Introduction

Snowflake Secure Data Share enables companies to access and share PredictHQ’s data in a controlled and efficient way. Access to a Secure Share of our events means you will experience a familiar and high-performance SQL interface with an up-to-date, clean, and complete set of PredictHQ’s data. This means you can immediately incorporate the data into your models, removing or greatly simplifying the need for ELT/ETL processes to pull event data into your data warehouse. You can check out Introduction to Secure Data Sharing page if you're interested to read more on Snowflake's Secure Data Sharing.

Don’t have a Snowflake account?

Reach out to us, we can help you set one up.

PredictHQ's Data Offerings on Snowflake Marketplace

PredictHQ has partnered with Snowflake to make forecast-grade data more easily available. Events API's data is the first to be offered on the Snowflake Marketplace. We are working to make other endpoints (eg: "Entities", "Live TV Events") data available on Snowflake Marketplace soon.

Sample Data Shares

PredictHQ offers multiple up-to-date event Data Shares on Snowflake's Marketplace. This, bundled up with Snowflake's 30-day trial, offers you a commitment-free opportunity to try PredictHQ's data for free. You can view and try these sample Data Shares on PredictHQ's page on Snowflake's Marketplace or by clicking on any of the links below:

  • Attended Events Sample Data Shares

    • Data Type: (Event's API Alternative)

    • Categories: sports, concerts, expos, conferences, festivals, performing-arts and community

    • Location: City of Seattle

    • Historical Visibility Time Window: 1 Year

    • Future Visibility Time Window: 30 Days

  • Non-Attended Events Sample Data Shares

    • Data Type: Events (Event's API Alternative)

    • Categories: public-holidays, school-holidays, observances, politics, daylight-savings and academic

    • Location: City of Seattle

    • Historical Visibility Time Window: 1 Year

    • Future Visibility Time Window: 30 Days

  • Unscheduled Events Sample Data Shares

    • Data Type: Events (Event's API Alternative)

    • Categories: severe-weather, disasters, airport-delays, terror and health-warnings

    • Location: State of California

    • Historical Visibility Time Window: 1 Year

    • Future Visibility Time Window: 30 Days

Samples are just... Samples!

The sample Data Shares are not limited in terms of columns or fields making them valuable for business and data science evaluations. However as it's been outlined above, these data shares are limited to a specific location and restricted time window that might not suit your use case. These samples therefore represent a small fraction of the data we have. We offer Customized Data Shares which are filtered to match customers' preferences.

Customized Data Shares

Customized Data Shares can be set up to match your preferences in terms of data type, location, time window and business use case. They are secure, easy to set up and usually don't require business resources for data integrations if you're already in the Snowflake platform. Get in touch with us to discuss your needs and we will come back to you as soon as possible.

Accessing Customized Data Shares In Snowflake's Snowsight Web UI

Once a Customized Data Share has been set up for you, head over to the Snowflake's Snowsight Web UI. You might need to enter your Snowflake "Account Name" and your account credentials to log-in.

alt

Once you have logged into Snowsight, navigate to Data >> Shared Data >> Shared With Me to see the new incoming or previously shared data shares. As you can see below, the incoming data shares would be under the Ready To Get section. You might need to switch to another Role from the top left corner of the navbar to be able to see the Ready To Get section.

Snowflake requires a user with the ACCOUNTADMIN role or with IMPORT SHARE privileges to perform data sharing tasks including viewing and accepting an incoming data share. If you cannot see anything listed under the Data >> Shared Data >> Shared With Me >> Ready to Get, make sure that there isn't another role you can select or reach out to your Snowflake administrator.

alt

PredictHQ's Customized Data Shares would be in CUSTOMER_<ORG> format where the <ORG> would be your business name. You would see the pop-up below to accept the incoming data share once you click on it:

alt

A custom name can be chosen for the database which would be created to contain the data share. You can also add additional roles who should be able to access the newly created database.

Data Description For Snowflake Data Shares

Event Data Shares:

The following data description applies to both sample and Customized Data Share for Events. All dates are in UTC but would displayed in Snowflake session's timezone.

Field (Column) Type Description
event_id VARCHAR(16777216) The unique identifier of the event
create_dt TIMESTAMP_LTZ(9) The date the event first entered our database in ISO 8601 format. This value may be missing on some events, and should not be considered an event announcement date.
update_dt TIMESTAMP_LTZ(9) The last modification date of the event in ISO 8601 format.
title VARCHAR(16777216) The title of the event.
category VARCHAR(16777216) The category of the event.
labels ARRAY The labels associated with the event.
description VARCHAR(16777216) A description of the event.
event_start TIMESTAMP_LTZ(9) The start date of the event in ISO 8601 format. If an event has a start time of midnight (in the event's time zone) this is an indication that the actual time may be unknown.
event_end TIMESTAMP_LTZ(9) The end date of the event in ISO 8601 format.
predicted_end TIMESTAMP_LTZ(9) The predicted end date of the event in ISO 8601 format. This field will only be present if an actual end time is not available for an event and we have a predicted end time.
timezone VARCHAR(16777216) The time zone of the event in TZ Database format. This is helpful so you know which time zone to convert the dates to (if needed).
entities ARRAY An array of entities linked to the event. An object in the array named entity_type would indicate the type of the entity which could be either of event-group or venue.
geo GEOGRAPHY A geometry object associated with the event in the GeoJSON format. An object in the array named entity_type would indicate the type of the entity which could be either of Point, Polygon or MultiPolygon.
country_code VARCHAR(16777216) An indicator of the event's country in ISO-3166-1 alpha-2 format.
place_hierarchies ARRAY An array of place hierarchies for the event. Each hierarchy is an array of place ids. The final place in a hierarchy is a specific place the event applies to. Each place is a sub-place of the place immediately preceding it in the hierarchy. For example, a hierarchy might contain the following places in this order: Earth > Europe > United Kingdom > England > Nottingham. Note that the place_hierarchies value can be an empty array in some cases.
phq_attendance NUMBER(38,0) A numerical value that reflects the predicted attendance number for supported attendance-based categories. The following categories are supported: concerts, performing arts, sports, expos, conferences, community and festivals. phq_attendance reflects the entire attendance for multi-day events (the number of people attending across the full duration of the event) except for some categories like conferences where it is the daily attendance. See Handling multi-day and Umbrella events for more details.
phq_rank NUMBER(38,0) A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event independent of its geographical location.
local_rank NUMBER(38,0) Similar to PHQ Rank, this is a log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event on its local geographical area. Local Rank is calculated for events in the categories community, concerts, conferences, expos, sports, festivals, performing-arts.
aviation_rank NUMBER(38,0) A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. Aviation Rank indicates how much an event will impact flight bookings by considering both domestic and international travel. It can be mapped to the predicted increase in demand based on people flying to an event. Therefore, events with higher Aviation Rank are expected to result in more people taking flights than lower Aviation Rank events.
status VARCHAR(16777216) The publication state of the event. Possible values are active which means the event is published and valid and deleted which means the event was removed, either because it was cancelled or is a duplicate.
brand_safe BOOLEAN A boolean value indicating whether the event is potentially safe for your brand to be associated with. Some examples of non-brand safe events include content that promotes hate, violence or discrimination, coarse language or content that is sexually-suggestive or explicit. We use title and description of events to determine whether an event is brand-safe or not.
parent_event_id VARCHAR(16777216) Used to indicate if this event is part of a larger event. These types of events are called umbrella events in the system. For example, a large multi-day parent umbrella event may have individual child events for sessions on different days. For example the Formula 1 2019 United States Grand Prix had child events for the qualification, 3 practice events, a concert that occurs at the Grand Prix, and the actual race events (there are 12 child events in total). See umbrella events for details on this field and details on what umbrella events are. Note that this field only shows if a child event has a parent id. It does not indicate if a parent event has child events.
cancelled_dt TIMESTAMP_LTZ(9) The date the event was set to cancelled in the system in ISO 8601 format.
postponed_dt TIMESTAMP_LTZ(9) The date the event was set to postponed in the system in ISO 8601 format.
scope VARCHAR(16777216) The geographical scope the events apply to. Possible values are locality, localadmin, county, region, country.
row_inserted_dt TIMESTAMP_LTZ(9) The date the event was first inserted into the data share in ISO 8601 format.
row_updated_dt TIMESTAMP_LTZ(9) The date the event was lastly updated in the data share in ISO 8601 format.

Snowflake SQL Query Examples

Using SQL queries in Snowflake the shared data can be manipulated to fit into a variety of tasks. Feeding the data into analyses, dashboards and machine learning models are only a few examples that are easily possible with PredictHQs data and SQL data manipulation in Snowflake.

The following SQL examples are based on the Sample Data Shares we described above. Here our database name is marketplace but in your account, it would be the custom name that was chosen when the sample data shares were being added to your Snowflake account.

  • Filtering on the ARRAY column labels, to find events with a construction label, using ARRAY_CONTAINS:

SELECT event_id,
       title,
       category,
       phq_rank,
       labels
FROM   marketplace.predicthq.seattle_attended_events AS EVENTS
WHERE  ARRAY_CONTAINS('construction' :: variant, EVENTS.labels);
EVENT_ID TITLE CATEGORY PHQ_RANK LABELS
8VmnC4u2qhHiYBnZQ7 Zak World of Facades Seattle conferences 0 [ "conference", "construction" ]
MPCit98VjwyuLikWZB TecHome Builder Summit Fall conferences 42 [ "conference", "construction" ]
7qmG3u42V3cVxNi9qD Northwest Remodeling Expo expos 67 [ "architecture", "construction", "expo" ]
EuGFPmDdpLcHkfZwHa Northwest Remodeling Expo expos 64 [ "construction", "expo", "product" ]
GxVVHErJWx83mizc9m MCAA GreatFutures Forum conferences 28 [ "business", "conference", "construction" ]
  • Filtering the nested ARRAY column place_hierarchies to find events at King County using FLATTEN: (Note that this uses the place id value, which is described further here)

SELECT event_id,
       title,
       category,
       labels,
       place_hierarchies,
       place_hierarchies_flattened.value                           AS place_hierarchy_list,
       place_hierarchy_flattened.value                             AS place_hierarchy_item
FROM   marketplace.predicthq.seattle_attended_events               AS events,
       LATERAL FLATTEN(input => events.place_hierarchies)          AS place_hierarchies_flattened,
       LATERAL FLATTEN(input => place_hierarchies_flattened.value) AS place_hierarchy_flattened
WHERE  place_hierarchy_flattened.value = '5799783';
EVENT_ID TITLE CATEGORY LABELS PLACE_HIERARCHIES PLACE_HIERARCHY_LIST PLACE_HIERARCHY_ITEM
E438FkX7HLqHpN2Lkc Organ Plus in Thomsen Chapel concerts [ "concert", "music" ] [ [ "6295630", "6255149", "6252001", "5815135", "5799783", "5788165" ], [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ] ] [ "6295630", "6255149", "6252001", "5815135", "5799783", "5788165" ] "5799783"
3xECVFn9uP7Mw2dJe7 Sleepy Hallow concerts [ "concert", "music" ] [ [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ] ] [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ] "5799783"
bqGwjqsPVRLKtbYLYs Comedian Thomas Nichols performing-arts [ "concert", "entertainment", "music", "performing-arts" ] [ [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ], [ "6295630", "6255149", "6252001", "5815135", "5799783", "5814442" ] ] [ "6295630", "6255149", "6252001", "5815135", "5799783", "5814442" ] "5799783"
keHoeBLK7Ww8V4uWkv Sub Urban & Bella Poarch - Virgil's Mania Tour concerts [ "concert", "music" ] [ [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ], [ "6295630", "6255149", "6252001", "5815135", "5799783", "7153941" ] ] [ "6295630", "6255149", "6252001", "5815135", "5799783", "7153941" ] "5799783"
6bBjPjujVN8Q8Cq44P Whales and Hairitage concerts [ "concert", "music" ] [ [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ] ] [ "6295630", "6255149", "6252001", "5815135", "5799783", "5809844" ] "5799783"
SELECT event_id,
       title,
       category,
       phq_rank,
       labels,
       phq_rank,
       geo
FROM   marketplace.predicthq.seattle_attended_events AS EVENTS
WHERE  ST_DISTANCE(EVENTS.geo, ST_MAKEPOINT(-122.336498, 47.623257)) <= 500;
EVENT_ID TITLE CATEGORY PHQ_RANK LABELS GEO
EYL7o5uE65TcQFpmQD Sarah Christine and Taylor Tuke concerts 0 [ "concert", "music" ] { "coordinates": [ -122.34197, 47.62218 ], "type": "Point" }
E4zrz8MTdb2UfNQfPi South Lake Union Winter Market festivals 54 [ "festival" ] { "coordinates": [ -122.33621, 47.622634 ], "type": "Point" }
X9rWASDN88DcRWhW8Z AUG 26! An Evening w/ Sarah Christine The Collective Seattle concerts 0 [ "concert", "music" ] { "coordinates": [ -122.34197340000003, 47.6221728 ], "type": "Point" }
FAmiEHQvanW3cfpRzB Parlor Tricks Comedy Night performing-arts 0 [ "concert", "music", "performing-arts" ] { "coordinates": [ -122.3421581, 47.62294079999999 ], "type": "Point" }
B7T6izj2gSJSr6Um5u The Collective Patio Sets: Sarah Christine community 0 [ "concert", "music" ] { "coordinates": [ -122.3421581, 47.62294079999999 ], "type": "Point" }

First, we define the shape we are interested in restricting the search to, as shown in the image below.

alt
Example polygon used for Snowflake query example

The coordinates of this shape are used in the following query.

SELECT seattle_events.event_id,
       title,
       category,
       phq_rank,
       geo
FROM marketplace.predicthq.seattle_attended_events seattle_events
WHERE st_within(seattle_events.geo, st_makepolygon(to_geography(
 ('LINESTRING(-122.30263710021973 47.62467785241324,-122.33980178833006 47.60442694445526,
  -122.34134674072267 47.58451555263637,-122.27954864501955 47.616925574159424,
  -122.30263710021973 47.62467785241324)'))))
EVENT_ID TITLE CATEGORY PHQ_RANK GEO
84xmSc634UTHwAkT5b NWSL - OL Reign vs Kansas City Current sports 78 { "coordinates": [ -122.33163939999997, 47.5951518 ], "type": "Point" }
3y8VCcVthX6dRk88aK The Antlers concerts 43 { "coordinates": [ -122.31962, 47.613938 ], "type": "Point" }
BH7ffgQGiJx9yux2qd Boris Brejcha concerts 59 { "coordinates": [ -122.3322862, 47.5933082 ], "type": "Point" }
BaqF5i5yYxBzFUGFLj Kill The Noise concerts 52 { "coordinates": [ -122.3205827, 47.6137242 ], "type": "Point" }
HpuMrnfLhdEgcBJu4q Convent Bonfires, Bonsai Trees, and Good Job. concerts 36 { "coordinates": [ -122.3343786, 47.6005944 ], "type": "Point" }

Using the geo field of the events, these can be plotted on a map as shown below.

alt

Frequently Asked Questions

  • How does costing work on the data shares? Snowflake's architecture separates data warehousing into different distinct layers: storage, virtual warehouses (compute), and cloud services. By using PredictHQ's data through Snowflake, you would only pay for the amount of computation that you would perform on the shared data. In other words, there is no cost for you to store the data and you would only be billed for running queries on the shared data. You can read more about Snowflake's costing on their website.

  • How frequently is the shared data updated? PredictHQ's data shares get updated at midnight in the region's timezone. This frequency is generally sufficient for most business use cases. If the business use case requires more up-to-date data, it can be provided down to minute-level frequency. However, this would incur extra computation costs.

  • Can I make a change in the shared data or re-share it? All database objects shared between accounts are read-only. They cannot be directly modified or deleted, including adding or modifying table data. Shared databases and all the objects in the database cannot be re-shared with other accounts.