CSV/Parquet Data Structure for ADX
Data can be provided as NDJSON, CSV or Parquet. This document describes the CSV/Parquet data structure.
Last updated
Was this helpful?
Data can be provided as NDJSON, CSV or Parquet. This document describes the CSV/Parquet data structure.
Last updated
Was this helpful?
© 2024 PredictHQ Ltd
Field | Description |
---|---|
EVENT_ID
The unique identifier of the event.
CREATE_DT
The date and time the event was first seen by PredictHQ in your Snowflake session timezone. Also called first_seen
in the Events API.
UPDATE_DT
The date and time the event was last updated in your Snowflake session timezone.
TITLE
The title of the event.
CATEGORY
The category of the event.
LABELS
The labels associated with the event. Use the count endpoint to fetch a list of available labels.
DESCRIPTION
The description of the event.
EVENT_START
The date and time that the event starts in your Snowflake session timezone. If timezone is not considered relevant (e.g., Independence Day) then this is recorded in UTC and is the same relative time in all timezones and the TIMEZONE
field will be null
.
Additionally, if an event has a start time of midnight (in the event time zone) this is an indication that the actual time may be unknown. You may wish to omit the time when displaying these events.
EVENT_END
The date and time that the event ends in your Snowflake session timezone. If timezone is not considered relevant (e.g., Independence Day) then this is recorded in UTC and is the same relative time in all timezones and the TIMEZONE
field will be null.
PREDICTED_END
The date and time that PredictHQ predicts that the event will end, in the timezone of the event.
TIMEZONE
The time zone of the event in TZ Database format. This is helpful so you know which time zone to convert the dates to (if needed). If the time zone is null
, the start and end date should be regarded as time zone agnostic and already being in local time.
ENTITIES
An array of entities linked to the event. This is a complex data type, please see Events API for details.
GEO
The geographic details (location) of the event.
SCOPE
The geographical scope the events apply to. Possible values are:
locality
localadmin
county
region
country
PLACEKEY
The Placekey identifier for the physical address where the event takes place. See Placekey. This field will be null
if the What part or the Where part of the Placekey for the event address couldn't be retrieved.
COUNTRY_CODE
The country code in ISO 3166-1 alpha-2 format. Note that the country value will usually be present but in some cases where the event location is not within a country (e.g., an earthquake in the middle of the ocean) it can be empty.
PLACE_HIERARCHIES
An array of place hierarchies for the event. Each hierarchy is an array of place ids. The final place in a hierarchy is a specific place the event applies to. Each place is a sub-place of the place immediately preceding it in the hierarchy. An empty array is possible and valid.
See also the Place Hierarchies guide.
PHQ_ATTENDANCE
A numerical value that reflects the predicted attendance number for supported attendance-based categories. The following categories are supported: concerts, performing arts, sports, expos, conferences, community, and festivals.
phq_attendance
reflects the entire attendance for multi-day events (the number of people attending across the full duration of the event) except for some categories like conferences where it is the daily attendance.
For details see our Predicted Attendance guide.
PHQ_RANK
A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event independent of its geographical location.
See also our PHQ Rank docs.
LOCAL_RANK
Similar to PHQ Rank, this is a log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event on its local geographical area.
Local Rank is calculated for events in the categories community, concerts, conferences, expos, sports, festivals, performing-arts. If local_rank is not intended to be available for an event, this field will be null
.
See also our Local Rank docs.
AVIATION_RANK
A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. Aviation Rank indicates how much an event will impact flight bookings by considering both domestic and international travel. It can be mapped to the predicted increase in demand based on people flying to an event. Therefore, events with higher Aviation Rank are expected to result in more people taking flights than lower Aviation Rank events. Aviation Rank is calculated for events in the categories concerts, conferences, expos, sports, festivals, performing-arts, observances, public-holidays, and school-holidays. If aviation_rank is not intended to be available for an event or we couldn't calculate it, this field will be null.
See also our Aviation Rank docs.
STATUS
The publication state of the event.
Possible values:
active
- the event is published and valid.
deleted
- the event was removed, either because it was cancelled or is a duplicate.
BRAND_SAFE
Whether or not this event is considered brand-safe. Examples of brand-unsafe events include content that promotes hate, violence, or discrimination, coarse language, content that is sexually suggestive or explicit, etc.
PARENT_EVENT_ID
Used to indicate if this event is part of a larger event. These types of events are called umbrella events in the system. For example, a large multi-day parent umbrella event may have individual child events for sessions on different days. This field only shows if a child event has a parent id. It does not indicate if a parent event has child events. For details see our Umbrella Events docs.
CANCELLED_DT
The date and time the event was set to cancelled presented in your Snowflake session timezone. This field will be null
if deleted_reason
not set to cancelled
or cancelled date is not available.
POSTPONED_DT
The date and time the event was set to postponed presented in your Snowflake session timezone. This field will be null
if deleted_reason
is not 'postponed
' or postponed date is not available. Note this field is not the new date and time of the postponed event.
IMPACT_PATTERNS
Also known as “Demand impact patterns”. This field shows impact for leading days (days before the event), lagging days (days after an event) and the days the event occurs. It contains details such as the industry vertical the impact pattern applies to, the type of impact shown in the impact pattern, and an array of objects for each day showing the date in the local timezone of the event and the value of the impact_type for that given day.
PREDICTED_EVENT_SPEND_ACCOMMODATION
The total predicted event spend for the accommodation industry. This field will be null
if the predicted event spend is not supported for this event.
PREDICTED_EVENT_SPEND_HOSPITALITY
The total predicted event spend for the hospitality industry. This field will be null
if the predicted event spend is not supported for this event.
PREDICTED_EVENT_SPEND_TRANSPORTATION
The total predicted event spend for the transportation industry. This field will be null
if the predicted event spend is not supported for this event.
PHQ_LABELS
The PHQ Labels associated with the event. This field will be null
if there are no PHQ Labels for this event.
ALTERNATE_IDS
All alternate IDs for the event. Any event IDs that may have been used for this event in the past will be included here. It does not include the current event ID.
ROW_INSERTED_DT
The date and time this row was inserted in your local Snowflake session timezone. The row may have been deleted and inserted multiple times so it does not reflect when a broadcast is first seen, use CREATE_DT
for that.
ROW_UPDATED_DT
The date and time this row was last updated in your local Snowflake session timezone.