CSV/Parquet Data Structure for ADX

Data can be provided as NDJSON, CSV or Parquet. This document describes the CSV/Parquet data structure.

Field
Description

EVENT_ID

The unique identifier of the event.

CREATE_DT

The date and time the event was first seen by PredictHQ in UTC. Also called first_seen in the Events API.

UPDATE_DT

The date and time the event was last updated in UTC.

TITLE

The title of the event.

CATEGORY

The category of the event.

LABELS

The labels associated with the event. The LABELS field is representing PredictHQ's legacy labels, and it's suggested to use the PHQ_LABELS field instead where possible. See also our Labels docs.

DESCRIPTION

The description of the event.

EVENT_START

The date and time the event starts, recorded in UTC. If the TIMEZONE field is null, the time represents the same relative time across all timezones.

Additionally, if an event has a start time of midnight in its local timezone, this may indicate that the actual time is unknown. You may wish to omit the time when displaying such events.

EVENT_END

The date and time the event ends, recorded in UTC. If the TIMEZONE field is null, the time represents the same relative time across all timezones.

PREDICTED_END

The date and time PredictHQ predicts the event will end, recorded in UTC. If the TIMEZONE field is null, the time represents the same relative time across all timezones. This value is present where an actual EVENT_END is unknown.

TIMEZONE

The time zone of the event in TZ Database format. This is helpful so you know which time zone to convert the dates to (if needed). If the time zone is null, the start and end date should be regarded as time zone agnostic and already being in local time.

ENTITIES

An array of entities linked to the event. This is a complex data type, please see Events API for details.

GEO

The geographic details (location) of the event in GeoJSON format. See geolocation guides for more information on handling GEO data.

IMPACT_PATTERNS

Also known as “Predicted Impact Patterns”. This field shows impact for leading days (days before the event), lagging days (days after an event) and the days the event occurs. It contains details such as the industry vertical the impact pattern applies to, the type of impact shown in the impact pattern, and an array of objects for each day showing the date in the local timezone of the event and the value of the impact_type for that given day. See also our Impact Patterns docs.

SCOPE

The geographical scope the events apply to. Possible values are:

  • locality

  • localadmin

  • county

  • region

  • country

PLACEKEY

The Placekey identifier for the physical address where the event takes place. See Placekey. This field will be null if the "What" part or the "Where" part of the Placekey for the event address couldn't be retrieved.

COUNTRY_CODE

The country code in ISO 3166-1 alpha 2 format. This value is typically present, but in some cases such as events occurring outside any country (e.g. an earthquake in the middle of the ocean), it may be empty.

PLACE_HIERARCHIES

An array of place hierarchies for the event. Each hierarchy is an array of place ids. The final place in a hierarchy is a specific place the event applies to. Each place is a sub-place of the place immediately preceding it in the hierarchy. An empty array is possible and valid.

See also the Place Hierarchies guide.

PHQ_ATTENDANCE

A numerical value that reflects the predicted attendance for supported attendance-based categories. Supported categories include concerts, performing arts, sports, expos, conferences, community, and festivals. Some academic and school holiday events may also include a phq_attendance value to indicate student numbers.

For multi-day events, phq_attendance represents total attendance across the entire duration, except for certain categories like conferences, where it reflects daily attendance.

For details see our Predicted Attendance guide.

PHQ_RANK

A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event independent of its geographical location.

See also our PHQ Rank docs.

LOCAL_RANK

Similar to PHQ Rank, this is a log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. It is designed to represent the potential impact of an event on its local geographical area.

Local Rank is calculated for events in the categories community, concerts, conferences, expos, sports, festivals, performing-arts. If local_rank is not intended to be available for an event, this field will be null.

See also our Local Rank docs.

AVIATION_RANK

A log scale numerical value between 0 and 100 with a five-level hierarchical impact schema. Aviation Rank indicates how much an event will impact flight bookings by considering both domestic and international travel.

Aviation Rank is no longer actively supported. For more information, see the Aviation Rank docs.

STATUS

The publication state of the event.

Possible values:

  • active - The event is an active event.

  • postponed - The event is a postponed event, and is expected to occur at a later date.

  • cancelled - The event is a cancelled event and is not expected to occur at a later date.

  • predicted - The event is a predicted event. For details, see our Predicted Events page.

BRAND_SAFE

Whether or not this event is considered brand-safe. Examples of brand-unsafe events include content that promotes hate, violence, or discrimination, coarse language, content that is sexually suggestive or explicit, etc.

PARENT_EVENT_ID

Used to indicate if this event is part of a larger event. These types of events are called umbrella events in the system. For example, a large multi-day parent umbrella event may have individual child events for sessions on different days. This field only shows if a child event has a parent id. It does not indicate if a parent event has child events. For details see our Umbrella Events docs.

CANCELLED_DT

The date and time the event was marked as cancelled, presented in the UTC timezone. This field will be null if STATUS is not set to "cancelled" or if the cancellation date is unavailable.

POSTPONED_DT

The date and time the event was marked as postponed, presented in the UTC timezone. This field will be null if STATUS is not set to "postponed" or if the postponement date is unavailable. Note that this field does not represent the new date and time of the postponed event.

PREDICTED_EVENT_SPEND_TOTAL

The total predicted event spend (USD) across all supported industries: accommodation, hospitality and transportation. This field will be null if the predicted event spend is not supported for this event. See also our Predicted Event Spend docs.

PREDICTED_EVENT_SPEND_ACCOMMODATION

The total predicted event spend (USD) for the accommodation industry. This field will be null if the predicted event spend is not supported for this event. See also our Predicted Event Spend docs.

PREDICTED_EVENT_SPEND_HOSPITALITY

The total predicted event spend (USD) for the hospitality industry. This field will be null if the predicted event spend is not supported for this event. See also our Predicted Event Spend docs.

PREDICTED_EVENT_SPEND_TRANSPORTATION

The total predicted event spend for the transportation industry. This field will be null if the predicted event spend is not supported for this event. See also our Predicted Event Spend docs.

PHQ_LABELS

The PHQ Labels associated with the event. This field will be null if there are no PHQ Labels for this event. See also our Labels docs.

ALTERNATE_IDS

All alternate IDs for the event. Any event IDs that may have been used for this event in the past will be included here. It does not include the current event ID.

EVENT_START_LOCAL

The date and time when the event begins, expressed in the event's local time zone.

EVENT_END_LOCAL

The date and time when the event ends, expressed in the event's local time zone.

PREDICTED_END_LOCAL

The date and time when the event is predicted to end, expressed in the event's local time zone.

REGION

The region in which the event will be occurring. This field will be null if the event covers more than a single region.

LOCALITY

The locality in which the event will be occurring. A locality is most commonly referred to as a city or town. This field will be null if the event covers more than a single locality.

POSTCODE

The postal code or ZIP code in which the event will be occurring. This field will be null if the event covers more than a single post code.

FORMATTED_ADDRESS

A full formatted address which can include street addresses, locality, postcode, region, and country.

ROW_INSERTED_DT

The date and time this row was inserted in UTC. The row may have been deleted and inserted multiple times so it does not reflect when an event is first seen, use CREATE_DT for that.

ROW_UPDATED_DT

The date and time this row was last updated in UTC.

CHANGE_ACTION

Indicates if the record has been updated, deleted or inserted. Use when processing the data file to keep your database updated.

Possible values:

  • insert - new record, not previously seen.

  • update - existing record, updated values.

  • delete - deleted record, remove from your dataset.

Last updated

Was this helpful?