Frequently Asked Questions - Data Science Guides
1. How is event data different than other data sets?
Event data is incredibly dynamic and particularly challenging because it is not static. Events also vary in size and impact, making correlating event data complicated. There are many different types of events, like conferences, expos, sports, festivals, and concerts, and these different types of events can impact different businesses in various ways. Due to these constant changes, once our most successful customers have completed testing the data through our Control Center (online Dashboard) or Data Exporter (UI for CSV or JSON exports), they call our API on a consistent basis to ensure the most up-to-date information reaches their systems and models.
2. What does PHQ Rank™ represent?
All events in the PredictHQ API are ranked based on an underlying model to reflect the event impact on a logarithmic scale. The higher the rank value, the higher the impact. These three rankings allow for a variety of insights to be derived from each - whether used individually or in conjunction with one another. PHQ Rank™ indicates estimated attendance for an event.
3. How do your ranks work?
PredictHQ’s ranks draw on a broad and deep set of factors including proprietary entities, customized labels and unique repositories of historical data. Our rankings are core to our business. While we can’t reveal all of our secret sauce, below are some examples of what makes our data reliable and unique.
To arrive at reliable event rankings, we have created a large number of algorithms and machine learning models to power different ranking protocols for different event categories. Our categories span conferences to concerts to public holidays to natural disasters, all requiring different ways of gauging their impact. Our categories can be understood as three types.
Attendance based categories, such as concerts and conferences, draw on: historical attendance, venue capacity, available tickets, average ticket sales, performer entities, sports entities, performer popularity and more.
Holiday events, such as public holidays and observances, draw on: The size and scope of the holiday (such as regional, local, or national), who it affects (majority / minority population), the duration and more.
Disaster events, such as terrorism and hurricanes, draw on: An assigned value to damage, number of people impacted, size of area affected and more.
4. What does your coverage look like?
We have events in every country and cover 30,000 cities. It is impossible to represent our coverage as a percentage, as we are the largest provider of verified event data in the world. In total, we have more than 25 million verified events. Therefore, we focus our coverage and quality metrics in a variety of ways including: - number of events related to population density - number of large-scale events and clusters of smaller events (over the next 30 days)
5. What is an entity?
Entities are things with defined attributes, such as performers, music bands, venues, sports teams and more. Our entities system is integral to our verification, enrichment, deduplication and ranking processes. Entities allow us to accurately estimate attendance and impact given that they are stable as opposed to a moving object such as ticket sales.
For example, a concert will take place in or at a venue entity, where people will watch a performer entity. Our system will factor in the popularity of a performer like Beyoncé when we rank her concert. A Beyoncé concert would likely rank much higher than a lesser known artist.
Venues and recurring event entities are publicly available in our API response.
Customers can track, analyze, and export all entity information for each event. For example, if a customer knows that a particular artist or venue affects them most, they can track every event matching the corresponding entity.
See more information of entities in our Category Info guides.
6. How do I know your data is accurate?
We have a rigorous set of models and algorithms to ensure we’re providing a clean and verified data set. Our machine learning models are working day-in and day-out, which is especially crucial for event data, which is changing all of the time.
Every event in our API goes through multiple steps to ensure quality and accuracy. Some example steps are:
Standardization: All events follow the same schema for ease of ingestion, comparability and compatibility.
Aggregation: We pull in events and entities from hundreds of different sources and compare them for quality and accuracy.
Enrichment: We categorize, label and add entities to all events to help reduce noise. We also ensure all events have a date, time and location.
Spam Filtering: Bad data is worse than no data at all. We ensure you only have access to events that are actually happening.
30% to 40% of events we receive are spam, add-ons or duplicates. We have a 0% spam rate.
Geocoding: Every event has a lat/long, allowing for precise mapping. Events also follow identification patterns from the open-source Geonames database. For instance, all events in California will have multiple IDs, of which 5332921 (the ID for California) will always be included. We also provide venue name and formatted address whenever possible.
De-duping: We combine duplicate events into one reliable record. E.g. Our system may find a football game with 30,000 expected attendees. It finds eight listings of this game from five different sources. Our unique model kicks in and keeps a single event with the aggregated detail from all of the listings.
Ranking: See details above on what goes into our rankings. We track accuracy with a variety of metrics and we’re refining daily. The top metrics we track are: Spam rates, number of duplicated events, location and competitive comparison.
7. Are all ranks treated equally across your categories (e.g. holiday versus concert)?
No. While our ranks—PHQ Rank™, Local Rank™ and Aviation Rank™—provide the most accurate representation of impact that an event will have, the algorithms behind each rank are different. Below are the categories we cover organized into three types:
- Attended Impact (attendance-based)
- Performing Arts
- Holiday Impact
- Public Holidays
- Daylight Savings
- Unscheduled Event Impact
- Airport Delays
- Severe Weather
Ranks are not comparable between three different buckets of category types. All ranks for attendance based events are comparable across 7 categories: conferences, expos, sporting events, concerts, festivals, performing arts and community events. Furthermore, public-holidays, school-holidays, and observances can be compared to one another.
Why do we organize our data this way? Different types of categories affect demand in different ways. This categorization ensures our ranks are as reliable as possible.
8. Since each event has a pinpoint lat/long, how should they be treated if they affect other areas? (e.g. SuperBowl, away soccer matches, etc.)
While each event has a latitude and longitude, we also provide the scope of the event (i.e. regional, national, local) which helps to determine the area that an event impacts.