Correlating events to demand - a guide for Data Scientists¶
This guide is for data scientists and technical teams working on correlation scenarios.
It maps out the steps you will need to take to correlate event impact data with your demand data so you can identify how events drive your demand. This enables you to factor these events into your forecasts in advance, leading to more profitable and efficient operations.
This guide builds on concepts that we covered in our getting started guide for data scientists. We recommend you read that guide first.
This guide is written in Jupyter notebook format to allow you to download and run the code samples to get up and running quickly. We provide sample files with the guide to get you started. This enables you to change the code and data in order to experiment with your own data and scenarios. We provide Python code samples for all the topics covered in the guide to give you a step-by-step resource you can use for your own correlation scenarios.
After running through this guide you should know what our Aggregate Event Impact data is and how to use it. You should have an understanding of an approach for using time series modeling to extract incremental demand. You will be able to correlate aggregate impact data with the incremental demand you have extracted.
In this guide, we use the following open source technologies:
- Python 3
- Jupyter notebook for live coding and visualisation
- Pandas library for data manipulation
- rstl library for time series decomposition
- Plotly for interactive visualisation
See also our Youtube video on correlating events and demand.
Now let's dive in and get started!
The Aggregate Event Impact endpoint is a feature that allows customers to aggregate event impacts for a location in a given time period. Because PHQ Rank™ and Aviation Rank™ are log scale numerical values, simply summing up all the rank values does not deliver a meaningful result.
PredictHQ’s Aggregate Event Impact tool does this calculation for you. It can be used with PHQ Rank™ or Aviation Rank™. It results in a numerical output that can be used directly in modelling.
How our ranks work at a glance:
- PHQ Rank™ uses factors like attendance information, venue size and related factors to calculate impact. The impact value for the PHQ Rank™ based Aggregate Event Impact reflects approximate attendance the events on a given day.
- Aviation Rank™ identifies an event’s impact based on airline demand. The impact value for the Aviation Rank™ based Aggregate Event Impact is a proxy for passenger numbers travelling for events on a given day.
The Aggregate Event Impact endpoint allows customers to aggregate event impact for a location in a given time period. It is useful for businesses seeking to correlate their demand with verified event data. Once you have achieved correlation, you can use Aggregate Event Impact to predict future impact on your business.
The location value for Aggregate Event Impact could be a city, or a latitude and longitude and radius. Use latitude and longitude and a radius around a specific point for finding the impact for a branch of your business like a store, hotel or warehouse. You can use city level or a larger radius if you want to look at impact across multiple physical locations.
To use Aggregate Event Impact:
- Begin by identifying your location. If you are using a physical bricks and mortar building then you can use the latitude and longitude of the building and a radius around it to look at events. For example, if you wanted to identify event demand on a hotel in Seattle, you might create a 20 mile radius around it, and use that as your location.
- Next, identify the timeframe you want to investigate - whether that’s a day, week, month or more.
- Choose which rank you want to use to calculate the impact value - PHQ Rank™ or Aviation Rank™.
- Finally, call our Aggregate Event Impact endpoint to deliver you the values for event impact in that area and timeframe.
- Remember you can use our data exporter to export a list of events around a location and look at their impact on your business. See our getting started guide for more details on how to do that.
Let's take an example of how Aggregate Event Impact works:
- On May 28th, 2019, there were a number of events in Seattle. For example the Northwest Folklife Festival, The Seattle International Film Festival, a Texas Rangers vs Seattle Mariners baseball game, The Georgetown Orbits concert, as well as the Annual Meeting of the Consortium of Multiple Sclerosis Centers and many more.
- The Aggregate Impact value for the whole of Seattle using PHQ Rank™ on that day is 38,253. This represents the total combined impact of all the events on that day. Please note this is an indicative number, not the exact total of expected attendees.
- Using this value once it has been correlated with your demand, you will be able to identify how much additional demand to prepare for.
Note, you can choose your location and radius and then filter down the impact to events on around a specific location. You can also choose to filter aggregate impact on values like rank (to include higher ranked events) or to specific categories based on what has the most impact to your business.
Aggregate Event Impact Data¶
PredictHQ's Aggregate Event Impact endpoint not only includes the overall daily impact for the attendance-based event categories, but also impact and rank levels for individual categories.
Here is a summary of the fields returned by the PHQ Rank™ based aggregate event impact.
|date||Date of aggregate event impact|
|impact||The total impact of all active events on a given day|
|count||The number of active events on a specific date|
|categories||A map of the categories supported by the endpoint and the count for each category|
|categories_impact||A map of the categories supported by the endpoint and the impact for each category|
|rank_levels||A map of count of events per rank level returned for a given day|
|rank_levels_impact||An map of the total impact per rank level returned for a given day|
More information on aggregate event impact is on PredictHQ's technical documentation.
To get you up and running quickly we have exported AEI events data into CSV files for use in this guide. These files contain data for Seattle in 2019. We took a geographical point in the center of Seattle and retrieved events within a specified radius of 20 miles. This CSV file contains all events with a PHQ Rank™ based AEI.
If you want to query aggregate impact data directly for your own testing you can sign up for a developer plan and use our data exporter or API to retrieve the data. Our data exporter allows you to export data into CSV or JSON format.
import pandas as pd from rstl import STL import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots
Load AEI data for 2019.
# Load AEI data aei_df = pd.read_csv('https://raw.githubusercontent.com/predicthq/phq-data-science-docs/master/correlation/aei.csv', index_col=0) aei_df.index = pd.to_datetime(aei_df.index)
In this guide, we will be using the overall daily aggregated event impact across categories.
The data should look like the following:
5 rows × 22 columns
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 365 entries, 2019-01-01 to 2019-12-31 Data columns (total 22 columns): count 365 non-null float64 impact 365 non-null float64 community_count 365 non-null float64 concerts_count 365 non-null float64 festivals_count 365 non-null float64 performing-arts_count 365 non-null float64 sports_count 365 non-null float64 conferences_count 365 non-null float64 expos_count 365 non-null float64 public-holidays_count 365 non-null float64 school-holidays_count 365 non-null float64 observances_count 365 non-null float64 community_impact 365 non-null float64 concerts_impact 365 non-null float64 festivals_impact 365 non-null float64 performing-arts_impact 365 non-null float64 sports_impact 365 non-null float64 conferences_impact 365 non-null float64 expos_impact 365 non-null float64 public-holidays_impact 365 non-null float64 school-holidays_impact 365 non-null float64 observances_impact 365 non-null float64 dtypes: float64(22) memory usage: 65.6 KB
Let's visualize the daily aggregate event impact data.
fig = go.Figure([go.Scatter(x=aei_df.index, y=aei_df['impact'], name='aei', line_color='#f2327e')]) fig.update_layout(title='Figure 1: Aggregate Event Impact') fig.show()