Using Event Data in Power BI
Learn how to connect PredictHQ data to Power BI using multiple methods, and build an example report.
In today's data-driven landscape, leveraging powerful analytical tools is essential for making informed decisions and uncovering hidden insights. This step-by-step guide focuses on Power BI as an industry standard robust, user-friendly platform. Power BI is used here as an example of a reporting suite that enables users to integrate data from various sources, create interactive reports, and share insights across an organization, to leverage PredictHQ data for powerful insights.
Use Cases
Demand Forecasting, Dynamic Pricing, Workforce Optimization, Demand Analytics, Inventory Management, Event Visibility
Relevant Industries
Accommodation, Consumer Packaged Goods, Grocery and Supermarkets, Leisure, Travel and Tourism, Marketing and Advertising, Parking, Restaurants, Retail, Transportation and Delivery and Others
Overview
This tutorial covers how to connect PredictHQ data to Power BI via two sources, CSV upload and direct API connection using one of our APIs - the Events API.
The data used in this guide is based on a popular location, in our case San Francisco City as a whole. Please change the location from San Francisco to the location you want to look at.
Below are the main steps involved in this guide:
Building Report Parameters around a Location
Example Parameters for this Guide
Select an Input method
CSV upload
Snowflake Connection
API connection
Guide to Building the Report
Example report download
Requirements:
Access to PredictHQ data via 3 methods with 3 different requirements:
CSV: PredictHQ account - Sign up here if you don’t already have an account.
Snowflake: PredictHQ Snowflake Data Share
API: API Access Token
Microsoft Power BI reporting software
Building Report Parameters around a Location
For the purposes of this tutorial, parameters will be fixed for a standard example. Parameters are defined below, focusing on San Francisco city for attended events in a 3 month period.
All of our parameters are able to be modified based on user needs, see our filtering guide for details on what these parameters mean and how they can be modified to suit different use cases.
Example Parameters for this Guide:
Date: user-defined, this tutorial uses a 3-month period from January 1st to March 31st 2024
Categories: community, conferences, concerts, expos, festivals, performing-arts, sports - these are our attended categories
Event State: Active and Predicted
PHQ Attendance: attended events only - filtered to events with an attendance of at least 1
Location: San Francisco city (place ID 5391959)
Location could be substituted for a specific latitude and longitude relating to an individual store, or could be scoped even wider depending on need. We suggest utilizing our Suggested Radius API to hone in on a specific shop location and pull only events within a more accurate radius based on those results. For now, we will look at the citywide events in San Francisco as our example.
The report provided in this example shows a graph of the total number of people attending events around the location per day, as well as a list of the events happening at the location sorted by the highest attendance events first.
We find many customers want to know what is happening around a business location such as around a hotel, restaurant, store, or other location. The graph of total attendance per day shows you peaks and dips in physically attended events. This allows you to see upcoming busy days or potential demand surges as well as quieter days. The list of events allows you to see events happening on a given day in more detail.
Our customers use this in a variety of ways, for example, an accommodation customer may use a report like this to set their hotel room pricing per day and may increase the price on days with a lot of events happening. A restaurant customer looking at staffing might roster more people when they see a lot of events happening near their location and perhaps reduce staff levels when fewer events are happening. And so on. See our use case guide for more examples.
The end result of the exercise will be a report like this:
Select an Input Method
There are several ways to connect PHQ data to Power BI or other reporting software. Below are three of the main methods users can utilize to connect and start creating reports.
CSV Upload: The quick and easy way to connect data straight from our PredictHQ Control Center into reporting software. If a static view of data is all you need, this method gets it done fast. This method does not refresh or update the data when it changes. Events are dynamic and get canceled, postponed, move location, and so on. Using a CSV is a good way to do initial modeling but we’d suggest calling the API or connecting to a data warehouse moving forward.
Snowflake Connection: Choosing Snowflake as the data source for Power BI is highly recommended due to its robust data warehousing capabilities and seamless integration. Snowflake provides dynamic scalability and real-time data access, enhancing the accuracy and efficiency of reports. Snowflake offers straightforward connectivity and powerful query performance.
API Connection: Another preferred method for connecting our dynamic events data to Business Intelligence software is to use our robust APIs. This way the report is connected to an ever-updating data source and is always up to date.
CSV Upload Method
We will use PredictHQ Control Center Search to get our CSV. Filter the events based on the parameters laid out in the Example Parameters for this Guide. For more information on using Control Center Search use this guide. Fill in the parameters and hit search.
Once the search has completed hit the Export button on the right to get a CSV. For more details on exporting see the CSV Export Guide. Once the export has been downloaded, it’s ready for use in Power BI. The filename by default should be “Events-Export-zzzz-on-xxxx” where x is the date of the export and z is the location - feel free to rename this to anything else.
In Power BI, create a new report and press Get Data -> Text/CSV
Upload the CSV export and hit the transform data option.
Right-click the Query under Queries and go to the Advanced Editor option. The Query will be named the same as the uploaded CSV name.
This opens up a Power Query window which allows code to transform the data for us. Below is a Power Query code that will transform the columns automatically for use in the report.
This code expands out the 'impact_patterns' column (see Impact Patterns in our technical documentation for more information) and filters it to accommodation and actual attendance distribution. It renames some essential columns. It also transforms some column formats for easier use in reporting. It is an involved process with multiple steps - the Power Query below is the final output of this multi-stage transformation.
Paste the Power Query below after the first existing 4 lines, after the "Changed Type" step, replacing everything from the existing “in” down.
As you can see we start with a comma to add on to the existing line, its positioning can be changed to the end of the existing line if you prefer, but its function is the same. The final pasted code should look something like this:
Hit Done. Hit Close & Apply and wait for the data transformation to finish processing.
After completing these steps, we have successfully loaded a CSV extract of PHQ Events data into Power BI ready for use in visuals and reporting. See the Building the Report step below for the next steps.
Snowflake Connection Method
To connect using Snowflake you will need the following knowledge about your organization's Snowflake environment. Ask your Snowflake Administrators for these settings or refer to Snowflake's official documentation links for the variables below:
Server Name: Usually in format <account_name>.snowflakecomputing.com
Warehouse: A warehouse is what you use to run queries. See which are available to you using the SHOW WAREHOUSES function.
Table Location: the Database, Schema, and Table Name for your PredictHQ Data Share Events table.
To start, navigate to the Snowflake data connection via:
Get Data -> More -> Database -> Snowflake
The Server and Warehouse info we gathered above will need to be entered at this step. It should look something like the below, replacing square bracket placeholder variables for your Server and Warehouse info.
Expand Advanced options and scroll down.
Fill the Database where the PredictHQ Events table lies in your Snowflake structure (case sensitive). Paste this SQL in the SQL box after substituting the Schema and Table Name. This code assumes no columns have been renamed:
This code is performing the data transformation and filtering in code. It filters to the parameters laid out in the Example Parameters for this Guide section, and transforms some columns we will be using for ease of use in the report. The most important transformed column is the 'impact_patterns' column which we use to find the attendance spread per day across a multi-day event. See Impact Patterns in our technical documentation for more information.
This is what it should look like when filled in - with all square bracket placeholder text in the FROM condition replaced.
Click "OK". Click "Load Data" on the next screen.
Connection settings: DirectQuery is recommended for constant database connection. Import for one-off import of data from the database.
After completing these steps, we have successfully connected Events data from Snowflake into Power BI ready for use in visuals and reporting and automatic data refreshes. See the Building the Report step below for the next steps.
Connecting to other Data Warehouses
See Loading Event Data into a Data Warehouse for an example of how to load event data into Google BigQuery or other data warehouses. See this guide on how to connect PowerBI to Google BigQuery.
API Connection Method
PredictHQ has a few APIs that can be used to build reports, for this example, we will stick to the Events API. Starting this process assumes a PredictHQ API access token has been created by following the API Quickstart guide.
Power BI will connect using the URL from the Events API: https://api.predicthq.com/v1/events/ but, query parameters must be added to this URL for the Power BI connection, in line with the parameters outlined in the Example Parameters for this Guide.
Following these parameters and the Events API documentation we will end up with a URL string like the one below:
Note: Scope uses the Place ID (geonames ID) for San Francisco (see our tech docs for info on Place ID). If you were looking for events happening around a business location you would use the within parameter with the latitude and longitude of your business location and the radius from the suggested radius API. Time zone parameter (active.tz) filters results based on that given time zone. Limit parameter allows for more results returned per “page” which allows for faster loading, rather than the default 10 per page.
See also our filtering guide for details on how to query the Events API for events impacting your locations.
With this API query string, event data can start to be loaded into Power BI.
First, start a new report. Then select Get Data -> Web
Choose the Advanced tab, not the Basic default. Because the PredictHQ API is Bearer token authorized, the Advanced tab must be selected to include the API Access Token request header.
Add the HTTP request header with the following information:
URL parts: our created Events API URL from the above:
https://api.predicthq.com/v1/events/?active.gte=2024-01-01&active.lt=2024-04-01&active.tz=America/Los_Angeles&category=community,conferences,concerts,expos,festivals,performing-arts,sports&state=active,predicted&phq_attendance.gte=1&place.scope=5391959&limit=500
HTTP request header parameters:
Put
Authorization
in the first fieldPut
Bearer api_token
in the field on the right of the first field withAuthorization
. where[api_token]
will be replaced with your PHQ API Access Token. Just replace ‘[api_token]
’ with your actual API Access Token. Leave the ‘Bearer ’ part in
The filled-out information should look like this:
After clicking “OK”, the Data Transformation page will open where data shaping options can be made before building the report.
Rename the Query to something relevant, as it defaults to the connection URL string parameters and we need a string to reference in the Power Query code below. We recommend the Query be renamed to “PredictHQ Connection”.
In order to transform the columns, open Power Query and paste the code below to format and expand some columns for easy use. To do this, go to the Advanced Editor for this Query, right click on the Query name under Queries and click Advanced Editor:
Replace the entire existing Power Query code with the one below, changing the 2 lines (Lines 4 and 8) that refer to ‘[api_token]’ with the PHQ API Access Token used previously.
This code expands out the 'impact_patterns' column (see Impact Patterns in our technical documentation for more information) and filters it to accommodation and actual attendance distribution. It renames some essential columns. It also accounts for our API pagination, making sure all results are returned. It is an involved process with multiple steps - the Power Query below is the final output of this multi-stage transformation.
If you renamed the Query to something other than "PredictHQ Connection" as per our steps above, you must also rename the reference in lines 2 and 11.
Click Close & Apply and wait for the data transformation to finish processing through multiple API pages.
After this step the data is now ready to start building a report with, as it has been successfully loaded and transformed in Power BI. A template of this API Connection report pre-built is available at the end in the Example API Connection Report Template section.
Guide to Building the Report
Using either of the two methods above will get PredictHQ Events data loaded and transformed in the same format ready to be used in a report. Not all the columns were transformed, just the ones used in this guide.
This guide creates a connected chart and table that covers the defined time period and shows the attendance per day in the chosen location - in the example San Francisco city as a whole. The chart breaks up attendance per day for the visualization, but the table shows event details and attendance in full, not split by day. Date results are shown in UTC, use the "_local" date columns for the local date.
To begin, insert a blank chart and table visualization using the Insert -> New Visual tab options, with the chart on top taking up half the screen, and the table on the bottom filling the other half.
Group as one (shift-click both boxes, right-click on one of them, and click the Group -> Group option).
Before the next step of filling in the chart and table, add Filters for the page: drag the 'date_local' field from the Data tab on the right to the “Filters on this page” section under Filters.
Change the drop-down to Advanced filtering and add the following:
“Is on or after” start of the selected date range AND “is before” the day after the date range ends - click “apply filter” in the bottom right of the filter menu. In the example, those dates are anything on or after the 1st of January 2024 and anything before 1st of April 2024.
Now fill the chart axis. The X-axis gets filled with the date_local field The Y-axis gets filled with the attendance_per_day field (this should default to a SUM which is correct)
For the table, drag these fields over and resize the columns as needed to fit everything:
Table: id, title, category, phq_attendance, start_local, end_local
For all fields that involve a date (date_local, start_local, end_local), remove the default Date Hierarchy format to get the actual date showing. Use the dropdown in the Visualizations column and select the field name instead of “Date Hierarchy”. If Date Hierarchy is preferred, feel free to leave this as is.
For phq_attendance in the table use the drop down to remove the summary, this summary isn’t actually grouping anything so is an unnecessary default that should be removed. Note that we only want to stop the summarization in the table, leave the chart as is.
Rename the chart title by clicking on the chart and going to the Visualizations tab -> General -> Title. Let’s rename it to “Event Attendance per day in San Francisco”.
Click on the "phq_attendance" column in the table twice to sort by highest to lowest attendance.
The final result should look like the following:
The picture above shows how this analysis can be used; by clicking on a spike (or any period on the chart) the report shows the events active during that period. The table data does not show the attendance per day like the chart, but the overall attendance of the event's full duration.
A useful addition to this basic view could be a drill down on the table by adding a new table visual to the group that has the 'id', 'date_local', and 'attendance_per_day' columns, showing how the attendance of an event has been spread out over multiple days (if it is a multi-day event). For more understanding of multi-day events, see our Working with Multi-day Events documentation.
Customers can add their own data to this chart to compare peaks and troughs of attendance vs sales in a basic comparison report. For deeper analysis into these kinds of reports, we suggest using our Beam functionality to provide a deeper insight as to which types of events impact demand, as the Events API will only give a high-level view of the story without any additional analysis from PredictHQ to provide more in-depth information.
Example API Connection Report Template
Below is a downloadable Power BI template that will automatically create the example report used throughout this guide, using the API Connection method.
Upon opening the template, you will be prompted to enter an API Access Token. Inputting this token will enable the report to automatically populate and build according to the parameters set forth in this guide. Please wait 10-20 seconds between each step as data populates and data runs in the background.
Once the data connection has loaded for a bit you might be prompted for a connection method screen like below. Please select Anonymous and click Connect.
If there are any issues with this template please refer to the API Connection Method and ensure all settings match with those steps.
Example Report:
Last updated