Introducing the Forecasts API — Event-driven forecasts for precise demand planning. Fast, accurate, and easy to run.
Explore Now
LogoLogo
Visit websiteWebAppGet DemoTry for Free
  • Introduction
  • Swagger UI
  • Loop
  • System Status
  • Getting Started
    • API Quickstart
    • Data Science Notebooks
    • PredictHQ Data
      • Data Accuracy
      • Event Categories
        • Attendance-Based Events
        • Non-Attendance-Based Events
        • Unscheduled Events
        • Live TV Events
      • Labels
      • Entities
      • Ranks
        • PHQ Rank
        • Local Rank
        • Aviation Rank
      • Predicted Attendance
      • Predicted End Times
      • Predicted Event Spend
      • Predicted Events
      • Predicted Impact Patterns
    • Guides
      • Geolocation Guides
        • Overview
        • Searching by Location
          • Find Events by Latitude/Longitude and Radius
          • Find Events by Place ID
          • Find Events by IATA Code
          • Find Events by Country Code
          • Find Events by Placekey
          • Working with Location-Based Subscriptions
        • Understanding Place Hierarchies
        • Working with Polygons
        • Join Events using Placekey
      • Date and Time Guides
        • Working with Recurring Events
        • Working with Multi-day and Umbrella Events
        • Working with Dates, Times and Timezones
      • Events API Guides
        • Understanding Relevance Field in Event Results
        • Attendance-Based Events Notebooks
        • Non-Attendance-Based Events Notebooks
        • Severe Weather Events Notebooks
        • Academic Events Notebooks
        • Working with Venues Notebook
      • Features API Guides
        • Increase Accuracy with the Features API
        • Get ML Features
        • Demand Forecasting with Event Features
      • Forecasts API Guides
        • Getting Started with Forecasts API
        • Understanding Forecast Accuracy Metrics
        • Troubleshooting Guide for Forecasts API
      • Live TV Event Guides
        • Find Broadcasts by County Place ID
        • Find Broadcasts by Latitude and Longitude
        • Find all Broadcasts for an Event
        • Find Broadcasts for Specific Sport Types
        • Aggregating Live TV Events
        • Live TV Events Notebooks
      • Beam Guides
        • ML Features by Location
        • ML Features by Group
      • Demand Surge API Guides
        • Demand Surge Notebook
      • Guide to Protecting PredictHQ Data
      • Streamlit Demo Apps
      • Guide to Bulk Export Data via the WebApp
      • Industry-Specific Event Filters
      • Tutorials
        • Filtering and Finding Relevant Events
        • Improving Demand Forecasting Models with Event Features
        • Using Event Data in Power BI
        • Using Event Data in Tableau
        • Connecting to PredictHQ APIs with Microsoft Excel
        • Loading Event Data into a Data Warehouse
        • Displaying Events in a Heatmap Calendar
        • Displaying Events on a Map
    • Tutorials by Use Case
      • Demand Forecasting with ML Models
      • Dynamic Pricing
      • Inventory Management
      • Workforce Optimization
      • Visualization and Insights
  • Integrations
    • Integration Guides
      • Keep Data Updated via API
      • Integrate with Beam
      • Integrate with Loop Links
    • Third-Party Integrations
      • Receive Data via Snowflake
        • Example SQL Queries for Snowflake
        • Snowflake Data Science Guide
          • Snowpark Method Guide
          • SQL Method Guide
      • Receive Data via AWS Data Exchange
        • CSV/Parquet Data Structure for ADX
        • NDJSON Data Structure for ADX
      • Integrate with Databricks
      • Integrate with Tableau
      • Integrate with a Demand Forecast in PowerBI
      • Google Cloud BigQuery
    • PredictHQ SDKs
      • Python SDK
      • Javascript SDK
  • API Reference
    • API Overview
      • Authenticating
      • API Specs
      • Rate Limits
      • Pagination
      • API Changes
      • Attribution
      • Troubleshooting
    • Events
      • Search Events
      • Get Event Counts
    • Broadcasts
      • Search Broadcasts
      • Get Broadcasts Count
    • Features
      • Get ML Features
    • Forecasts
      • Models
        • Create Model
        • Update Model
        • Replace Model
        • Delete Model
        • Search Models
        • Get Model
        • Train Model
      • Demand Data
        • Upload Demand Data
        • Get Demand Data
      • Forecasts
        • Get Forecast
      • Algorithms
        • Get Algorithms
    • Beam
      • Create an Analysis
      • Upload Demand Data
      • Search Analyses
      • Get an Analysis
      • Update an Analysis
      • Partially Update an Analysis
      • Get Correlation Results
      • Get Feature Importance
      • Refresh an Analysis
      • Delete an Analysis
      • Analysis Groups
        • Create an Analysis Group
        • Get an Analysis Group
        • Search Analysis Groups
        • Update an Analysis Group
        • Partially Update an Analysis Group
        • Refresh an Analysis Group
        • Delete an Analysis Group
        • Get Feature Importance for an Analysis Group
    • Demand Surge
      • Get Demand Surges
    • Suggested Radius
      • Get Suggested Radius
    • Saved Locations
      • Create a Saved Location
      • Search Saved Locations
      • Get a Saved Location
      • Search Events for a Saved Location
      • Update a Saved Location
      • Delete a Saved Location
    • Loop
      • Loop Links
        • Create a Loop Link
        • Search Loop Links
        • Get a Loop Link
        • Update a Loop Link
        • Delete a Loop Link
      • Loop Settings
        • Get Loop Settings
        • Update Loop Settings
      • Loop Submissions
        • Search Submitted Events
      • Loop Feedback
        • Search Feedback
    • Places
      • Search Places
      • Get Place Hierarchies
  • WebApp Support
    • WebApp Overview
      • Using the WebApp
      • API Tools
      • Events Search
      • How to Create an API Token
    • Getting Started
      • Can I Give PredictHQ a Go on a Free Trial Basis?
      • How Do I Get in Touch if I Need Help?
      • Using AWS Data Exchange to Access PredictHQ Events Data
      • Using Snowflake to Access PredictHQ Events Data
      • What Happens at the End of My Free Trial?
      • Export Events Data from the WebApp
    • Account Management
      • Managing your Account Settings
      • How Do I Change My Name in My Account?
      • How Do I Change My Password?
      • How Do I Delete My Account?
      • How Do I Invite People Into My Organization?
      • How Do I Log In With My Google or LinkedIn Account?
      • How Do I Update My Email Address?
      • I Signed Up Using My Google/LinkedIn Account, but I Want To Log In With My Own Email
    • API Plans, Pricing & Billing
      • Do I Need To Provide Credit Card Details for the 14-Day Trial?
      • How Do I Cancel My API Subscription?
      • Learn About Our 14-Day Trial
      • What Are the Definitions for "Storing" and "Caching"?
      • What Attribution Do I Have To Give PredictHQ?
      • What Does "Commercial Use" Mean?
      • What Happens If I Go Over My API Plan's Rate Limit?
    • FAQ
      • How Does PredictHQ Support Placekey?
      • Using Power BI and Tableau With PredictHQ Data
      • Can I Download a CSV of Your Data?
      • Can I Suggest a New Event Category?
      • Does PredictHQ Have Historical Event Data?
      • Is There a PredictHQ Mobile App?
      • What Are Labels?
      • What Countries Do You Have School Holidays For?
      • What Do The Different Event Ranks Mean?
      • What Does Event Visibility Window Mean?
      • What Is the Difference Between an Observed Holiday and an Observance?
    • Tools
      • Is PHQ Attendance Available for All Categories?
      • See Event Trends in the WebApp
      • What is Event Trends?
      • Live TV Events
        • What is Live TV Events?
        • Can You Access Live TV Events via the WebApp?
        • How Do I Integrate Live TV Events into Forecasting Models?
      • Labels
        • What Does the Closed-Doors Label Mean?
    • Beam (Relevancy Engine)
      • An Overview of Beam - Relevancy Engine
      • Creating an Analysis in Beam
      • Uploading Your Demand Data to Beam
      • Viewing the List of Analysis in Beam
      • Viewing the Table of Results in Beam
      • Viewing the Category Importance Information in Beam
      • Feature Importance With Beam - Find the ML Features to Use in Your Forecasts
      • Beam Value Quantification
      • Exporting Correlation Data With Beam
      • Getting More Details on a Date on the Beam Graph
      • Grouping Analyses in Beam
      • Using the Beam Graph
      • Viewing the Time Series Impact Analysis in Beam
    • Location Insights
      • An Overview of Location Insights
      • How to Set a Default Location
      • How Do I Add a Location?
      • How Do I Edit a Location?
      • How Do I Share Location Insights With My Team?
      • How Do I View Details for One Location?
      • How Do I View My Saved Locations as a List?
      • Search and View Event Impact in Location Insights
      • What Do Each of the Columns Mean?
      • What Is the Difference Between Center Point & Radius and City, State, Country?
Powered by GitBook

PredictHQ

  • Terms of Service
  • Privacy Policy
  • GitHub

© 2025 PredictHQ Ltd

On this page
  • Overview
  • Scenario: Tom’s Pizzeria
  • Data Load Methods
  • Table Data Structure
  • JSONL file Upload Method
  • Search Events in the WebApp and Export JSONL
  • Create a Table via JSONL Upload
  • API Connection Method
  • Table Creation Code
  • Extract from Events API
  • Transform API Responses
  • Load Data into the Table
  • Keep Event Data Updated
  • Querying the Loaded Data
  • Utilizing PredictHQ Data in Your Data Warehouse

Was this helpful?

  1. Getting Started
  2. Guides
  3. Tutorials

Loading Event Data into a Data Warehouse

Take PredictHQ API data and load it into a data warehouse, in this example, GCP BigQuery.

PreviousConnecting to PredictHQ APIs with Microsoft ExcelNextDisplaying Events in a Heatmap Calendar

Last updated 1 month ago

Was this helpful?

This guide outlines the process for integrating PredictHQ's Events data into your data lake. It is common for customers to want to get data from our APIs and store it in their data lake. To support this, we have created this guide to help you with the integration process

We use Google Cloud Platform (GCP) as our primary example, but the methods for structuring the data table and making PredictHQ Events API calls can be applied to various data management systems. This guide is designed for users with a basic understanding of GCP or similar data warehousing solutions and the necessary permissions to use them.

Please note that this guide focuses exclusively on integrating PredictHQ’s Events API data and does not cover the Features API.

If you are using Snowflake or AWS Data Exchange (ADX) for your data lake PredictHQ integrates with those products. See and for details.

Overview

This guide details the data structure of our Events API within a data warehouse environment and outlines two methodologies for creating an Events data table in GCP.

Throughout this guide, we will use the fictional example from our , "Tom’s Pizzeria". This scenario will give us a practical illustration of the methods described and demonstrate how to tailor API queries to specific business needs. For an understanding of how parameters were selected for this example, refer to the .

Requirements:

  • Access to PredictHQ Data

    • JSONL: Requires a PredictHQ account. if you don’t already have one.

    • API: An is necessary for accessing the data programmatically.

  • GCP permissions:

    • Ensure you have "BigQuery Data Owner" and "BigQuery User" permissions over the BigQuery environment.

    • A GCP BigQuery Service Account and its corresponding JSON key are required. for details.

Scenario: Tom’s Pizzeria

In this guide, we'll explore a hypothetical use case for “Tom’s Pizzeria”, a chain of restaurants with locations throughout the US headquartered in Seattle, Washington. Tom is interested in understanding how local events might influence his business operations and customer flow. Tom’s inventory system, website, and tools run off GCP. Tom wants to use event data in his staffing and inventory management systems to help anticipate the demand caused by events. He wants to show upcoming events near stores to his staff. To do this he needs to download events into his data lake.

Tom's Data Parameters:

  • Attended Categories: community, conferences, concerts, expos, festivals, performing-arts, sports.

  • Date Range: Events active within the range June 1, 2024, to June 30, 2024.

  • Event Rank: Events with a rank greater than 30 indicate a significant likelihood of impacting local traffic and attendance.

  • Event Status: Both ‘active’ and ‘predicted’ events to ensure a comprehensive overview.

For the purposes of this guide, we have limited the example load to a single city for Tom to filter on. Users may bring through as much data as they have access to or require when doing an actual load. We find with data warehouse customers they may pull down all data they have access to into their data warehouse and then query it for relevant locations and data from their applications.

Data Load Methods

There are several methods available for integrating PredictHQ data with GCP BigQuery or other data warehouse solutions. This guide outlines two primary approaches, both compatible with each other’s data structure. Regardless of the method chosen for the initial data load, ongoing updates will require API code.

API Connection Method: Connect to the PredictHQ Events API to download the latest data from the API into your data lake. This guide takes you through using the Python API connection method for GCP but similar steps would apply to other programming languages. Events data is dynamic with events changing all the time as events are canceled, postponed, shift location, or have other details change. Also, new events are being announced all the time. We recommend having a process that updates your data on a regular frequency, such as daily.

This guide will walk you through the initial data load, providing you with the tools and understanding necessary to create a robust connection to GCP. You'll learn how to structure your data effectively within your warehouse, setting the stage for potential automation and regular updates that you can implement as needed.

Table Data Structure

Regardless of the method chosen for initial data creation and loading, the table structure remains consistent. This ensures that both methods are interchangeable, delivering data in a uniform format. The data structure for the table is detailed below:

Field Name
Datatype
Mode

id

STRING

REQUIRED

parent_event

JSON

NULLABLE

title

STRING

NULLABLE

alternate_titles

JSON

NULLABLE

description

STRING

NULLABLE

category

STRING

NULLABLE

labels

JSON

NULLABLE

phq_labels

JSON

NULLABLE

rank

INTEGER

NULLABLE

local_rank

INTEGER

NULLABLE

phq_attendance

INTEGER

NULLABLE

entities

JSON

NULLABLE

duration

INTEGER

NULLABLE

start

TIMESTAMP

NULLABLE

start_local

TIMESTAMP

NULLABLE

end

TIMESTAMP

NULLABLE

end_local

TIMESTAMP

NULLABLE

predicted_end

TIMESTAMP

NULLABLE

predicted_end_local

TIMESTAMP

NULLABLE

updated

TIMESTAMP

NULLABLE

first_seen

TIMESTAMP

NULLABLE

timezone

STRING

NULLABLE

location

JSON

NULLABLE

geo

JSON

NULLABLE

scope

STRING

NULLABLE

country

STRING

NULLABLE

place_hierarchies

JSON

NULLABLE

state

STRING

NULLABLE

private

BOOLEAN

NULLABLE

impact_patterns

JSON

NULLABLE

predicted_event_spend

INTEGER

NULLABLE

predicted_event_spend_industries

JSON

NULLABLE

JSONL file Upload Method

This method is recommended for large data uploads, as it efficiently manages the transfer of large volumes of data better than direct API calls.

Search Events in the WebApp and Export JSONL

Typically you may download all the data you have access to into your data warehouse. In that case, run a search for all events and download them.

To do that, we searched for Seattle in the WebApp for the relevant period, status, and attended categories.

Create a Table via JSONL Upload

Setting up a BigQuery table with a JSONL file is a straightforward process, provided you have the necessary permissions on GCP. Before beginning, ensure you are clear about which dataset will host the data. Here are the steps to create the table once you have found and highlighted the dataset in GCP BigQuery:

  1. Click Create Table: Navigate to the dataset you wish to create the table in, click the hamburger menu and select “Create Table”.

  1. Select the File Location: Select the JSONL export that you have downloaded somewhere on your computer.

  2. Name the Table: Give the table about to be created a name that suits

  1. Advanced Options: Expand the Advanced Options and tick the checkbox “Unknown Values”. This setting allows the system to gracefully handle missing information in specific columns of some records, ensuring that rows with incomplete data are not rejected or throw errors during the upload process.

  2. Create the Table: Click the "Create Table" button to finalize the creation.

API Connection Method

Table Creation Code

  • SERVICE_ACCOUNT_JSON: This is your service account JSON key, which is typically stored in a secure file. If your organization uses a different method to handle service account keys, please modify the code accordingly.

  • dataset_id: Specify whether this is a new dataset or an existing one in which you want to place this table.

  • table_id: Determine a name for your new PredictHQ data table in BigQuery.

It is advisable to manage this table creation code separately from other data processing scripts to maintain clarity and ease of updates.

from google.cloud import bigquery

#Set Service Account Key connection
SERVICE_ACCOUNT_JSON='~/file_location/service_account_key.json' 

# Set BigQuery variables
dataset_id = "[dataset_name]"
table_id = "[table_name]"

# Initialize the BigQuery client
client = bigquery.Client.from_service_account_json(SERVICE_ACCOUNT_JSON)

# Manually create schema
schema = [
    bigquery.SchemaField("id", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("parent_event", "JSON"),
    bigquery.SchemaField("title", "STRING"),
    bigquery.SchemaField("alternate_titles", "JSON"),
    bigquery.SchemaField("description", "STRING"),
    bigquery.SchemaField("category", "STRING"),
    bigquery.SchemaField("labels", "JSON"),
    bigquery.SchemaField("phq_labels", "JSON"),
    bigquery.SchemaField("rank", "INTEGER"),
    bigquery.SchemaField("local_rank", "INTEGER"),
    bigquery.SchemaField("phq_attendance", "INTEGER"),
    bigquery.SchemaField("entities", "JSON"),
    bigquery.SchemaField("duration", "INTEGER"),
    bigquery.SchemaField("start", "TIMESTAMP"),
    bigquery.SchemaField("start_local", "TIMESTAMP"),
    bigquery.SchemaField("end", "TIMESTAMP"),
    bigquery.SchemaField("end_local", "TIMESTAMP"),
    bigquery.SchemaField("predicted_end", "TIMESTAMP"),
    bigquery.SchemaField("predicted_end_local", "TIMESTAMP"),
    bigquery.SchemaField("updated", "TIMESTAMP"),
    bigquery.SchemaField("first_seen", "TIMESTAMP"),
    bigquery.SchemaField("timezone", "STRING"),
    bigquery.SchemaField("location", "JSON"),
    bigquery.SchemaField("geo", "JSON"),
    bigquery.SchemaField("scope", "STRING"),
    bigquery.SchemaField("country", "STRING"),
    bigquery.SchemaField("place_hierarchies", "JSON"),
    bigquery.SchemaField("state", "STRING"),
    bigquery.SchemaField("private", "BOOLEAN"),
    bigquery.SchemaField("impact_patterns", "JSON"),
    bigquery.SchemaField("predicted_event_spend", "INTEGER"),
    bigquery.SchemaField("predicted_event_spend_industries", "JSON")
]

#basic create code with no error handling
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # Make an API request to create the table.
print("Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id))

This script sets up the initial table structure within BigQuery, providing a foundation for subsequent data loads.

Extract from Events API

Before initiating the script, ensure you have configured the following prerequisites:

  • phq_access_token: Insert your PredictHQ access token here. For enhanced security, consider referencing the token indirectly if you prefer not to hard-code it into your script.

  • SERVICE_ACCOUNT_JSON: This is your Google Cloud Platform service account JSON key, which should be securely stored. Adjust the access method in the script if your organization's practices differ.

  • dataset_id: Designate whether this is a new or existing dataset in which you plan to house the PredictHQ data.

  • table_id: Assign a name to your new PredictHQ data table in BigQuery.

  • params: Modify these parameters as needed to align with the data you intend to extract from PredictHQ.

import time
import requests
import json
from google.cloud import bigquery

base_url = "https://api.predicthq.com/v1/events"
access_token = "[phq_phq_access_token]"  
dataset_id = "[dataset_name]"
table_id = "[table_name]" 
SERVICE_ACCOUNT_JSON='~/file_location/service_account_key.json'

headers = {
    "Authorization": f"Bearer {access_token}",  
    "Accept": "application/json"
}

params = {      # adjust these parameters to correspond to the data you want to pull down
    "categories": "community,conferences,concerts,expos,festivals,performing-arts,sports",
    "place.scope": "5809844",     #Seattle geonames ID
    "active.gte": "2024-06-01",
    "active.lte": "2024-06-30",
    "state": "active,predicted",
    "limit": 500       # Maximum number of events per page
}

# Initialize the BigQuery client
client = bigquery.Client.from_service_account_json(SERVICE_ACCOUNT_JSON)

def fetch_all_pages(base_url, headers, params):
    print("calling Events API")
    
    results = []
    next_url = base_url  # Start with the base URL

    while next_url:
        response = requests.get(next_url, headers=headers, params=params)
        params = None  # After the first request, prevent re-sending initial parameters
        if response.status_code == 200:
            data = response.json()
            results.extend(data.get('results', []))
            next_url = data.get('next')  # Update the next_url from the 'next' field in the response
        else:
            print(f"Failed to fetch data: {response.status_code}, Message: {response.text}")
            break

    print("total events fetched", len(results))

    return results

events_data = fetch_all_pages(base_url, headers, params)

Transform API Responses

To ensure seamless compatibility with the BigQuery table structure, we run a transformation function applied to the data retrieved from the Events API in the previous step. The function performs essential adjustments, including:

  • Handling null or empty responses to maintain data integrity.

  • Restricting output to only the columns defined in our table structure ensures consistency and relevance.

  • Formatting complex fields to JSON, suitable for BigQuery ingestion.

Append this transformation code to the extraction code from above. After successful data extraction and transformation, the processed data is ready for loading into BigQuery.

def prepare_data_for_bigquery(events_data):
    schema_fields = set([
        'id', 'parent_event', 'title', 'alternate_titles', 'description', 'category', 'labels',  
        'phq_labels', 'rank', 'local_rank', 'phq_attendance', 'entities', 'duration', 'start', 
        'start_local', 'end', 'end_local', 'updated', 'predicted_end', 'predicted_end_local', 'first_seen', 
        'timezone', 'location', 'geo', 'scope', 'country', 'place_hierarchies', 'state', 'private', 
        'impact_patterns', 'predicted_event_spend', 'predicted_event_spend_industries'
    ])

    # Fields that are to be in JSON format
    json_fields = {'phq_labels', 'entities', 'geo', 'location', 'impact_patterns', 
                   'predicted_event_spend_industries', 'labels', 'place_hierarchies', 
                   'parent_event', 'alternate_titles'} 

    cleaned_events = []

    for event in events_data:
        filtered_event = {}

        for key in schema_fields:
            if key in event and event[key] is not None:
                # If key is in json_fields, serialize it as JSON string
                if key in json_fields:
                    # Ensure the data is properly structured as a JSON string
                    filtered_event[key] = json.dumps(event[key])
                else:
                    filtered_event[key] = event[key]
            else:
                # Provide default None for missing fields, or an empty JSON structure as appropriate
                filtered_event[key] = 'null' if key in json_fields else None

        cleaned_events.append(filtered_event)

    return cleaned_events

transformed_events_data = prepare_data_for_bigquery(events_data)

Load Data into the Table

Below is the code block that you need to append to the end of the extraction and transformation script. It includes a basic retry mechanism to handle occasional upload failures, which is common in network-related operations. However, depending on your requirements for reliability and data integrity, you might consider implementing a more advanced retry logic.

def insert_data_with_retry(table_ref, data, max_attempts=3):
    for attempt in range(max_attempts):
        errors = client.insert_rows_json(table_ref, data)
        if not errors:
            print("Data inserted successfully.")
            return
        else:
            print(f"Attempt {attempt + 1} failed with errors: {errors}")
            time.sleep(2)  # Wait for 2 seconds before the next retry

    print("Failed to insert data after several attempts.")

table_ref = client.dataset(dataset_id).table(table_id)

#Run insert data with retry
insert_data_with_retry(table_ref, transformed_events_data)

Below is the full code where we have combined all these 3 code parts into one executable.

Full Code
import time
import requests
import json
from google.cloud import bigquery

bse_url = "https://api.predicthq.com/v1/events"
access_token = "[phq_phq_access_token]"  
dataset_id = "[dataset_name]"
table_id = "[table_name]" 
SERVICE_ACCOUNT_JSON='~/file_location/service_account_key.json'


headers = {
    "Authorization": f"Bearer {access_token}",  
    "Accept": "application/json"
}

params = {      # adjust these parameters to correspond to the data you want to pull down
    "categories": "community,conferences,concerts,expos,festivals,performing-arts,sports",
    "place.scope": "5809844",     #Seattle geonames ID
    "active.gte": "2024-06-01",
    "active.lte": "2024-06-30",
    "state": "active,predicted",
    "limit": 500       # Maximum number of events per page
}

# Initialize the BigQuery client
client = bigquery.Client.from_service_account_json(SERVICE_ACCOUNT_JSON)

def fetch_all_pages(base_url, headers, params):
    print("calling Events API")
    
    results = []
    next_url = base_url  # Start with the base URL

    while next_url:
        response = requests.get(next_url, headers=headers, params=params)
        params = None  # After the first request, prevent re-sending initial parameters
        if response.status_code == 200:
            data = response.json()
            results.extend(data.get('results', []))
            next_url = data.get('next')  # Update the next_url from the 'next' field in the response
        else:
            print(f"Failed to fetch data: {response.status_code}, Message: {response.text}")
            break

    print("total events fetched", len(results))

    return results

events_data = fetch_all_pages(base_url, headers, params)

def prepare_data_for_bigquery(events_data):
    schema_fields = set([
        'id', 'parent_event', 'title', 'alternate_titles', 'description', 'category', 'labels',  
        'phq_labels', 'rank', 'local_rank', 'phq_attendance', 'entities', 'duration', 'start', 
        'start_local', 'end', 'end_local', 'updated', 'predicted_end', 'predicted_end_local', 'first_seen', 
        'timezone', 'location', 'geo', 'scope', 'country', 'place_hierarchies', 'state', 'private', 
        'impact_patterns', 'predicted_event_spend', 'predicted_event_spend_industries'
    ])

    # Fields that are to be in JSON format
    json_fields = {'phq_labels', 'entities', 'geo', 'location', 'impact_patterns', 
                   'predicted_event_spend_industries', 'labels', 'place_hierarchies', 
                   'parent_event', 'alternate_titles'} 

    cleaned_events = []

    for event in events_data:
        filtered_event = {}

        for key in schema_fields:
            if key in event and event[key] is not None:
                # If key is in json_fields, serialize it as JSON string
                if key in json_fields:
                    # Ensure the data is properly structured as a JSON string
                    filtered_event[key] = json.dumps(event[key])
                else:
                    filtered_event[key] = event[key]
            else:
                # Provide default None for missing fields, or an empty JSON structure as appropriate
                filtered_event[key] = 'null' if key in json_fields else None

        cleaned_events.append(filtered_event)

    return cleaned_events

transformed_events_data = prepare_data_for_bigquery(events_data)

def insert_data_with_retry(table_ref, data, max_attempts=3):
    for attempt in range(max_attempts):
        errors = client.insert_rows_json(table_ref, data)
        if not errors:
            print("Data inserted successfully.")
            return
        else:
            print(f"Attempt {attempt + 1} failed with errors: {errors}")
            time.sleep(2)  # Wait for 2 seconds before the next retry

    print("Failed to insert data after several attempts.")

table_ref = client.dataset(dataset_id).table(table_id)

#Run insert data with retry
insert_data_with_retry(table_ref, transformed_events_data)

Keep Event Data Updated

Event data is dynamic and events can change frequently. This happens when events are canceled, posted, or have details updated. Also, PredictHQ’s pipeline is constantly fetching new events so new future events are always being added and can be downloaded via the API.

We recommend running a daily update process (such as a cron job) that calls the PredictHQ API and updates the data in your data lake.

Querying the Loaded Data

Once the data is successfully loaded into BigQuery, you can begin querying it to derive insights relevant to your use case. This section provides an example of a BigQuery SQL query tailored to Tom’s scenario as outlined earlier. While the initial data load might have utilized specific filters via the WebApp or API parameters, it's often useful to perform additional queries directly within BigQuery. This capability is particularly valuable if you have loaded a broader dataset and need to perform dynamic or complex filtering post-load.

Some common fields to query are:

  • start and end: filter on start and end dates being in a specified time period. You might be looking at events in the next week or month or longer.

  • rank: is commonly used to filter out smaller events. Filter where rank is equal to or greater than a specific field to filter out smaller events.

Below is a sample BigQuery SQL query that aligns with the parameters specified for our example. This query filters events based on the categories, date range, event rank, and geographical proximity to Tom’s location.

This type of query is used to find all events around a location with a specified radius for a business location. For example a restaurant, hotel, store, parking garage, or any other business location. Once you have the data loaded into GCP you will want to find how events are impacting your locations. Use the type of query below for each location to get all the events around that location.

SELECT *
FROM `predicthq_dataset_test.phq_api_json_format`
WHERE category IN ('concerts','conferences','festivals','performing-arts')
  AND `start_local` >= '2024-06-01' --Change this to the date range you want to display events for - e.g. the next 30 days
  AND `end_local` <= '2024-06-30'
  AND rank >= 30
  AND ST_Distance(
      ST_GeogPoint(
          CAST(JSON_EXTRACT_SCALAR(geo, '$.geometry.coordinates[0]') AS FLOAT64),
          CAST(JSON_EXTRACT_SCALAR(geo, '$.geometry.coordinates[1]') AS FLOAT64)),
          ST_GeogPoint(-122.33, 47.60)  --Tom's store location in our example
      ) <= 2400 -- Approximately 1.48 miles in meters
	--When implementing this use the Suggested Radius API to find the radius for your location
  AND state IN ('active', 'predicted');

Visually this type of query allows you to pull all the events in a radius as shown in the image below:

location
lattitude
longitude
radius
radius_unit
date_start
date_end

store1-chicago

41.8131

-87.6586

4.11

mi

2023-07-01

2023-12-31

Hyde Park

51.50736

-0.16411

2.06

mi

2024-01-01

2024-03-31

store10-new-yor

40.73061

-73.93524

...

...

...

...

Using BigQuery for these queries ensures that you leverage powerful, scalable SQL analytics over large datasets, enabling responsive decision-making based on the latest event data available in your system.\

Utilizing PredictHQ Data in Your Data Warehouse

Having integrated PredictHQ's rich events data into your data warehouse, the opportunities to leverage this data are extensive. By now, you've successfully set up your data structure within Google Cloud Platform's BigQuery and have a solid understanding of the JSONL file Upload or the API Connection methods. Here’s how you can maximize the value of PredictHQ data within your organization:

1. Cross-referencing with Internal Datasets

Enhance the granularity and relevance of your internal analytics by cross-referencing PredictHQ events data with your own datasets. For instance, you can correlate sales data with event occurrences to analyze the impact of local events on sales performance. This cross-analysis can be crucial for demand forecasting and strategic planning.

2. Demand Analysis and Forecasting

PredictHQ data can significantly enhance your demand forecasting models, especially for businesses that are impacted by local events, such as retail, hospitality, and transportation. By understanding when significant events are happening, you can better predict and prepare for attendance surges or declines.

3. Building Tailored Reports

Use the data within BigQuery to create detailed reports and dashboards that monitor the effects of events on your business operations. These reports can provide actionable insights to business users across your organization, from marketing teams planning campaigns around major events to supply chain management preparing for increased activity.

4. Enhancing Customer Experience

Inform your customers about local events that might impact their experience with your service or product. For example, a transportation company could provide passengers with real-time updates about events that might affect travel times or service availability.

5. Event-Driven Marketing

Plan and execute marketing campaigns that align with upcoming events to capitalize on increased foot traffic or digital engagement. This targeted approach can improve marketing ROI by reaching audiences when they are most receptive.

By tapping into the detailed and predictive insights provided by PredictHQ, your business can not only anticipate the impact of external events but also strategize proactively to harness potential opportunities or mitigate risks. Whether through enhancing predictive analytics, refining customer engagement strategies, or driving operational efficiencies, PredictHQ's events data serves as a powerful tool in your data-driven decision-making arsenal.

For comprehensive details on selecting appropriate filters for this scenario, refer to our . This guide will help us understand which events could impact Tom's business and how to configure our data queries accordingly. For our load into GCP, we want to bring through a larger number of events and then filter down to a specific pizzeria location using BigQuery once loaded. See section for more.

Location: Bring through all of Seattle first, and Tom can filter for his locations once it’s in BigQuery using our .

JSONL File Upload Method: This method provides a straightforward, code-free approach to data upload, by from PredictHQ’s . We recommend using JSONL uploads for the initial population of your data lake, especially in cases where there is a substantial volume of data, such as multiple years of historical data. Subsequent updates should be managed through API calls to ensure the data remains current.

To locate and export the relevant event data into a JSONL file, we utilize the PredictHQ . This tool allows for precise querying of events based on specific criteria, ensuring that you retrieve only the most relevant information for your needs. For detailed instructions on optimizing your search with the WebApp, please refer to on the WebApp search capabilities.

In the context of our example for Tom's Pizzeria, they would be downloading events for all of the US and then querying for specific locations. Many customers may bulk load all the data they have access to by exporting it all and then importing it into their data lake. In this example, we’ll download events only for Seattle.

After configuring your filters and executing the search, simply select the Export option. For more details on exporting see the except select the JSONL file format instead of CSV. This JSONL can then be directly uploaded to your BigQuery setup, as detailed in the .

Manually Define Schema: This step involves specifying the schema details manually. You must accurately define each column, ensuring that the datatype and column names precisely match those in the . Any discrepancies in spelling or datatype will lead to errors during the upload process. While you have flexibility to modify the schema by adding or removing columns based on your specific data requirements, this guide focuses on the recommended fields we suggest including.

This method allows initializing your BigQuery table with a JSONL dataset suitable for bulk data uploads. However, it does not support ongoing data refreshes. See the section for advice on setting up a regularly updated table after initialization.

This method works well for smaller datasets as initial upload and must be used for continuous updates to the table as recommended in our section.

To establish the required data structure in BigQuery, you can utilize the following Python script. This script explicitly defines the columns and data types as laid out in our , configuring them precisely as needed for your BigQuery table. Before executing this script, ensure you have the following prerequisites:

This section outlines the process of querying the PredictHQ Events API using Python, using the example for Tom's Pizzeria. This approach is designed to ensure the data extracted is directly relevant to Tom’s operational needs. The methodology and rationale behind the data extraction parameters used below are explained in the section of this guide.

This Python script is crafted to fetch the necessary data from the Events API. This method loops through the paginated response from the API and pulls all results and columns. The covers transforming this data before we push for upload. For more details on any of the parameters we’ve used in the code below, see our documentation, keeping in mind our to pull attended results in Seattle for Tom.

Once the data has been successfully extracted from the API and transformed to meet our , the next step involves loading this data into the previously established BigQuery table. This process utilizes Python code integrated with the BigQuery API to load the data.

With this step completed, the data from PredictHQ Events API is now populated into your BigQuery table and is ready for analytical querying. This setup initially caters to a single load of data; however, to maintain the relevance and timeliness of your data, consider adapting this script to periodically update the dataset based on changes reflected in the "updated" timestamp column of the source data. See the on updating your data.

To keep your data updated see . Use a similar code to using the ‘updated’ parameter to filter for recently changed events. This will extract all new events and updates to events. Check for events updated since your last table update using the ‘updated’ timestamp column. You will need to code for updating and replacing the data in BigQuery according to your preferred data update standards, but the structure will be the same as outlined above.

category: filter down to the relevant categories most likely impacting your

geo: This field contains geojson data ( for more details) on the location event. For attended events, this field typically holds the latitude and longitude of the point at which the event is occurring. It can also hold information for events that cover a wide area like marathons or severe weather events. For marathons, the polygon shows the route of the marathon. Query on this field to find all events in an area like a radius.

See the for more examples. The SQL example below shows how to query these fields in the database.

This query will retrieve records that meet all the specified criteria, allowing Tom to identify events that could potentially influence the operations and traffic at this pizzeria in Seattle. Modify the above query to fit the specific fields and data types of your table if they differ from this example, and fill your latitude and longitude for your locations with a radius suggested by our .

A common example is customers often look at events occurring in the next 1 to 3 months and may display this information in their application, in a BI tool, or in other types of products and tools. A common approach to doing this can be to have a table with a list of your business locations with latitude and longitude for each. For each, call the to store the radius and then look up your store locations in the table. For example you may have a table of locations like that below:

You can look up location details from this table (to find the lat/long for a store for example) and then use it to find events with SQL like that shown in the example above. See our for an example of doing this in Snowflake.

Customers sometimes use fields like to join events data and other location data.

You can use Event data from your data warehouse in your demand forecast to improve forecast accuracy. See our for an example of how you can implement ML features for demand forecasting in a data warehouse. Although that example shows how to do this in Snowflake, a similar approach applies to other data warehouses. See also .

See for an example of building reports in Power BI. You can connect Power BI or other BI tools to your database to build dashboards and reports. See also .

Receive Data via Snowflake
Receive Data via AWS Data Exchange
Filtering Guide
Filtering Guide
Sign up here
API Access Token
See the GCP documentation
Suggested Radius API
exporting data
WebApp
WebApp Search
this guide
industry
see here
polygon
Filtering Guide
Suggested Radius API
Suggested Radius API
Snowflake Data Science Guide
Placekey
Snowflake Data Science Guide
Improving Demand Forecasting Models with Event Features
Use Events Data in Power BI
How to Connect PowerBI to Google BigQuery
Filtering Guide
Querying the Loaded Data
Scenario
CSV Export Guide
next section
Table Data Structure
Keep Event Data Updated
Keep Event Data Updated
Table Data Structure
Scenario
Events API
next section
Scenario
schema requirements
section below
Keep Data Updated via API
that above
WebApp Search for Seattle ready for Export
Select destination dataset and use the hamburger menu to create table
table upload example details. Replace with your own dataset and table name
Follow our and check for spelling
tick "Unknown values" and you're ready to create
Radius Map example from our website
Table Data Structure