Live TV Events with the Broadcasts API

PredictHQ’s Live TV Events feature enables our customers to identify how many people will watch a TV broadcast at a given location.

The initial release of Live TV Events is focused on live sports broadcasts in the United States. With this release, customers can query to see how many people within a given county will watch a live sports game. For example, for the Green Bay Packers vs San Francisco 49ers game in San Jose, you can use our API to find that 129,085 people will watch the live broadcast of that game in Alameda County in Oakland on November 24, 2019.

Using our Broadcasts API, customers can query to find the number of people watching a game in a location. We provide both historic data (from 1 Jan 2018) and predicted viewership for two weeks from the current date. Customers can use this to model how Live TV Events have impacted their business in the past to inform accurate forecasts based on our future-looking TV viewership data.

A key use case for this is within demand forecasting. For example, delivery companies may find when relevant sports games are being played so they can plan for spikes in deliveries. Using our Broadcasts API companies can integrate Live TV Events features into their forecasting models to increase forecast accuracy.

This page outlines how to get started with the Broadcasts API so customers can use it to find Live TV Events information. The Broadcasts API provides Live TV Events data by returning broadcast records. A broadcast record represents how many people will watch a physical event broadcasted on a television network, at a specific date, time and location.


Search broadcasts by location and time

Find broadcasts televised in specific locations and time ranges

For the following example, we want to find all broadcasts televised in two counties in California during November 2020.

The location.place_id parameter allows us to filter live sports events by their broadcast locations. For the counties in our example, we will use location.place_id=5368381,5391832, which are the respective place_ids for Los Angeles County and San Diego County in California.

These ids were found using the Places API. We provide a CSV file of broadcast places to download, to make it easier to discover the place_id for all counties and states in the US.

We can also use the start.* parameters to filter broadcasts by time. For the time range in our example, we will use start.gte=2020-11-01 and start.lte=2020-11-30. Using start.tz=America/Los_Angeles will treat the parameter’s start dates and times in the America/Los_Angeles time zone, otherwise the parameter dates and times will be treated as UTC.

  • GET /v1/broadcasts/?location.place_id=5368381,5391832&start.gte=2020-11-01&start.lte=2020-11-30&start.tz=America/Los_Angeles HTTP/1.1
    Host: api.predicthq.com
    Authorization: Bearer $ACCESS_TOKEN
    
  • curl -X GET -G "https://api.predicthq.com/v1/broadcasts/" \
         -d location.place_id=5368381,5391832 \
         -d start.gte=2020-11-01 \
         -d start.lte=2020-11-30 \
         -d start.tz=America/Los_Angeles \
         -H "Authorization: Bearer $ACCESS_TOKEN"
    
  • import requests
    
    response = requests.get(
        url="https://api.predicthq.com/v1/broadcasts/",
        headers={
            "Accept": "application/json",
            "Authorization": "Bearer $ACCESS_TOKEN"
        },
        params={
            "location.place_id": "5368381,5391832",
            "start.gte": "2020-11-01",
            "start.lte": "2020-11-30",
            "start.tz": "America/Los_Angeles"
        }
    )
    
    print(response.json())
    

A snippet of the results are shown below:

{
    "count": 501,
    "results": [
        {
            "broadcast_id": "6v4RTik6YVhkLr9HpyddHJ",
            "dates": {
                "start": "2020-11-01T18:00:00Z",
                "start_local": "2020-11-01T10:00:00",
                "timezone": "America/Los_Angeles"
            },
            "location": {
                "places": [
                    {
                        "place_id": "5391832",
                        "name": "San Diego County",
                        ...
                    }
                ],
                "country": "US"
            },
            "phq_viewership": 184637,
            "event": {
                "event_id": "24HwKWjyPw3xLBhGza",
                "title": "Los Angeles Rams vs Miami Dolphins",
                "labels": ["american-football", "nfl", "closed-doors", "sport"],
                ...

In this example, the Broadcasts API found 501 broadcasts. The snippet shows one of the broadcasts: an NFL game where 184637 people will watch the broadcast.


Find broadcasts for specific sports events

Find broadcasts televised in all locations for an event

In this example, we want to find the broadcasts for the Super Bowl game, New England Patriots vs Los Angeles Rams, played on February 3rd 2019.

To find broadcasts for an event, we can use the event.event_id parameter. This parameter allows us to retrieve broadcast records for each county the game has viewership in. So, for a specific game televised nation-wide, the API would return over 3000 broadcast records with viewership per county.

The 2019 Super Bowl game's event_id is ePQLUqbPnMn3mQhe35, so we need to filter broadcasts using event.event_id=ePQLUqbPnMn3mQhe35. The event_id was found using our Events API. See our Events API documentation to discover how to query for other events.

  • GET /v1/broadcasts/?event.event_id=ePQLUqbPnMn3mQhe35 HTTP/1.1
    Host: api.predicthq.com
    Authorization: Bearer $ACCESS_TOKEN
    
  • curl -X GET "https://api.predicthq.com/v1/broadcasts/?event.event_id=ePQLUqbPnMn3mQhe35" \
         -H "Authorization: Bearer $ACCESS_TOKEN"
    
  • import requests
    
    response = requests.get(
        url="https://api.predicthq.com/v1/broadcasts/",
        headers={
            "Accept": "application/json",
            "Authorization": "Bearer $ACCESS_TOKEN"
        },
        params={
            "event.event_id": "ePQLUqbPnMn3mQhe35"
        }
    )
    
    print(response.json())
    

A snippet of the results are shown below:

{
    "count": 3055,
    "results": [
        {
            "broadcast_id": "V75qDgQ8wrq4Qp36nFVThm",
            "dates": {
                "start": "2019-02-03T23:30:00Z",
                "start_local": "2019-02-03T18:30:00",
                "timezone": "America/New_York"
            },
            "location": {
                "places": [
                    {
                        "place_id": "4945455",
                        "name": "Norfolk County",
                        "region": "Massachusetts",
                        ...
                    }
                ],
                "country": "US"
            },
            "phq_viewership": 308374,
            "event": {
                "event_id": "ePQLUqbPnMn3mQhe35",
                "title": "Super Bowl - New England Patriots vs Los Angeles Rams",
                ...
3055 broadcasts were found, which means the Super Bowl game was televised in 3055 counties in the US. The snippet shows one of the results: a broadcast in Norfolk County, Massachusetts where 308374 people in Norfolk County will watch the game.


Aggregating Live TV Events data

Calculate the total and maximum daily broadcast viewership by county and sports league

In this example, we want to aggregate phq_viewership, per day, per sport, per league, for all broadcasts scheduled to be televised in three counties in the next two weeks.

We will use three query parameters to retrieve the data:

  • start.gt=TODAY: replace TODAY with today’s date in YYYY-MM-DD format.
  • limit=500
  • location.place_id=5128594,5391997,4684904: the respective place_id values for the New York County (NY), San Francisco County (CA) and Dallas County (TX) locations.

The place_id values were found using the Places API. We provide a CSV file of broadcast places to download, to make it easier to discover the place_id for all counties and states in the US.

View the Python script for this aggregation example here: aggregating_live_tv_events_example.py. Key sections are highlighted below.

A snippet of the results are shown below:

{
    "count": 687,
    "next": "https://api.predicthq.com/v1/broadcasts/?location.place_id=5128594%2C5391997%2C4684904&start.gt=2020-12-01&limit=500&offset=500",
    "previous": null,
    "overflow": false,
    "results": [
        {
            "broadcast_id": "N4kBTtJWGWJCMey8FZsFpV",
            "phq_viewership": 74151,
            ...
        },
        {
            "broadcast_id": "puJVLfeJaNymkDA9yUGhYX",
            "phq_viewership": 44555,
            ...

We will need to follow the pagination links in the response’s next field to ensure we retrieve all 687 broadcasts before performing the aggregation. On the last page of results, the next field is null. The example Python script uses the pagination links in the get_all_broadcasts function. The relevant snippet is shown below.

def get_all_broadcasts(params: Optional[dict] = None) -> list:
    ...
    url = "https://api.predicthq.com/v1/broadcasts/"
    broadcasts, next_url = get_request(url, headers, params=params)

    while next_url is not None:
        results, next_url = get_request(next_url, headers)
        broadcasts += results
    ...

After all broadcasts are retrieved we can use any technique to perform the aggregation. In the example Python script we use pandas.

import pandas as pd
from datetime import datetime

all_broadcasts = get_all_broadcasts(api_query_params)

df = pd.DataFrame(all_broadcasts)
df["start_date_local"] = df.dates.apply(
    lambda start_dt: datetime.strptime(start_dt["start_local"], "%Y-%m-%dT%H:%M:%S").date()
)
df["county_place_id"] = df.location.apply(lambda location: location["places"][0]["place_id"])

# get_matching_label is a helper function to extract the sport type and league from the
# labels of a broadcast's physical event.
df["sport_type"] = df.event.apply(lambda event: get_matching_label(event["labels"], SPORTS))
df["league"] = df.event.apply(lambda event: get_matching_label(event["labels"], LEAGUES))

aggregated_df = (
    df.groupby(["start_date_local", "county_place_id", "sport_type", "league"])
    .agg(
        max_daily_viewership=("phq_viewership", "max"),
        total_daily_viewership=("phq_viewership", "sum")
    )
    .reset_index()
)

# Export the aggregated results as CSV.
aggregated_df.to_csv("aggregated-broadcasts.csv", index=False)

A snippet of the CSV results file is shown below.

start_date_local,county_place_id,sport_type,league,max_daily_viewership,total_daily_viewership
2020-12-13,4684904,american-football,nfl,164537,1441319
2020-12-13,4684904,basketball,nba,4511,26745
2020-12-13,4684904,basketball,ncaa,7387,83442
2020-12-13,5128594,american-football,nfl,76260,480747
2020-12-13,5128594,basketball,nba,3312,18195
2020-12-13,5128594,basketball,ncaa,3687,42466
2020-12-13,5391997,american-football,nfl,52400,462293
2020-12-13,5391997,basketball,nba,1578,9468
2020-12-13,5391997,basketball,ncaa,1710,18767
2020-12-14,4684904,american-football,nfl,153084,153084
2020-12-14,4684904,basketball,nba,4503,29089
2020-12-14,4684904,basketball,ncaa,6579,34244
2020-12-14,5128594,american-football,nfl,70952,70952
2020-12-14,5128594,basketball,nba,2724,18998
2020-12-14,5128594,basketball,ncaa,3471,17542
2020-12-14,5391997,american-football,nfl,48728,48728
2020-12-14,5391997,basketball,nba,1456,10192
2020-12-14,5391997,basketball,ncaa,1472,7724