Increase Accuracy with the Features API

Once you’ve familiarized yourself with our data, you’ll likely find that focusing on individual spikes often leads to a data set too small to accurately correlate. By doing it at an aggregate level, a data science team will be looking at the volume of spike days to prove a correlation between demand and events based on category features.

Features API aggregates PHQ Attendance figures, PHQ Viewership figures and PHQ Rank counts (in buckets by rank range) for a given category feature in a particular location on a given day, and returns desired statistics (see Feature statistics). These evaluated statistics can be used to quickly gauge and understand the demand impact on a location for a given day for a particular category. For example, at a future date in Sydney, there is a major sports game, a street fair, an international film festival, the Symphony orchestra playing, and more. The combined impact of all these events might result in a total aggregate attendance (when the various category aggregated attendance values are summed up) score of 150,000 and this could be across a hundred events or more. This represents a prediction of 150,000 people attending events on that day in the location.

The Features API returns requested statistical values (sumcountaverageminmaxmedianstd_dev) per day for a specified date range, across a specified attendance category feature - see PHQ Attendance Response. Similarly, Features API returns requested statistical values, across a specified viewership category feature - see PHQ Viewership Response. For non-attendance-based events the rank of those events impacting that location on those days are bucketed into a relevant rank range in the response for evaluation - see PHQ Rank Response. When calling the API you specify a number of filters to get events at a specific location, above a specific rank value, for specific categories, and so on. The values returned are the processed aggregations that serve to measure the impact (total predicted attendance for example) for all the events that match the filters specified.

Making Requests

Examples of these requests are included in raw HTTP, cURL, and the Python Requests library - use the links at the top of each code sample to switch between these options.

See the API documentation for more details on the API. See also our feature engineering guide for how to use the features API features in demand forecasting.

Features API Endpoint

Find events that cause high demand

In this example we will use PHQ Attendance features to find high demand days in the city of Chicago in February 2020.

When using the Features API endpoint you need to specify a location either as a latitude and longitude and radius or as a place id. A common use case is to look at the impact of events in a city, but you can choose whatever location makes sense for your use case. You need to also specify a date range. You can use the active.gte and active.lte fields (or other active date range fields) to specify the date range.

To find high demand days for the city of Chicagoplace_id=4887398, during the month of February 2020active.gte=2020-02-01 and active.lte=2020-02-29, using communityconcertsconferences, and sports PHQ Attendance features, looking at countsum and avg stats fields.

More examples

Please reference the example data science notebook - see the Feature API Notebook

curl -X POST https://api.predicthq.com/v1/features \
     -H "Accept: application/json" \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     --data @<(cat <<EOF
    {
        "active": {
            "gte": "2020-02-01",
            "lte": "2020-02-29"
        },
        "location": {
            "place_id": [
                4887398
            ]
        },
        "phq_attendance_community": {
            "stats": [
                "count",
                "sum",
                "avg"
            ]
        },
        "phq_attendance_concerts": {
            "stats": [
                "count",
                "sum",
                "avg"
            ]
        },
        "phq_attendance_conferences": {
            "stats": [
                "count",
                "sum",
                "avg"
            ]
        },
        "phq_attendance_sports": {
            "stats": [
                "count",
                "sum",
                "avg"
            ]
        },
        "phq_viewership_sports_american_football_nfl": {
            "stats": [
                "count",
                "sum",
                "avg"
            ]
        }
    }
    EOF
    )  

A snippet of the full results are shown below:

"results": [
        {
            "date": "2020-02-01",
            "phq_attendance_community": {
                "stats": {
                    "count": 24,
                    "sum": 3135,
                    "avg": 130.625
                }
            },
            "phq_attendance_concerts": {
                "stats": {
                    "count": 38,
                    "sum": 25478,
                    "avg": 670.4736842105264
                }
            },
            "phq_attendance_conferences": {
                "stats": {
                    "count": 2,
                    "sum": 5100,
                    "avg": 2550.0
                }
            },
            "phq_attendance_sports": {
                "stats": {
                    "count": 6,
                    "sum": 34259,
                    "avg": 5709.833333333333
                }
            },
            "phq_viewership_sports_american_football_nfl": {
                "stats": {
                    "count": 2,
                    "sum": 16544,
                    "avg": 8272
                }
            }
        },
        {
            "date": "2020-02-02",
            ...

Querying Features API Endpoint with Python SDK

Installing and Using the Python SDK

More details available here: Python SDK | Python SDK Github Repo

The following example obtains features for events that are active between 2017-12-31 and 2018-01-02, with place_id 4671654.

Requested features:

  • rank_levels for public_holidays

  • count and median of sporting events which have a phq_rank greater than 50

from predicthq import Client

phq = Client(access_token="abc123")


for feature in phq.features.obtain_features(
        active__gte="2017-12-31",
        active__lte="2018-01-02",
        location__place_id=[4671654],
        phq_rank_public_holidays=True,
        phq_attendance_sports__stats=['count', 'median'],
        phq_attendance_sports__phq_rank={
            "gt": 50
        },
        phq_viewership_sports__stats=["count", "avg"],
        phq_viewership_sports__phq_rank={
            "gt": 75
        },
        phq_viewership_sports_basketball_nba__stats=["count", "sum", "avg"],
        phq_viewership_sports_basketball_nba__phq_rank={
            "gt": 50
        }
):
    print(feature.date, feature.phq_attendance_sports.stats.count, 
        feature.phq_rank_public_holidays.rank_levels, feature.phq_attendance_sports.stats.count,
        feature.phq_attendance_sports.stats.median, feature.phq_viewership_sports.stats.count,
        feature.phq_viewership_sports.stats.avg, 
        feature.phq_viewership_sports_basketball_nba.stats.count,
        feature.phq_viewership_sports_basketball_nba.stats.avg)

The following example mimics the previous except it makes use of a geopoint and radius filter.

Requested features:

  • rank_levels for public_holidays

  • count and median of sporting events which have a phq_rank greater than 50

from predicthq import Client

phq = Client(access_token="abc123")


for feature in phq.features.obtain_features(
        active__gte="2017-12-31",
        active__lte="2018-01-02",
        location__geo={
            "lon": -97.74306,
            "lat": 30.26715,
            "radius": "150km"
        },
        phq_rank_public_holidays=True,
        phq_attendance_sports__stats=['count', 'median'],
        phq_attendance_sports__phq_rank={
            "gt": 50
        },
        phq_viewership_sports__stats=["count", "avg"],
        phq_viewership_sports__phq_rank={
            "gt": 75
        },
        phq_viewership_sports_basketball_nba__stats=["count", "sum", "avg"],
        phq_viewership_sports_basketball_nba__phq_rank={
            "gt": 50
        }
):
    print(feature.date, feature.phq_attendance_sports.stats.count,
        feature.phq_rank_public_holidays.rank_levels, feature.phq_attendance_sports.stats.count,
        feature.phq_attendance_sports.stats.median, feature.phq_viewership_sports.stats.count,
        feature.phq_viewership_sports.stats.avg,
        feature.phq_viewership_sports_basketball_nba.stats.count,
        feature.phq_viewership_sports_basketball_nba.stats.avg)