Receive Data via SFTP

PredictHQ can deliver Event and Broadcast data via SFTP as regularly updated files. This is a good option if you want file-based delivery without using AWS Data Exchange or APIs.

SFTP delivery follows the same full + incremental data model used by our other bulk data integrations.

Overview

When integrating via SFTP, PredictHQ delivers data as a feed of files. Here’s what you can expect (same delivery model as our AWS Data Exchange exports):

Initial Full Dump: upon setup, you will receive a full dataset covering all records you have access to.
Incremental Updates: after the initial dump, we provide incremental updates containing only the new or changed records since the last update. By default, these updates are delivered daily.
Occasional Full Dumps: at times (either by request or operational need), we may deliver a full dump without prior notice. You can distinguish these by the presence of full (not incremental) in the filename.

Processing Order & Change Action

To maintain a complete and accurate dataset, process deliveries in the order they are delivered (typically by the datetime folder, oldest to newest). Within a single delivery, the individual files can be processed in any order or in parallel.

For incremental updates, make sure to check the change_action column to work out what action you should take the with record (insert, update or delete).

File Naming

<delivery_config_id>/<datetime>/<data_type>/<delivery_type>-part-<number>.<ext>

Field

Description

delivery_config_id

PredictHQ identifier for your delivery configuration.

datetime

UTC export timestamp in YYYYMMDD-HHMM format.

data_type

The data being delivered. Can be one of the following:

event
broadcast

delivery_type

The delivery is either a full export of all available data or incremental based on the previous export. Possible values:

full
incremental

number

Each delivery is split into multiple files to keep file sizes manageable. Individual files will vary in size but will not exceed approximately 1 GB.

Files within a single delivery are not ordered and do not need to be processed sequentially. They can be processed in parallel. However, deliveries themselves must be processed in chronological order to ensure data consistency.

ext

The file extension indicates the data structure and compression used.

If compression is used (configurable) the data will be compressed using Snappy and the file extension will be prefixed with snappy.

Possible values:

parquet
ndjson - Newline-delimited JSON
csv - Comma separated values
psv - Pipe separated values

E.g., snappy.parquet

Within a single delivery, files can be processed in any order. Deliveries themselves should be processed oldest to newest.

Data Retention

Files on the SFTP server are retained for a limited period and are automatically deleted after that period.

Your ingestion process should fetch and persist data promptly. Do not rely on long-term availability of files on the SFTP server.

Access and Authentication

PredictHQ will provide:

An SFTP URL
A private SSH key for authentication

You will use these credentials to connect to the PredictHQ-managed SFTP server and fetch data on your own schedule.

Typical Ingestion Flow

Most customers implement an automated process that:

Connects to the SFTP server
Lists available / delivery folders
Selects the next unprocessed delivery
Downloads all files for that delivery
Applies records in order, using change_action for incrementals
Records the delivery as processed in their own system

Backwards Compatible Changes

From time to time, PredictHQ may make backwards-compatible changes to SFTP exports, including:

Adding new fields or columns
Adding new files alongside existing ones
Introducing new event categories or labels

Your ingestion process should tolerate these changes.

PreviousNDJSON Data Structure for ADX NextIntegrate with Databricks

Last updated 2 months ago

Was this helpful?

Good evening

hashtagOverview

hashtagProcessing Order & Change Action

hashtagFile Naming

hashtagData Retention

hashtagAccess and Authentication

hashtagTypical Ingestion Flow

hashtagBackwards Compatible Changes