If either of the above approaches are taken, the Features API can be called for each location and have the results saved into a table instead of using SQL. These options skip the maintenance of SQL and is the recommended approach if possible.
Below is an example of calling the Features API with Python in Snowpark. This uses the PredictHQ Python SDK. It loops over the SAVED_LOCATIONS table, calls the Features API using the SDK, and outputs the results into a table. So, it achieves a similar result to the SQL method but using the API. This code needs to be modified to include relevant features for what is desired to be fetched.
Note this is not designed to be production-ready code that can be used without modifications. It is provided as an example. Please test and optimize as needed.
Python Code
from snowflake.snowpark import Sessionfrom predicthq import Clientimport pandas as pd# Snowflake connection parametersconnection_parameters ={"account":"<your_account>","user":"<your_username>","password":"<your_password>","role":"<your_role>","warehouse":"<your_warehouse>","database":"<your_database>","schema":"<your_schema>"}# Create a sessionsession = Session.builder.configs(connection_parameters).create()# PredictHQ Client setupphq =Client(access_token="your_predicthq_access_token")# Fetch location data from the Snowflake tablelocations_df = session.table("SAVED_LOCATIONS").to_pandas()# Prepare a DataFrame to collect all dataall_data = []# Iterate over each row in the location DataFramefor index, location in locations_df.iterrows():# Prepare the radius in the appropriate unit radius_with_unit =f"{location['RADIUS']}{location['RADIUS_UNIT']}"# Call the Features API for each locationfor feature in phq.features.obtain_features( active__gte=str(location['DATE_START']), active__lte=str(location['DATE_END']), location__geo={"lon": location['LON'],"lat": location['LAT'],"radius": radius_with_unit }, phq_attendance_sports__stats=["sum"], phq_attendance_conferences__stats=["sum"]# add more ML features here like phq_attendance_community, phq_attendance_concerts,# phq_attendance_expos, phq_attendance_festivals, phq_attendance_performing_arts,# and so on. ): data_point ={'location': location['LOCATION'],'date': feature.date,'phq_attendance_conferences':getattr(feature.phq_attendance_conferences.stats, 'sum', 0),'phq_attendance_sports':getattr(feature.phq_attendance_sports.stats, 'sum', 0)} all_data.append(data_point)# Convert all collected data to a DataFrameresults_df = pd.DataFrame(all_data)# Convert the DataFrame to a Snowpark DataFramesnow_df = session.create_dataframe(results_df)# Append the results to the existing Snowflake tablesnow_df.write.mode("append").save_as_table("ML_FEATURES_FOR_LOCATIONS")# Print the contents of the table to verifyprint(session.table("ML_FEATURES_FOR_LOCATIONS").show())# Close the sessionsession.close()
Table Output
The output of the script above should look similar to the data below: