This guide shows three common patterns to extract data from a REST API using the Http worker, persist the response to a fileStore (e.g., S3), and load it into a staging layer:
-
Pattern A: Simple extraction to fileStore, then direct column mapping into a staging table (with truncate first).
-
Pattern B: Dump raw JSON into a “single-column” staging table and pivot via a staging view.
-
Pattern C: Handle paginated APIs (loop over pages using Link headers).
Sending API responses to a fileStore avoids overloading State logs and prevents data from being written to execution logs.

Prerequisites
-
The fileStore “extract” (or your chosen store) is configured and accessible by the Http and sqlInsertFromJson workers.
-
Staging database connection(s) exist (e.g., bevault-staging).
=> CFR https://dfakto.atlassian.net/wiki/x/AoDCjgE to see how to configure the different stores required
-
API credentials are available (often injected via HttpHeaders Authorization or similar).
-
Target staging tables exist (or you have DDL ready to create them).
Pattern A — Direct column mapping into a staging table
Goal:
-
Truncate the staging table at the start.
-
Call the API with the Http worker, streaming the response to fileStore.
-
Load selected fields into a staging table with sqlInsertFromJson.
This pattern keeps logs clean and maps only the columns you need.
State machine
Assumptions:
-
JiraParameters is present in state input (e.g., passed in or loaded earlier). If not, add an initial SQLQuery Reader step to fetch parameters from a ref table.
{
"StartAt": "truncate_staging_table",
"States": {
"truncate_staging_table": {
"Type": "Task",
"Resource": "Production-SQLQuery",
"ResultPath": "$.TruncateResult",
"Parameters": {
"ConnectionName": "bevault-staging",
"Query": "TRUNCATE TABLE stg.jira_cloud_users",
"Type": "NonQuery"
},
"Next": "extract_data"
},
"extract_data": {
"Type": "Task",
"Resource": "Production-Http",
"ResultPath": "$.jira_users",
"Parameters": {
"OutputFileStoreName": "extract",
"OutputFileName": "jira-cloud/users.json",
"Uri.$": "States.Format('{}/users/search?maxResults=500', $.JiraParameters.result[0].base_rest_api_url)",
"HttpHeaders": {
"Authorization.$": "$.JiraParameters.result[0].authorization_header",
"X-Force-Accept-Language": "true",
"X-Language": "en"
},
"HttpQueryParams": {}
},
"Next": "insert_in_staging"
},
"insert_in_staging": {
"Type": "Task",
"Resource": "Production-sqlInsertFromJson",
"ResultPath": "$.is_loaded",
"Parameters": {
"ConnectionName": "bevault-staging",
"Schemaname": "stg",
"TableName": "jira_cloud_users",
"FileToken.$": "$.jira_users.contentFileToken",
"Columns": {
"url": "self",
"technical_id": "accountId",
"type": "accountType",
"display_name": "displayName",
"active": "active"
}
},
"End": true
}
}
Notes:
-
The Http worker writes the response to fileStore and only returns a metadata envelope (including contentFileToken). This keeps logs small and avoids exposing payloads in execution logs.
Pattern B — Dump raw JSON to a single-column staging table, then pivot in a view
Use this when the API returns complex, nested JSON and your mapping will evolve. You:
-
Truncate the dump table.
-
Extract and write the API response to fileStore.
-
Load the entire JSON into a single text/JSON column.
-
Create a staging view to pivot/explode arrays and expose typed columns.
State machine
{
"StartAt": "truncate_dump",
"States": {
"truncate_dump": {
"Type": "Task",
"Resource": "Production-SQLQuery",
"Parameters": {
"ConnectionName": "bevault-staging",
"Query": "TRUNCATE TABLE stg.users_dump",
"Type": "NonQuery"
},
"ResultPath": "$.truncateResult",
"Next": "get_data"
},
"get_data": {
"Type": "Task",
"Resource": "Production-Http",
"Parameters": {
"Method": "GET",
"Uri.$": "States.Format('https://{}/api/users',$.baseUri)",
"HttpHeaders": {
"Authorization.$": "States.Format('Bearer {}', $.tokenResult.jsonContent.access_token)"
},
"outputFileName": "extract_api/users.json",
"outputFileStoreName": "extract"
},
"ResultPath": "$.usersResult",
"Next": "insert_data"
},
"insert_data": {
"Type": "Task",
"Resource": "Production-sqlInsertFromJson",
"Parameters": {
"connectionName": "bevault-staging",
"Schemaname": "stg",
"TableName": "users_dump",
"FileToken.$": "$.usersResult.contentFileToken",
"columns": {
"output": "$"
}
},
"ResultPath": "$.is_loaded",
"End": true
}
}
}
Staging table DDL
CREATE TABLE IF NOT EXISTS stg.users_dump (
output text
);
Example staging view to pivot the JSON
The example below:
-
Reads the API’s Records array from the stored JSON.
-
Exposes selected fields as columns.
-
Leaves nested ExternalIdentifiers as raw text (you can further normalize it if needed).
CREATE OR REPLACE VIEW stg.v_users AS
SELECT
rec ->> 'Id' AS person_unique_id,
rec ->> 'GivenName' AS first_name,
rec ->> 'FamilyName' AS last_name,
rec -> 'Email' ->> 'Address' AS email_address,
(rec -> 'ExternalIdentifiers')::text AS external_ids_raw
FROM stgusers_dump d
CROSS JOIN LATERAL jsonb_array_elements(
COALESCE(d.output::jsonb -> 'Records', '[]'::jsonb)
) AS rec;
Tips:
-
As API shapes evolve, you only update the view logic—your ingestion remains unchanged.
-
For deeply nested objects and arrays, combine jsonb_array_elements, jsonb_to_recordset, and additional lateral joins.
-
In this example, the mappings between the staging and the raw vault are done on the view. There will be no mapping on the staging table stg.users_dump.
-
You can create as many views as needed on the dump table to create specific staging view in case of multiple nested objects in the JSON.
Pattern C — Extract from a paginated API (example: Shopify with Link headers)
This pattern:
-
Loads API parameters from a reference table.
-
Truncates the dump table.
-
Fetches the first page to fileStore and loads it.
-
If a Link header contains rel="next", it extracts the next URL and loops until exhaustion.
State machine
{
"StartAt": "api_parameters",
"States": {
"api_parameters": {
"Type": "Task",
"Resource": "Production-SQLQuery",
"ResultPath": "$.ShopifyParameters",
"Parameters": {
"ConnectionName": "bevault-staging",
"Query": "SELECT * FROM ref.shopify_parameters",
"Type": "Reader"
},
"Next": "truncate_customers_table"
},
"truncate_customers_table": {
"Type": "Task",
"Resource": "Production-SQLQuery",
"ResultPath": "$.TruncateResult",
"Parameters": {
"ConnectionName": "bevault-staging",
"Query": "TRUNCATE TABLE stg.shopify_customer_dump",
"Type": "NonQuery"
},
"Next": "extract_customers"
},
"extract_customers": {
"Type": "Task",
"Resource": "Production-Http",
"ResultPath": "$.customers",
"Parameters": {
"OutputFileStoreName": "extract",
"OutputFileName": "shopify/customers.json",
"Uri.$": "States.Format('{}customers.json?limit=10', $.ShopifyParameters.result[0].base_rest_api_url)",
"HttpHeaders": {
"X-Shopify-Access-Token.$": "$.ShopifyParameters.result[0].authorization_header"
},
"HttpQueryParams": {}
},
"Next": "load_customers"
},
"load_customers": {
"Type": "Task",
"Resource": "Production-sqlInsertFromJson",
"ResultPath": "$.is_loaded",
"Parameters": {
"ConnectionName": "bevault-staging",
"Schemaname": "stg",
"TableName": "shopify_customer_dump",
"FileToken.$": "$.customers.contentFileToken",
"Columns": {
"customers": "customers"
}
},
"Next": "has_links"
},
"has_links": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.customers.httpHeaders.Link",
"IsPresent": true,
"Next": "read-link"
}
],
"Default": "done"
},
"read-link": {
"Type": "Task",
"Resource": "Production-SQLQuery",
"ResultPath": "$.link",
"Comment": "Extract the next-page URL from the Link header (rel=\"next\").",
"Parameters": {
"ConnectionName": "bevault-staging",
"Query.$": "States.Format('SELECT substring(split_part(link, '';'', 1), ''<(.*)>'') AS link FROM UNNEST(STRING_TO_ARRAY(''{}'', '','')) link WHERE substring(split_part(link, '';'', 2), '' rel=\"(.*)\"'') = ''next''', $.customers.httpHeaders.Link[0])",
"Type": "Reader"
},
"Next": "extract_more_choice"
},
"extract_more_choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.link.result[0].link",
"IsPresent": true,
"Next": "extract_more"
}
],
"Default": "done"
},
"extract_more": {
"Type": "Task",
"Resource": "Production-Http",
"ResultPath": "$.customers",
"Parameters": {
"OutputFileStoreName": "extract",
"OutputFileName": "shopify/customers.json",
"Uri.$": "$.link.result[0].link",
"HttpHeaders": {
"X-Shopify-Access-Token.$": "$.ShopifyParameters.result[0].authorization_header"
},
"HttpQueryParams": {}
},
"Next": "load_customers"
},
"done": {
"Type": "Pass",
"End": true
}
}
}
Good practices and notes
-
Keep payloads out of logs: Always use OutputFileStoreName/OutputFileName (or their lowercase variants) so the Http worker writes to fileStore. Downstream steps should pass around contentFileToken rather than the JSON body.
-
Idempotency: Truncate at the beginning of the flow when you are doing full refreshes.
-
Parameterization: Drive URIs, headers, and connection names from reference tables or State input to make flows reusable across projects/environments.
-
Pagination variants: Some APIs paginate with numeric page/offset or cursor parameters instead of Link headers. Adapt the loop to increment the page or pass the cursor returned by the previous call.
-
Rate limiting and retries: If your API enforces limits, configure retry policies on the Http task (backoff, max attempts) and consider pause/sleep steps between requests.
-
Schema drift: Prefer Pattern B for rapidly changing or deeply nested JSON. Use views to stabilize downstream consumption.