Skip to main content
Version: 9.2

Sync Datasets

Data Sync pulls new and updated records from your sources, keeping your base and managed datasets current without unnecessary reloads.

Data Sync Tutorial

The following tutorial video shows how to add append and update sync logic to your dataset.

Tags: Data, Sync, Dataset

Before You Begin

In composite datasets, each joined entity can use its own Data Sync settings. Determine if you want to enable a full reload of the dataset or if you are adding append and update sync logic.

  • Full reload (default): Reloads the entire dataset.
  • Append and update: Pulls new or changed rows and merges them into the existing dataset. For more information, see Enable Append and Update.

Enable Append and Update

To add append and update logic, a dataset entity needs to include the following attributes.

Unique Record Identifier

To identify which record to update during a sync, you must designate at least one field as a unique identifier. In many cases, a single column is enough. In composite datasets or joined entities, you might need to combine two or more fields to guarantee uniqueness.

You can add a unique ID to your dataset in the Columns tab.

  1. Select the three-dot menu for a table row in the table.

  2. Select Unique ID>On.

    Enable Unique ID

  3. If necessary, use the Join Output pill to create a composite key for the ID. For example, you can combine ProductID with an inventory row GUID or another field that uniquely identifies a product-inventory record.

Note: When you change the unique identifier and apply the design changes, Qrvey triggers a full reload to enable Elasticsearch to rebuild indexes corresponding to the identifier changes.

Date-Time Column

To enable incremental syncs, each entity must provide a date-time column. Qrvey uses the column entry as a timestamp to find records where the timestamp is greater than or equal to the most recent pull time. Typical choices include columns like modified_at or last_updated.

You can select multiple columns for an entity. Multiple selections use OR logic. For example, a sync can detect records with a discontinued_date or modified_date that falls within the sync window.

Additional Timestamp Rules

  • The sync window end time is always the current time the query runs. You can change the query start time if you need to recover missed data, but the end time cannot be changed retroactively.
  • If your database time zone differs from the AWS region where Qrvey is deployed, adjust the database time zone so the query converts to the correct time boundaries during execution.

File Sources That Do Not Require a Timestamp Column

If your dataset source is a file upload, S3, or BLOB storage, you do not need a timestamp column. Qrvey can use the file creation or modification timestamps in the storage location to determine which files contain new or updated data.

Set Up a Data Sync

The Data Sync pull-based mechanism runs queries on a schedule to detect new or changed records.

  1. Select a managed or base dataset to edit.

  2. Select the Data Sync tab.

  3. In the Data Source Settings section, select a Sync Type (Full Reload or Append and update) for each entity listed in the table.

    Select Sync Type

    Tip: You do not have to sync all entities at the same frequency. This allows you to optimize performance and cost based on the frequency of changes for each data source.

  4. Define the schedule for your data sync:

    • Basic (default): Select a simple interval from the dropdown list.
    • CRON: Enter a cronjob expression for advanced timing. Select Test CRON Expression to test your entry.
  5. Select Apply Changes to save your schedule.

  6. To test your setup, select Sync now.

    This test runs the incremental sync logic and creates an entry in the dataset activity log showing the queries that ran, the time ranges used for each entity, and the number of records returned. You can check the log to verify that the sync pulled the expected rows.

    Note: Do not use Reload dataset to test Data Sync. Use Sync now.

  7. Toggle Data Sync to enable Data Sync for your dataset.

    Applying the Data Sync schedule does not trigger a reload event. It commits the schedule configuration.

Creating a Push Workflow

You can create a push workflow in one of the following ways:

  • Use a push API as your connector.
  • Use APIs to push data into any dataset type.

Troubleshooting

Append and Update Disabled

If append and update remains disabled after you select identifiers and timestamps, confirm that you selected Apply changes in the dataset design. Design changes remain in draft until applied. Also check that your identifier makes records truly unique across the joined entities.

Checklist

  • For each entity, designate one or more unique identifier fields that guarantee a single record match.
  • For each entity, select at least one timestamp column unless the source is a file or S3/blob.
  • Apply design changes. Expect a full reload if you change unique identifiers.
  • Set a schedule or cron expression for each entity as needed.
  • To enable the schedule, toggle Data Sync on.

Best Practices

  • A simple, accurate way to enable the detection of changes is to use a reliable modified timestamp column.
  • When joining tables, think in terms of record uniqueness across the joined entities. Use composite keys when necessary.
  • Document time zones for your data sources and adjust the DB time zone if you notice gaps in sync windows.
  • Use cron expressions when you need precise control over timing. Otherwise, select a simple frequency.