Data Cleansing
Well-designed dataset transformations prevent fragmented insights and keep dashboards truthful. By applying the following techniques to ensure that your charts reflect reality, not accidental string differences.
Overview
Dirty data is the quiet reason dashboards lie. Small differences — a stray space, mixed case, or an unexpected newline — turn identical values into separate buckets. The cure is targeted data scrubbing: simple, repeatable transformations that normalize fields before analysis. Below I walk through a pragmatic, step‑by‑step approach for cleaning a country column using Qrvey dataset transformations.
Impact of Data Scrubbing
The analytics engine is literal: if a string differs by even one character, including whitespace, it is treated as a distinct value. That inflates counts, fragments groups, and breaks visualizations. Common issues include:
- Case differences — uppercase, lowercase, and camelCase variants.
- Leading or trailing whitespace — invisible but significant.
- Hidden characters — carriage returns or newlines embedded in a string.
General Tasks
The general approach I use is:
- Trim unwanted characters (usually whitespace) from relevant columns.
- Standardize case (for example, convert to uppercase).
- Apply targeted replacements for messy or inconsistent values using a custom formula.
- Reload the dataset when transformations must apply to existing records.
Data Cleansing Tutorial
The following tutorial video shows how to set up a dataset with transformations.
Normalize the country field
Step 1: Inspect the problem
Start by scanning the visualization or raw data to see how the country values are appearing. You might see separate buckets for "united states", "United States", "United States of America" and entries with trailing spaces or newline characters. This fragmentation is exactly what we want to eliminate.

Step 2: Open Dataset Transformations
From the dataset design page, click Transformations. This area contains functions for text manipulation, numeric operations, date arithmetic, JSON flattening, and a custom code option using JavaScript syntax.

Step 3: Remove Excess Whitespace from Fields
Use the trim transform first. You can select multiple columns at once — trimming is harmless on most text fields and saves time.
- Leave the trim value blank to remove standard whitespace.
- Select Update existing rather than creating new fields unless you want the original preserved.
- Give the transform a descriptive internal name so teammates can understand what it does.

Step 4: Standardize Case
To avoid case-based fragmentation, convert the country values to a consistent case. I prefer uppercase for country codes and names because it’s unambiguous in charts and joins.
Select the country column, choose the uppercase transform, and again choose Update existing.

Step 5: Apply Transforms to Existing Records
Some transforms apply only to new or updated records unless you explicitly reload the dataset. If you want the changes reflected across historical data, perform a full reload after creating your transforms. Otherwise, transforms will only affect records appended or updated after the transform was created.

Step 6: Handle Stubborn Cases with a Custom Formula
Trimming and case normalization won't catch embedded control characters or long variations like "United States of America." Instead of hunting for a specific invisible character, write a small conditional formula to standardize known patterns.
> if (country.includes("America")) {
> return "USA";
> } else {
> return country;
> }
This uses JavaScript syntax. It checks whether the country string contains the keyword America and replaces it with USA. Remember that the includes check is case sensitive unless you normalize case first.

Step 7: Update the Field and Reload
Set the custom formula to update the existing country column. Apply your changes and perform a reload if you need the transformations to take effect on historical rows. After the reload completes, verify that the chart or dataset now groups countries correctly.

Checklist
- Trim text fields that might contain accidental whitespace.
- Standardize case across categorical fields.
- Use targeted replacements for known variations and messy entries.
- Decide whether transforms should apply retroactively (full reload) or only to new data (no reload).
- Keep transform names descriptive for maintainability.
Best Practices
- Small transformation steps make your datasets reliable and repeatable.
- Implement non-destructive edits (create new fields) when experimenting, then switch to updating existing fields when you’re confident.
- Maintain a short library of common formulas for country names, product SKUs, and other frequently messy fields.