Skip to main content
Version: 9.2

Data Cleansing

Well-designed dataset transformations prevent fragmented insights and keep dashboards truthful. By applying the following techniques to ensure that your charts reflect reality, not accidental string differences.

Overview

Dirty data is the quiet reason dashboards lie. Small differences — a stray space, mixed case, or an unexpected newline — turn identical values into separate buckets. The cure is targeted data scrubbing: simple, repeatable transformations that normalize fields before analysis. Below I walk through a pragmatic, step‑by‑step approach for cleaning a country column using Qrvey dataset transformations.

Impact of Data Scrubbing

The analytics engine is literal: if a string differs by even one character, including whitespace, it is treated as a distinct value. That inflates counts, fragments groups, and breaks visualizations. Common issues include:

  • Case differences — uppercase, lowercase, and camelCase variants.
  • Leading or trailing whitespace — invisible but significant.
  • Hidden characters — carriage returns or newlines embedded in a string.

General Tasks

The general approach I use is:

  1. Trim unwanted characters (usually whitespace) from relevant columns.
  2. Standardize case (for example, convert to uppercase).
  3. Apply targeted replacements for messy or inconsistent values using a custom formula.
  4. Reload the dataset when transformations must apply to existing records.

Data Cleansing Tutorial

The following tutorial video shows how to set up a dataset with transformations.

Normalize the country field

Step 1: Inspect the problem

Start by scanning the visualization or raw data to see how the country values are appearing. You might see separate buckets for "united states", "United States", "United States of America" and entries with trailing spaces or newline characters. This fragmentation is exactly what we want to eliminate.

Dashboard bar chart titled 'Gross Sales by Country' showing duplicate country labels in different cases (such as 'Canada' and 'CANADA', 'Mexico' and 'MEXICO') with the app UI and presenter thumbnail.

Step 2: Open Dataset Transformations

From the dataset design page, click Transformations. This area contains functions for text manipulation, numeric operations, date arithmetic, JSON flattening, and a custom code option using JavaScript syntax.

Transformations panel with Text options highlighted including Text Replace, Trim and Uppercase

Step 3: Remove Excess Whitespace from Fields

Use the trim transform first. You can select multiple columns at once — trimming is harmless on most text fields and saves time.

  • Leave the trim value blank to remove standard whitespace.
  • Select Update existing rather than creating new fields unless you want the original preserved.
  • Give the transform a descriptive internal name so teammates can understand what it does.

Trim transform dropdown expanded showing column list with 'Country' highlighted and selection checkboxes; 'Update Existing' output option visible.

Step 4: Standardize Case

To avoid case-based fragmentation, convert the country values to a consistent case. I prefer uppercase for country codes and names because it’s unambiguous in charts and joins.

Select the country column, choose the uppercase transform, and again choose Update existing.

Qrvey transformations panel with 'Columns to Set to Upper Case' showing Country selected and the 'Update Existing' radio option chosen.

Step 5: Apply Transforms to Existing Records

Some transforms apply only to new or updated records unless you explicitly reload the dataset. If you want the changes reflected across historical data, perform a full reload after creating your transforms. Otherwise, transforms will only affect records appended or updated after the transform was created.

Dataset design screen with Country column visible and cursor hovering over the orange Reload Dataset button

Step 6: Handle Stubborn Cases with a Custom Formula

Trimming and case normalization won't catch embedded control characters or long variations like "United States of America." Instead of hunting for a specific invisible character, write a small conditional formula to standardize known patterns.

> if (country.includes("America")) {  
>   return "USA";
> } else {
>   return country;
> }

This uses JavaScript syntax. It checks whether the country string contains the keyword America and replaces it with USA. Remember that the includes check is case sensitive unless you normalize case first.

Transformations editor with the formula if ({Country}.includes(

Step 7: Update the Field and Reload

Set the custom formula to update the existing country column. Apply your changes and perform a reload if you need the transformations to take effect on historical rows. After the reload completes, verify that the chart or dataset now groups countries correctly.

High-clarity screenshot of the Analyze screen showing the 'Gross Sales by Country' horizontal bar chart (USA, Canada, France, Germany, Mexico) with the interface header visible and a small presenter thumbnail overlay.

Checklist

  • Trim text fields that might contain accidental whitespace.
  • Standardize case across categorical fields.
  • Use targeted replacements for known variations and messy entries.
  • Decide whether transforms should apply retroactively (full reload) or only to new data (no reload).
  • Keep transform names descriptive for maintainability.

Best Practices

  • Small transformation steps make your datasets reliable and repeatable.
  • Implement non-destructive edits (create new fields) when experimenting, then switch to updating existing fields when you’re confident.
  • Maintain a short library of common formulas for country names, product SKUs, and other frequently messy fields.