Tips
CSV red flags that signal bad data
Warning signs that should make you pause and investigate.
Dec 15, 20244 min read
Some patterns in data scream 'something is wrong.' Learn to recognize them and you'll catch problems early.
Suspiciously round numbers
Real data is messy. If every value ends in 00 or every percentage is a multiple of 5, someone might be estimating or fabricating.
- Real metrics have irregular decimals
- Too-round numbers suggest estimates
- Check if precision makes sense
Impossible values
Negative ages, dates in the future, percentages over 100. These should never exist but somehow always appear.
- Age should be positive and reasonable
- Dates should be within expected range
- Percentages usually cap at 100
Quick CTA
Investigate your data
Sort and search in Readable CSV to spot red flags quickly.
Try itToo many nulls
Some nulls are normal. A column that's 80% empty suggests a data collection problem or a field no one fills out.
- Calculate null percentage per column
- Investigate columns with high nulls
- Consider dropping mostly-empty columns
Duplicate explosion
When your row count is much higher than expected, look for duplicates. Bad joins and import bugs create row multiplication.
- Compare row count to expectation
- Check for duplicate key values
- Verify joins didn't multiply rows
Key takeaway
Trust but verify. Red flags don't always mean bad data, but they always mean you should check.