Remove duplicate lines from exports, allowlists and log extracts
Concatenated CSV exports, copy-pasted allowlists and grep output often contain the same key repeated dozens of times. Manual deduping in Excel breaks when rows differ by invisible whitespace. This tool removes duplicate lines in the browser so credential lists, email domains and ID batches never upload to a remote service. Use it before you format SQL IN clauses, before you count unique entries for a stakeholder report or after merging multiple log pulls into one file.
Dedupe workflow
- Paste lines from a spreadsheet column or log extract.
- Choose case sensitivity and whether to trim whitespace.
- Review the unique count versus original line count.
- Convert or import the cleaned list into the next tool.
Duplicates that look unique
Trailing spaces, Windows CRLF versus LF and zero-width characters cause “duplicates” to survive a naive compare. Trim and normalize line endings when counts still look high. IDs with leading zeros may be corrupted if something coerced them to numbers upstream — dedupe after you fix types in CSV tooling.
Text and data companions
Measure narrative length with the Word Counter, normalize delimiters with the Text ⇄ CSV Converter and rename fields with the Case Converter before you paste into SQL. Move unique rows into JSON via the CSV ⇄ JSON Converter when the list becomes structured data.
Operational habits
Keep the raw file in the ticket and attach the deduped output separately so auditors can reproduce counts. Document whether duplicates were expected (retry storms) or a supplier bug. Redact secrets before sharing lists in chat, even though deduping stays local.
Order-sensitive lists
Removing duplicates destroys original ordering when you sort for uniqueness — confirm whether your downstream job cares about first-seen versus last-seen wins. Blank lines may count as values; trim policy should match your importer. For case-insensitive dedupe, remember that locale rules affect whether "File" and "file" collapse together. Export the cleaned list to your clipboard and spot-check the first and last ten lines — edge duplicates often hide at file boundaries where headers repeat.
When to keep duplicates
Event logs and time-series exports may legitimately repeat the same message on different timestamps — deduping by line text alone can erase valid history. Filter by a key column in SQL or JSON tooling when uniqueness should apply per id, not per rendered line. Document the rule you used so downstream analysts reproduce the same distinct set.