anonymizer
Guide

How to anonymize a contract before using ChatGPT

To run a client contract through ChatGPT (or Claude/Perplexity) without leaking confidential data, strip the personal data locally first. anonymizer is a free, open-source tool that replaces names, companies, financial IDs, addresses, emails and phone numbers with structured tokens in .docx, .pdf and .xlsx — entirely on your machine. Nothing is uploaded.

Why it matters

ChatGPT stores your uploads and may use them for training, and whether AI chat logs stay privileged is unsettled in court. Pasting a raw contract exposes client names, deal terms, financial figures and personal data to OpenAI's infrastructure. That creates client confidentiality risk, and depending on jurisdiction, GDPR or 152-ФЗ liability. Tokenizing first sidesteps all of it.

Steps

  1. Install: uv tool install docs-anonymizer (or the one-line installer).
  2. Run anonymize and drop your contract into the local web UI (127.0.0.1).
  3. Review the tokenized document — names become [PERSON_1], companies [ORG_2], etc. Adjust any missed entities with the in-UI mask tool before proceeding.
  4. Upload or paste the tokenized file into ChatGPT and ask your question. The model sees clause structure, not real client data.
  5. Tokens stay consistent within a session — hover any token in the preview to reveal its original text, then substitute the real names back into ChatGPT's answer.
$ uv tool install docs-anonymizer

FAQ

Is it safe to upload a contract to ChatGPT after anonymizing?
Yes — anonymizer replaces every detected entity with a token before you touch ChatGPT. The model sees clause structure, not client identity.
Does it work offline?
Yes. All detection runs locally (Natasha + spaCy + rules). A build-time test asserts the engine opens no socket.
Which formats are supported?
.docx, .pdf and .xlsx — document structure is preserved, metadata cleared.
How is it different from Find & Replace in Word?
Find & Replace misses spelling variants, headers/footers, metadata and entities you forgot. anonymizer detects them with NLP.
Is it free for law firms?
Yes. AGPL-3.0 — no fees, no subscription, no seat limits.

Get started

$ uv tool install docs-anonymizer

Full documentation — including CLI reference, format coverage, and detector configuration — is at anonymizer.site/docs.