ftfy (Fixes Text For You)
What it is: Python library that fixes broken Unicode and text encoding errors. Automatically detects and repairs mojibake, smart quotes, and other text corruption from encoding issues.
What It Does Best
Fixes mojibake. Turns "Café" back into "Café" and "don’t" into "don't". Automatically detects and reverses encoding mistakes.
Normalizes Unicode. Handles multiple representations of same character. Removes invisible control characters that break string matching.
Smart defaults. Call fix_text() and it handles 99% of text issues. Doesn't over-correct or introduce new problems.
Key Features
Encoding repair: Fixes UTF-8, Latin-1, and Windows-1252 mojibake
Unicode normalization: Standardizes different character representations
Smart quotes: Converts curly quotes to straight quotes when appropriate
Control characters: Removes invisible characters that break processing
One-liner: Single function call fixes most text issues
Pricing
Free: Open source, Apache 2.0 license
No restrictions: Use commercially without limitations
Community support: Active GitHub repository with examples
When to Use It
✅ Scraping data from web with mixed encodings
✅ Legacy databases with encoding issues
✅ User-submitted text with copy-paste artifacts
✅ Files exported from Excel or other tools
✅ See weird characters in your data (Ã, â€, etc.)
When NOT to Use It
❌ Text already clean and properly encoded
❌ Need language-specific text processing (use spaCy)
❌ Processing data that shouldn't be modified
❌ Working with binary data or non-text formats
❌ Need to preserve exact original encoding
Common Use Cases
Web scraping: Clean HTML content with mixed encodings
Social media data: Fix emoji and special character issues
Legacy migration: Clean old database exports before import
User input: Normalize text pasted from various sources
Text analysis: Prepare corpus data for NLP processing
ftfy vs Alternatives
vs chardet: chardet detects encoding, ftfy fixes broken text
vs unicodedata: ftfy easier to use, handles more edge cases
vs manual regex: ftfy comprehensive, regex error-prone
Unique Strengths
Automatic detection: Figures out what's wrong without configuration
Conservative: Won't modify text that doesn't need fixing
Single purpose: Does one thing exceptionally well
Battle-tested: Used in production by major companies
Bottom line: Solves a specific problem brilliantly. When you see weird characters in your text data, ftfy is the answer. One function call fixes most encoding disasters. Keep it in your toolkit.