Table of Content
- Overview
- Quick Start
- Installation
- Usage Flow A: Direct DataFrame API
- Usage Flow B: Spark SQL Extension Functions
- Docs Build
- Supported Metrics
- Token-based metrics
- Jaccard
- Sorensen-Dice
- Overlap Coefficient
- Cosine
- Braun-Blanquet
- Monge-Elkan
- Matrix / edit-distance metrics
- Levenshtein
- LCS Similarity
- Jaro
- Jaro-Winkler
- Needleman-Wunsch
- Smith-Waterman
- Affine Gap
- Phonetic encoders
- Soundex
- Refined Soundex
- Double Metaphone
- Tokenization modes
- Configurable parameters
- Fuzzy Testing
- Summary
- How to read the table
- Reproducing
- Benchmarks
- Summary
- How to read the table
- Reproducing