Fuzzy Testing

Parity comparison between Spark-native implementations and the Java SecondString library. Both implementations are run on the same randomly generated input pairs and their output scores are compared row by row.

Summary

Total compared rows: 197509

Overall rows within +-5% agreement: 74.71%

Metric with lowest >30% drift: smith_waterman (0.00%)

Metric Rows Pearson Spearman +-5% +-10% +-30% >30%
needleman_wunsch 50000 0.999108 0.996912 83.96% 7.13% 7.34% 1.57%
smith_waterman 50000 1.000000 1.000000 100.00% 0.00% 0.00% 0.00%
jaro_winkler 50000 0.979174 0.991773 93.91% 2.62% 0.63% 2.85%
monge_elkan 47509 0.814753 0.856769 18.14% 2.68% 21.19% 57.99%

How to read the table

High >30% counts typically indicate a known algorithmic difference rather than a bug (e.g. Monge-Elkan uses a symmetric average while SecondString uses a one-directional score).

Reproducing

sbt "fuzzy-testing/runMain io.github.semyonsinchenko.sparkss.fuzzy.FuzzyTestingCli \
  --seed 42 --rows 100000 \
  --out target/reports/fuzzy-report.md \
  --save-output target/reports/fuzzy-csv"

Artifact source: fuzzy-testing/target/reports/fuzzy-report.md