Fuzzy Testing

Parity comparison between Spark-native implementations and the Java SecondString library. Both implementations are run on the same randomly generated input pairs and their output scores are compared row by row.

Summary

Total compared rows: 296687

Overall rows within +-5% agreement: 72.99%

Metric with lowest >30% drift: smith_waterman (0.00%)

Metric	Rows	Pearson	Spearman	+-5%	+-10%	+-30%	>30%
needleman_wunsch	50000	0.999108	0.996912	83.96%	7.13%	7.34%	1.57%
smith_waterman	50000	1.000000	1.000000	100.00%	0.00%	0.00%	0.00%
jaro_winkler	50000	0.979174	0.991773	93.91%	2.62%	0.63%	2.85%
monge_elkan	47509	0.814753	0.856769	18.14%	2.68%	21.19%	57.99%

How to read the table

Pearson / Spearman: correlation coefficients between the native and reference scores. Values close to 1.0 indicate strong agreement.
+-5% / +-10% / +-30%: percentage of rows where the absolute difference between the two scores falls within that tolerance band.
>30%: rows with more than 30% absolute difference, indicating significant divergence.

High >30% counts typically indicate a known algorithmic difference rather than a bug (e.g. Monge-Elkan uses a symmetric average while SecondString uses a one-directional score).

Reproducing

sbt "fuzzy-testing/runMain io.github.semyonsinchenko.sparkss.fuzzy.FuzzyTestingCli \
  --seed 42 --rows 100000 \
  --out target/reports/fuzzy-report.md \
  --save-output target/reports/fuzzy-csv"

Artifact source: fuzzy-testing/target/reports/fuzzy-report.md