Benchmarks

Performance comparison of Spark-native Catalyst expressions vs. equivalent UDF wrappers around the Java SecondString library. Higher ops/s is better; the diff column shows the relative throughput change (negative means the native implementation is faster).

Summary

Algorithms compared: 5

Best relative delta (closest to parity): jaro_winkler (-33.86%)

Algorithm spark-native UDF diff
affine_gap 11.30 +/- 2.20 ops/s 5.80 +/- 1.01 ops/s -48.68%
jaro_winkler 19.59 +/- 4.77 ops/s 12.95 +/- 2.73 ops/s -33.86%
monge_elkan 11.45 +/- 1.94 ops/s 4.90 +/- 0.66 ops/s -57.20%
needleman_wunsch 14.76 +/- 3.00 ops/s 6.68 +/- 0.98 ops/s -54.78%
smith_waterman 12.90 +/- 3.06 ops/s 7.07 +/- 1.23 ops/s -45.21%

How to read the table

Reproducing

Run the benchmark suite and regenerate the comparison table:

./dev/benchmarks_suite.sh --mode compare-only

Artifact source: benchmarks/target/reports/suite/compare-table.txt