Benchmarks

Performance comparison of Spark-native Catalyst expressions vs. equivalent UDF wrappers around the Java SecondString library. Higher ops/s is better; the diff column shows the relative throughput change (negative means the native implementation is faster).

Summary

Algorithms compared: 5

Best relative delta (closest to parity): jaro_winkler (-34.00%)

Algorithm spark-native UDF diff
affine_gap 11.00 +/- 1.71 ops/s 5.62 +/- 0.88 ops/s -48.94%
jaro_winkler 20.03 +/- 4.57 ops/s 13.22 +/- 2.64 ops/s -34.00%
monge_elkan 11.73 +/- 2.51 ops/s 5.04 +/- 1.02 ops/s -57.01%
needleman_wunsch 15.42 +/- 3.40 ops/s 7.27 +/- 1.48 ops/s -52.81%
smith_waterman 14.46 +/- 2.38 ops/s 7.22 +/- 1.54 ops/s -50.09%

How to read the table

Reproducing

Run the benchmark suite and regenerate the comparison table:

./dev/benchmarks_suite.sh --mode compare-only

Artifact source: benchmarks/target/reports/suite/compare-table.txt