Benchmarks
Scipy
...: N = 100_000_000
...: dataf32 = np.arange(1, N, dtype=np.float32)
...: dataf64 = np.arange(1, N, dtype=np.float64)
...: %timeit f(dataf32, 5, 2, mode = "wrap")
...: %timeit f(dataf64, 5, 2, mode = "wrap")
f32 savgol (5,2) for 100000000 elements
1.38 s ยฑ 31.7 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
f64 savgol (5,2) for 100000000 elements
1.69 s ยฑ 121 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
- Time isn't scaling with bitwidth -> not memory bound!
- All benchmarking issues are mine, please let me know if I've goofed