Consider adding prepared statements #23

gbwey9 · 2018-07-15T22:56:28Z

Hi,
This is an excellent library. I would like to use this package instead
of hdbc-odbc but the only thing missing are prepared statements.
Is that something you are thinking of adding?

chrisdone · 2018-07-16T10:22:18Z

I'll consider adding them, but there are trade-offs:

The difficulty with prepared statements lies in making sure they are safely handled as resources, which isn't particularly easy. Probably do-able, but only if the effort is justified.
Apparently, if we add parameter binding to directly executed statements, we can reap the same rewards as prepared statements in modern SQL Server, which applies across different connections too: https://docs.microsoft.com/en-us/sql/relational-databases/native-client-odbc-queries/executing-statements/direct-execution?view=sql-server-2017
Does prepared statements for your current program significantly increase performance?

If you're convinced this is substantially faster and can demonstrate some numbers (e.g. using hdbc-odbc?), I'll look into adding this to odbc with a set of benchmarks proving their performance advantages. 👍

gbwey · 2018-07-16T22:01:05Z

I ran some speed comparisons between odbc and hdbc and saw signifcant speed differences in favour of prepared statements. It's possible I messed this up somehow. Can you take a look?
Thanks,

I inserted 100,000 rows with fairly large data using
odbc TH
odbc plain
perf.zip

hdbc prepared statements
hdbc raw
hdbc runraw
Each time I ran this, it showed prepared statements were faster by a factor of at least two.

Here is one run:
16 17:17:49 D:\haskell\hdbcperf>stack exec hdbcperf r
create table OdbcTH
create table OdbcRaw
odbc TH starting ...
odbc TH: 141.39708733558655
odbc raw starting ...
odbc raw: 123.21104717254639
create table Prepared
create table Run
create table RunRaw
hdbc prepared starting ...
hdbc prepared: 53.47105836868286
hdbc run starting ...
hdbc run: 127.43228888511658
hdbc runraw starting ...
hdbc runraw: 74.92328548431396

chrisdone · 2018-07-17T07:02:46Z

Awesome! Could you link to a copy of your hdbcperf repo? I assume that the number are legit but I’d like to see how the data is generated and timed to confirm.

…

On Tue, 17 Jul 2018 at 00:01, gbwey ***@***.***> wrote: I ran some speed comparisons between odbc and hdbc and saw signifcant speed differences in favour of prepared statements. It's possible I messed this up somehow. Can you take a look? Thanks, I inserted 100,000 rows with fairly large data using odbc TH odbc plain perf.zip <https://github.com/fpco/odbc/files/2199464/perf.zip> hdbc prepared statements hdbc raw hdbc runraw Each time I ran this, it showed prepared statements were faster by a factor of at least two. Here is one run: 16 17:17:49 D:\haskell\hdbcperf>stack exec hdbcperf r create table OdbcTH create table OdbcRaw odbc TH starting ... odbc TH: 141.39708733558655 odbc raw starting ... odbc raw: 123.21104717254639 create table Prepared create table Run create table RunRaw hdbc prepared starting ... hdbc prepared: 53.47105836868286 hdbc run starting ... hdbc run: 127.43228888511658 hdbc runraw starting ... hdbc runraw: 74.92328548431396 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAArC9-p2B35e2DQL9PyRGFVeBXnicKDks5uHQ0hgaJpZM4VQbKK> .

chrisdone · 2018-07-17T07:06:45Z

There should not be, for example, any difference in speed between odbc TH and manual construction, the former is just syntactic sugar for the latter, indicating that you need something like criterion to get the real average.

…

On Tue, 17 Jul 2018 at 09:02, Christopher Done ***@***.***> wrote: Awesome! Could you link to a copy of your hdbcperf repo? I assume that the number are legit but I’d like to see how the data is generated and timed to confirm. On Tue, 17 Jul 2018 at 00:01, gbwey ***@***.***> wrote: > I ran some speed comparisons between odbc and hdbc and saw signifcant > speed differences in favour of prepared statements. It's possible I messed > this up somehow. Can you take a look? > Thanks, > > I inserted 100,000 rows with fairly large data using > odbc TH > odbc plain > perf.zip <https://github.com/fpco/odbc/files/2199464/perf.zip> > > hdbc prepared statements > hdbc raw > hdbc runraw > Each time I ran this, it showed prepared statements were faster by a > factor of at least two. > > Here is one run: > 16 17:17:49 D:\haskell\hdbcperf>stack exec hdbcperf r > create table OdbcTH > create table OdbcRaw > odbc TH starting ... > odbc TH: 141.39708733558655 > odbc raw starting ... > odbc raw: 123.21104717254639 > create table Prepared > create table Run > create table RunRaw > hdbc prepared starting ... > hdbc prepared: 53.47105836868286 > hdbc run starting ... > hdbc run: 127.43228888511658 > hdbc runraw starting ... > hdbc runraw: 74.92328548431396 > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#23 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAArC9-p2B35e2DQL9PyRGFVeBXnicKDks5uHQ0hgaJpZM4VQbKK> > . >

gbwey · 2018-07-17T19:16:04Z

I just pushed the repo hdbcperf which now uses criterion.

The results are in results.txt

gbwey · 2018-07-17T21:54:01Z

I added an extra test called hdbc PreparedCommit which does a commit for each insert statement which is probably what odbc is doing (autocommit on). The timings are very similar to odbc.

results2
results3

chrisdone · 2018-07-18T15:05:51Z

Nice work, thanks! I've reviewed your numbers and code so far, and it seems that you've found the actual speed difference. If I add exec conn "set implicit_transactions on" to each odbc test, then odbc outperform HDBC's prepared statements (see results below).

All HDBC connections set implicit transactions on by default, which explains that. We could add a withTransaction combinator to odbc, or else enable transactions at the beginning of the connection like HDBC does.

It may still be that prepared statements are faster than regular statements, but if we're already as fast as or faster than HDBC with prepared statements, you may be happy with it like this.

What do you think?

benchmarking small/odbc TH
time                 154.7 ms   (132.7 ms .. 192.1 ms)
                     0.966 R²   (0.875 R² .. 1.000 R²)
mean                 163.8 ms   (154.2 ms .. 174.5 ms)
std dev              14.53 ms   (10.87 ms .. 17.71 ms)
variance introduced by outliers: 26% (moderately inflated)

benchmarking small/odbc Raw
time                 143.9 ms   (124.3 ms .. 168.8 ms)
                     0.985 R²   (0.970 R² .. 0.999 R²)
mean                 136.1 ms   (129.1 ms .. 143.1 ms)
std dev              9.606 ms   (7.303 ms .. 12.06 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking small/hdbc prepared
time                 244.9 ms   (148.7 ms .. 341.0 ms)
                     0.961 R²   (0.862 R² .. 1.000 R²)
mean                 270.4 ms   (252.6 ms .. 300.6 ms)
std dev              29.19 ms   (3.099 ms .. 37.94 ms)
variance introduced by outliers: 19% (moderately inflated)

gbwey · 2018-07-19T03:00:58Z

Here is a last benchmark which has just uses one commit at the very end to keep things consistent. I also added another table with 20 columns . I am seeing hdbc prepared gives around 50% improvement over odbc raw. Interesting is the bigger slow down by odbc TH with the 20 column inserts.
I use a fork of hdbc-odbc that seems to runs faster than the one on hackage which might explain the differences we are seeing.

results6.txt

chrisdone · 2018-07-19T08:10:20Z

I am using your stack repo which links to the same fork of hdbc-odbc from github (I saw it clone them), so we are using the exact same package versions and GHC version.

My performance numbers don't match yours presently, odbc raw is still fastest:

https://gist.github.com/chrisdone/6b3714808ea43326e72b85daedcd7492

I think if anything is different, it'll be our ODBC driver version and SQL Server version. What are yours? Mine are:

OS: Ubuntu Linux
SQL Server: microsoft/mssql-server-linux:2017-latest (from dockerhub https://hub.docker.com/r/microsoft/mssql-server-linux/)
Driver: ODBC Driver 17 for SQL Server

You appear to be using DRIVER=SQL Server Native Client 11.0, have you considered using the ODBC driver? According to Microsoft, the ODBC driver is the only client that is maintained after 2012:

https://stackoverflow.com/questions/40842719/how-to-install-sql-server-native-client-11-0-on-ubuntu/40844822#40844822

The SQL Server Native Client OLE DB provider was included in SQL Server Native Client in SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012. After SQL Server 2012, the SQL Server Native Client OLE DB provider will no longer be included in SQL Server Native Client.

After SQL Server 2012, the ODBC driver will be updated for the most recent server features, including Microsoft Windows Azure SQL Database, and released as the Microsoft ODBC Driver for SQL Server.

I'm on Linux so I won't be able to reproduce your numbers with the native client, which I think was only available for Windows. Would you mind testing with the ODBC Driver? I'd be interested to see whether it's faster, or slower, and whether this affects the HDBC/ODBC speed difference.

If the new driver is faster or slower, it would be worth noting in this package's README and Haddocks, and give a recommendation to users.

Alternatively, it could be a mysterious Windows vs Linux difference. But I doubt that and more suspect the ODBC driver.

Thanks for your patience and time spent on this issue. 👍 I think the work we're doing here will be useful for other people, too.

gbwey · 2018-07-19T23:35:20Z

I agree that this is very useful and I am quite happy to work on it.

Running with odbc 13 didn't seem to make much difference. Here are the results

I am using "ODBC Driver 13 for SQL Server"
Microsoft SQL Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (X64)
Aug 15 2017 10:23:29
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor)

Next week I should be able to run these benchmarks using ubuntu (virtual box) which might give us more information.

chrisdone · 2018-07-20T06:46:14Z

Ok, thanks, so the 13 driver doesn’t change anything. So I see two differences: * Driver 13 vs 17 * SQL Server 2012 vs 2017 Aside from Linux vs Windows these might cause the performance differences between us. I can try the 2013 driver version. But I can’t directly test that SQL Server version, as it predates the free Docker releases by MS. I can look into getting a copy of Windows to run in a VM. The newer SQL Server 2017 may have new optimisations not present in the 2012 release. Just speculating here. Anyway, in parallel I can take a look at adding prepared queries on a branch and see how they perform on my system with the odbc package.

…

On Fri, 20 Jul 2018 at 01:35, gbwey ***@***.***> wrote: I agree that this is very useful and I am quite happy to work on it. Running with odbc 13 didn't seem to make much difference. Here are the results <https://github.com/gbwey/hdbcperf/blob/master/results_odbcdriver2.txt> I am using "ODBC Driver 13 for SQL Server" Microsoft SQL Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (X64) Aug 15 2017 10:23:29 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor) Next week I should be able to run these benchmarks using ubuntu (virtual box) which might give us more information. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAArC5tQoaLKJoXVBO26UlW-dZEdyrzGks5uIRe4gaJpZM4VQbKK> .

gbwey · 2018-07-26T18:10:29Z

I ran several benchmarks using ubuntu1604 with virtualbox (odbc 13 driver) and one with windows as a baseline. I ran them several times as the numbers were unexpected!

Everything on ubuntu was really slow, except for these four exceptions (below), which approached windows speed:

ubuntu "small/hdbc prepared commit" was 10x faster than the other ubuntu odbc/hdbc stuff
ubuntu "small/hdbc prepared wide" was 4x faster than the other ubuntu odbc/hdbc stuff
ubuntu "mediuml/hdbc prepared commit" was 10x faster than the other ubuntu odbc/hdbc stuff
ubuntu "medium/hdbc prepared wide" was 5x faster than the other ubuntu odbc/hdbc stuff

So other than these cases, the benchmarks are so completely different from windows that I think something else is going on here.

I added Comp.hs to help compare separate benchmarks using "mean". (eg dump fn1 fn2)

There is one extra benchmark type with "hdbc ... commit" in the title that does a commit for each insert.

Here are the results:
windows baseline
ubuntu test1
ubuntu test2
ubuntu test3 shuffled the order

chrisdone changed the title ~~prepared statements~~ Consider adding prepared statements Jul 16, 2018

chrisdone added the enhancement New feature or request label Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding prepared statements #23

Consider adding prepared statements #23

gbwey9 commented Jul 15, 2018

chrisdone commented Jul 16, 2018 •

edited

Loading

gbwey commented Jul 16, 2018

chrisdone commented Jul 17, 2018 via email

chrisdone commented Jul 17, 2018 via email

gbwey commented Jul 17, 2018 •

edited

Loading

gbwey commented Jul 17, 2018

chrisdone commented Jul 18, 2018 •

edited

Loading

gbwey commented Jul 19, 2018

chrisdone commented Jul 19, 2018

gbwey commented Jul 19, 2018

chrisdone commented Jul 20, 2018 via email

gbwey commented Jul 26, 2018 •

edited

Loading

Consider adding prepared statements #23

Consider adding prepared statements #23

Comments

gbwey9 commented Jul 15, 2018

chrisdone commented Jul 16, 2018 • edited Loading

gbwey commented Jul 16, 2018

chrisdone commented Jul 17, 2018 via email

chrisdone commented Jul 17, 2018 via email

gbwey commented Jul 17, 2018 • edited Loading

gbwey commented Jul 17, 2018

chrisdone commented Jul 18, 2018 • edited Loading

gbwey commented Jul 19, 2018

chrisdone commented Jul 19, 2018

gbwey commented Jul 19, 2018

chrisdone commented Jul 20, 2018 via email

gbwey commented Jul 26, 2018 • edited Loading

chrisdone commented Jul 16, 2018 •

edited

Loading

gbwey commented Jul 17, 2018 •

edited

Loading

chrisdone commented Jul 18, 2018 •

edited

Loading

gbwey commented Jul 26, 2018 •

edited

Loading