Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding prepared statements #23

Open
gbwey9 opened this issue Jul 15, 2018 · 12 comments
Open

Consider adding prepared statements #23

gbwey9 opened this issue Jul 15, 2018 · 12 comments
Labels
enhancement New feature or request

Comments

@gbwey9
Copy link

gbwey9 commented Jul 15, 2018

Hi,
This is an excellent library. I would like to use this package instead
of hdbc-odbc but the only thing missing are prepared statements.
Is that something you are thinking of adding?

@chrisdone
Copy link
Contributor

chrisdone commented Jul 16, 2018

I'll consider adding them, but there are trade-offs:

If you're convinced this is substantially faster and can demonstrate some numbers (e.g. using hdbc-odbc?), I'll look into adding this to odbc with a set of benchmarks proving their performance advantages. 👍

@chrisdone chrisdone changed the title prepared statements Consider adding prepared statements Jul 16, 2018
@chrisdone chrisdone added the enhancement New feature or request label Jul 16, 2018
@gbwey
Copy link

gbwey commented Jul 16, 2018

I ran some speed comparisons between odbc and hdbc and saw signifcant speed differences in favour of prepared statements. It's possible I messed this up somehow. Can you take a look?
Thanks,

I inserted 100,000 rows with fairly large data using
odbc TH
odbc plain
perf.zip

hdbc prepared statements
hdbc raw
hdbc runraw
Each time I ran this, it showed prepared statements were faster by a factor of at least two.

Here is one run:
16 17:17:49 D:\haskell\hdbcperf>stack exec hdbcperf r
create table OdbcTH
create table OdbcRaw
odbc TH starting ...
odbc TH: 141.39708733558655
odbc raw starting ...
odbc raw: 123.21104717254639
create table Prepared
create table Run
create table RunRaw
hdbc prepared starting ...
hdbc prepared: 53.47105836868286
hdbc run starting ...
hdbc run: 127.43228888511658
hdbc runraw starting ...
hdbc runraw: 74.92328548431396

@chrisdone
Copy link
Contributor

chrisdone commented Jul 17, 2018 via email

@chrisdone
Copy link
Contributor

chrisdone commented Jul 17, 2018 via email

@gbwey
Copy link

gbwey commented Jul 17, 2018

I just pushed the repo hdbcperf which now uses criterion.

The results are in results.txt

@gbwey
Copy link

gbwey commented Jul 17, 2018

I added an extra test called hdbc PreparedCommit which does a commit for each insert statement which is probably what odbc is doing (autocommit on). The timings are very similar to odbc.

results2
results3

@chrisdone
Copy link
Contributor

chrisdone commented Jul 18, 2018

Nice work, thanks! I've reviewed your numbers and code so far, and it seems that you've found the actual speed difference. If I add exec conn "set implicit_transactions on" to each odbc test, then odbc outperform HDBC's prepared statements (see results below).

All HDBC connections set implicit transactions on by default, which explains that. We could add a withTransaction combinator to odbc, or else enable transactions at the beginning of the connection like HDBC does.

It may still be that prepared statements are faster than regular statements, but if we're already as fast as or faster than HDBC with prepared statements, you may be happy with it like this.

What do you think?

benchmarking small/odbc TH
time                 154.7 ms   (132.7 ms .. 192.1 ms)
                     0.966 R²   (0.875 R² .. 1.000 R²)
mean                 163.8 ms   (154.2 ms .. 174.5 ms)
std dev              14.53 ms   (10.87 ms .. 17.71 ms)
variance introduced by outliers: 26% (moderately inflated)

benchmarking small/odbc Raw
time                 143.9 ms   (124.3 ms .. 168.8 ms)
                     0.985 R²   (0.970 R² .. 0.999 R²)
mean                 136.1 ms   (129.1 ms .. 143.1 ms)
std dev              9.606 ms   (7.303 ms .. 12.06 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking small/hdbc prepared
time                 244.9 ms   (148.7 ms .. 341.0 ms)
                     0.961 R²   (0.862 R² .. 1.000 R²)
mean                 270.4 ms   (252.6 ms .. 300.6 ms)
std dev              29.19 ms   (3.099 ms .. 37.94 ms)
variance introduced by outliers: 19% (moderately inflated)

@gbwey
Copy link

gbwey commented Jul 19, 2018

Here is a last benchmark which has just uses one commit at the very end to keep things consistent. I also added another table with 20 columns . I am seeing hdbc prepared gives around 50% improvement over odbc raw. Interesting is the bigger slow down by odbc TH with the 20 column inserts.
I use a fork of hdbc-odbc that seems to runs faster than the one on hackage which might explain the differences we are seeing.

results6.txt

@chrisdone
Copy link
Contributor

I am using your stack repo which links to the same fork of hdbc-odbc from github (I saw it clone them), so we are using the exact same package versions and GHC version.

My performance numbers don't match yours presently, odbc raw is still fastest:

https://gist.github.com/chrisdone/6b3714808ea43326e72b85daedcd7492

I think if anything is different, it'll be our ODBC driver version and SQL Server version. What are yours? Mine are:

You appear to be using DRIVER=SQL Server Native Client 11.0, have you considered using the ODBC driver? According to Microsoft, the ODBC driver is the only client that is maintained after 2012:

https://stackoverflow.com/questions/40842719/how-to-install-sql-server-native-client-11-0-on-ubuntu/40844822#40844822

The SQL Server Native Client OLE DB provider was included in SQL Server Native Client in SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012. After SQL Server 2012, the SQL Server Native Client OLE DB provider will no longer be included in SQL Server Native Client.

After SQL Server 2012, the ODBC driver will be updated for the most recent server features, including Microsoft Windows Azure SQL Database, and released as the Microsoft ODBC Driver for SQL Server.

I'm on Linux so I won't be able to reproduce your numbers with the native client, which I think was only available for Windows. Would you mind testing with the ODBC Driver? I'd be interested to see whether it's faster, or slower, and whether this affects the HDBC/ODBC speed difference.

If the new driver is faster or slower, it would be worth noting in this package's README and Haddocks, and give a recommendation to users.

Alternatively, it could be a mysterious Windows vs Linux difference. But I doubt that and more suspect the ODBC driver.

Thanks for your patience and time spent on this issue. 👍 I think the work we're doing here will be useful for other people, too.

@gbwey
Copy link

gbwey commented Jul 19, 2018

I agree that this is very useful and I am quite happy to work on it.

Running with odbc 13 didn't seem to make much difference. Here are the results

I am using "ODBC Driver 13 for SQL Server"
Microsoft SQL Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (X64)
Aug 15 2017 10:23:29
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor)

Next week I should be able to run these benchmarks using ubuntu (virtual box) which might give us more information.

@chrisdone
Copy link
Contributor

chrisdone commented Jul 20, 2018 via email

@gbwey
Copy link

gbwey commented Jul 26, 2018

I ran several benchmarks using ubuntu1604 with virtualbox (odbc 13 driver) and one with windows as a baseline. I ran them several times as the numbers were unexpected!

Everything on ubuntu was really slow, except for these four exceptions (below), which approached windows speed:

  1. ubuntu "small/hdbc prepared commit" was 10x faster than the other ubuntu odbc/hdbc stuff
  2. ubuntu "small/hdbc prepared wide" was 4x faster than the other ubuntu odbc/hdbc stuff
  3. ubuntu "mediuml/hdbc prepared commit" was 10x faster than the other ubuntu odbc/hdbc stuff
  4. ubuntu "medium/hdbc prepared wide" was 5x faster than the other ubuntu odbc/hdbc stuff

So other than these cases, the benchmarks are so completely different from windows that I think something else is going on here.

I added Comp.hs to help compare separate benchmarks using "mean". (eg dump fn1 fn2)

There is one extra benchmark type with "hdbc ... commit" in the title that does a commit for each insert.

Here are the results:
windows baseline
ubuntu test1
ubuntu test2
ubuntu test3 shuffled the order

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants