You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Christophe Cgr <ch...@gmail.com> on 2016/12/05 13:42:28 UTC

[Jdbc-storage] Change Request for performance enhancement

Hi Everyone,
I'm new on drill and I use it to interact with data between hadoop and
Teradata.

During my test, i detected performances issues with the jdbc storage
pluggin reads.
Thoses issues where about:
  1. No cpu parallelisme while fetching resultset
  2. low performance on network with low cpu usage.

I've tested enhancement (speed *3)  for the second issue which use prepare
statements  and i'd like the community's point of view to test it on other
jdbd clients.
Can you please tell me if this is a good idea and, if it's ok, what is the
best to implement it in futures release ?

Here's the context of the optimisation.
Objective: use preparedStatement to activate fastexport mode in jdbc export.

1.    My Environment:
        Drill  runs on redhat 6.6

2.    Storage-jdbc:
        -Database acces is Teradata with jdbc driver 15.10
        -Driver documentation is at :
https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html
            -> Driver parameters are explained on keyword "Enabling JDBC
FastExport"
        -Code sample is here:
http://developer.teradata.com/doc/connectivity/jdbc/reference/current/samp/T20306JD.java.txt
(look at object "pstmt2")

        -My Plugin configuration is:
                {
                  "type": "jdbc",
                  "driver": "com.teradata.jdbc.TeraDriver",
                  "url": "jdbc:teradata://DBC/TYPE=FASTEXPORT,CHARSET=UTF8",
                  "username": "myuser",
                  "password": "mypassword",
                  "enabled": true
                }

    3. Code changes:

        -Modifications hit
https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java
        -changes on file are:
                    add:
                            import java.sql.PreparedStatement;
                    replace :
                                176:  statement =
connection.createStatement();
                                177:    resultSet =
statement.executeQuery(sql);
                    with    :
                                176:  PreparedStatement statement =
connection.prepareStatement(sql);
                                177:    resultSet =
statement.executeQuery();

    4. Performances tests Results:
        With large table (>5GB), speed tests results are:
                -With standard mode
                    One used Cpu, with  4MB/s network and a few cpu
consumption
                -With fastexport activated
                    Still One used Cpu, with  10MB/s network and a 100% cpu
consumption


    5. log analysis
           - With Teradata driver, changing JDBC storage URL to "url":
"jdbc:teradata://DBC/TYPE=FASTEXPORT,CHARSET=UTF8,LOG=INFO"
            -This show new java call
:"com.teradata.jdbc.jdk6.JDK6_FastExport_Connection" in drillbits.out
    6. Build tests
         -I have no error with the mysql automated tests.  (derbys test are
in error before change)


Thanks for your feedback,
I'm still working on this kind of jdbc connection, and i'm facing issues
with Timestampe in CET with microsecond. I may send an other issue with
this soon after more analysis.

Regards
Christophe