You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Filipa Sousa <Fi...@criticaltechworks.com> on 2020/09/02 14:34:01 UTC

Adding isolation level when reading from DB2 with spark.read

    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa

RE: Adding isolation level when reading from DB2 with spark.read

Posted by Filipa Sousa <Fi...@criticaltechworks.com>.

Hello Luca,

Thank you for your fast response!

It is an interesting suggestion, but unfortunately, isolation level change statement like SET ISOLATION is not permitted while connected to a DB2 database (https://www.ibm.com/support/knowledgecenter/th/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0001946.html). It must be set prior to the connection and Spark is stablishing that connection.

Thank you anyway,
Filipa. 


-----Mensagem original-----
De: Luca Canali <Lu...@cern.ch> 
Enviada: 2 de setembro de 2020 16:10
Para: Filipa Sousa <Fi...@criticaltechworks.com>
Cc: Ana Sofia Martins <An...@criticaltechworks.com>; user@spark.apache.org
Assunto: RE: Adding isolation level when reading from DB2 with spark.read

Notice: This e-mail has originated from an external email service, so do not click on any links, nor open any attachments unless you know who the sender is and are sure the content is secure.



Hi Filipa ,

Spark JDBC data source has the option to add a "sessionInitStatement".
Documented in https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html and https://issues.apache.org/jira/browse/SPARK-21519
I guess you could use that option  to "inject " a SET ISOLATION statement, although I am not familiar with the details of DB2.
Would that be useful for your use case?

Best,
Luca

-----Original Message-----
From: Filipa Sousa <Fi...@criticaltechworks.com>
Sent: Wednesday, September 2, 2020 16:34
To: user@spark.apache.org
Cc: Ana Sofia Martins <An...@criticaltechworks.com>
Subject: Adding isolation level when reading from DB2 with spark.read

    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�HK[XZ[
�\�\�][��X��ܚX�P�\�˘\X�K�ܙ�B�

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

RE: Adding isolation level when reading from DB2 with spark.read

Posted by Luca Canali <Lu...@cern.ch>.

Hi Filipa ,

Spark JDBC data source has the option to add a "sessionInitStatement".
Documented in https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html and https://issues.apache.org/jira/browse/SPARK-21519 
I guess you could use that option  to "inject " a SET ISOLATION statement, although I am not familiar with the details of DB2.
Would that be useful for your use case?

Best,
Luca

-----Original Message-----
From: Filipa Sousa <Fi...@criticaltechworks.com> 
Sent: Wednesday, September 2, 2020 16:34
To: user@spark.apache.org
Cc: Ana Sofia Martins <An...@criticaltechworks.com>
Subject: Adding isolation level when reading from DB2 with spark.read

    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�HK[XZ[
�\�\�][��X��ܚX�P�\�˘\X�K�ܙ�B�

Re: Adding isolation level when reading from DB2 with spark.read

Posted by Jörg Strebel <jo...@gmail.com>.

Hallo!

You can set the DB2 JDBC driver options in theJDBC connection string:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_rjvdsprp.html


The DB2 JDBC driver has an option called "defaultIsolationLevel"
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052038.html

Maybe you should also try the option "concurrentAccessResolution"
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052607.html

See also
https://www.ibm.com/support/pages/how-set-isolation-level-db2-jdbc-database-connections
and https://www.idug.org/p/fo/et/thread=45083

Regards

Jörg Strebel


Am 02.09.20 um 16:34 schrieb Filipa Sousa:
>     Hello,
>
>     We are trying to read from an IBM DB2 database using a pyspark job.
>     We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
>     We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?
>
>     isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.
>
>     Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.
>
>     The last try we did is to add this predicates option when reading with spark, but this is being ignored.
>     predicates = "PART_NR != '0' with UR"
>             input_df = (
>                 self.spark.read.format("jdbc")
>                 .option("url", self.db_settings["jdbc_url"])
>                 .option("dbtable", db_table)
>                 .option("user", self.db_settings["db_username"])
>                 .option("password", self.db_settings["db_password"])
>                 .option("predicates", predicates )
>                 .option("fetchsize", self.fetch_size)
>             )
>
>     Do you have any advises on how can we do this?
>
>
> Best regards,
> Filipa Sousa
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
-- 
Jörg Strebel
Aachener Straße 2
80804 München


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org