You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 22:08:23 UTC

[GitHub] [beam] kennknowles opened a new issue, #19069: Timeout handling in JdbcIO with Oracle java driver

kennknowles opened a new issue, #19069:
URL: https://github.com/apache/beam/issues/19069

   Hello, we have a batch pipeline that is reading data from Oracle, writing it to BigQuery.
   Using JdbcIO, BigQueryIO and Oracle driver [3].
   
   We run this job 2 times in a day, the pipeline usually works fine, it completed within 10 minutes.
   
   But sometimes we have a [1] warning from this job, it never finished until we cancelling it.
   (sometimes means 4 times in last month)
   
   Even we set timeout configuration for our code [2], warning log says [1] Processing stuck in step Read JDBC.
    
   Would you explain and correct us this kind of problem? 
   
   We already contacted Google Support Team, they said our configuration code is looked fine them. Then  they instructed us to ask here.
   
   Thanks,
   Bya
   
   [1] Warning Log
   
   ```
   
   Processing stuck in step Read JDBC/JdbcIO.Read/JdbcIO.ReadAll/ParDo(Read) for at least 13h40m00s without
   outputting or completing in state process at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:144)
   at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82) at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
   at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101) at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
   at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98) at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612) at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:59) at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:747)
   at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:904) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1082)
   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3780) at
   oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1343) at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3822)
   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1165)
   at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
   at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
   at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
   at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
   at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:601) at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn$DoFnInvoker.invokeProcessElement(Unknown
   Source)
   
   ```
   
   
    
   
   [2] Jdbc configure
   ```
   
   import org.apache.beam.sdk.io.jdbc.JdbcIO;
   import oracle.jdbc.driver.OracleConnection;
   
   JdbcIO.DataSourceConfiguration
    dataSource = JdbcIO
                   .DataSourceConfiguration.create("oracle.jdbc.driver.OracleDriver",
   oracleUrl)
                   .withUsername(oracleUser)
                   .withPassword(oraclePassword)
   
                  .withConnectionProperties(String.format("%s=%s",
                           OracleConnection.CONNECTION_PROPERTY_THIN_NET_CONNECT_TIMEOUT,
   10000))
                   .withConnectionProperties(String.format("%s=%s",
                           OracleConnection.CONNECTION_PROPERTY_THIN_READ_TIMEOUT,
   600000))
                   .withConnectionProperties(String.format("%s=%s",
                          
   OracleConnection.CONNECTION_PROPERTY_TIMEZONE_AS_REGION, false))
                   .withConnectionProperties(String.format("%s=%s",
   
                          OracleConnection.CONNECTION_PROPERTY_NET_KEEPALIVE, false))
           ;
   
   pipeline.apply(JdbcIO.<TableRow>read()
   
      .withDataSourceConfiguration(dataSource)
       .withQuery(readQuery)
       .withCoder(TableRowJsonCoder.of())
   
      .withRowMapper(new TableRowRowMapper())
   );
   
   ```
   
    
   
   [3] Oracle DB driver: ojdbc8 12.2.0.1
   
    
   
    
   
   Imported from Jira [BEAM-5305](https://issues.apache.org/jira/browse/BEAM-5305). Original Jira may contain additional context.
   Reported by: bya.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Timeout handling in JdbcIO with Oracle java driver [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #19069:
URL: https://github.com/apache/beam/issues/19069#issuecomment-1764876957

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #19069: Timeout handling in JdbcIO with Oracle java driver

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #19069:
URL: https://github.com/apache/beam/issues/19069#issuecomment-1716400737

   This is great. Would you contribute a fix to Beam?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] thevijayshekhawat commented on issue #19069: Timeout handling in JdbcIO with Oracle java driver

Posted by "thevijayshekhawat (via GitHub)" <gi...@apache.org>.
thevijayshekhawat commented on issue #19069:
URL: https://github.com/apache/beam/issues/19069#issuecomment-1713241604

   Hello, We were facing a similar issue in production while using the JDBC IO for writes, here is how we solved it.
   
   Background: We run over close to a hundred streaming dataflow pipelines to write to Postgres, Every few of those pipelines would get stuck in JDBC IO write operation and would never recover, We had to close these pipelines and since these were streaming pipelines this would happen all around the clock causing a lot of on-call burden.
   
   The issue here is apparently happening on the Network Layer, Many JDBC drivers, have a default `socketTimeout` of "infinite" which means if the application in this case (Dataflow) sends a read/write request to the database and the database successfully executes that request. It sends back the result but it never reaches the application (i.e. Packet Loss or Connection Drop). This would result in this Limbo state where the application thinks the database is still working on the request and the database on the other hand thinks it has successfully executed the request. 
   
   How we fixed it:
   
   Add `socketTimeout` to JDBC Driver configuration: This will ensure that any network operation (read/write) will time out after a specified period, preventing indefinite hangs.
   
   Optional(If not enabled already): Add `statementTimeout`: If a SQL query exceeds the statementTimeout, the database engine will cancel the query execution. This prevents long-running or potentially stuck queries from occupying system resources indefinitely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Timeout handling in JdbcIO with Oracle java driver [beam]

Posted by "thevijayshekhawat (via GitHub)" <gi...@apache.org>.
thevijayshekhawat commented on issue #19069:
URL: https://github.com/apache/beam/issues/19069#issuecomment-1763437279

   Sure, Happy to Contribute


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org