You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2017/10/25 20:38:00 UTC

[jira] [Commented] (BEAM-793) JdbcIO can create a deadlock when parallelism is greater than 1

    [ https://issues.apache.org/jira/browse/BEAM-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219480#comment-16219480 ] 

Eugene Kirpichov commented on BEAM-793:
---------------------------------------

This appears to be a MySQL issue where it can hit deadlocks even though there's nothing wrong with what the application is doing https://bugs.mysql.com/bug.php?id=52020

That said, the MySQL guidance is "just reissue the transaction in case of deadlock" and that's what JdbcIO should do - roughly as implemented in the last comment by Guillaume. I don't know whether we should retry indefinitely or up to some limit.

To an earlier point: this code is running on multiple workers in multiple threads and there's no "lock" you can grab while inserting into the database; even if we could, it probably wouldn't be a good idea:

- Someone else might be working with the database at the same time, and you might still get a deadlock
- Many databases are able to handle many clients issuing update statements in parallel quite well, and in that case giving up parallelism would be giving up performance

JB, why was this moved from 2.2.0 to 2.3.0? That might be valid, but when changing the fix version of a bug it would be good to accompany that with an explanation as to why the issue is not important enough.

> JdbcIO can create a deadlock when parallelism is greater than 1
> ---------------------------------------------------------------
>
>                 Key: BEAM-793
>                 URL: https://issues.apache.org/jira/browse/BEAM-793
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>             Fix For: 2.3.0
>
>
> With the following JdbcIO configuration, if the parallelism is greater than 1, we can have a {{Deadlock found when trying to get lock; try restarting transaction}}.
> {code}
>         MysqlDataSource dbCfg = new MysqlDataSource();
>         dbCfg.setDatabaseName("db");
>         dbCfg.setUser("user");
>         dbCfg.setPassword("pass");
>         dbCfg.setServerName("localhost");
>         dbCfg.setPortNumber(3306);
>         p.apply(Create.of(data))
>                 .apply(JdbcIO.<Tuple5<Integer, Integer, ByteString, Long, Long>>write()
>                         .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(dbCfg))
>                         .withStatement("INSERT INTO smth(loc,event_type,hash,begin_date,end_date) VALUES(?, ?, ?, ?, ?) ON DUPLICATE KEY UPDATE event_type=VALUES(event_type),end_date=VALUES(end_date)")
>                         .withPreparedStatementSetter(new JdbcIO.PreparedStatementSetter<Tuple5<Integer, Integer, ByteString, Long, Long>>() {
>                             public void setParameters(Tuple5<Integer, Integer, ByteString, Long, Long> element, PreparedStatement statement)
>                                     throws Exception {
>                                 statement.setInt(1, element.f0);
>                                 statement.setInt(2, element.f1);
>                                 statement.setBytes(3, element.f2.toByteArray());
>                                 statement.setLong(4, element.f3);
>                                 statement.setLong(5, element.f4);
>                             }
>                         }));
> {code}
> This can happen due to the {{autocommit}}. I'm going to investigate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)