You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/24 21:03:00 UTC

[jira] [Work logged] (BEAM-14161) Add dynamic splitting to JdbcIO.readWithPartitions

     [ https://issues.apache.org/jira/browse/BEAM-14161?focusedWorklogId=747453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747453 ]

ASF GitHub Bot logged work on BEAM-14161:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Mar/22 21:02
            Start Date: 24/Mar/22 21:02
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on pull request #16863:
URL: https://github.com/apache/beam/pull/16863#issuecomment-1078291316


   Looking at some of the database monitoring, the workload looks pretty much the same - but this is just a simple test database that is not serving any extra load, so I am not really sure that this monitoring information supports any hypothesis:
   
   Reading from the database with non-splittable reads:
   
   ![image](https://user-images.githubusercontent.com/1301740/160008947-7b9a8f32-1352-4e2c-9306-245d7b6fac23.png)
   
   
   Reading the database with splittable reads:
   
   ![image](https://user-images.githubusercontent.com/1301740/160007694-653e83d5-06fc-434e-95cc-46d63e71497c.png)
   
   The test is relatively simple. We would probably need something more complicated to have a stronger indication of the tradeoffs here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 747453)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add dynamic splitting to JdbcIO.readWithPartitions
> --------------------------------------------------
>
>                 Key: BEAM-14161
>                 URL: https://issues.apache.org/jira/browse/BEAM-14161
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-jdbc
>            Reporter: Pablo Estrada
>            Assignee: Jean-Baptiste Onofré
>            Priority: P2
>             Fix For: Not applicable
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now, the JDBC IO is basically a {{DoFn}} executed with a {{{}ParDo{}}}. So, it means that parallelism is "limited" and executed on one executor. ReadWithPartitions does some preliminary partitioning of the data, but any skew in data range or workload will create an unbalanced workload.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)