You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Jérémie Vexiau (JIRA)" <ji...@apache.org> on 2017/08/25 09:04:00 UTC

[jira] [Updated] (BEAM-2803) JdbcIO read is very slow when query return a lot of rows

     [ https://issues.apache.org/jira/browse/BEAM-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jérémie Vexiau updated BEAM-2803:
---------------------------------
    Attachment: test1M.png
                test500k.png
                test1500K.png

> JdbcIO read is very slow when query return a lot of rows
> --------------------------------------------------------
>
>                 Key: BEAM-2803
>                 URL: https://issues.apache.org/jira/browse/BEAM-2803
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>    Affects Versions: Not applicable
>            Reporter: Jérémie Vexiau
>            Assignee: Reuven Lax
>              Labels: performance
>             Fix For: Not applicable
>
>         Attachments: test1500K.png, test1M.png, test500k.png
>
>
> Hi,
> I'm using JdbcIO reader in batch mode with the postgresql driver.
> my select query return more than 5 Millions rows
> using cursors with Statement.setFetchSize().
> these ParDo are OK :
> {code:java}
>           .apply(ParDo.of(new ReadFn<>(this))).setCoder(getCoder())
>           .apply(ParDo.of(new DoFn<T, KV<Integer, T>>() {
>             private Random random;
>             @Setup
>             public void setup() {
>               random = new Random();
>             }
>             @ProcessElement
>             public void processElement(ProcessContext context) {
>               context.output(KV.of(random.nextInt(), context.element()));
>             }
>           }))
> {code}
> but reshuffle is very very slow. 
> it must be the GroupByKey with more than 5 millions of Key.
> {code:java}
> .apply(GroupByKey.<Integer, T>create())
> {code}
> is there a way to optimize the reshuffle, or use another method to prevent fusion ? 
> thanks in advance,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)