You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 16:07:00 UTC

[jira] [Resolved] (KUDU-3054) Init kudu.write_duration accumulator lazily

     [ https://issues.apache.org/jira/browse/KUDU-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-3054.
-------------------------------
    Fix Version/s: NA
       Resolution: Duplicate

> Init kudu.write_duration accumulator lazily
> -------------------------------------------
>
>                 Key: KUDU-3054
>                 URL: https://issues.apache.org/jira/browse/KUDU-3054
>             Project: Kudu
>          Issue Type: Improvement
>          Components: spark
>    Affects Versions: 1.9.0
>            Reporter: liupengcheng
>            Priority: Major
>             Fix For: NA
>
>         Attachments: durationHisto_large.png, durationhisto.png, read_kudu_and_shuffle.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, we encountered a issue in kudu-spark that will causing spark sql query failure:
> ```
> Job aborted due to stage failure: Total size of serialized results of 942 tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB)
> ```
> After carefully debug, we find out that it's the kudu.write_duration accumulators causing single spark task larger than 2M, thus all tasks size of the stage will bigger than the limit.
> However, this stage is just reading kudu table and do shuffle exchange, no writing any kudu tables.
> So I think should init this accumulator lazily in KuduContext to avoid such issues.
> !https://issues.apache.org/jira/secure/attachment/12993451/durationHisto_large.png!
>  
> !https://issues.apache.org/jira/secure/attachment/12993452/durationhisto.png!
> !https://issues.apache.org/jira/secure/attachment/12993453/read_kudu_and_shuffle.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)