You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2023/03/13 09:15:00 UTC

[jira] [Comment Edited] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

    [ https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699530#comment-17699530 ] 

shenxingwuying edited comment on KUDU-3455 at 3/13/23 9:14 AM:
---------------------------------------------------------------

I have patched the information in the description.


was (Author: shenxingwuying):
I have patched the information.

> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>         Attachments: image-2023-03-06-17-23-35-119.png, image-2023-03-11-16-57-16-589.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in his application which uses kudu java client. And he collects some information and do a lot of analytics for this problem, I shared his work for this issue.
> Application program was killed by OS very frequently because of oom.  When java heap memory 8GB(inner heap 5.5GB available), more than 10000 rows  in-list predicate would not work(oom happens). The kudu table in his case exists about 1500 columns.  His scan requests like '{*}select * from profile_wos where id in (...){*}'.
>  
> The problem only happened when KuduScanPredicate is In-List predicate, other predicates have no problem.
> He found the memory consumption is positive correlation to count of (ids * count of columns). In fact, I think it's also a very important key factor that the count of every in-list columns' values.
>  
> When using kudu api to build a scanner, the memory will reach a very high watermark and multi-thread will make the problem worse. A picture can explain this and prove in-list predicate consumes very high memory.
>  
> !image-2023-03-11-16-57-16-589.png!
>  
>  
>  
> Reduce space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.  And at the same time, PartialRow has many deep copy, it may be slow.
>  
> !image-2023-03-06-17-23-35-119.png!
>  
>  
> So, we need to fix the problem to improve the space complexity and speed optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)