You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@livy.apache.org by "runzhiwang (Jira)" <ji...@apache.org> on 2019/09/18 08:17:00 UTC

[jira] [Comment Edited] (LIVY-667) Support query a lot of data.

    [ https://issues.apache.org/jira/browse/LIVY-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930215#comment-16930215 ] 

runzhiwang edited comment on LIVY-667 at 9/18/19 8:16 AM:
----------------------------------------------------------

There are several design to support query a lot of data.

1.Merge the result rdd to one partition, and save as a single file in hdfs. And livy reads the file line by line directly.  Cons: it's slow to read line by line.
2.Repartition each partition into fixed size, and save in hdfs. And livy reads by toLocalIterator which read one partition into memory at one time. Cons: there are a lot of files in hdfs if the size of each partition is too small.
3.Cache rdd, and read each partition by batch. Cons: the shortage of memory and disk will cause the recompute of rdd, which maybe time-consuming
4.Save rdd to hdfs without repartition. and read each partition by batch. Cons: a little complicated to implement.


was (Author: runzhiwang):
I'm working on it

> Support query a lot of data.
> ----------------------------
>
>                 Key: LIVY-667
>                 URL: https://issues.apache.org/jira/browse/LIVY-667
>             Project: Livy
>          Issue Type: Bug
>          Components: Thriftserver
>    Affects Versions: 0.6.0
>            Reporter: runzhiwang
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When enable livy.server.thrift.incrementalCollect, thrift use toLocalIterator to load one partition at each time instead of the whole rdd to avoid OutOfMemory. However, if the largest partition is too big, the OutOfMemory still occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)