You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alexander Ocsa (Jira)" <ji...@apache.org> on 2021/10/15 14:26:00 UTC

[jira] [Updated] (ARROW-14330) Create DataHolder that can be used for caching during exec plans

     [ https://issues.apache.org/jira/browse/ARROW-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Ocsa updated ARROW-14330:
-----------------------------------
    Description: 
The purpose of this task is to make an ExecNode that can provide the following functionality.
 # Be able to obtain heuristics about our memory consumption and have a memory consumption threshold

 # Be able to write incoming ExecBatch to disk if memory consumption is above the threshold, stores either the ExecBatch or a handle to file in a queue.

 # Provide an api for pulling an ExecBatch from the queue. It should favor pulling all of the batches that are in memory first and then the ones that are handles to files.

 

PRs to reference

[https://github.com/apache/arrow/pull/11017/]

Discusssion around the subject (these are just for thoughts, these documents are not decisions)

[https://docs.google.com/document/d/15X0ePnVJqDmT7og1seikZ31Zmtd3KRyJbL33jym5G3A/edit#]

  was:
The purpose of this task is to make an ExecNode that can provide the following functionality.
 # Be able to obtain heuristics about our memory consumption and have a memory consumption threshold

 # Be able to write incoming ExecBatch to disk if memory consumption is above the threshold, stores either the ExecBatch or a handle to file in a queue.

 # Provide an api for pulling an ExecBatch from the queue. It should favor pulling all of the batches that are in memory first and then the ones that are handles to files.

 

PRs to reference

[https://github.com/apache/arrow/pull/11017/]

Discusssion around the subject (these are just for thoughts, these documents are not decisions)

[https://www.notion.so/voltrondata/Caches-and-Cache-Data-d8822213fec5402aa691ca76912a5b3d#ca60341a3e3f4b5487f146f545b19b2c]

[https://docs.google.com/document/d/15X0ePnVJqDmT7og1seikZ31Zmtd3KRyJbL33jym5G3A/edit#]


> Create DataHolder that can be used for caching during exec plans
> ----------------------------------------------------------------
>
>                 Key: ARROW-14330
>                 URL: https://issues.apache.org/jira/browse/ARROW-14330
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 7.0.0
>            Reporter: Alexander Ocsa
>            Assignee: Alexander Ocsa
>            Priority: Major
>              Labels: query-engine
>
> The purpose of this task is to make an ExecNode that can provide the following functionality.
>  # Be able to obtain heuristics about our memory consumption and have a memory consumption threshold
>  # Be able to write incoming ExecBatch to disk if memory consumption is above the threshold, stores either the ExecBatch or a handle to file in a queue.
>  # Provide an api for pulling an ExecBatch from the queue. It should favor pulling all of the batches that are in memory first and then the ones that are handles to files.
>  
> PRs to reference
> [https://github.com/apache/arrow/pull/11017/]
> Discusssion around the subject (these are just for thoughts, these documents are not decisions)
> [https://docs.google.com/document/d/15X0ePnVJqDmT7og1seikZ31Zmtd3KRyJbL33jym5G3A/edit#]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)