You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/19 23:29:00 UTC
[jira] [Commented] (DRILL-6307) Handle empty batches in record batch sizer correctly

    [ https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444985#comment-16444985 ] 

ASF GitHub Bot commented on DRILL-6307:
---------------------------------------

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/1228

    DRILL-6307: Handle empty batches in record batch sizer correctly

    When we get empty batch, record batch sizer calculates row width as zero. In that case, we do not do accounting and memory allocation correctly for outgoing batches. 
    
    For ex., for outer left join, if right side batch is empty, we still have to include the right side columns as null in outgoing batch. Say first batch is empty. Then, for outgoing, we allocate empty vectors with zero capacity.  When we read the next batch with data, we will end up going through realloc loop as we write values. Also, if we use right side row width as 0 in outgoing row width calculation, number of rows (to include in the outgoing batch) we will calculate will be higher and later when we get a non empty batch, we might exceed the memory limits. 
    
    This PR tries to address these problems by allocating memory based on std size for empty input batch. Uses allocation width as width of the batch in number of rows calculation for binary operators. For unary operators, this is not a problem since we drop empty batches without doing any processing. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-6307

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1228
    
----
commit cd78209e9f75a59edc68df3e416f3936fb00f917
Author: Padma Penumarthy <pp...@...>
Date:   2018-04-06T19:56:06Z

    DRILL-6307: Handle empty batches in record batch sizer correctly

----


> Handle empty batches in record batch sizer correctly
> ----------------------------------------------------
>
>                 Key: DRILL-6307
>                 URL: https://issues.apache.org/jira/browse/DRILL-6307
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In that case, we do not do accounting and memory allocation correctly for outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is empty, we still have to include the right side columns as null in outgoing batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with zero capacity.  When we read the next batch with data, we will end up going through realloc loop. If we use right side row width as 0 in outgoing row width calculation, number of rows we will calculate will be higher and later when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for empty input batch. Use allocation width as width of the batch in number of rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)