You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/02/18 18:40:18 UTC

[jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size

    [ https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152698#comment-15152698 ] 

ASF GitHub Bot commented on DRILL-4411:
---------------------------------------

GitHub user minji-kim opened a pull request:

    https://github.com/apache/drill/pull/381

    DRILL-4411: hash join should limit batch based on size and number of records

    Right now, hash joins can run out of memory if records are large since the batch is limited only by size (of 4000).  This patch implements a simple heuristic.  If the allocator for the outputs become larger than 10 MB before outputing 4000 records (say 2000), then set the batch size limit to 2000 for the future batches.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/minji-kim/drill DRILL-4411

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/381.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #381
    
----
commit 2e3b1c75273e1b87679d79bdc4f3877b72603e3c
Author: Minji Kim <mi...@dremio.com>
Date:   2016-02-18T17:05:51Z

    DRILL-4411: hash join should limit batch based on size as well as number of records

----


> HashJoin should not only depend on number of records, but also on size
> ----------------------------------------------------------------------
>
>                 Key: DRILL-4411
>                 URL: https://issues.apache.org/jira/browse/DRILL-4411
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server
>            Reporter: MinJi Kim
>            Assignee: MinJi Kim
>
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH (4000).  But we should not only depend on the number of records, but also size (in case of extremely large records).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)