You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/07/18 23:21:00 UTC

[jira] [Created] (ARROW-17115) [C++] HashJoin fails if it encounters a batch with more than 32Ki rows

Weston Pace created ARROW-17115:
-----------------------------------

             Summary: [C++] HashJoin fails if it encounters a batch with more than 32Ki rows
                 Key: ARROW-17115
                 URL: https://issues.apache.org/jira/browse/ARROW-17115
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Weston Pace
            Assignee: Weston Pace


The new swiss join assumes that batches are being broken according to the morsel/batch model and it assumes those batches have, at most, 32Ki rows (signed 16-bit indices are used in various places).

However, we are not currently slicing all of our inputs to batches this small.  This is causing conbench to fail and would likely be a problem with any large inputs.

We should fix this by slicing batches in the engine to the appropriate maximum size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)