You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "rowillia (via GitHub)" <gi...@apache.org> on 2023/06/20 20:18:17 UTC

[GitHub] [arrow] rowillia opened a new issue, #36187: [C++] Display the name of the problematic field when returning status "Data type ... is not supported in join non-key field" for HashJoin

rowillia opened a new issue, #36187:
URL: https://github.com/apache/arrow/issues/36187

   ### Describe the enhancement requested
   
   Joining two tables where 1 has any column of type `list` (even if it's not the join column) results in an exception.  For example:
   
   ```python
   import pyarrow as pa
   import random
   NUM_ITEMS = 30
   t1 = pa.Table.from_pydict({
       'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
       'array_column': [[z for z in range(3)] for x in range(NUM_ITEMS)],
   })
   t2 = pa.Table.from_pydict({
       'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
       'value': [x for x in range(NUM_ITEMS)]
   })
   t1.join(t2, 'id', join_type='inner')
   ```
   Results in the following exception:
   `ArrowInvalid: Data type list<item: int64> is not supported in join non-key field`
   
   This [exception](
   https://github.com/apache/arrow/blob/f959a2e05c79351255227a91cb36d6ca39d01a3d/cpp/src/arrow/acero/hash_join_node.cc#L235-L248) is fairly unintuitive (I spent a few hours today trying to understand what was causing this exception) and could be made a lot clearer by providing the field name if it's available (I'm new to Arrow but I believe the name should be available?  https://github.com/apache/arrow/blob/f959a2e05c79351255227a91cb36d6ca39d01a3d/cpp/src/arrow/type.h#L1829-L1831)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #36187: [C++] Display the name of the problematic field when returning status "Data type ... is not supported in join non-key field" for HashJoin

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #36187: [C++] Display the name of the problematic field when returning status "Data type ... is not supported in join non-key field" for HashJoin
URL: https://github.com/apache/arrow/issues/36187


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on issue #36187: [C++] Display the name of the problematic field when returning status "Data type ... is not supported in join non-key field" for HashJoin

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on issue #36187:
URL: https://github.com/apache/arrow/issues/36187#issuecomment-1625241291

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org