You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/07 12:11:36 UTC

[GitHub] [arrow-datafusion] syntonym opened a new issue #693: Using string datatype from python raises "Exception: The type 13 is not valid"

syntonym opened a new issue #693:
URL: https://github.com/apache/arrow-datafusion/issues/693


   I'm trying to use datafusion from python with the python package datafusion  0.2.0 and pyarrow 4.0.1. Using a string datatype leads to `Exception: The type 13 is not valid` when trying to construct a dataframe (see code below). 
   
   It seems that at least for my pyarrow 4.0.1 the string datatype has id 13 instead of the expected 21.
   
   In datafusion-python the ids are set in python/src/types.rs where 21 gets mapped to UTF8 and 13 is not mapped due to being unsupported. Chaning 21 here to 13 and building the package fixes the error and datafusion works with my data as expected. In arrow it seems like type ids are coming from an enum in arrow/python/pyarrow/includes/libarrow.pxd where string is the 21st entry. I thought that maybe I used an old pyarrow version, but the last recent code changes in that area are 13 months old.
   
   **To Reproduce**
   
   ```
   import datafusion
   import pyarrow
   
   f = datafusion.functions
   
   batch = pyarrow.RecordBatch.from_arrays(
       [pyarrow.array(["a", "b", "c"]), pyarrow.array([4, 5, 6])],
       names=["a", "b"],
   )
   
   ctx = datafusion.ExecutionContext()
   df = ctx.create_dataframe([[batch]])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org