You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 19:54:51 UTC

[GitHub] [arrow] rileyhun opened a new issue #12396: Error: Unable to read map data type

rileyhun opened a new issue #12396:
URL: https://github.com/apache/arrow/issues/12396


   We have some data stored in parquet file format from a `pyspark` pipeline and we are trying to read it in using `pyarrow`. Unfortunately, `pyarrow' is not able to interpret one of the stored data types. Would prefer being able to read in the data without relying on `pyspark`. I am using `pyarrow=7.0`
   
   Example:
   
   ```
   import s3fs
   import pyarrow.parquet as pq
   
   fs = s3fs.S3FileSystem()
   bucket_uri = 's3://data/batch=1000doc/part=0'
   
   dataset = pq.ParquetDataset(bucket_uri, filesystem=fs)
   table = dataset.read()
   table.to_pandas()
   ```
   
   Error:
   
   ```
   ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double>
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wjones127 commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

wjones127 commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035660882


   @rileyhun Does the following snippet fail for you? I'm see conversion of that schema go through fine in this example.
   
   ```python
   import pyarrow as pa
   import pandas as pd
   
   data = [[('x', 1.0), ('y', 0.0)], [('a', 2.0), ('b', 45.0)]]
   ty = pa.map_(pa.string(), pa.float64())
   inner = pa.array(data, type=ty)
   array = pa.MapArray.from_arrays([0,1], ['one', 'two'], inner)
   array = pa.MapArray.from_arrays([0,1], [1], array)
   table = pa.table({'col': array})
   array.type
   # MapType(map<int64, map<string, map<string, double>>>)
   table.to_pandas()
   #                                           col
   # 0  [(1, [('one', [('x', 1.0), ('y', 0.0)])])]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] rileyhun closed issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

rileyhun closed issue #12396:
URL: https://github.com/apache/arrow/issues/12396


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035630460


   Hello @wjones127 - thanks so much for your response. 
   
   Here is the data type for that column: `(MapType(map<int64, map<string, map<string, double>>>))`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1042707528


   Thanks @wjones127 - I will try that example shortly. Thanks again for your assistance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1042706762


   @jorisvandenbossche - sorry for the delay in my response. We found a workaround, so going to close the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wjones127 commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

wjones127 commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035618025


   Could you provide the full column type? You can use `print(table.column('col_name').type)` to print it out.
   
   I was not able to reproduce that conversion issue with the following code on PyArrow 7.0.0 (nor 6.0.1):
   
   ```python
   import pyarrow as pa
   import pandas as pd
   
   data = [[('x', 1.0), ('y', 0.0)], [('a', 2.0), ('b', 45.0)]]
   ty = pa.map_(pa.string(), pa.float64())
   table = pa.table({'col': pa.array(data, type=ty)})
   table.to_pandas() # Works fine for me.
   ```
   
   Maybe verify that that works for you and double check your pyarrow version as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

jorisvandenbossche commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1040002674


   @rileyhun can you also verify the pyarrow version that you are using (eg check `pyarrow.__version__` at runtime, to ensure you are actualy using the latest 7.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on issue #12396: Error: Unable to read map data type

Posted by GitBox <gi...@apache.org>.

jorisvandenbossche commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1040002674


   @rileyhun can you also verify the pyarrow version that you are using (eg check `pyarrow.__version__` at runtime, to ensure you are actualy using the latest 7.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org