You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 19:54:51 UTC
[GitHub] [arrow] rileyhun opened a new issue #12396: Error: Unable to read map data type
rileyhun opened a new issue #12396:
URL: https://github.com/apache/arrow/issues/12396
We have some data stored in parquet file format from a `pyspark` pipeline and we are trying to read it in using `pyarrow`. Unfortunately, `pyarrow' is not able to interpret one of the stored data types. Would prefer being able to read in the data without relying on `pyspark`. I am using `pyarrow=7.0`
Example:
```
import s3fs
import pyarrow.parquet as pq
fs = s3fs.S3FileSystem()
bucket_uri = 's3://data/batch=1000doc/part=0'
dataset = pq.ParquetDataset(bucket_uri, filesystem=fs)
table = dataset.read()
table.to_pandas()
```
Error:
```
ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wjones127 commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
wjones127 commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035660882
@rileyhun Does the following snippet fail for you? I'm see conversion of that schema go through fine in this example.
```python
import pyarrow as pa
import pandas as pd
data = [[('x', 1.0), ('y', 0.0)], [('a', 2.0), ('b', 45.0)]]
ty = pa.map_(pa.string(), pa.float64())
inner = pa.array(data, type=ty)
array = pa.MapArray.from_arrays([0,1], ['one', 'two'], inner)
array = pa.MapArray.from_arrays([0,1], [1], array)
table = pa.table({'col': array})
array.type
# MapType(map<int64, map<string, map<string, double>>>)
table.to_pandas()
# col
# 0 [(1, [('one', [('x', 1.0), ('y', 0.0)])])]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rileyhun closed issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
rileyhun closed issue #12396:
URL: https://github.com/apache/arrow/issues/12396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035630460
Hello @wjones127 - thanks so much for your response.
Here is the data type for that column: `(MapType(map<int64, map<string, map<string, double>>>))`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1042707528
Thanks @wjones127 - I will try that example shortly. Thanks again for your assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rileyhun commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
rileyhun commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1042706762
@jorisvandenbossche - sorry for the delay in my response. We found a workaround, so going to close the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wjones127 commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
wjones127 commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1035618025
Could you provide the full column type? You can use `print(table.column('col_name').type)` to print it out.
I was not able to reproduce that conversion issue with the following code on PyArrow 7.0.0 (nor 6.0.1):
```python
import pyarrow as pa
import pandas as pd
data = [[('x', 1.0), ('y', 0.0)], [('a', 2.0), ('b', 45.0)]]
ty = pa.map_(pa.string(), pa.float64())
table = pa.table({'col': pa.array(data, type=ty)})
table.to_pandas() # Works fine for me.
```
Maybe verify that that works for you and double check your pyarrow version as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1040002674
@rileyhun can you also verify the pyarrow version that you are using (eg check `pyarrow.__version__` at runtime, to ensure you are actualy using the latest 7.0)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #12396: Error: Unable to read map data type
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #12396:
URL: https://github.com/apache/arrow/issues/12396#issuecomment-1040002674
@rileyhun can you also verify the pyarrow version that you are using (eg check `pyarrow.__version__` at runtime, to ensure you are actualy using the latest 7.0)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org