You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/12 00:30:32 UTC
[GitHub] [arrow-rs] chadbrewbaker edited a comment on issue #703: Empty or null list of struct cannot be written to parquet
chadbrewbaker edited a comment on issue #703:
URL: https://github.com/apache/arrow-rs/issues/703#issuecomment-991809095
This line of JSON is barfing in json2parquet with:
```bash
thread 'main' panicked at 'Cannot filter indices on a non-primitive array, found List(true)'
```
https://github.com/apache/arrow-rs/blob/e0abda2c178be0c38d4257d22de2e4a3bfafde82/parquet/src/arrow/levels.rs#L757
```json
{"ts":1331901001.88,"fuid":"Fd3cGk2agqUftBeFx4","tx_hosts":["192.168.229.251"],"rx_hosts":["192.168.202.79"],"conn_uids":["CaJMZy195M8cuXfxn4"],"source":"HTTP","depth":0,"analyzers":[],"mime_type":"text/html","duration":0.0,"is_orig":false,"seen_bytes":1433,"total_bytes":1433,"missing_bytes":0,"overflow_bytes":0,"timedout":false}
```
The Python bindings handle this just fine.
```python
from pyarrow import json
fn = 'mini.json'
table = json.read_json(fn)
print(table)
```
```bash
pyarrow.Table
ts: double
fuid: string
tx_hosts: list<item: string>
child 0, item: string
rx_hosts: list<item: string>
child 0, item: string
conn_uids: list<item: string>
child 0, item: string
source: string
depth: int64
analyzers: list<item: null>
child 0, item: null
mime_type: string
duration: double
is_orig: bool
seen_bytes: int64
total_bytes: int64
missing_bytes: int64
overflow_bytes: int64
timedout: bool
----
ts: [[1331901001.88]]
fuid: [["Fd3cGk2agqUftBeFx4"]]
tx_hosts: [[["192.168.229.251"]]]
rx_hosts: [[["192.168.202.79"]]]
conn_uids: [[["CaJMZy195M8cuXfxn4"]]]
source: [["HTTP"]]
depth: [[0]]
analyzers: [[0 nulls]]
mime_type: [["text/html"]]
duration: [[0]]
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org