You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/13 03:16:07 UTC

[GitHub] [arrow-rs] chadbrewbaker opened a new issue #1036: JSON reader barfs on {"emptylist":[]}

chadbrewbaker opened a new issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036


   **Describe the bug**
   JSON reader barfs on {"emptylist":[]}
   
   ```bash
   thread 'main' panicked at 'Cannot filter indices on a non-primitive array, found List(true)', /PATH/parquet-6.1.0/src/arrow/levels.rs:757:18
   ```
   
   
   
   **To Reproduce**
   {"emptylist":[]}
   
   **Expected behavior**
   Same as pyarrow, it does not barf the reader.
   
   ```bash
   pyarrow.Table
   emptylist: list<item: null>
     child 0, item: null
   ----
   emptylist: [[0 nulls]]
   ```
   
   **Additional context**
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-998720897


   I am not an expert in this code @novemberkilo  -- I think @nevi-me is currently focused on other things, so I am not sure he will have time to answer. Perhaps looking at the "blame" (or history) of the relevant code might lead to some others to ask?
   
   I also think @tustvold  has been looking at this code recently


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] novemberkilo commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
novemberkilo commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-994274942


   I am interested in picking this up please // @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] novemberkilo edited a comment on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
novemberkilo edited a comment on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-997521410


   @nevi-me @alamb I started with `json2parquet` and found the shape of the RecordBatch that corresponded to `{"emptylist": []}` (see below). This then guided me to writing the test that I've committed for now. I [get the same panic and error message](https://github.com/apache/arrow-rs/runs/4577227397?check_suite_focus=true#step:6:1648) so I think I am on the right track. Any suggestions for where the actual fix might be? I'm spelunking around but if either of you (or anyone else familiar with the code here) can help orient me, that would help.
   
   I ran `json2parquet` on `{"emptylist": []}` and placed a `dbg!` on what is sent to the writer:
   
   ```
   [src/main.rs:182] &batch = Ok(
       RecordBatch {
           schema: Schema {
               fields: [
                   Field {
                       name: "emptylist",
                       data_type: List(
                           Field {
                               name: "item",
                               data_type: Null,
                               nullable: true,
                               dict_id: 0,
                               dict_is_ordered: false,
                               metadata: None,
                           },
                       ),
                       nullable: true,
                       dict_id: 0,
                       dict_is_ordered: false,
                       metadata: None,
                   },
               ],
               metadata: {},
           },
           columns: [
               ListArray
               [
                 NullArray(0),
               ],
           ],
       },
   )
   thread 'main' panicked at 'Cannot filter indices on a non-primitive array, found List(true)', /home/navin/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.3.0/src/arrow/levels.rs:757:18
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] novemberkilo commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
novemberkilo commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-1053940094


   This is relevant to this issue https://github.com/apache/arrow-rs/pull/1063#issuecomment-1053939744


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] novemberkilo commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
novemberkilo commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-997521410


   @nevi-me @alamb I started with `json2parquet` and found the shape of the RecordBatch that corresponded to `{"emptylist": []}`. This then guided me to writing the test that I've committed for now. I get the same panic and error message so I think I am on the right track. Any suggestions for where the actual fix might be? I'm spelunking around but if either of you (or anyone else familiar with the code here) can help orient me, that would help.
   
   The `dbg!` output from running `json2parquet` on the emptylist appears below:
   
   ```
   [src/main.rs:182] &batch = Ok(
       RecordBatch {
           schema: Schema {
               fields: [
                   Field {
                       name: "emptylist",
                       data_type: List(
                           Field {
                               name: "item",
                               data_type: Null,
                               nullable: true,
                               dict_id: 0,
                               dict_is_ordered: false,
                               metadata: None,
                           },
                       ),
                       nullable: true,
                       dict_id: 0,
                       dict_is_ordered: false,
                       metadata: None,
                   },
               ],
               metadata: {},
           },
           columns: [
               ListArray
               [
                 NullArray(0),
               ],
           ],
       },
   )
   thread 'main' panicked at 'Cannot filter indices on a non-primitive array, found List(true)', /home/navin/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.3.0/src/arrow/levels.rs:757:18
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-993927676


   Thanks for the report @chadbrewbaker  -- 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on issue #1036: JSON input barfs on {"emptylist":[]}

Posted by GitBox <gi...@apache.org>.
nevi-me commented on issue #1036:
URL: https://github.com/apache/arrow-rs/issues/1036#issuecomment-994094653


   These parquet bugs are mostly/all my fault. I was working on fixing them a few months ago, but there's been significant changes in my time, and I left them hanging. I really apologise for that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org