You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/01 02:55:25 UTC

[GitHub] [arrow-rs] zhaoyanggh opened a new issue #1515: cannot read parquet file

zhaoyanggh opened a new issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515


   **Describe the bug**
   I want to read the parquet I generated, which get the error when I use the "get_row_iter" api, I get this error:
   thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/yzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-11.0.0/src/record/reader.rs:132:52
   
   **To Reproduce**
   Steps to reproduce the behavior:
   This is my schema:
   message table {
     REPEATED group table_info {
       REQUIRED BYTE_ARRAY name;
       REPEATED group cols {
         REQUIRED BYTE_ARRAY name;
         REQUIRED INT32 type;
         OPTIONAL INT32 length;
       }
       REPEATED group tags {
         REQUIRED BYTE_ARRAY name;
         REQUIRED INT32 type;
         OPTIONAL INT32 length;
       }
     }
   }
   
   I can successfully read the parquet if I change the schema to :
   message table {
     REPEATED group table_info {
       REQUIRED BYTE_ARRAY name;
       REPEATED group cols {
         REQUIRED BYTE_ARRAY name;
         REQUIRED INT32 type;
         OPTIONAL INT32 length;
       }
     }
   }
   
   **Expected behavior**
   For my generated parquet file, I can successfully use mac's parquet-tools to read them:
   <img width="1366" alt="Screen Shot 2022-04-01 at 10 50 44 AM" src="https://user-images.githubusercontent.com/47117543/161186191-d82cf613-d2b5-4afb-8d28-2f8676ac3a37.png">
   
   
   **Additional context**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1086128907


   The current `List` logic in `get_arrow_field` is not correct. Proposed a fix at #1517.
   
   But after the fix, you still get:
   
   ```
   panicked at 'Failed to read into array!: ArrowError("Reading repeated field (\"cols\") is not supported yet!")'
   ```
   
   It is another issue, I think.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1086710483


   It seems a known limitation, but I don't find related issue though. I think it is another issue, maybe you can create a new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] zhaoyanggh commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
zhaoyanggh commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1085360702


   [139866508782784.parquet.zip](https://github.com/apache/arrow-rs/files/8394357/139866508782784.parquet.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1085798994


   Can confirm the issue with the given file:
   
   ```
   thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', parquet/src/record/reader.rs:134:52
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
      1: core::panicking::panic_fmt
                at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
      2: core::panicking::panic
                at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:48:5
      3: core::option::Option<T>::unwrap
                at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/option.rs:729:21
      4: parquet::record::reader::TreeBuilder::reader_tree
                at ./parquet/src/record/reader.rs:134:31
      5: parquet::record::reader::TreeBuilder::reader_tree
                at ./parquet/src/record/reader.rs:301:38
      6: parquet::record::reader::TreeBuilder::reader_tree
                at ./parquet/src/record/reader.rs:281:34
      7: parquet::record::reader::TreeBuilder::reader_tree
                at ./parquet/src/record/reader.rs:301:38
      8: parquet::record::reader::TreeBuilder::reader_tree
                at ./parquet/src/record/reader.rs:281:34
      9: parquet::record::reader::TreeBuilder::build
                at ./parquet/src/record/reader.rs:79:26
     10: parquet::record::reader::TreeBuilder::as_iter
                at ./parquet/src/record/reader.rs:102:25
     11: <parquet::record::reader::RowIter as core::iter::traits::iterator::Iterator>::next
                at ./parquet/src/record/reader.rs:774:32
     12: parquet_read::main
                at ./parquet/src/bin/parquet-read.rs:84:15
     13: core::ops::function::FnOnce::call_once
                at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
   ```
   
   The column path that can't be found is ["tags", "name"] but should be ["table_info", "tags", "name"].


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] zhaoyanggh commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
zhaoyanggh commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1086526104


   Thank you so much for the help. For the new problem, should I or someone else create a new github issue or just fix under this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] zhaoyanggh commented on issue #1515: cannot read parquet file

Posted by GitBox <gi...@apache.org>.
zhaoyanggh commented on issue #1515:
URL: https://github.com/apache/arrow-rs/issues/1515#issuecomment-1085360781


   and this is my parquet file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org