You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/13 18:01:17 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

alamb opened a new issue, #1701:
URL: https://github.com/apache/arrow-rs/issues/1701

   **Describe the bug**
   After https://github.com/apache/arrow-rs/pull/1682 from @tustvold  some tests in datafusion begin to fail with 
   
   >  "out of order projection is not supported" after Fix Parquet Arrow Schema Inference
   
   **To Reproduce**
   See the reproduction instructions on https://github.com/apache/arrow-datafusion/pull/2530
   
   
   **Expected behavior**
   tests should pass
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1135966800

   Not beyond updating it to use the new ProjectionMask within ParquetExec, so changing from `column_reader_with_columns(iter)` to `column_reader_with_columns(ProjectionMask::roots(iter))`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1126311359

   It silently ignored the out-of-order projection, i.e.
   
   ```
   #[test]
       fn test_out_of_order_projection() {
           let testdata = arrow::util::test_util::parquet_test_data();
           let path = format!("{}/alltypes_plain.parquet", testdata);
           let file = File::open(&path).unwrap();
           let reader = SerializedFileReader::try_from(file).unwrap();
           let expected_rows = reader.metadata().file_metadata().num_rows() as usize;
   
           let mut arrow_reader = ParquetFileArrowReader::new(Arc::new(reader));
           let b1 = arrow_reader.get_record_reader_by_columns([0, 1], 2).unwrap();
   
           let b2 = arrow_reader.get_record_reader_by_columns([1, 0], 2).unwrap();
   
           assert_eq!(b1.schema, b2.schema);
       }
   ```
   
   Would pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1135978422

   Ok, Maybe I can find time to do so as part of preparing the arrow 15 release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #1701:  "out of order projection is not supported" after Fix Parquet Arrow Schema Inference
URL: https://github.com/apache/arrow-rs/issues/1701


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1136039400

   I'd be happy to help if you run into any roadblocks, it _should_ be straightforward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1136304413

   Update: @tustvold  has kindly offered to take a stab at preparing a datafusion PR to upgrade to latest arrow, prior to #1727 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1126307390

   This is expected behaviour, DataFusion currently passed out of order projection to the parquet reader despite the ArrayReader having never supported it. We now complain explicitly that the reader does not support this, but imo this is better than silently ignoring it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1139495796

   PR here - https://github.com/apache/arrow-datafusion/pull/2631


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1135964916

   Do we need to make any changes to DataFusion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1701: "out of order projection is not supported" after Fix Parquet Arrow Schema Inference

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1701:
URL: https://github.com/apache/arrow-rs/issues/1701#issuecomment-1126308402

   > despite the ArrayReader having never supported it. 
   
   What happened if the reader was passed out of order projections? Was data ignored?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org