You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/24 10:13:14 UTC

[GitHub] [arrow-rs] iyupeng opened a new issue, #1735: IPC reader may break on projection

iyupeng opened a new issue, #1735:
URL: https://github.com/apache/arrow-rs/issues/1735

   **Describe the bug**
   A clear and concise description of what the bug is.
   
   Function `read_record_batch` deals with `projection` in `arrow/src/ipc/reader.rs`.
   
   Current logic may not advance the `node_index` and `buffer_index` correctly because it does not call `create_array` for skipped nodes.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   ```
   [dependencies]
   arrow = "14.0.0"
   ```
   
   ```
   use std::fs::File;
   use std::sync::Arc;
   
   use arrow::array::*;
   use arrow::datatypes::*;
   use arrow::ipc::reader::*;
   use arrow::ipc::writer::*;
   use arrow::record_batch::RecordBatch;
   
   fn main() {
       let schema = Schema::new(vec![
           Field::new("f0", DataType::UInt32, false),
           Field::new("f1", DataType::Utf8, false),
           Field::new("f2", DataType::Boolean, false),
       ]);
       
       let array0 = UInt32Array::from(vec![1, 2, 3]);
       let array1 = StringArray::from(vec!["foo", "bar", "baz"]);
       let array2 = BooleanArray::from(vec![true, false, true]);
   
       let record_batch = RecordBatch::try_new(
           Arc::new(schema.clone()),
           vec![Arc::new(array0), Arc::new(array1), Arc::new(array2)]
       ).unwrap();
   
       {
           let file = File::create("./test.arrow_file").unwrap();
           let mut writer = FileWriter::try_new(file, &schema).unwrap();
   
           writer.write(&record_batch).unwrap();
           writer.finish().unwrap();
       }
   
       let projection = vec![1];
       let file = File::open("./test.arrow_file").unwrap();
       let arrow_reader = FileReader::try_new(file, Some(projection.clone()));
       let batch = arrow_reader.unwrap().next();
       println!("{:?}", batch);
   }
   ```
   
   Above code will produce:
   ```
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("First offset 1 in Utf8 is smaller than last offset 0")', /home/yupeng/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-14.0.0/src/ipc/reader.rs:287:29
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   **Expected behavior**
   Output:
   
   ```
   Some(Ok(RecordBatch { schema: Schema { fields: [Field { name: "f1", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [StringArray
   [
     "foo",
     "bar",
     "baz",
   ]], row_count: 3 }))
   ```
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya closed issue #1735: IPC reader may break on projection

Posted by GitBox <gi...@apache.org>.
viirya closed issue #1735: IPC reader may break on projection
URL: https://github.com/apache/arrow-rs/issues/1735


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org