You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "emcake (via GitHub)" <gi...@apache.org> on 2023/06/12 09:30:51 UTC

[GitHub] [arrow-rs] emcake opened a new issue, #4397: arrow::compute::filter_record_batch drops timezone

emcake opened a new issue, #4397:
URL: https://github.com/apache/arrow-rs/issues/4397

   **Describe the bug**
   When using filter_record_batch on a RecordBatch, one of whose types is a timestamp with timezone, it drops the timezone.
   
   **To Reproduce**
   A test that replicates this:
   
   ```rust
   #[test]
   fn filter_record_batch_maintains_timezones() -> Result<(), arrow::error::ArrowError> {
       let fields = vec![arrow::datatypes::Field::new(
           "timestamp",
           arrow::datatypes::DataType::Timestamp(
               arrow::datatypes::TimeUnit::Nanosecond,
               Some("UTC".to_owned().into()),
           ),
           false,
       )];
   
       let field_builders: Vec<Box<dyn arrow::array::ArrayBuilder>> =
           vec![Box::new(arrow::array::TimestampNanosecondBuilder::new())];
   
       let mut sa = arrow::array::StructBuilder::new(fields, field_builders);
   
       for i in 0..100 {
           sa.field_builder::<arrow::array::TimestampNanosecondBuilder>(0)
               .unwrap()
               .append_value(i);
           sa.append(true);
       }
   
       let struct_array = sa.finish();
   
       let rec: arrow::record_batch::RecordBatch = (&struct_array).into();
   
       let schema = rec.schema();
   
       let dt = schema.field(0);
       assert_eq!(
           &arrow::datatypes::DataType::Timestamp(
               arrow::datatypes::TimeUnit::Nanosecond,
               Some("UTC".to_owned().into())
           ),
           dt.data_type()
       );
   
       let filter: arrow::array::BooleanArray = vec![true; 100].into();
   
       let filtered = arrow::compute::filter_record_batch(&rec, &filter)?;
   
       let filtered_schema = filtered.schema();
   
       let filtered_dt = filtered_schema.field(0);
       assert_eq!(
           &arrow::datatypes::DataType::Timestamp(
               arrow::datatypes::TimeUnit::Nanosecond,
               Some("UTC".to_owned().into())
           ),
           filtered_dt.data_type()
       );
   
       Ok(())
   }
   ```
   
   **Expected behavior**
   Test should pass.
   
   **Additional context**
   Tested on arrow `40` and `41`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4397: arrow::compute::filter_record_batch drops timezone

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4397:
URL: https://github.com/apache/arrow-rs/issues/4397#issuecomment-1586981286

   I think this is actually a bug in StructBuilder
   
   ```
   #[test]
   fn filter_record_batch_maintains_timezones() {
       let fields = vec![Field::new(
           "timestamp",
           DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".to_owned().into())),
           false,
       )];
   
       let field_builders: Vec<Box<dyn ArrayBuilder>> =
           vec![Box::new(TimestampNanosecondBuilder::new())];
   
       let mut sa = StructBuilder::new(fields, field_builders);
   
       for i in 0..100 {
           sa.field_builder::<TimestampNanosecondBuilder>(0)
               .unwrap()
               .append_value(i);
           sa.append(true);
       }
   
       let struct_array = sa.finish();
       assert_eq!(
           struct_array.fields()[0].data_type(),
           &DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".to_owned().into())),
       );
       // FAILS
       assert_eq!(
           struct_array.column(0).data_type(),
           &DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".to_owned().into()))
       );
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #4397: StructBuilder::new Doesn't Validate Builder DataTypes

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4397: StructBuilder::new Doesn't Validate Builder DataTypes
URL: https://github.com/apache/arrow-rs/issues/4397


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #4397: StructBuilder::new Doesn't Validate Builder DataTypes

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4397:
URL: https://github.com/apache/arrow-rs/issues/4397#issuecomment-1595001661

   `label_issue.py` automatically added labels {'arrow'} from #4400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org