You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/18 22:40:48 UTC

[GitHub] [arrow] nevi-me opened a new pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

nevi-me opened a new pull request #9253:
URL: https://github.com/apache/arrow/pull/9253


   The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
   This changes that to use the builder that preserves timezones.
   
   I tested this change with the test file provided in the JIRA.
   It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] maxburke commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
maxburke commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-763138559


   I guess this will be merged after 3.0 is released?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#discussion_r559942620



##########
File path: rust/parquet/src/arrow/converter.rs
##########
@@ -168,19 +167,19 @@ impl Converter<Vec<Option<FixedLenByteArray>>, IntervalDayTimeArray>
     }
 }
 
-pub struct Int96ArrayConverter {}
+pub struct Int96ArrayConverter {
+    pub timezone: Option<String>,
+}
 
 impl Converter<Vec<Option<Int96>>, TimestampNanosecondArray> for Int96ArrayConverter {
     fn convert(&self, source: Vec<Option<Int96>>) -> Result<TimestampNanosecondArray> {
-        let mut builder = TimestampNanosecondBuilder::new(source.len());
-        for v in source {
-            match v {
-                Some(array) => builder.append_value(array.to_i64() * 1000000),
-                None => builder.append_null(),
-            }?
-        }
-
-        Ok(builder.finish())
+        Ok(TimestampNanosecondArray::from_opt_vec(

Review comment:
       Cool. Good to know. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#discussion_r559940749



##########
File path: rust/parquet/src/arrow/converter.rs
##########
@@ -168,19 +167,19 @@ impl Converter<Vec<Option<FixedLenByteArray>>, IntervalDayTimeArray>
     }
 }
 
-pub struct Int96ArrayConverter {}
+pub struct Int96ArrayConverter {
+    pub timezone: Option<String>,
+}
 
 impl Converter<Vec<Option<Int96>>, TimestampNanosecondArray> for Int96ArrayConverter {
     fn convert(&self, source: Vec<Option<Int96>>) -> Result<TimestampNanosecondArray> {
-        let mut builder = TimestampNanosecondBuilder::new(source.len());
-        for v in source {
-            match v {
-                Some(array) => builder.append_value(array.to_i64() * 1000000),
-                None => builder.append_null(),
-            }?
-        }
-
-        Ok(builder.finish())
+        Ok(TimestampNanosecondArray::from_opt_vec(

Review comment:
       We've found the builder pattern to be slower than allocating vecs. There's `FromIter` for PrimitiveArray, but no equivalent for `TimestampArray::from_opt_vec`. I've filed ARROW-11312 to address this.
   
   On the other field types, we don't use Array builder there, but use `ArrayData::builder(timestamp_with_timezone)`. So they don't suffer from the same limitation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-762503781


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=h1) Report
   > Merging [#9253](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=desc) (5938e70) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **increase** coverage by `0.00%`.
   > The diff coverage is `95.12%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9253/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9253   +/-   ##
   =======================================
     Coverage   81.61%   81.61%           
   =======================================
     Files         215      215           
     Lines       51867    51896   +29     
   =======================================
   + Hits        42329    42357   +28     
   - Misses       9538     9539    +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9253/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `71.54% <83.33%> (+0.14%)` | :arrow_up: |
   | [rust/parquet/src/arrow/converter.rs](https://codecov.io/gh/apache/arrow/pull/9253/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9jb252ZXJ0ZXIucnM=) | `73.04% <83.33%> (ø)` | |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9253/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.66% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9253/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9253/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `84.21% <0.00%> (+5.26%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=footer). Last update [69a9a1c...5938e70](https://codecov.io/gh/apache/arrow/pull/9253?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-762500339


   https://issues.apache.org/jira/browse/ARROW-11269


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #9253:
URL: https://github.com/apache/arrow/pull/9253


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-762647178


   @nevi-me yes agreed - I think we shouldn't support writing to int96 from arrow for the reasons you listed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-762632150


   Tnanks @sunchao, is it worthwhile to support writing int96 with the arrow writers? It's deprecated, and version 2.6.0 of the format introduces TIMESTAMP_NANOS, so when we support 2.6.0, users will have the ability to write those timestamps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#issuecomment-762500784


   CC @maxburke @mcassels 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9253: ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9253:
URL: https://github.com/apache/arrow/pull/9253#discussion_r559938124



##########
File path: rust/parquet/src/arrow/converter.rs
##########
@@ -168,19 +167,19 @@ impl Converter<Vec<Option<FixedLenByteArray>>, IntervalDayTimeArray>
     }
 }
 
-pub struct Int96ArrayConverter {}
+pub struct Int96ArrayConverter {
+    pub timezone: Option<String>,
+}
 
 impl Converter<Vec<Option<Int96>>, TimestampNanosecondArray> for Int96ArrayConverter {
     fn convert(&self, source: Vec<Option<Int96>>) -> Result<TimestampNanosecondArray> {
-        let mut builder = TimestampNanosecondBuilder::new(source.len());
-        for v in source {
-            match v {
-                Some(array) => builder.append_value(array.to_i64() * 1000000),
-                None => builder.append_null(),
-            }?
-        }
-
-        Ok(builder.finish())
+        Ok(TimestampNanosecondArray::from_opt_vec(

Review comment:
       Are we going to change other converters to use this pattern as well? Also I'm not sure what the performance looks like with this new approach though - seems it needs to allocate extra memory for the intermediate `Vec`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org