You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/31 08:55:37 UTC

[GitHub] [arrow] ritchie46 opened a new pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

ritchie46 opened a new pull request #9379:
URL: https://github.com/apache/arrow/pull/9379


   `Date64` datatype encodes date and time information in ms. The current csv writer only writes the date, making it impossible to serialize and deserialize whilst retaining the same precision.
   
   This PR proposes to write the date and time information to csv for `Date64` types.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ritchie46 commented on a change in pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#discussion_r567402067



##########
File path: rust/arrow/src/csv/writer.rs
##########
@@ -76,6 +76,7 @@ use crate::{array::*, util::serialization::lexical_to_string};
 const DEFAULT_DATE_FORMAT: &str = "%F";
 const DEFAULT_TIME_FORMAT: &str = "%T";
 const DEFAULT_TIMESTAMP_FORMAT: &str = "%FT%H:%M:%S.%9f";
+const DEFAULT_DATETIME_FORMAT: &str = "%+"; //  	ISO 8601 / RFC 3339 date & time format.

Review comment:
       Yes, good point. This seems overly expensive indeed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#issuecomment-770349085


   https://issues.apache.org/jira/browse/ARROW-11443


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#issuecomment-770363765


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=h1) Report
   > Merging [#9379](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=desc) (5a74b50) into [master](https://codecov.io/gh/apache/arrow/commit/f05b49bb08c0a4cc0cbfcfb07103dcf374c7fd38?el=desc) (f05b49b) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9379/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9379      +/-   ##
   ==========================================
   + Coverage   81.94%   81.98%   +0.03%     
   ==========================================
     Files         231      230       -1     
     Lines       53374    53391      +17     
   ==========================================
   + Hits        43739    43773      +34     
   + Misses       9635     9618      -17     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/csv/writer.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3dyaXRlci5ycw==) | `84.12% <100.00%> (+4.84%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `83.52% <0.00%> (-0.15%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vYnVpbGRlci5ycw==) | `88.20% <0.00%> (-0.07%)` | :arrow_down: |
   | [rust/datafusion/src/optimizer/optimizer.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvb3B0aW1pemVyLnJz) | | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.17% <0.00%> (+0.01%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.78% <0.00%> (+0.14%)` | :arrow_up: |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `89.21% <0.00%> (+0.49%)` | :arrow_up: |
   | [rust/datafusion/tests/user\_defined\_plan.rs](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3VzZXJfZGVmaW5lZF9wbGFuLnJz) | `81.64% <0.00%> (+1.99%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow/pull/9379/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=footer). Last update [f05b49b...5a74b50](https://codecov.io/gh/apache/arrow/pull/9379?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#discussion_r571418161



##########
File path: rust/arrow/src/csv/writer.rs
##########
@@ -97,6 +97,8 @@ pub struct Writer<W: Write> {
     has_headers: bool,
     /// The date format for date arrays
     date_format: String,

Review comment:
       It seems like these fields are somewhat confusingly named - `date_format` maybe could be called `date32_format` and `datetime_format` could be called `date64_format` -- though perhaps the root source of the confusion is that `Date32` is always in units of days and `Date64` is in units of millisecond. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#issuecomment-774459432


   Thanks @ritchie46 for the contribution -- we really appreciate your contributions to the project.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ritchie46 commented on pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#issuecomment-770357628


   > LGTM. Do you think it is worth adding a test?
   
   Maybe it is good to test the goal of this PR: namely writing a `Date64` to some csv buffer and later parsing the csv and obtaining the same `Date64` values. I will add a test for that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9379:
URL: https://github.com/apache/arrow/pull/9379#discussion_r567395307



##########
File path: rust/arrow/src/csv/writer.rs
##########
@@ -76,6 +76,7 @@ use crate::{array::*, util::serialization::lexical_to_string};
 const DEFAULT_DATE_FORMAT: &str = "%F";
 const DEFAULT_TIME_FORMAT: &str = "%T";
 const DEFAULT_TIMESTAMP_FORMAT: &str = "%FT%H:%M:%S.%9f";
+const DEFAULT_DATETIME_FORMAT: &str = "%+"; //  	ISO 8601 / RFC 3339 date & time format.

Review comment:
       Not necessarily for this PR, but I believe we can get some better performance by using the `Display` implementation of `NaiveDateTime` when not specifying the format in the writer rather than specifying (and parsing) a format string for each row too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9379: ARROW-11443: [Rust] Write datetime information for Date64 Type in csv writer

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9379:
URL: https://github.com/apache/arrow/pull/9379


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org