You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/05 16:22:10 UTC

[GitHub] [arrow-rs] rjnanderson opened a new issue #1275: parquet doesn't preserve the time in Datatype::Date64

rjnanderson opened a new issue #1275:
URL: https://github.com/apache/arrow-rs/issues/1275


   **Describe the bug**
   When a RecordBatch is stored in a parquet file and then retrieved the time portion of Datatype::Date64 values is changed to 0.
   
   **To Reproduce**
   with this schema:
   Field::new(“item”, DataType::Utf8, false),
   Field::new(“timestamp”, DataType::Date64, false)
   
   1. Read the csv1 data below into batch1
   2. Write batch1 to csv1a
   3. Compare csv1 to csv1a — they match
   4. Write batch1 to a parquet file
   5. Read batch2 from the same parquet file
   6. Write batch2 to csv2
   7. Compare csv1 to csv2 — they don’t match because in csv2 the times are all 00:00:00.000000000
   
   csv1: 
   item,timestamp
   1,1998-10-28T19:10:30.056000000
   2,1998-10-30T11:10:10.623000000
   3,1999-01-23T17:10:31.006000000
   
   csv2: 
   item,timestamp
   1,1998-10-28T00:00:00.000000000
   2,1998-10-30T00:00:00.000000000
   3,1999-01-23T00:00:00.000000000
   
   **Expected behavior**
   The time portion of the DataType::Date64 value should be preserved in parquet just as it is in csv.
   
   **Additional context**
   Version 8.0.0
   
   It looks like this unit test needs to include some non-zero times:
   
       #[test]
       fn date64_single_column() {
           // Date64 must be a multiple of 86400000, see ARROW-10925
           required_and_optional::<Date64Array, _>(
               (0..(SMALL_SIZE as i64 * 86400000)).step_by(86400000),
           );
       }
   
   According to ARROW-10925 a valid *time* is in the range 0..86400000 milliseconds.
   Here DataType::Date64 is defined to be in milliseconds: https://arrow.apache.org/docs/cpp/api/datatype.html
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org