You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/09/06 08:33:17 UTC

[GitHub] lvhuyen edited a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

lvhuyen edited a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418967608
 
 
   @HuangZhenQiu 
   Here is the schema of that parquet file, printed in Zeppelin.
   > root
   >  |-- metrics_date: timestamp (nullable = true)
   >  |-- counter: long (nullable = true)
   >  |-- meter: double (nullable = true)
   >  |-- customer_id: string (nullable = true)
   I also attach that sample file here: [https://github.com/lvhuyen/flink/blob/parquet_input_format(7243)/flink-formats/flink-parquet/src/test/resources/test.parquet](url
   )
   
   I tried to debug in IntelliJ, that column is in fact stored as primitive type int96 (not 64), and as Apache's [https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderImpl.java](url), int96 is treated as a Binary (line 274). As per your current implementation in RowCoverter class, a Binary is converted into a String using UTF-8, which seems to be irreversible and leads to data loss (my data has metrics_date = 2018-09-01 15:02:55.0, which was read as a bytes array of [0, 118, -95, -103, 69, 49, 0, 0, -5, -126, 37, 0] the got converted to a string with length = 12 which has the same character at 3, 4, 9, and 10th position. 
   Should that possible to modify the method RowConverter.RowPrimitiveConverter.addBinary() to handle String / BigInteger differently?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services