You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/10/19 07:35:19 UTC

[GitHub] [parquet-mr] anantdamle opened a new pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

anantdamle opened a new pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831


   Make sure you have checked _all_ steps below.
   
   Reading Parquet files in Apache Beam using ParquetIO uses `AvroParquetReader` causing it to throw `IllegalArgumentException("INT96 not implemented and is deprecated")`
   
   Customers have large datasets which can't be reprocessed again to convert into a supported type. An easier approach would be to convert into a byte array of 12 bytes, that can then be interpreted by the developer in any way they want to interpret it.
   
   This patch interprets the INT96 parquet type as a byte array of 12-bytes. the developer/user can then handle it appropriate to interpret into a timestamp or simple some bytes.
   
   - [x ] My PR adds the following unit tests `testParquetInt96AsFixed12AvroType`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] anantdamle commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
anantdamle commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-737194101


   Gentle bump up 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] anantdamle commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
anantdamle commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-739620994


   Thanks @gszadovszky, quick request to kindly squash and merge as there are 2 - useless commits to rectify my IDE's autochange to LICENSE comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] anantdamle commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
anantdamle commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-738487283


   thanks for the review. I have incorporated the changes requested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky merged pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
gszadovszky merged pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] anantdamle commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
anantdamle commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-724684358


   Adding @rdblue  @tomwhite for review
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] anantdamle commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
anantdamle commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-738864269


   @gszadovszky Thanks for the approval. 
   sorry for a noob question. What happens next? how does this PR get merged in master?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-738890468


   @anantdamle, thank you for the contribution!
   I will do it. Just usually wait 24 hours (because of the weekend this time a bit more) to give a chance for others to comment before it gets merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-737864597


   Parquet community was against adding INT96 support to not to encourage our clients to use it. While I understand the requirement of supporting the already written types. (Meanwhile as parquet-avro did not support INT96 ever this change is required for developments of new functionalities depending on the deprecated INT96.)
   Anyway, I am fine with this change but I do not really like that it works by default. What do you think about keeping the original behavior by default and introduce a configuration flag to switch it on? (See `writeParquetUUID` as an example.) This way we still not encourage the clients to use INT96 but have the option to do so if it is necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-739802895


   @anantdamle, our usual process is to squash all the changes related to one jira before merging. Thanks a lot for your contribution!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org