You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/04/27 12:48:29 UTC

[GitHub] [parquet-mr] srinivasst opened a new pull request #789: PARQUET-1850: Fix dictionaryPageOffset flag setting in toParquetMetadata method

srinivasst opened a new pull request #789:
URL: https://github.com/apache/parquet-mr/pull/789


   ### Issue
   
   toParquetMetadata method converts org.apache.parquet.hadoop.metadata.ParquetMetadata to org.apache.parquet.format.FileMetaData but this does not set the dictionary page offset bit in FileMetaData.
   
   When a FileMetaData object is serialized while writing to the footer and then deserialized, the dictionary offset is lost as the dictionary page offset bit was never set.
   
   ### Fix
   
   The flag is set to true when a dictionary page is used for encoding.
   
   ### Tests
   
   A ParquetMetadata object is created with PLAIN_DICTIONARY encoding and dictionaryPageOffset is set to a non zero value. 
   
   The ParquetMetadata object is converted to FileMetaData using toParquetMetadata method.
   The FileMetaData object is then serialized and deserialized to FileMetaData and converted back to ParquetMetadata using fromParquetMetadata method. 
   
   The new ParquetMetadata should have the same dictionaryPageOffset as the original ParquetMetadata object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] prakharjain09 commented on a change in pull request #789: PARQUET-1850: Fix dictionaryPageOffset flag setting in toParquetMetadata method

Posted by GitBox <gi...@apache.org>.
prakharjain09 commented on a change in pull request #789:
URL: https://github.com/apache/parquet-mr/pull/789#discussion_r416463221



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##########
@@ -480,6 +480,10 @@ private void addRowGroup(ParquetMetadata parquetMetadata, List<RowGroup> rowGrou
           columnMetaData.getTotalSize(),
           columnMetaData.getFirstDataPageOffset());
       columnChunk.meta_data.dictionary_page_offset = columnMetaData.getDictionaryPageOffset();

Review comment:
       Use setDictionary_page_offset instead of `columnChunk.meta_data.dictionary_page_offset =`. That will automatically invoke setDictionary_page_offsetIsSet and we don't need to call it explicitly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] srinivasst commented on pull request #789: PARQUET-1850: Fix dictionaryPageOffset flag setting in toParquetMetadata method

Posted by GitBox <gi...@apache.org>.
srinivasst commented on pull request #789:
URL: https://github.com/apache/parquet-mr/pull/789#issuecomment-619973869


   @julienledem @rdblue @belugabehr please review this PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] asfgit closed pull request #789: PARQUET-1850: Fix dictionaryPageOffset flag setting in toParquetMetadata method

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #789:
URL: https://github.com/apache/parquet-mr/pull/789


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org