You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2021/02/10 17:15:36 UTC

[GitHub] [parquet-mr] gszadovszky opened a new pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

gszadovszky opened a new pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869


   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
     - https://issues.apache.org/jira/browse/PARQUET-XXX
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869#discussion_r576649370



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
##########
@@ -157,14 +157,14 @@ public static ColumnChunkMetaData get(
           totalUncompressedSize);
     }
   }
-  
+
   // In sensitive columns, the ColumnMetaData structure is encrypted (with column-specific keys), making the fields like Statistics invisible.
   // Decryption is not performed pro-actively, due to performance and authorization reasons.
   // This method creates an a shell ColumnChunkMetaData object that keeps the encrypted metadata and the decryption tools.
   // These tools will activated later - when/if the column is projected.
-  public static ColumnChunkMetaData getWithEncryptedMetadata(ParquetMetadataConverter parquetMetadataConverter, ColumnPath path, 
+  public static ColumnChunkMetaData getWithEncryptedMetadata(ParquetMetadataConverter parquetMetadataConverter, ColumnPath path,

Review comment:
       Oh, haven't noticed. Probably some IDE configs made these changes. I'll undo them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky merged pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
gszadovszky merged pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869#issuecomment-779102808


   @chenjunjiedada, @shangxinli, could you please check this out?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] chenjunjiedada commented on pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869#issuecomment-779779278


   @gszadovszky, Thanks for fixing this! 
   
   It looks correct to me. Just one minor thing, could you help to add a unit test to check null bloom filter offset when there is no bloom filter?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869#issuecomment-779922579


   > @gszadovszky, Thanks for fixing this!
   > 
   > It looks correct to me. Just one minor thing, could you help to add a unit test to check null bloom filter offset when there is no bloom filter?
   
   Thanks, @chenjunjiedada. The lack of the unit test is valid. Added one, please check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] shangxinli commented on a change in pull request #869: PARQUET-1979: bloom_filter_offset is filled if there are no bloom filters

Posted by GitBox <gi...@apache.org>.
shangxinli commented on a change in pull request #869:
URL: https://github.com/apache/parquet-mr/pull/869#discussion_r576383236



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
##########
@@ -157,14 +157,14 @@ public static ColumnChunkMetaData get(
           totalUncompressedSize);
     }
   }
-  
+
   // In sensitive columns, the ColumnMetaData structure is encrypted (with column-specific keys), making the fields like Statistics invisible.
   // Decryption is not performed pro-actively, due to performance and authorization reasons.
   // This method creates an a shell ColumnChunkMetaData object that keeps the encrypted metadata and the decryption tools.
   // These tools will activated later - when/if the column is projected.
-  public static ColumnChunkMetaData getWithEncryptedMetadata(ParquetMetadataConverter parquetMetadataConverter, ColumnPath path, 
+  public static ColumnChunkMetaData getWithEncryptedMetadata(ParquetMetadataConverter parquetMetadataConverter, ColumnPath path,

Review comment:
       Is this intentionally changed? I see multiple places have changed like this.  These kinds of changes could cause other pending PRs conflict. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org