You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2020/04/21 21:12:47 UTC

[GitHub] [parquet-mr] garawalid commented on a change in pull request #781: PARQUET-1826: Document Hadoop configuration options

garawalid commented on a change in pull request #781:
URL: https://github.com/apache/parquet-mr/pull/781#discussion_r412495171



##########
File path: parquet-hadoop/README.md
##########
@@ -230,23 +236,28 @@ conf.set("parquet.bloom.filter.expected.ndv#column.path", 200)
 ## Class: ParquetInputFormat
 
 **Property:** `parquet.read.support.class`  
-**Description:** The read support class.
+**Description:** The read support class that is used in
+ParquetInputFormat to materialize records. It should be a the descendant class of `org.apache.parquet.hadoop.api.ReadSupport`
 
 ---
 
 **Property:** `parquet.read.filter`  
-**Description:** **Todo**
+**Description:** The filter class name that implements `org.apache.parquet.filter.UnboundRecordFilter`. This class is for the old filter API in the package `org.apache.parquet.filter`, it filters records during record assembly.
 
 ---
 
-**Property:** `parquet.strict.typing`  
-**Description:** Whether to enable type checking for conflicting schema.  
-**Default value:** `true`
+ **Property:** `parquet.private.read.filter.predicate`  
+ **Description:** The filter class used in the new filter API in the package `org.apache.parquet.filter2.predicate`
+ Note that this class should implements `org.apache.parquet.filter2..FilterPredicate` and the value of this property should be a gzip compressed base64 encoded java serialized object.  
+ The new filter API can filter records or filter entire row groups of records without reading them at all.
+
+**Note:** User should either use the old filter API (`parquet.read.filter`) or the new one (`parquet.private.read.filter.predicate`).

Review comment:
       I agree! 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org