You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/13 17:08:00 UTC

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

    [ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553716#comment-17553716 ] 

ASF GitHub Bot commented on PARQUET-2157:
-----------------------------------------

dongjoon-hyun commented on code in PR #975:
URL: https://github.com/apache/parquet-mr/pull/975#discussion_r895949925


##########
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java:
##########
@@ -39,13 +39,17 @@
 import java.io.File;
 import java.io.IOException;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.Map;
+import java.util.Set;
 import java.util.concurrent.Callable;
 
 import net.openhft.hashing.LongHashFunction;
+import org.apache.commons.lang3.RandomStringUtils;

Review Comment:
   To avoid CI failure, please add this as a test dependency to `parquet-hadoop/pom.xml`.
   ```
       <dependency>
         <groupId>org.apache.commons</groupId>
         <artifactId>commons-lang3</artifactId>
         <version>3.9</version>
         <scope>test</scope>
       </dependency>
   ```





> Add BloomFilter fpp config
> --------------------------
>
>                 Key: PARQUET-2157
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2157
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Huaxin Gao
>            Priority: Major
>
> Currently parquet-mr hardcoded bloom filter fpp (false positive probability) to 0.01.  We should have a config to let user to specify fpp.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)