You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by GitBox <gi...@apache.org> on 2022/11/29 15:24:45 UTC

[GitHub] [orc] pavibhai commented on a diff in pull request #1314: ORC-1310: Allowlist Support for plugin filter

pavibhai commented on code in PR #1314:
URL: https://github.com/apache/orc/pull/1314#discussion_r1034875806


##########
java/core/src/java/org/apache/orc/OrcConf.java:
##########
@@ -190,6 +193,14 @@ public enum OrcConf {
                       + "determined, they are combined using AND. The order of application is "
                       + "non-deterministic and the filter functionality should not depend on the "
                       + "order of application."),
+
+  PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist",
+                          "orc.filter.plugin.allowlist",
+                          "*",
+                          "A list of comma-separated class names. It is used to specify the "
+                          + "className which will be loaded as PluginFilterService. "
+                          + "Classes that not in the list will be ignored."),

Review Comment:
   Would this be acceptable?
   
   "A list of comma-separated class names. If specified it restricts the PluginFilters to just these classes as discovered by the PluginFilterService."
   
   IMO it feels better that we leave the default as "" indicating allow all discovered classes.



##########
java/core/src/java/org/apache/orc/impl/filter/FilterFactory.java:
##########
@@ -184,4 +186,32 @@ static List<BatchFilter> findPluginFilters(String filePath, Configuration conf)
     }
     return filters;
   }
+
+  /**
+   * Filter BatchFilter which is in the allowList.
+   *
+   * @param filters whole BatchFilter list we load from class path.
+   * @param allowList a Class-Name list that we want to load in.
+   */
+  static List<BatchFilter> getAllowedFilters(List<BatchFilter> filters, List<String> allowList) {
+    List<BatchFilter> allowBatchFilters = new ArrayList<>();

Review Comment:
   *nit maybe move this after
   
   ```java
       if (allowList.contains("*")) {
         return filters;
       }
   ```



##########
java/core/src/java/org/apache/orc/OrcConf.java:
##########
@@ -329,6 +340,14 @@ public String getString(Configuration conf) {
     return getString(null, conf);
   }
 
+  public List<String> getStringAsList(Configuration conf) {
+    String confStr = getString(null, conf);
+    if (confStr == null || confStr.length() == 0) {

Review Comment:
   *nit `confStr.isEmpty()`



##########
java/core/src/java/org/apache/orc/OrcConf.java:
##########
@@ -190,6 +193,14 @@ public enum OrcConf {
                       + "determined, they are combined using AND. The order of application is "
                       + "non-deterministic and the filter functionality should not depend on the "
                       + "order of application."),
+
+  PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist",
+                          "orc.filter.plugin.allowlist",
+                          "*",
+                          "A list of comma-separated class names. It is used to specify the "
+                          + "className which will be loaded as PluginFilterService. "
+                          + "Classes that not in the list will be ignored."),

Review Comment:
   Would this be acceptable?
   
   "A list of comma-separated class names. If specified it restricts the PluginFilters to just these classes as discovered by the PluginFilterService. The default of * allows all discovered classes and an empty string would not allow any plugins to be applied."



##########
java/core/src/java/org/apache/orc/impl/filter/FilterFactory.java:
##########
@@ -71,9 +71,11 @@ public static BatchFilter createBatchFilter(Reader.Options opts,
     // 2. Process PluginFilter
     if (opts.allowPluginFilters()) {
       List<BatchFilter> pluginFilters = findPluginFilters(filePath, conf);
-      if (!pluginFilters.isEmpty()) {
-        LOG.debug("Added plugin filters {} to the read", pluginFilters);
-        filters.addAll(pluginFilters);
+      List<BatchFilter> allowFilters =

Review Comment:
   Maybe we can move this logic into `findPluginFilters`?



##########
java/core/src/test/org/apache/orc/impl/filter/TestPluginFilterService.java:
##########
@@ -56,4 +60,43 @@ public void testErrorCreatingFilter() {
   public void testMissingFilter() {
     assertTrue(FilterFactory.findPluginFilters("file://db/table11/file1", conf).isEmpty());
   }
+
+  @Test
+  public void testHitAllowListFilter() {
+    conf.set("my.filter.name", "my_str_i_eq");
+    // Hit the allowlist.
+    List<String> allowListHit = new ArrayList<>();
+    allowListHit.add("org.apache.orc.impl.filter.BatchFilterFactory$BatchFilterImpl");
+
+    List<BatchFilter> pluginFilters = FilterFactory.findPluginFilters("file://db/table1/file1", conf);
+    List<BatchFilter> allowListFilter = FilterFactory.getAllowedFilters(pluginFilters, allowListHit);
+
+    assertEquals(1, allowListFilter.size());
+  }
+
+  @Test
+  public void testAllowListFilterAllowAll() {
+    conf.set("my.filter.name", "my_str_i_eq");
+    // Hit the allowlist.
+    List<String> allowListHit = new ArrayList<>();
+    allowListHit.add("*");
+
+    List<BatchFilter> pluginFilters = FilterFactory.findPluginFilters("file://db/table1/file1", conf);
+    List<BatchFilter> allowListFilter = FilterFactory.getAllowedFilters(pluginFilters, allowListHit);
+
+    assertEquals(1, allowListFilter.size());
+  }
+
+  @Test
+  public void testAllowListFilterDisallowAll() {
+    conf.set("my.filter.name", "my_str_i_eq");
+    // Hit the allowlist.
+    List<String> allowListHit = new ArrayList<>();
+    allowListHit.add("");

Review Comment:
   If the configuration is set to empty, we will receive an empty List. I would remove this and test with just the empty list. Please do ensure trim of the configuration.



##########
java/core/src/java/org/apache/orc/impl/filter/FilterFactory.java:
##########
@@ -184,4 +186,32 @@ static List<BatchFilter> findPluginFilters(String filePath, Configuration conf)
     }
     return filters;
   }
+
+  /**
+   * Filter BatchFilter which is in the allowList.
+   *
+   * @param filters whole BatchFilter list we load from class path.
+   * @param allowList a Class-Name list that we want to load in.
+   */
+  static List<BatchFilter> getAllowedFilters(List<BatchFilter> filters, List<String> allowList) {
+    List<BatchFilter> allowBatchFilters = new ArrayList<>();
+
+    if (allowList == null || allowList.size() == 0 || filters == null) {
+      LOG.debug("Disable all BatchFilter.");

Review Comment:
   I think it is better to use PluginFilter instead of BatchFilter in the message as that is used elsewhere and this does not disable that use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org