You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/08/02 20:36:15 UTC

[GitHub] [orc] omalley commented on a change in pull request #716: ORC-743: Added conversion of SArg into filters to take advantage of t…

omalley commented on a change in pull request #716:
URL: https://github.com/apache/orc/pull/716#discussion_r681236104



##########
File path: java/core/src/gen/filters/string_eq.txt
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr;
+import org.apache.orc.filter.impl.LeafFilter;
+
+import java.nio.charset.StandardCharsets;
+
+// This is generated from string_eq.txt
+public class <ClassName> extends LeafFilter {

Review comment:
       All of the leaf classes should be package local.

##########
File path: java/core/pom.xml
##########
@@ -114,6 +119,10 @@
     <dependency>
       <groupId>net.bytebuddy</groupId>
       <artifactId>byte-buddy</artifactId>
+    </dependency>

Review comment:
       I think this is not required by the patch.

##########
File path: java/core/src/gen/filters/type_in.txt
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.<LeafVector>;
+import org.apache.orc.filter.impl.LeafFilter;
+
+import java.util.Arrays;
+import java.util.List;
+
+// This is generated from type_in.txt
+public class <ClassName> extends LeafFilter {
+  public final <LeafType>[] inValues;

Review comment:
       We should have all of the fields as private.

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       I'm not convinced that the generator is worth the added complexity. I found myself generating the code and reviewing that instead.
   If we do keep the generator, I'd suggest a much more specific name like "filter-codegen".

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       The other advantage to having the generated code is that the code refactoring and style check tools work on it.

##########
File path: java/core/src/java/org/apache/orc/filter/impl/OrFilter.java
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.VectorFilter;
+
+public class OrFilter implements VectorFilter {
+
+  public final VectorFilter[] filters;
+  private final Selected orOut = new Selected();
+
+  public OrFilter(VectorFilter[] filters) {
+    this.filters = filters;
+  }
+
+  public static void merge(Selected src, Selected tgt) {

Review comment:
       This should be moved to Selected and renamed to unionDistinct.

##########
File path: java/core/src/java/org/apache/orc/filter/VectorFilter.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter;
+
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.impl.Selected;
+
+/**
+ * A filter that operates on the supplied
+ * {@link org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch} and updates the selections.
+ *
+ * This is the interface that is the basis of both the leaf filters such as Equals, In and logical
+ * filters such as And, Or and Not
+ */
+public interface VectorFilter {
+
+  /**
+   * Filter the vectorized row batch that is wrapped into the FilterContext.
+   * @param fc     The filter context that contains the VectorizedRowBatch
+   * @param bound  The bound of the scan
+   * @param selIn  The current selection
+   * @param selOut The result selection
+   */
+  void filter(OrcFilterContext fc, Selected bound, Selected selIn, Selected selOut);

Review comment:
       I'd propose that we should join bound and selIn to be the rows that should be checked. The documentation should make it clear that bound should not be modified. Furthermore, we document that items in selOut that are not in bound must be retained. The selOut vector must be sorted.

##########
File path: java/core/src/java/org/apache/orc/filter/impl/LeafFilter.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.filter.impl;
+
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.orc.OrcFilterContext;
+import org.apache.orc.filter.VectorFilter;
+
+public abstract class LeafFilter implements VectorFilter {

Review comment:
       I'd suggest putting a boolean in to LeafFilter that is whether the filter is negated. It can be used on the calls to accept at very low cost.

##########
File path: java/core/pom.xml
##########
@@ -37,6 +37,11 @@
       <groupId>org.apache.orc</groupId>
       <artifactId>orc-shims</artifactId>
     </dependency>
+    <dependency>

Review comment:
       I pushed the results of removing the code generation in a [fork](https://github.com/omalley/orc/tree/orc-743) of this branch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org