You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/05/21 17:10:06 UTC

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3771: [WIP] pushdown array_contains filter to carbon

ajantha-bhat opened a new pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519286



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       Please add test scenario with data as null. array(null)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519687



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       This is WIP temp, cannot merge this poc code. Why review? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r447457038



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {
+              case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
       I want reuse existing equalsTo code, I don't see any advantage of making new expression

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -152,13 +152,25 @@ object CarbonFilters {
     }
 
     def getCarbonExpression(name: String) = {

Review comment:
       I want reuse existing equalsTo code, I don't see any advantage of making new expression




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633044918


   if the query has only one simple filter(without and/or), maybe we can try to push down "limit" to filter.
   So the filter will not require to read all values of all rows.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456176045



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
       it will be hard to read the code after we add more if condition




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437867873



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
       val supportBatch =
         supportBatchedDataSource(relation.relation.sqlContext,
           updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
-      if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+      if (!vectorPushRowFilters && !supportBatch && !implicitExisted && filterSet.nonEmpty &&

Review comment:
       This is for count(*) with array_contains() query.  Here they were reverting back the array_contains(). so avoided it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632599089


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3054/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658688900


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1652/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456167613



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       can you add a test case that likes the below query?
   
   select * from complex1 where arr[0] = 'sd'
   
   can we push down this filter too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-649431612


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1490/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075708



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       Currently carbon doesn't support pushdown of arr[0] = 'sd', because this pushdown is based on array index. 
   Need a separate handling for this. yet to analyze the changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638746696


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1409/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632298941


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3050/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648838314


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3211/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642476599


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r444824720



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
       @QiangCai : can you please tell me, why new expression is required ? why equalTo is not enough ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457070775



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
-      case others => None
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {

Review comment:
       ok. moved




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639416717






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658610416


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632972849


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1335/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648763485


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633064918


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3056/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-634563208


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3074/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639547711


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3140/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075875



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+        CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+    sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk',null) union all " +
+        "select array('ghsf','dbv','','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+
+    checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"),
+      true,
+      "PushedFilters: [*EqualTo(arr,sd)]")
+
+    checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
       This PR is only for UDF pushdown




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638650363


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3132/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-633063889


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1336/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-660848087


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1690/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-634561752


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1353/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r445361657



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
       I think using equalTo expression I can reuse most of the code. what do you think ? @QiangCai 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548570


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3145/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437812719



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]

Review comment:
       1.  better to add new Expression like ArrayContainsExpression
   2.  how about to consider filter BitSetPipeLine ?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
     // In case of ComplexType dataTypes no filters should be pushed down. IsNotNull is being
     // explicitly added by spark and pushed. That also has to be handled and pushed back to
     // Spark for handling.
-    val predicatesWithoutComplex = predicates.filter(predicate =>
+    // allow array_contains() push down
+    val filteredPredicates = predicates.filter(predicate =>

Review comment:
       use '{' instead of '('

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
       val supportBatch =
         supportBatchedDataSource(relation.relation.sqlContext,
           updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
-      if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+      if (!vectorPushRowFilters && !supportBatch && !implicitExisted && filterSet.nonEmpty &&

Review comment:
       why need to change it?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##########
@@ -152,13 +152,25 @@ object CarbonFilters {
     }
 
     def getCarbonExpression(name: String) = {

Review comment:
       in 'createFilter' method,  convert CarbonArrayContains filter to ArrayContainsExpression

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {
+              case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
       how about to use a new filter: CarbonArrayContains




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-648837271


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1484/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632972441


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3055/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661122494


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1698/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456175101



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
         Some(CarbonContainsWith(c))
       case c@Literal(v, t) if (v == null) =>
         Some(FalseExpr())
-      case others => None
+      case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+        a.dataType match {
+          case arrayType: ArrayType =>
+            arrayType.elementType match {

Review comment:
       how about extract the match code block to a method: isPrimitiveDataType and move it into a util class?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519286



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       Please add test scenario with data as null. array(null)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632299427


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1330/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658682568


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3392/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429519687



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestCompactionComplexType.scala
##########
@@ -47,6 +47,33 @@ class TestCompactionComplexType extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE IF EXISTS compactComplex")
   }
 
+  test("complex issue") {
+    sql("drop table if exists complex1")
+    sql("create table complex1 (arr array<String>) stored as carbondata")
+    sql("insert into complex1 select array('as') union all " +
+        "select array('sd','df','gh') union all " +
+        "select array('rt','ew','rtyu','jk','sder') union all " +
+        "select array('ghsf','dbv','fg','ty') union all " +
+        "select array('hjsd','fggb','nhj','sd','asd')")
+

Review comment:
       This is WIP temp, cannot merge this poc code. Why review? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-660848132


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3432/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638746185


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3133/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r444826666



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##########
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
     // In case of ComplexType dataTypes no filters should be pushed down. IsNotNull is being
     // explicitly added by spark and pushed. That also has to be handled and pushed back to
     // Spark for handling.
-    val predicatesWithoutComplex = predicates.filter(predicate =>
+    // allow array_contains() push down
+    val filteredPredicates = predicates.filter(predicate =>

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548932


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1421/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435036918



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       In BlockletScannedResult, dimensionColumnPages[][] 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-632597680


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r429538025



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       how about to cache the page, it will not require to decode for each row again.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =
+                (ArrayQueryType) complexDimensionInfoMap.get(dimensionChunkIndex[i]);
+            int[] numberOfChild = complexType
+                .getNumberOfChild(rawBlockletColumnChunks.getDimensionRawColumnChunks(), null,

Review comment:
       how about to get all numbers of the child once 

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =

Review comment:
       move to the outside of for loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661559605


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435034670



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
##########
@@ -67,4 +67,18 @@ private DimensionColumnPage getDecodedDimensionPage(DimensionColumnPage[][] dime
     }
     return dimensionColumnPages[columnIndex][pageNumber];
   }
+
+  /**
+   * Method will copy the block chunk holder data and return the cloned value.
+   * This method is also used by child.
+   */
+  protected byte[] copyBlockDataChunkWithoutClone(DimensionRawColumnChunk[] rawColumnChunks,
+      DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) {
+    byte[] data =
+        getDecodedDimensionPage(dimensionColumnPages, rawColumnChunks[columnIndex], pageNumber)

Review comment:
       I have debugged, cache is already there. the argument of this method, `DimensionColumnPage[][] dimensionColumnPages` itself is a cache based on column index.
   
   go inside `ComplexQueryType#getDecodedDimensionPage` to see it.
   
   Also observed that only once decodeColumnPage called for that page, reset it is using from cache only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638649864


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1408/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-638651321


   @QiangCai : We need spark changes support to push down limit to carbonara. So, I think it cannot be done here as we use open source spark.
   
   I want to implement array_contains pushdown for all the primitive type not just string type. I will finish it today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-661117575


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3440/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [WIP] pushdown array_contains filter to carbon

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r435038194



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##########
@@ -222,49 +224,90 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
       }
     }
     BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-    for (int i = 0; i < pageNumbers; i++) {
-      BitSet set = new BitSet(numberOfRows[i]);
-      RowIntf row = new RowImpl();
-      BitSet prvBitset = null;
-      // if bitset pipe line is enabled then use rowid from previous bitset
-      // otherwise use older flow
-      if (!useBitsetPipeLine ||
-          null == rawBlockletColumnChunks.getBitSetGroup() ||
-          null == bitSetGroup.getBitSet(i) ||
-          rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
-        for (int index = 0; index < numberOfRows[i]; index++) {
-          createRow(rawBlockletColumnChunks, row, i, index);
-          Boolean rslt = false;
-          try {
-            rslt = exp.evaluate(row).getBoolean();
-          }
-          // Any invalid member while evaluation shall be ignored, system will log the
-          // error only once since all rows the evaluation happens so inorder to avoid
-          // too much log inforation only once the log will be printed.
-          catch (FilterIllegalMemberException e) {
-            FilterUtil.logError(e, false);
-          }
-          if (null != rslt && rslt) {
-            set.set(index);
+
+    if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0]) {
+      // fill default value here
+      DimColumnResolvedFilterInfo dimColumnEvaluatorInfo = dimColEvaluatorInfoList.get(0);
+      // if filter dimension is not present in the current add its default value
+      if (dimColumnEvaluatorInfo.getDimension().getDataType().isComplexType()) {
+        for (int i = 0; i < pageNumbers; i++) {
+          BitSet set = new BitSet(numberOfRows[i]);
+          RowIntf row = new RowImpl();
+          for (int index = 0; index < numberOfRows[i]; index++) {
+            ArrayQueryType complexType =
+                (ArrayQueryType) complexDimensionInfoMap.get(dimensionChunkIndex[i]);
+            int[] numberOfChild = complexType
+                .getNumberOfChild(rawBlockletColumnChunks.getDimensionRawColumnChunks(), null,

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-649430423


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3217/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-639548729


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1416/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org