You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/05/28 06:25:55 UTC

[GitHub] [carbondata] Indhumathi27 opened a new pull request #3778: [WIP] Support array with SI

Indhumathi27 opened a new pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480920128



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
   /**
    * This method will prepare the data from raw object that will take part in sorting
    */
-  private Object[] prepareRowObjectForSorting(Object[] row) {
+  private Object[] prepareRowObjectForSorting(Object[] row,
+      Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+      throws SecondaryIndexException {
     ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
-    // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+    byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
 
     List<CarbonDimension> dimensions = segmentProperties.getDimensions();
     Object[] preparedRow = new Object[dimensions.size() + measureCount];
+    Map<Integer, Object[]> complexDataMap = new HashMap<>();
 
     int noDictionaryIndex = 0;
     int dictionaryIndex = 0;
+    int complexIndex = 0;
     int i = 0;
     // loop excluding last dimension as last one is implicit column.
     for (; i < dimensions.size() - 1; i++) {
       CarbonDimension dims = dimensions.get(i);
-      if (dims.hasEncoding(Encoding.DICTIONARY)) {
+      boolean isComplexColumn = false;
+      // check if dimension is a complex data type
+      if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {

Review comment:
       Can we use dim.isComplex to do the same check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661763876






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458509321



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,35 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-    Object[] data = fillData(dataBuffer);
+    return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {
+    Object[] data = fillData(dataBuffer, false);
     if (data == null) {
       return null;
     }
     return DataTypeUtil.getDataTypeConverter().wrapWithGenericArrayData(data);
   }
 
-  protected Object[] fillData(ByteBuffer dataBuffer) {
+  @Override
+  public Object[] getObjectArrayDataBasedOnDataType(ByteBuffer dataBuffer) {
+    Object[] data = fillData(dataBuffer, true);
+    if (data == null) {
+      return null;
+    }
+    return data;

Review comment:
       return fillData(dataBuffer, true);

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -281,12 +310,45 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
       }
     }
 
+    if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+      // In case of complex array type, flatten the data and add for sorting
+      // TODO: Handle for nested array and other complex types
+      for (int k = 0; k < wrapper.getComplexTypesKeys().length; k++) {
+        byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(k);
+        ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+        GenericQueryType genericQueryType =
+            complexDimensionInfoMap.get(complexColumnParentBlockIndexes[k]);
+        short length = byteArrayInput.getShort(2);
+        // get flattened array data
+        Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+        if (length != 1) {
+          for (int j = 1; j < length; j++) {
+            preparedRow[i] = getData(data, j);
+            preparedRow[i + 1] = implicitColumnByteArray;
+            addRowForSorting(preparedRow.clone());
+          }
+          // add first row
+          preparedRow[i] = getData(data, 0);
+        } else {
+          preparedRow[i] = getData(data, 0);
+        }

Review comment:
       }
   preparedRow[i] = getData(data, 0);

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -226,8 +239,16 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
     for (CarbonIterator<RowBatch> detailQueryIterator : detailQueryResultIteratorList) {
       while (detailQueryIterator.hasNext()) {
         RowBatch batchResult = detailQueryIterator.next();
+        DetailQueryResultIterator queryIterator = (DetailQueryResultIterator) detailQueryIterator;
+        BlockExecutionInfo blockExecutionInfo = queryIterator.getBlockExecutionInfo();
+        // get complex dimension info map from block execution info
+        Map<Integer, GenericQueryType> complexDimensionInfoMap =
+            blockExecutionInfo.getComplexDimensionInfoMap();
+        int[] complexColumnParentBlockIndexes =
+            blockExecutionInfo.getComplexColumnParentBlockIndexes();

Review comment:
       move to outside of while loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708488


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3818/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671499205


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1948/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674787907


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2004/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677642207


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3810/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365673



##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##########
@@ -221,4 +221,9 @@ public int numberOfNodes() {
   public List<TableBlockInfo> getBlockInfos() {

Review comment:
       removed getBlockInfos method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673324648


   please rebase


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r467978296



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+  override def beforeEach(): Unit = {
+    sql("drop table if exists complextable")
+  }
+
+  override def afterEach(): Unit = {
+    sql("drop index if exists index_1 on complextable")
+    sql("drop table if exists complextable")
+  }
+
+  test("test array<string> on secondary index") {

Review comment:
       currenlty, for d), for query having more than one array contains filter will not SI. query will hit main table only. added testcase for c)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r469772276



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int columnIndex) {
 
   @Override
   public void addChildren(GenericQueryType children) {
-    if (this.getName().equals(children.getParentName())) {
+    if (null == this.getName() || this.getName().equals(children.getParentName())) {

Review comment:
       When the name can be null?

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
    * property which defines the insert stage flow
    */
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";

Review comment:
       better to move all constants of SI together to one place of this class.

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
    * map that contains the mapping of block id to the valid blocklets in that block which contain
    * the data as per the applied filter
    */
-  private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+  private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List<Expression> implicitFilterList) {
+    final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());

Review comment:
       move LOGGER to be a static field of this class

##########
File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##########
@@ -221,4 +221,9 @@ public int numberOfNodes() {
   public List<TableBlockInfo> getBlockInfos() {

Review comment:
       after we added getTableBlockInfo(), can we remove this method?

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
    * map that contains the mapping of block id to the valid blocklets in that block which contain
    * the data as per the applied filter
    */
-  private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+  private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List<Expression> implicitFilterList) {
+    final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
     // initialize map with half the size of filter list as one block id can contain
     // multiple blocklets
     blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
     for (Expression value : implicitFilterList) {
       String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString();
       addBlockEntry(blockletPath);
     }
+    int complexFilterThreshold = CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+    isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+    if (isThresholdReached) {
+      LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", Threshold is: "
+          + complexFilterThreshold);
+    }
   }
 
-  public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) {
+  public ImplicitExpression(Map<String, Set<String>> blockIdToBlockletIdMapping) {
     this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
   }
 
   private void addBlockEntry(String blockletPath) {

Review comment:
       The logic of this method is hard to understand.
   Can we add a flag into ImplicitExpression when it is created?
   if it is blocklet level, we addBlockletEntry.
   if it is row level, we addRowEntry.
   
   

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-    Object[] data = fillData(dataBuffer);
+    return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {

Review comment:
       how about to keep the old method and add a new method getObjectDataBasedOnDataType?
   It will not need this boolean parameter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661191872


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1701/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684727237


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3942/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660967562


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3438/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458547213



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+  override def beforeEach(): Unit = {
+    sql("drop table if exists complextable")
+  }
+
+  override def afterEach(): Unit = {
+    sql("drop index if exists index_1 on complextable")
+    sql("drop table if exists complextable")
+  }
+
+  test("test array<string> on secondary index") {

Review comment:
       a) I feel we should handle the SI support for all array of primitive not just string.
   b) Better to discuss the row level SI solution in community once ?
   c) Also if multiple SI created for one primitive and one complex column. query gives 0 rows. Need to handle it.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-680679287


   @QiangCai @ajantha-bhat Reverted  row level position reference code changes. Please review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661187440


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3443/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662267686


   Agree with @ajantha-bhat 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480960162



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
   /**
    * This method will prepare the data from raw object that will take part in sorting
    */
-  private Object[] prepareRowObjectForSorting(Object[] row) {
+  private Object[] prepareRowObjectForSorting(Object[] row,
+      Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+      throws SecondaryIndexException {
     ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
-    // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+    byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
 
     List<CarbonDimension> dimensions = segmentProperties.getDimensions();
     Object[] preparedRow = new Object[dimensions.size() + measureCount];
+    Map<Integer, Object[]> complexDataMap = new HashMap<>();
 
     int noDictionaryIndex = 0;
     int dictionaryIndex = 0;
+    int complexIndex = 0;
     int i = 0;
     // loop excluding last dimension as last one is implicit column.
     for (; i < dimensions.size() - 1; i++) {
       CarbonDimension dims = dimensions.get(i);
-      if (dims.hasEncoding(Encoding.DICTIONARY)) {
+      boolean isComplexColumn = false;
+      // check if dimension is a complex data type
+      if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+        for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+          if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+            isComplexColumn = true;
+            break;
+          }
+        }
+      }
+      if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
         // dictionary
         preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
       } else {
-        // no dictionary dims
-        byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
-        // no dictionary primitive columns are expected to be in original data while loading,
-        // so convert it to original data
-        if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
-          Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
-              noDictionaryKeyByIndex, dims.getDataType());
-          if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
-            dataFromBytes = (long) dataFromBytes / 1000L;
+        if (isComplexColumn) {
+          byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+          ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+          GenericQueryType genericQueryType =
+              complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+          int complexDataLength = byteArrayInput.getShort(2);
+          // In case, if array is empty
+          if (complexDataLength == 0) {
+            complexDataLength = complexDataLength + 1;
+          }
+          // get flattened array data
+          Object[] complexFlattenedData = new Object[complexDataLength];
+          Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+          for (int index = 0; index < complexDataLength; index++) {
+            complexFlattenedData[index] =
+                getData(data, index, dims.getColumnSchema().getDataType());
           }
-          preparedRow[i] = dataFromBytes;
+          complexDataMap.put(i, complexFlattenedData);
         } else {
-          preparedRow[i] = noDictionaryKeyByIndex;
+          // no dictionary dims
+          byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+          // no dictionary primitive columns are expected to be in original data while loading,
+          // so convert it to original data
+          if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+            Object dataFromBytes = DataTypeUtil
+                .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+                    dims.getDataType());
+            if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+              dataFromBytes = (long) dataFromBytes / 1000L;
+            }
+            preparedRow[i] = dataFromBytes;
+          } else {
+            preparedRow[i] = noDictionaryKeyByIndex;
+          }
         }
       }
     }
 
     // at last add implicit column position reference(PID)
+    preparedRow[i] = implicitColumnByteArray;
 
-    preparedRow[i] = wrapper.getImplicitColumnByteArray();
+    // In case of complex array type, flatten the data and add for sorting
+    // TODO: Handle for nested array and other complex types

Review comment:
       added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480922354



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
   /**
    * This method will prepare the data from raw object that will take part in sorting
    */
-  private Object[] prepareRowObjectForSorting(Object[] row) {
+  private Object[] prepareRowObjectForSorting(Object[] row,
+      Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+      throws SecondaryIndexException {
     ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
-    // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+    byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
 
     List<CarbonDimension> dimensions = segmentProperties.getDimensions();
     Object[] preparedRow = new Object[dimensions.size() + measureCount];
+    Map<Integer, Object[]> complexDataMap = new HashMap<>();
 
     int noDictionaryIndex = 0;
     int dictionaryIndex = 0;
+    int complexIndex = 0;
     int i = 0;
     // loop excluding last dimension as last one is implicit column.
     for (; i < dimensions.size() - 1; i++) {
       CarbonDimension dims = dimensions.get(i);
-      if (dims.hasEncoding(Encoding.DICTIONARY)) {
+      boolean isComplexColumn = false;
+      // check if dimension is a complex data type
+      if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+        for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+          if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+            isComplexColumn = true;
+            break;
+          }
+        }
+      }
+      if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
         // dictionary
         preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
       } else {
-        // no dictionary dims
-        byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
-        // no dictionary primitive columns are expected to be in original data while loading,
-        // so convert it to original data
-        if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
-          Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
-              noDictionaryKeyByIndex, dims.getDataType());
-          if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
-            dataFromBytes = (long) dataFromBytes / 1000L;
+        if (isComplexColumn) {
+          byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+          ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+          GenericQueryType genericQueryType =
+              complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+          int complexDataLength = byteArrayInput.getShort(2);
+          // In case, if array is empty
+          if (complexDataLength == 0) {
+            complexDataLength = complexDataLength + 1;
+          }
+          // get flattened array data
+          Object[] complexFlattenedData = new Object[complexDataLength];
+          Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+          for (int index = 0; index < complexDataLength; index++) {
+            complexFlattenedData[index] =
+                getData(data, index, dims.getColumnSchema().getDataType());
           }
-          preparedRow[i] = dataFromBytes;
+          complexDataMap.put(i, complexFlattenedData);
         } else {
-          preparedRow[i] = noDictionaryKeyByIndex;
+          // no dictionary dims
+          byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+          // no dictionary primitive columns are expected to be in original data while loading,
+          // so convert it to original data
+          if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+            Object dataFromBytes = DataTypeUtil
+                .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+                    dims.getDataType());
+            if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+              dataFromBytes = (long) dataFromBytes / 1000L;
+            }
+            preparedRow[i] = dataFromBytes;
+          } else {
+            preparedRow[i] = noDictionaryKeyByIndex;
+          }
         }
       }
     }
 
     // at last add implicit column position reference(PID)
+    preparedRow[i] = implicitColumnByteArray;
 
-    preparedRow[i] = wrapper.getImplicitColumnByteArray();
+    // In case of complex array type, flatten the data and add for sorting
+    // TODO: Handle for nested array and other complex types

Review comment:
       Please mention in some doc, that nexted complex is not supported

##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
   /**
    * This method will prepare the data from raw object that will take part in sorting
    */
-  private Object[] prepareRowObjectForSorting(Object[] row) {
+  private Object[] prepareRowObjectForSorting(Object[] row,
+      Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+      throws SecondaryIndexException {
     ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
-    // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+    byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
 
     List<CarbonDimension> dimensions = segmentProperties.getDimensions();
     Object[] preparedRow = new Object[dimensions.size() + measureCount];
+    Map<Integer, Object[]> complexDataMap = new HashMap<>();
 
     int noDictionaryIndex = 0;
     int dictionaryIndex = 0;
+    int complexIndex = 0;
     int i = 0;
     // loop excluding last dimension as last one is implicit column.
     for (; i < dimensions.size() - 1; i++) {
       CarbonDimension dims = dimensions.get(i);
-      if (dims.hasEncoding(Encoding.DICTIONARY)) {
+      boolean isComplexColumn = false;
+      // check if dimension is a complex data type
+      if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {
+        for (GenericQueryType queryType : complexDimensionInfoMap.values()) {
+          if (queryType.getName().equalsIgnoreCase(dims.getColName())) {
+            isComplexColumn = true;
+            break;
+          }
+        }
+      }
+      if (dims.hasEncoding(Encoding.DICTIONARY) && !isComplexColumn) {
         // dictionary
         preparedRow[i] = wrapper.getDictionaryKeyByIndex(dictionaryIndex++);
       } else {
-        // no dictionary dims
-        byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
-        // no dictionary primitive columns are expected to be in original data while loading,
-        // so convert it to original data
-        if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
-          Object dataFromBytes = DataTypeUtil.getDataBasedOnDataTypeForNoDictionaryColumn(
-              noDictionaryKeyByIndex, dims.getDataType());
-          if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
-            dataFromBytes = (long) dataFromBytes / 1000L;
+        if (isComplexColumn) {
+          byte[] complexKeyByIndex = wrapper.getComplexKeyByIndex(complexIndex);
+          ByteBuffer byteArrayInput = ByteBuffer.wrap(complexKeyByIndex);
+          GenericQueryType genericQueryType =
+              complexDimensionInfoMap.get(complexColumnParentBlockIndexes[complexIndex++]);
+          int complexDataLength = byteArrayInput.getShort(2);
+          // In case, if array is empty
+          if (complexDataLength == 0) {
+            complexDataLength = complexDataLength + 1;
+          }
+          // get flattened array data
+          Object[] complexFlattenedData = new Object[complexDataLength];
+          Object[] data = genericQueryType.getObjectArrayDataBasedOnDataType(byteArrayInput);
+          for (int index = 0; index < complexDataLength; index++) {
+            complexFlattenedData[index] =
+                getData(data, index, dims.getColumnSchema().getDataType());
           }
-          preparedRow[i] = dataFromBytes;
+          complexDataMap.put(i, complexFlattenedData);
         } else {
-          preparedRow[i] = noDictionaryKeyByIndex;
+          // no dictionary dims
+          byte[] noDictionaryKeyByIndex = wrapper.getNoDictionaryKeyByIndex(noDictionaryIndex++);
+          // no dictionary primitive columns are expected to be in original data while loading,
+          // so convert it to original data
+          if (DataTypeUtil.isPrimitiveColumn(dims.getDataType())) {
+            Object dataFromBytes = DataTypeUtil
+                .getDataBasedOnDataTypeForNoDictionaryColumn(noDictionaryKeyByIndex,
+                    dims.getDataType());
+            if (null != dataFromBytes && dims.getDataType() == DataTypes.TIMESTAMP) {
+              dataFromBytes = (long) dataFromBytes / 1000L;
+            }
+            preparedRow[i] = dataFromBytes;
+          } else {
+            preparedRow[i] = noDictionaryKeyByIndex;
+          }
         }
       }
     }
 
     // at last add implicit column position reference(PID)
+    preparedRow[i] = implicitColumnByteArray;
 
-    preparedRow[i] = wrapper.getImplicitColumnByteArray();
+    // In case of complex array type, flatten the data and add for sorting
+    // TODO: Handle for nested array and other complex types

Review comment:
       Please mention in some doc, that nested complex is not supported




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671341729


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1943/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471459506



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-    Object[] data = fillData(dataBuffer);
+    return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {

Review comment:
       handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660968353


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1696/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677789319


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3820/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660048799


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3422/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600323



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
 
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+  /**
+   * Maximum value for complex filter threshold
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+  /**
+   * Property to decide if position id till row level or not
+   */
+  public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =

Review comment:
       In the community, we concluded that no need of row level position reference. So, why this is required ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673649958


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1974/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r460795333



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+  override def beforeEach(): Unit = {
+    sql("drop table if exists complextable")
+  }
+
+  override def afterEach(): Unit = {
+    sql("drop index if exists index_1 on complextable")
+    sql("drop table if exists complextable")
+  }
+
+  test("test array<string> on secondary index") {

Review comment:
       d) If two array_contains() present with AND in query. When it is pushed down as equal to filter in SI. It will give 0 rows as SI is flattened and it cannot find two values in one row. Need to handle that also




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684965465


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2212/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671340388


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3682/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660869540


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1689/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635224307


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677647032


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2069/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682146723


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2155/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r480961113



##########
File path: integration/spark/src/main/java/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java
##########
@@ -243,46 +258,115 @@ private void processResult(List<CarbonIterator<RowBatch>> detailQueryResultItera
   /**
    * This method will prepare the data from raw object that will take part in sorting
    */
-  private Object[] prepareRowObjectForSorting(Object[] row) {
+  private Object[] prepareRowObjectForSorting(Object[] row,
+      Map<Integer, GenericQueryType> complexDimensionInfoMap, int[] complexColumnParentBlockIndexes)
+      throws SecondaryIndexException {
     ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0];
-    // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount];
+    byte[] implicitColumnByteArray = wrapper.getImplicitColumnByteArray();
 
     List<CarbonDimension> dimensions = segmentProperties.getDimensions();
     Object[] preparedRow = new Object[dimensions.size() + measureCount];
+    Map<Integer, Object[]> complexDataMap = new HashMap<>();
 
     int noDictionaryIndex = 0;
     int dictionaryIndex = 0;
+    int complexIndex = 0;
     int i = 0;
     // loop excluding last dimension as last one is implicit column.
     for (; i < dimensions.size() - 1; i++) {
       CarbonDimension dims = dimensions.get(i);
-      if (dims.hasEncoding(Encoding.DICTIONARY)) {
+      boolean isComplexColumn = false;
+      // check if dimension is a complex data type
+      if (!complexDimensionInfoMap.isEmpty() && complexColumnParentBlockIndexes.length > 0) {

Review comment:
       dim is actually the index table dimension, where complex type of main table is stored as its primitive type in SI. Hence, we cannot check isComplex from dimension.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365424



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int columnIndex) {
 
   @Override
   public void addChildren(GenericQueryType children) {
-    if (this.getName().equals(children.getParentName())) {
+    if (null == this.getName() || this.getName().equals(children.getParentName())) {

Review comment:
       removed this check

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
    * property which defines the insert stage flow
    */
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";

Review comment:
       handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660867814


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3431/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] brijoobopanna commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
brijoobopanna commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682087451


   retest this please
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677790243


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2079/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659944291


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1676/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671194649


   please rebase it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684734917


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2202/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473620842



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
 
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+  /**
+   * Maximum value for complex filter threshold
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+  /**
+   * Property to decide if position id till row level or not
+   */
+  public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =

Review comment:
       cc: @kunal642 , @QiangCai , @ravipesala 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-685454195


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635205017


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3081/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661078111


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1700/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674854838


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2005/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471254176



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-    Object[] data = fillData(dataBuffer);
+    return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {

Review comment:
       why not call getObjectDataBasedOnDataType?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661065769


   please rebase it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471242699



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##########
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-    Object[] data = fillData(dataBuffer);
+    return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean getBytesData) {

Review comment:
       Already added a new method getObjectDataBasedOnDataType. this boolen is still required, as in filldata() method, complex children getDataBasedOnDataType will be called




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-660051811


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1681/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-665812800






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662251770


   a) I feel we should handle the SI support for all array<primitive> not just string.
   b) Better to discuss the row level SI solution in community once ?
   
   @QiangCai , @kunal642 @Indhumathi27 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-671496551


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3687/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659945334


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3418/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-635205935


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1360/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat edited a comment on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-662251770


   a) I feel we should handle the SI support for all array of primitive not just string.
   b) Better to discuss the row level SI solution in community once ?
   
   @QiangCai , @kunal642 @Indhumathi27 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r458561948



##########
File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+  override def beforeEach(): Unit = {
+    sql("drop table if exists complextable")
+  }
+
+  override def afterEach(): Unit = {
+    sql("drop index if exists index_1 on complextable")
+    sql("drop table if exists complextable")
+  }
+
+  test("test array<string> on secondary index") {

Review comment:
       also add a test case for c)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
 
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+  /**
+   * Maximum value for complex filter threshold
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+  /**
+   * Property to decide if position id till row level or not
+   */
+  public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =

Review comment:
       CC: @kunal642 , @QiangCai 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682145902


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3896/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473600574



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
 
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+  /**
+   * Maximum value for complex filter threshold
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+  /**
+   * Property to decide if position id till row level or not
+   */
+  public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =

Review comment:
       I think above 2 properties also not needed if it is not row level position reference 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684738151


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473599181



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       databaseName: String, tableName: String, indexTableName: String,
       absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = {
     var schemaOrdinal = -1
-    var allColumns = indexModel.columnNames.map { indexCol =>
-      val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema
+    val complexDimensions = carbonTable.getAllDimensions.asScala
+      .filter(dim => dim.getDataType.isComplexType &&
+                     indexModel.columnNames.asJava.contains(dim.getColName))
+    if (complexDimensions.size > 1) {
+      throw new ErrorMessage("SI creation with more than one complex type is not supported yet");
+    }
+    var allColumns = List[ColumnSchema]()

Review comment:
       Consider the scenario where one SI table contains (complex, primitive1, primitive2), we need to maintain the same order. But now it will become primitive1,primitive2,complex1. which is wrong.
   
   So, suggest to keep the user-specified order.
   Can refer below code
   
   var allColumns = List[ColumnSchema]()
   indexModel.columnNames.foreach { indexCol =>
     val dimension = carbonTable.getDimensionByName(tableName, indexCol)
     val colSchema = dimension.getColumnSchema
     schemaOrdinal += 1
     allColumns = allColumns :+ cloneColumnSchema(colSchema, schemaOrdinal)
   }
   complexDimensions.foreach { complexDim =>
     if (complexDim.getNumberOfChild > 0) {
       if (complexDim.getListOfChildDimensions.asScala
         .exists(col => DataTypes.isArrayType(col.getDataType))) {
         throw new ErrorMessage("SI creation with nested array complex type is not supported yet");
       }
     }
   }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674861462


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3746/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684963426


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3952/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673883264


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3716/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673940891






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708011


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2077/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-661075813


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3442/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473601787



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2400,6 +2400,23 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_SI_SEGMENT_MERGE_DEFAULT = "false";
 
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = "carbon.si.complex.filter.threshold";
+
+  /**
+   * Maximum value for complex filter threshold
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD_DEFAULT = "32000";
+
+  /**
+   * Property to decide if position id till row level or not
+   */
+  public static final String IS_TUPLE_ID_TILL_ROW_FOR_SI_COMPLEX =

Review comment:
       CC: @kunal642 , @QiangCai 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673637872


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3713/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r473873670



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala
##########
@@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand(
       databaseName: String, tableName: String, indexTableName: String,
       absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = {
     var schemaOrdinal = -1
-    var allColumns = indexModel.columnNames.map { indexCol =>
-      val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema
+    val complexDimensions = carbonTable.getAllDimensions.asScala
+      .filter(dim => dim.getDataType.isComplexType &&
+                     indexModel.columnNames.asJava.contains(dim.getColName))
+    if (complexDimensions.size > 1) {
+      throw new ErrorMessage("SI creation with more than one complex type is not supported yet");
+    }
+    var allColumns = List[ColumnSchema]()

Review comment:
       handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471366000



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
    * map that contains the mapping of block id to the valid blocklets in that block which contain
    * the data as per the applied filter
    */
-  private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+  private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List<Expression> implicitFilterList) {
+    final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());
     // initialize map with half the size of filter list as one block id can contain
     // multiple blocklets
     blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
     for (Expression value : implicitFilterList) {
       String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString();
       addBlockEntry(blockletPath);
     }
+    int complexFilterThreshold = CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+    isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+    if (isThresholdReached) {
+      LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", Threshold is: "
+          + complexFilterThreshold);
+    }
   }
 
-  public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) {
+  public ImplicitExpression(Map<String, Set<String>> blockIdToBlockletIdMapping) {
     this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
   }
 
   private void addBlockEntry(String blockletPath) {

Review comment:
       handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-684847720


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3948/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659830702


   please describe the PR and fix the failure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-673882872


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1977/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365947



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##########
@@ -41,39 +44,62 @@
    * map that contains the mapping of block id to the valid blocklets in that block which contain
    * the data as per the applied filter
    */
-  private Map<String, Set<Integer>> blockIdToBlockletIdMapping;
+  private final Map<String, Set<String>> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List<Expression> implicitFilterList) {
+    final Logger LOGGER = LogServiceFactory.getLogService(getClass().getName());

Review comment:
       moved




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org