You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/03/19 09:47:36 UTC

[GitHub] [carbondata] VenuReddy2103 opened a new pull request #4110: [WIP]Secondary Index as a coarse grain datamap

VenuReddy2103 opened a new pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110

### Why is this PR needed?
At present, secondary indexes are leveraged for query pruning via spark plan modification. This approach is tightly coupled with spark because the plan modification is specific to spark engine. In order to use secondary indexes for Presto or Hive queries, it is not feasible to modify the query plans as we desire in the current approach. Thus need arises for an engine agnostic approach to use secondary indexes in query pruning.

### What changes were proposed in this PR?
1. Added Secondary Index datamap as a coarse grain datamap
2. Secondary Index datamap prune fires the spark sql query on the identified secondary index table within the particular segment to get the position references for the matching filters of datamap and in turn forms the blocklets. Note: Spark sql query is fired to take the advantage of spark distributed computing to filter and read the secondary index table in the distributed manner. Since the secondary index datamap fires the spark sql, it is prerequisite to enable distributed pruning and the Index Server must be up and running.
3. Have added a CarbonInputFormat level property to control the use of newly added secondary index datamap or not in query pruning. This property is set only when query is triggered from Presto. So, Secondary index datamap is used only for Presto queries. And queries from spark continue to use the existing approach of plan modification at optimizer/execution phases.
4. Upon Index Server get splits, if secondary index prune is applicable, prune and get extended blocklets directly on the index server driver instead of using existing DistributedPruneRDD which prunes on index server executors. This is because secondary index datamap pruning essentially fires a spark sql query and it require spark session/context.

### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)

### Is any new testcase added?
- No
- Yes

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839154


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-802779583


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5604/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808462123


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609676098



##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -265,6 +265,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-816791095


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-817907446


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5158/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813821344


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609300785



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607816175



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -128,6 +129,16 @@ class CarbonScanRDD[T: ClassTag](
       } else {
         prepareInputFormatForDriver(job.getConfiguration)
       }
+      if (segmentIds.isDefined) {
+        CarbonInputFormat.setQuerySegment(job.getConfiguration, segmentIds.get)

Review comment:
       we already have `segmentsToAccess`, we need one more variable?

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/CarbonCreateIndexCommand.scala
##########
@@ -225,7 +225,9 @@ case class CarbonCreateIndexCommand(
               new IndexTableInfo(parentTable.getDatabaseName, indexModel.indexName,
                 indexSchema.getProperties),
               false)
-            val enabledIndexInfo = IndexTableInfo.enableIndex(indexInfo, indexModel.indexName)
+            val enabledIndexInfo = IndexTableInfo.setIndexStatus(indexInfo,

Review comment:
       For the existing SI table, what will be the behavior? can be queried from presto?
   better to update the document on this.

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,
+      indexName: String,
+      indexType: IndexType,
+      status: IndexStatus,
+      needLock: Boolean = true,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName

Review comment:
       we cannot pass the single table to the above batch API and update only that, why do we need similar functionality again ?
   If both needed please extract common code and reuse

##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
     for (TableIndex tableIndex : tableIndices) {
       // register it only if it is not lazy index, for lazy index, user
       // will rebuild the index manually

Review comment:
       please update the comments

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       agree. Maybe we should support this in the refresh SI command with the option to set this for existing tables. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821145251


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5201/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299248



##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617318018



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       Yes. As discussed offline, have created a subtask under same jira.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821273953


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5202/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607445465



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =

Review comment:
       please update this in the document also (.md file) and add a comment to explain the property 

##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {
+    Map<String, List<String>> indexes = new HashMap();
+    String indexMeta =
+        tableInfo.getFactTable().getTableProperties().get(tableInfo.getFactTable().getTableId());
+    IndexMetadata indexMetadata = null;
+    if (null != indexMeta) {
+      try {
+        indexMetadata = IndexMetadata.deserialize(indexMeta);
+      } catch (IOException e) {
+        e.printStackTrace();

Review comment:
       add a error log instead of printing the trace.

##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       why this is compared with hardcoded value ? please explain the logic in comments 

##########
File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -265,6 +265,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       same as above for prestosql also.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       please make synchronize object as final.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexFactory.java
##########
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.MalformedIndexCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.index.IndexFilter;
+import org.apache.carbondata.core.index.IndexInputSplit;
+import org.apache.carbondata.core.index.IndexMeta;
+import org.apache.carbondata.core.index.Segment;
+import org.apache.carbondata.core.index.dev.IndexBuilder;
+import org.apache.carbondata.core.index.dev.IndexWriter;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndexFactory;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.IndexSchema;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.events.Event;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.Logger;
+
+/**
+ * Index Factory for Secondary Index.
+ */
+@InterfaceAudience.Internal
+public class SecondaryIndexFactory extends CoarseGrainIndexFactory {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndexFactory.class.getName());
+  private IndexMeta indexMeta;
+
+  public SecondaryIndexFactory(CarbonTable carbonTable, IndexSchema indexSchema)
+      throws MalformedIndexCommandException {
+    super(carbonTable, indexSchema);
+    List<ExpressionType> operations = new ArrayList(Arrays.asList(ExpressionType.values()));
+    indexMeta = new IndexMeta(indexSchema.getIndexName(),
+        carbonTable.getIndexedColumns(indexSchema.getIndexColumns()), operations);
+    LOGGER.info("Created Secondary Index Factory instance for " + indexSchema.getIndexName());
+  }
+
+  @Override
+  public IndexWriter createWriter(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexBuilder createBuilder(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexMeta getMeta() {
+    return indexMeta;
+  }
+
+  private Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    Map<Segment, List<CoarseGrainIndex>> indexes = new HashMap<>();
+    List<String> allSegmentIds =
+        segments.stream().map(Segment::getSegmentNo).collect(Collectors.toList());
+    for (Segment segment : segments) {
+      indexes.put(segment, this.getIndexes(segment, allSegmentIds, positionReferenceInfo));
+    }
+    return indexes;
+  }
+
+  private List<CoarseGrainIndex> getIndexes(Segment segment, List<String> allSegmentIds,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    List<CoarseGrainIndex> indexes = new ArrayList<>();
+    SecondaryIndex secondaryIndex = new SecondaryIndex();
+    secondaryIndex.init(
+        new SecondaryIndexModel(getIndexSchema().getIndexName(), segment.getSegmentNo(),
+            allSegmentIds, positionReferenceInfo, segment.getConfiguration()));
+    indexes.add(secondaryIndex);
+    return indexes;
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments, IndexFilter filter)
+      throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      Set<Path> partitionLocations, IndexFilter filter) throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public List<CoarseGrainIndex> getIndexes(Segment segment) throws IOException {
+    List<String> allSegmentIds = new ArrayList();

Review comment:
       In all the newly added code please check and modify ArrayList() -> ArrayList<>()
   and HashMap() -> HashMap<>() to avoid IDE warnings

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {

Review comment:
       If two  SI query is in progress concurrently with a different filter expression, I think it will give wrong result  ? as segment id is same, we need to check same expression or not also ??

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name

Review comment:
       please keep the variables private and reformat the comments

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";
+
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT = "true";

Review comment:
       a) cluster test with index server and presto with SI is completed ?
   b) with spark and index server testing is done with this `property = false` ?

##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/LiteralExpression.java
##########
@@ -58,7 +64,28 @@ public String getString() {
 
   @Override
   public String getStatement() {
-    return value == null ? null : value.toString();
+    boolean quoteString = false;
+    Object val = value;
+    if (val != null) {
+      if (dataType == DataTypes.STRING || val instanceof String) {
+        quoteString = true;
+      } else if (dataType == DataTypes.TIMESTAMP) {
+        SimpleDateFormat parser =
+            new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
+        val = parser.format(new Date((Long) val / 1000L));
+        quoteString = true;

Review comment:
       I can see duplicate code in ExpressionResult#getStatement(), can we extract a common code and use in both ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;

Review comment:
       These class members can be private.

##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/indextable/IndexTableInfo.java
##########
@@ -157,15 +157,19 @@ public static String updateIndexColumns(String oldGsonData, String columnToBeUpd
     return toGson(indexTableInfos);
   }
 
-  public static String enableIndex(String oldIndexIno, String indexName) {
-    IndexTableInfo[] indexTableInfos = fromGson(oldIndexIno);
+  public static void setIndexStatus(IndexTableInfo[] indexTableInfos, String indexName,
+      IndexStatus status) {
     for (IndexTableInfo indexTableInfo : indexTableInfos) {
       if (indexTableInfo.tableName.equalsIgnoreCase(indexName)) {
-        Map<String, String> oldIndexProperties = indexTableInfo.indexProperties;
-        oldIndexProperties.put(CarbonCommonConstants.INDEX_STATUS, IndexStatus.ENABLED.name());
-        indexTableInfo.setIndexProperties(oldIndexProperties);
+        Map<String, String> indexProperties = indexTableInfo.indexProperties;
+        indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

Review comment:
       may be no need to declare variable.
   
   indexTableInfo.indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       Sometimes SI may not give an advantage based on the data distribution.
   From presto if they don't want to use SI, is NI UDF is supported?
   Else use that carbon property here also to support disable from presto.

##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java
##########
@@ -95,13 +94,10 @@ private IndexStoreManager() {
         .getIndexesMap().entrySet()) {
       for (Map.Entry<String, Map<String, String>> indexEntry : providerEntry.getValue()
           .entrySet()) {
-        if (!indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER)
-            .equalsIgnoreCase(IndexType.SI.getIndexProviderName())) {
-          IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),
-              indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER));
-          indexSchema.setProperties(indexEntry.getValue());
-          indexes.add(getIndex(carbonTable, indexSchema));
-        }
+        IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),

Review comment:
       Here we need to send SI as CG index only when SI PlanReWrite is false right ? where is that check ?

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(
+        CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX + "." + dbName + "."
+            + tableName);
+    if (configuredValue == null) {
+      configuredValue = getProperty(CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX,
+          CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT);

Review comment:
       need to use dbname and table name here also while looking up ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {
+        Object[] rows = IndexUtil.getPositionReferences(String
+            .format("select distinct positionReference from %s.%s where insegment('%s') and %s",
+                databaseName, indexName, String.join(",", validSegmentIds),
+                expression.getStatement()));
+        for (Object row : rows) {
+          String positionReference = (String) row;
+          int blockPathIndex = positionReference.indexOf("/");
+          String segmentId = positionReference.substring(0, blockPathIndex);
+          String blockPath = positionReference.substring(blockPathIndex + 1);
+          Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences
+              .computeIfAbsent(segmentId, k -> new HashSet<String>());
+          blockPaths.add(blockPath);
+        }
+        positionReferenceInfo.fetched = true;
+      }
+    }
+    Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences.get(currentSegmentId);
+    return blockPaths != null ? blockPaths : new HashSet<String>();
+  }
+
+  @Override
+  public List<Blocklet> prune(FilterResolverIntf filterExp, SegmentProperties segmentProperties,
+      FilterExecutor filterExecutor, CarbonTable carbonTable) {
+    Set<String> positionReferences = getPositionReferences(carbonTable.getDatabaseName(), indexName,
+        filterExp.getFilterExpression());
+    List<Blocklet> blocklets = new ArrayList();
+    for (String blockPath : positionReferences) {

Review comment:
       can we extract common code and reuse here ?

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name
+  String currentSegmentId; // Segment Id to prune
+  List<String> validSegmentIds; // All the valid segment Ids for Secondary Index pruning
+  PositionReferenceInfo positionReferenceInfo; // Position reference information
+
+  public SecondaryIndexModel(String indexName, String currentSegmentId,
+      List<String> validSegmentIds, PositionReferenceInfo positionReferenceInfo,
+      Configuration configuration) {
+    super(null, configuration);
+    this.indexName = indexName;
+    this.currentSegmentId = currentSegmentId;
+    this.validSegmentIds = validSegmentIds;
+    this.positionReferenceInfo = positionReferenceInfo;
+  }
+
+  /**
+   * Position Reference information
+   */
+  public static class PositionReferenceInfo {

Review comment:
       As mentioned above, need to keep per expression , per segment may be instead of just per segment. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r608636650



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexStoreManager.java
##########
@@ -95,13 +94,10 @@ private IndexStoreManager() {
         .getIndexesMap().entrySet()) {
       for (Map.Entry<String, Map<String, String>> indexEntry : providerEntry.getValue()
           .entrySet()) {
-        if (!indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER)
-            .equalsIgnoreCase(IndexType.SI.getIndexProviderName())) {
-          IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),
-              indexEntry.getValue().get(CarbonCommonConstants.INDEX_PROVIDER));
-          indexSchema.setProperties(indexEntry.getValue());
-          indexes.add(getIndex(carbonTable, indexSchema));
-        }
+        IndexSchema indexSchema = new IndexSchema(indexEntry.getKey(),

Review comment:
       Not here. Only in the query process. we pass it to IndexChooser constructor.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617319168



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,
+      indexName: String,
+      indexType: IndexType,
+      status: IndexStatus,
+      needLock: Boolean = true,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName

Review comment:
       replied above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292856



##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(
+        CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX + "." + dbName + "."
+            + tableName);
+    if (configuredValue == null) {
+      configuredValue = getProperty(CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX,
+          CarbonCommonConstants.CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT);

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809273056


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3350/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808462627


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5098/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813362851


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5124/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609300934



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;

Review comment:
       Added detailed comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-819797153


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5184/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298600



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexFactory.java
##########
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.MalformedIndexCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.index.IndexFilter;
+import org.apache.carbondata.core.index.IndexInputSplit;
+import org.apache.carbondata.core.index.IndexMeta;
+import org.apache.carbondata.core.index.Segment;
+import org.apache.carbondata.core.index.dev.IndexBuilder;
+import org.apache.carbondata.core.index.dev.IndexWriter;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndexFactory;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.IndexSchema;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.events.Event;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.Logger;
+
+/**
+ * Index Factory for Secondary Index.
+ */
+@InterfaceAudience.Internal
+public class SecondaryIndexFactory extends CoarseGrainIndexFactory {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndexFactory.class.getName());
+  private IndexMeta indexMeta;
+
+  public SecondaryIndexFactory(CarbonTable carbonTable, IndexSchema indexSchema)
+      throws MalformedIndexCommandException {
+    super(carbonTable, indexSchema);
+    List<ExpressionType> operations = new ArrayList(Arrays.asList(ExpressionType.values()));
+    indexMeta = new IndexMeta(indexSchema.getIndexName(),
+        carbonTable.getIndexedColumns(indexSchema.getIndexColumns()), operations);
+    LOGGER.info("Created Secondary Index Factory instance for " + indexSchema.getIndexName());
+  }
+
+  @Override
+  public IndexWriter createWriter(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexBuilder createBuilder(Segment segment, String shardName,
+      SegmentProperties segmentProperties) throws IOException {
+    throw new UnsupportedOperationException("Not supported for Secondary Index");
+  }
+
+  @Override
+  public IndexMeta getMeta() {
+    return indexMeta;
+  }
+
+  private Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    Map<Segment, List<CoarseGrainIndex>> indexes = new HashMap<>();
+    List<String> allSegmentIds =
+        segments.stream().map(Segment::getSegmentNo).collect(Collectors.toList());
+    for (Segment segment : segments) {
+      indexes.put(segment, this.getIndexes(segment, allSegmentIds, positionReferenceInfo));
+    }
+    return indexes;
+  }
+
+  private List<CoarseGrainIndex> getIndexes(Segment segment, List<String> allSegmentIds,
+      PositionReferenceInfo positionReferenceInfo) throws IOException {
+    List<CoarseGrainIndex> indexes = new ArrayList<>();
+    SecondaryIndex secondaryIndex = new SecondaryIndex();
+    secondaryIndex.init(
+        new SecondaryIndexModel(getIndexSchema().getIndexName(), segment.getSegmentNo(),
+            allSegmentIds, positionReferenceInfo, segment.getConfiguration()));
+    indexes.add(secondaryIndex);
+    return indexes;
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments, IndexFilter filter)
+      throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public Map<Segment, List<CoarseGrainIndex>> getIndexes(List<Segment> segments,
+      Set<Path> partitionLocations, IndexFilter filter) throws IOException {
+    return getIndexes(segments, new PositionReferenceInfo());
+  }
+
+  @Override
+  public List<CoarseGrainIndex> getIndexes(Segment segment) throws IOException {
+    List<String> allSegmentIds = new ArrayList();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-816786183


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5147/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824096942


   @VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one  SI tables also for filter queries. Is it supported in SI as datamap flow ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617590710



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       ok. we can track this
   https://issues.apache.org/jira/browse/CARBONDATA-4169




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815539289


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815293905


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3390/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815295082


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5141/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812364945


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5115/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815871042


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3394/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815871764


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5145/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815604435


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5144/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821141585


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3449/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] asfgit closed pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299017



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824087364


   I executed secondary index flows in the presto cluster with this code. No issue found and flows are working fine as per design and discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-821283487


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3450/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609301165



##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
     for (TableIndex tableIndex : tableIndices) {
       // register it only if it is not lazy index, for lazy index, user
       // will rebuild the index manually

Review comment:
       done

##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
     for (TableIndex tableIndex : tableIndices) {
       // register it only if it is not lazy index, for lazy index, user
       // will rebuild the index manually

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r610671534



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {
+        Object[] rows = IndexUtil.getPositionReferences(String
+            .format("select distinct positionReference from %s.%s where insegment('%s') and %s",
+                databaseName, indexName, String.join(",", validSegmentIds),
+                expression.getStatement()));
+        for (Object row : rows) {
+          String positionReference = (String) row;
+          int blockPathIndex = positionReference.indexOf("/");
+          String segmentId = positionReference.substring(0, blockPathIndex);
+          String blockPath = positionReference.substring(blockPathIndex + 1);
+          Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences
+              .computeIfAbsent(segmentId, k -> new HashSet<String>());
+          blockPaths.add(blockPath);
+        }
+        positionReferenceInfo.fetched = true;
+      }
+    }
+    Set<String> blockPaths = positionReferenceInfo.segmentToPosReferences.get(currentSegmentId);
+    return blockPaths != null ? blockPaths : new HashSet<String>();
+  }
+
+  @Override
+  public List<Blocklet> prune(FilterResolverIntf filterExp, SegmentProperties segmentProperties,
+      FilterExecutor filterExecutor, CarbonTable carbonTable) {
+    Set<String> positionReferences = getPositionReferences(carbonTable.getDatabaseName(), indexName,
+        filterExp.getFilterExpression());
+    List<Blocklet> blocklets = new ArrayList();
+    for (String blockPath : positionReferences) {

Review comment:
       done. Have referred other places and modified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617292539



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";
+
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX_DEFAULT = "true";

Review comment:
       Yes cluster testing is done with both presto, spark with index server combination. Also, there is no performance impact observed with spark + index server when property is false 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609354461



##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(

Review comment:
       Modified. Actually, was setting the property only for spark session. So the property name(rewrite plan) was meaningful only for spark. 
   Now changed the property to `carbon.coarse.grain.secondary.index` so that it can be used for both presto and spark. Default value in Presto and Spark session are `true` and `false` respectively.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-811958406


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3361/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609675717



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       Agreed. NI UDF is not supported. Using the same property for Presto also. By default, property is true for Presto. If user do not wish to use the secondary indexes in the query pruning, need to explicitly configure this property to false.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r610493428



##########
File path: core/src/main/java/org/apache/carbondata/core/scan/expression/LiteralExpression.java
##########
@@ -58,7 +64,28 @@ public String getString() {
 
   @Override
   public String getStatement() {
-    return value == null ? null : value.toString();
+    boolean quoteString = false;
+    Object val = value;
+    if (val != null) {
+      if (dataType == DataTypes.STRING || val instanceof String) {
+        quoteString = true;
+      } else if (dataType == DataTypes.TIMESTAMP) {
+        SimpleDateFormat parser =
+            new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
+        val = parser.format(new Date((Long) val / 1000L));
+        quoteString = true;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824115250


   LGTM, 
   we should work on CARBONDATA-4169 to support querying already existing tables also.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809122015


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5099/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803623919


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5056/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298969



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name

Review comment:
       done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name

Review comment:
       done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndexModel.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.core.index.dev.IndexModel;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Secondary Index Model
+ */
+public class SecondaryIndexModel extends IndexModel {
+  String indexName; // Secondary Index name
+  String currentSegmentId; // Segment Id to prune
+  List<String> validSegmentIds; // All the valid segment Ids for Secondary Index pruning
+  PositionReferenceInfo positionReferenceInfo; // Position reference information
+
+  public SecondaryIndexModel(String indexName, String currentSegmentId,
+      List<String> validSegmentIds, PositionReferenceInfo positionReferenceInfo,
+      Configuration configuration) {
+    super(null, configuration);
+    this.indexName = indexName;
+    this.currentSegmentId = currentSegmentId;
+    this.validSegmentIds = validSegmentIds;
+    this.positionReferenceInfo = positionReferenceInfo;
+  }
+
+  /**
+   * Position Reference information
+   */
+  public static class PositionReferenceInfo {

Review comment:
       Have replied above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823839382


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3472/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609299417



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -394,4 +397,35 @@ object DistributedRDDUtils {
       }
     }
   }
+
+  def pruneOnDriver(request: IndexInputFormat): ExtendedBlockletWrapper = {
+    val job = CarbonSparkUtil.createHadoopJob()
+    val conf = job.getConfiguration
+    val tableInfo = request.getCarbonTable.getTableInfo
+    val identifier = request.getCarbonTable.getAbsoluteTableIdentifier
+    val indexFilter = new IndexFilter(request.getCarbonTable,
+      request.getFilterResolverIntf.getFilterExpression)
+    CarbonInputFormat.setTableInfo(conf, tableInfo)
+    CarbonInputFormat.setFilterPredicates(conf, indexFilter)
+    CarbonInputFormat.setDatabaseName(conf, tableInfo.getDatabaseName)
+    CarbonInputFormat.setTableName(conf, tableInfo.getFactTable.getTableName)
+    CarbonInputFormat.setPartitionsToPrune(conf, request.getPartitions)
+    CarbonInputFormat.setTransactionalTable(conf, tableInfo.isTransactionalTable)
+    CarbonInputFormat.setTablePath(conf,
+      identifier.appendWithLocalPrefix(identifier.getTablePath))
+    CarbonInputFormat.setQuerySegment(conf, identifier)
+    CarbonInputFormat.setColumnProjection(conf, Array("positionId"))
+    CarbonInputFormat.setReadCommittedScope(conf, request.getReadCommittedScope)
+    CarbonInputFormat.setSegmentsToAccess(conf, request.getValidSegments)
+    CarbonInputFormat.setValidateSegmentsToAccess(conf, false)
+    CarbonInputFormat.setSecondaryIndexPruning(conf, request.isSIPruningEnabled)
+    val blocklets = new CarbonTableInputFormat[Object].getPrunedBlocklets(job,

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815613829


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3393/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-814288498


   @VenuReddy2103 : Do we have a performance report of querying a SI table with spark vs with presto?  First without SI we can run from spark and presto and then same query can be benchmarked with SI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813362446


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3373/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812213423


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5114/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815528351


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5143/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809125024


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3348/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-815533035


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3392/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609294215



##########
File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/indextable/IndexTableInfo.java
##########
@@ -157,15 +157,19 @@ public static String updateIndexColumns(String oldGsonData, String columnToBeUpd
     return toGson(indexTableInfos);
   }
 
-  public static String enableIndex(String oldIndexIno, String indexName) {
-    IndexTableInfo[] indexTableInfos = fromGson(oldIndexIno);
+  public static void setIndexStatus(IndexTableInfo[] indexTableInfos, String indexName,
+      IndexStatus status) {
     for (IndexTableInfo indexTableInfo : indexTableInfos) {
       if (indexTableInfo.tableName.equalsIgnoreCase(indexName)) {
-        Map<String, String> oldIndexProperties = indexTableInfo.indexProperties;
-        oldIndexProperties.put(CarbonCommonConstants.INDEX_STATUS, IndexStatus.ENABLED.name());
-        indexTableInfo.setIndexProperties(oldIndexProperties);
+        Map<String, String> indexProperties = indexTableInfo.indexProperties;
+        indexProperties.put(CarbonCommonConstants.INDEX_STATUS, status.name());

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

kunal642 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824560735


   LGTM, 
   @maheshrajus can you please start working on CARBONDATA-4169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292286



##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       done

##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292293



##########
File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
##########
@@ -217,7 +218,9 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu
           }
         } else if (key.startsWith(CarbonCommonConstants.CARBON_INDEX_VISIBLE)) {
           isValid = true;
-        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL)) {
+        } else if (key.startsWith(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL) || (
+            key.startsWith(CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX)
+                && key.split("\\.").length == 8)) {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-820230509


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3435/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292432



##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =

Review comment:
       added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use secondary indexes for Presto queries

Posted by GitBox <gi...@apache.org>.

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-824230350


   > @VenuReddy2103 , @maheshrajus : when SI is not a datamap flow, we support matching to more than one SI tables also for filter queries. Is it supported in SI as datamap flow ?
   
   @ajantha-bhat the behaviour is the same in both cases[SI as datamap and SI as non-datamap(plan-rewrite)]
   I checked the below cases.
   1) SI created in one column and query with a combination of Si index and non-si index., Then we are getting output based on filters[AND, OR etc] correctly.
   2) SI created in two different columns and query with the combination of both SI index columns then we are getting output based on filters[AND, OR etc] correctly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812216965


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3363/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-802776615


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3838/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808456818


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609292526



##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {
+    Map<String, List<String>> indexes = new HashMap();
+    String indexMeta =
+        tableInfo.getFactTable().getTableProperties().get(tableInfo.getFactTable().getTableId());
+    IndexMetadata indexMetadata = null;
+    if (null != indexMeta) {
+      try {
+        indexMetadata = IndexMetadata.deserialize(indexMeta);
+      } catch (IOException e) {
+        e.printStackTrace();

Review comment:
       added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-813822392


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5127/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609301168



##########
File path: processing/src/main/java/org/apache/carbondata/processing/index/IndexWriterListener.java
##########
@@ -75,7 +76,8 @@ public void registerAllWriter(CarbonTable carbonTable, String segmentId,
     for (TableIndex tableIndex : tableIndices) {
       // register it only if it is not lazy index, for lazy index, user
       // will rebuild the index manually

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812327923


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-819858513


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3432/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-812363048


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3364/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808449842


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5097/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r617293398



##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609294436



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609303441



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -128,6 +129,16 @@ class CarbonScanRDD[T: ClassTag](
       } else {
         prepareInputFormatForDriver(job.getConfiguration)
       }
+      if (segmentIds.isDefined) {
+        CarbonInputFormat.setQuerySegment(job.getConfiguration, segmentIds.get)

Review comment:
        We have segment Ids from insegment() udf. So had passed them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609304145



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -265,8 +272,6 @@ object IndexServer extends ServerInterface {
           server.stop()
         }
       })
-      CarbonProperties.getInstance().addProperty(CarbonCommonConstants

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-816611656


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5146/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] maheshrajus commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

maheshrajus commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-820168994


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-820235081


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5187/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803526629


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5607/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809073737


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-808449555


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809275416


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5101/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803625683


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3304/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] brijoobopanna commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

brijoobopanna commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-823794397


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-803527532


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3841/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-817913592


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3406/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on pull request #4110: [WIP]Secondary Index as a coarse grain datamap

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-809203807


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609295437



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       have removed the synchronization block now with another review comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-816615512


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3395/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

VenuReddy2103 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r609298135



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {
+      /* If the position references are not obtained yet(i.e., prune happening for the first valid
+      segment), then get them from the given index table with the given filter from all the valid
+      segments at once and store them as map of segmentId to set of position references in that
+      particular segment. Upon the subsequent prune for other segments, return the position
+      references for the respective segment from the map directly */
+      if (!positionReferenceInfo.fetched) {

Review comment:
       They do not share the same position reference across queries. Have added more comments to `PositionReferenceInfo` class now. 
   One instance of `PositionReferenceInfo` is shared across all the `SecondaryIndex` instances for the particular query pruning with the given index filter. This ensures to run a single sql query to get position references from the valid segments of the given secondary index table with given index filter and populate them in map. First secondary index segment prune in the query will run the sql query for position references and store them in map. And subsequent segments prune in the same query can avoid the individual sql for position references within the respective segment and return position references from the map directly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

kunal642 commented on a change in pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#discussion_r607040648



##########
File path: core/src/main/java/org/apache/carbondata/core/index/secondaryindex/CarbonCostBasedOptimizer.java
##########
@@ -15,15 +15,49 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.secondaryindex.optimizer;
+package org.apache.carbondata.core.index.secondaryindex;
 
+import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.index.IndexType;
+import org.apache.carbondata.core.metadata.schema.indextable.IndexMetadata;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+
 public class CarbonCostBasedOptimizer {
+  public static Map<String, List<String>> getSecondaryIndexes(TableInfo tableInfo) {

Review comment:
       this method should be called identifyRequiredTables(Set<String> filterAttributes, TableInfo tableInfo)... internally you can extract the table info string and pass to the other method to get the valid SI list

##########
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2187,6 +2187,24 @@ public static boolean isFilterReorderingEnabled() {
     );
   }
 
+  /**
+   * Check whether plan rewrite with secondary index is enabled or not
+   */
+  public boolean isRewritePlanWithSecondaryIndex(String dbName, String tableName) {
+    String configuredValue = getSessionPropertyValue(

Review comment:
       I feel this should be based on whether the query is from a non spark engine

##########
File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2316,6 +2316,12 @@ private CarbonCommonConstants() {
   @CarbonProperty(dynamicConfigurable = true)
   public static final String CARBON_ENABLE_INDEX_SERVER = "carbon.enable.index.server";
 
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_REWRITE_PLAN_WITH_SECONDARY_INDEX =
+      "carbon.rewrite.plan.with.secondary.index";

Review comment:
       add this new property in docs

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;
+
+  @Override
+  public void init(IndexModel indexModel) {
+    assert (indexModel instanceof SecondaryIndexModel);
+    SecondaryIndexModel model = (SecondaryIndexModel) indexModel;
+    indexName = model.indexName;
+    currentSegmentId = model.currentSegmentId;
+    validSegmentIds = model.validSegmentIds;
+    positionReferenceInfo = model.positionReferenceInfo;
+  }
+
+  private Set<String> getPositionReferences(String databaseName, String indexName,
+      Expression expression) {
+    synchronized (positionReferenceInfo) {

Review comment:
       there is no need for this as this will be a sequential call

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -394,4 +397,35 @@ object DistributedRDDUtils {
       }
     }
   }
+
+  def pruneOnDriver(request: IndexInputFormat): ExtendedBlockletWrapper = {
+    val job = CarbonSparkUtil.createHadoopJob()
+    val conf = job.getConfiguration
+    val tableInfo = request.getCarbonTable.getTableInfo
+    val identifier = request.getCarbonTable.getAbsoluteTableIdentifier
+    val indexFilter = new IndexFilter(request.getCarbonTable,
+      request.getFilterResolverIntf.getFilterExpression)
+    CarbonInputFormat.setTableInfo(conf, tableInfo)
+    CarbonInputFormat.setFilterPredicates(conf, indexFilter)
+    CarbonInputFormat.setDatabaseName(conf, tableInfo.getDatabaseName)
+    CarbonInputFormat.setTableName(conf, tableInfo.getFactTable.getTableName)
+    CarbonInputFormat.setPartitionsToPrune(conf, request.getPartitions)
+    CarbonInputFormat.setTransactionalTable(conf, tableInfo.isTransactionalTable)
+    CarbonInputFormat.setTablePath(conf,
+      identifier.appendWithLocalPrefix(identifier.getTablePath))
+    CarbonInputFormat.setQuerySegment(conf, identifier)
+    CarbonInputFormat.setColumnProjection(conf, Array("positionId"))
+    CarbonInputFormat.setReadCommittedScope(conf, request.getReadCommittedScope)
+    CarbonInputFormat.setSegmentsToAccess(conf, request.getValidSegments)
+    CarbonInputFormat.setValidateSegmentsToAccess(conf, false)
+    CarbonInputFormat.setSecondaryIndexPruning(conf, request.isSIPruningEnabled)
+    val blocklets = new CarbonTableInputFormat[Object].getPrunedBlocklets(job,

Review comment:
       set useIndexServer as false in job and getPrunedBlocklets should check for "useIndexServer" to decide whether to go for indexServer or not. 
   
   Otherwise if user sets indexServerEnable to true in carbon.properties for indexserver application then this flow can go in a infinite loop.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/index/secondary/SecondaryIndex.java
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.index.secondary;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.index.IndexUtil;
+import org.apache.carbondata.core.index.dev.IndexModel;
+import org.apache.carbondata.core.index.dev.cgindex.CoarseGrainIndex;
+import org.apache.carbondata.core.indexstore.Blocklet;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.index.secondary.SecondaryIndexModel.PositionReferenceInfo;
+
+import org.apache.log4j.Logger;
+
+import static org.apache.carbondata.core.util.path.CarbonTablePath.BATCH_PREFIX;
+
+/**
+ * Secondary Index to prune at blocklet level.
+ */
+public class SecondaryIndex extends CoarseGrainIndex {
+  private static final Logger LOGGER =
+      LogServiceFactory.getLogService(SecondaryIndex.class.getName());
+  String indexName;
+  String currentSegmentId;
+  List<String> validSegmentIds;
+  PositionReferenceInfo positionReferenceInfo;

Review comment:
       Can you add details for why positionReferenceInfo is same across the SecondaryIndex instances for each segment.

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##########
@@ -265,8 +272,6 @@ object IndexServer extends ServerInterface {
           server.stop()
         }
       })
-      CarbonProperties.getInstance().addProperty(CarbonCommonConstants

Review comment:
       please revert this

##########
File path: integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##########
@@ -270,6 +270,7 @@ private CarbonTableCacheModel getValidCacheBySchemaTableName(SchemaTableName sch
     config.set("query.id", queryId);
     CarbonInputFormat.setTransactionalTable(config, carbonTable.isTransactionalTable());
     CarbonInputFormat.setTableInfo(config, carbonTable.getTableInfo());
+    CarbonInputFormat.checkAndSetSecondaryIndexPruning(carbonTable.getTableInfo(), filters, config);

Review comment:
       Later we can optimize the CarbonInputSplit to return only the required objects incase query performance is bad... Please add a TODO for same

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##########
@@ -396,6 +398,88 @@ object CarbonIndexUtil {
     }
   }
 
+  def updateIndexStatusInBatch(carbonTable: CarbonTable,
+      indexTables: List[CarbonTable],
+      indexType: IndexType,
+      status: IndexStatus,
+      sparkSession: SparkSession): Unit = {
+    val dbName = carbonTable.getDatabaseName
+    val tableName = carbonTable.getTableName
+    val metadataLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
+      LockUsage.METADATA_LOCK)
+    try {
+      if (metadataLock.lockWithRetries()) {
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        val table = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession)
+        val indexTableInfos = IndexTableInfo.fromGson(table.getIndexInfo)
+        val indexMetadata = table.getIndexMetadata
+        indexTables.foreach { indexTable =>
+          IndexTableInfo.setIndexStatus(indexTableInfos, indexTable.getTableName, status)
+          indexMetadata.updateIndexStatus(indexType.getIndexProviderName,
+            indexTable.getTableName,
+            status.name())
+        }
+        table.getTableInfo
+          .getFactTable
+          .getTableProperties
+          .put(table.getCarbonTableIdentifier.getTableId, indexMetadata.serialize)
+        sparkSession.sql(s"""ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES ('indexInfo' = '${
+          IndexTableInfo.toGson(indexTableInfos)
+        }')""".stripMargin).collect()
+        CarbonHiveIndexMetadataUtil.refreshTable(dbName, tableName, sparkSession)
+        CarbonMetadata.getInstance.removeTable(dbName, tableName)
+        CarbonMetadata.getInstance.loadTableMetadata(table.getTableInfo)
+      }
+    } finally {
+      metadataLock.unlock()
+    }
+  }
+
+  def updateIndexStatus(carbonTable: CarbonTable,

Review comment:
       If SI is already created on a table and its in sync, then IndexStatus would not be there as this is a new DataMap Implementation for us.
   
   We need to find some way to update this property for old SI tables




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4110: [CARBONDATA-4158]Secondary Index as a coarse-grain datamap and use them for Presto queries

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4110:
URL: https://github.com/apache/carbondata/pull/4110#issuecomment-811956693


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5112/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org