You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ku...@apache.org on 2019/08/02 07:30:12 UTC

[carbondata] branch branch-1.6 updated (917e041 -> 80438f7)

This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a change to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git.


    from 917e041  [HOTFIX] CLI test case failed during release because of space differences
     new 2ebc041  [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation
     new 575b711  [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
     new 80438f7  [HOTFIX] Removed the hive-exec and commons dependency from hive module

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../carbondata/core/datamap/TableDataMap.java      | 12 +++--
 integration/spark-common/pom.xml                   | 10 ++++
 .../AlterTableColumnRenameTestCase.scala           | 54 ++++++++++++++++++++++
 .../merger/CarbonCompactionExecutor.java           |  9 +++-
 4 files changed, 80 insertions(+), 5 deletions(-)


[carbondata] 03/03: [HOTFIX] Removed the hive-exec and commons dependency from hive module

Posted by ku...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 80438f75379cd3754cb31a42a372aeb36e4d61e7
Author: ravipesala <ra...@gmail.com>
AuthorDate: Fri Aug 2 11:15:05 2019 +0530

    [HOTFIX] Removed the hive-exec and commons dependency from hive module
    
    Removed the hive-exec and commons dependency from hive module as spark has its own hive-exec.
    Because of external hive-exec dependency, some tests are failing.
    
    This closes #3347
---
 integration/spark-common/pom.xml | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/integration/spark-common/pom.xml b/integration/spark-common/pom.xml
index df683e0..a12992d 100644
--- a/integration/spark-common/pom.xml
+++ b/integration/spark-common/pom.xml
@@ -39,6 +39,16 @@
       <groupId>org.apache.carbondata</groupId>
       <artifactId>carbondata-hive</artifactId>
       <version>${project.version}</version>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.commons</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+        <exclusion>
+          <groupId>org.apache.hive</groupId>
+          <artifactId>hive-exec</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.carbondata</groupId>


[carbondata] 01/03: [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation

Posted by ku...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 2ebc0413ee03645659e49b8c4d41969ee444b9aa
Author: Indhumathi27 <in...@gmail.com>
AuthorDate: Fri Jul 26 16:51:32 2019 +0530

    [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation
    
    Problem:
    In case of alter add, drop, rename operation, restructuredBlockExists will be true.
    Currently, to get RawResultIterator for a block, we check if block has ColumnDrift
    or not, by comparing SegmentProperties and columndrift columns.
    SegmentProperties will be formed based on restructuredBlockExists.
    if restructuredBlockExists is true, we will take current column schema to form SegmentProperties,
    else, we will use datafilefooter columnschema to form SegmentProperties.
    
    In the example given in CARBONDATA-3478 for both blocks, we use current column
    schema to form SegmentProperties, as restructuredBlockExists will be true.
    Hence, while iterating block 1, it throws ArrayIndexOutOfBound exception,
    as it uses RawResultIterator instead of ColumnDriftRawResultIterator
    
    Solution:
    Use schema from datafilefooter of each block to check if it has columndrift or not
    
    This closes #3337
---
 .../AlterTableColumnRenameTestCase.scala           | 54 ++++++++++++++++++++++
 .../merger/CarbonCompactionExecutor.java           |  9 +++-
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
index d927724..dd1fa0f 100644
--- a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
+++ b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
@@ -320,12 +320,66 @@ class AlterTableColumnRenameTestCase extends Spark2QueryTest with BeforeAndAfter
     }
   }
 
+  test("test compaction after table rename and alter set tblproerties") {
+    sql("DROP TABLE IF EXISTS test_rename")
+    sql("DROP TABLE IF EXISTS test_rename_compact")
+    sql(
+      "CREATE TABLE test_rename (empno int, empname String, designation String, doj Timestamp, " +
+      "workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, " +
+      "projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int," +
+      "utilization int,salary int) STORED BY 'org.apache.carbondata.format'")
+    sql(
+      s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_rename OPTIONS
+         |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+    sql("alter table test_rename rename to test_rename_compact")
+    sql("alter table test_rename_compact set tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')")
+    sql(
+      s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_rename_compact OPTIONS
+         |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+    val res1 = sql("select * from test_rename_compact")
+    sql("alter table test_rename_compact compact 'major'")
+    val res2 = sql("select * from test_rename_compact")
+    assert(res1.collectAsList().containsAll(res2.collectAsList()))
+    checkExistence(sql("show segments for table test_rename_compact"), true, "Compacted")
+    sql("DROP TABLE IF EXISTS test_rename")
+    sql("DROP TABLE IF EXISTS test_rename_compact")
+  }
+
+  test("test compaction after alter set tblproerties- add and drop") {
+    sql("DROP TABLE IF EXISTS test_alter")
+    sql(
+      "CREATE TABLE test_alter (empno int, empname String, designation String, doj Timestamp, " +
+      "workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, " +
+      "projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int," +
+      "utilization int,salary int) STORED BY 'org.apache.carbondata.format'")
+    sql(
+      s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_alter OPTIONS
+         |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+    sql("alter table test_alter set tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')")
+    sql("alter table test_alter drop columns(deptno)")
+    sql(
+      s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_alter OPTIONS
+         |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+    sql("alter table test_alter add columns(deptno int)")
+    sql(
+      s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_alter OPTIONS
+         |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+    val res1 = sql("select * from test_alter")
+    sql("alter table test_alter compact 'major'")
+    val res2 = sql("select * from test_alter")
+    assert(res1.collectAsList().containsAll(res2.collectAsList()))
+    sql("DROP TABLE IF EXISTS test_alter")
+  }
+
   override def afterAll(): Unit = {
     dropTable()
   }
 
   def dropTable(): Unit = {
     sql("DROP TABLE IF EXISTS RENAME")
+    sql("DROP TABLE IF EXISTS test_rename")
+    sql("DROP TABLE IF EXISTS test_rename_compact")
+    sql("DROP TABLE IF EXISTS test_alter")
   }
 
   def createTableAndLoad(): Unit = {
diff --git a/processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionExecutor.java b/processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionExecutor.java
index 28f1cf4..d7769bb 100644
--- a/processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionExecutor.java
+++ b/processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionExecutor.java
@@ -177,8 +177,9 @@ public class CarbonCompactionExecutor {
   private RawResultIterator getRawResultIterator(Configuration configuration, String segmentId,
       String task, List<TableBlockInfo> tableBlockInfoList)
       throws QueryExecutionException, IOException {
-    SegmentProperties sourceSegmentProperties = getSourceSegmentProperties(
-        Collections.singletonList(tableBlockInfoList.get(0).getDataFileFooter()));
+    SegmentProperties sourceSegmentProperties =
+        new SegmentProperties(tableBlockInfoList.get(0).getDataFileFooter().getColumnInTable(),
+            tableBlockInfoList.get(0).getDataFileFooter().getSegmentInfo().getColumnCardinality());
     boolean hasColumnDrift = carbonTable.hasColumnDrift() &&
         RestructureUtil.hasColumnDriftOnSegment(carbonTable, sourceSegmentProperties);
     if (hasColumnDrift) {
@@ -186,6 +187,10 @@ public class CarbonCompactionExecutor {
           executeBlockList(tableBlockInfoList, segmentId, task, configuration),
           sourceSegmentProperties, destinationSegProperties);
     } else {
+      if (restructuredBlockExists) {
+        sourceSegmentProperties = getSourceSegmentProperties(
+            Collections.singletonList(tableBlockInfoList.get(0).getDataFileFooter()));
+      }
       return new RawResultIterator(
           executeBlockList(tableBlockInfoList, segmentId, task, configuration),
           sourceSegmentProperties, destinationSegProperties, true);


[carbondata] 02/03: [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning

Posted by ku...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 575b7116e5cc0a7c25e17794a462a6ecdf4afb24
Author: ajantha-bhat <aj...@gmail.com>
AuthorDate: Thu Jul 25 18:50:19 2019 +0530

    [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
    
    Cause : When the datamaps count is just near numOfThreadsForPruning,
    As code is checking '>= ', last thread may not get the datamaps for prune.
    Hence array out of index exception is thrown in this scenario.
    There is no issues with higher number of datamaps.
    
    Solution: In this scenario launch threads based on the distribution value,
    not on the hardcoded value
    
    This closes #3336
---
 .../org/apache/carbondata/core/datamap/TableDataMap.java     | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
index 33fc3b1..ecdd586 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
@@ -207,9 +207,6 @@ public final class TableDataMap extends OperationEventListener {
      */
 
     int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning();
-    LOG.info(
-        "Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning
-            + ". total files: " + totalFiles + ". total segments: " + segments.size());
     int filesPerEachThread = totalFiles / numOfThreadsForPruning;
     int prev;
     int filesCount = 0;
@@ -254,6 +251,15 @@ public final class TableDataMap extends OperationEventListener {
       // this should not happen
       throw new RuntimeException(" not all the files processed ");
     }
+    if (datamapListForEachThread.size() < numOfThreadsForPruning) {
+      // If the total datamaps fitted in lesser number of threads than numOfThreadsForPruning.
+      // Launch only that many threads where datamaps are fitted while grouping.
+      LOG.info("Datamaps is distributed in " + datamapListForEachThread.size() + " threads");
+      numOfThreadsForPruning = datamapListForEachThread.size();
+    }
+    LOG.info(
+        "Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning
+            + ". total files: " + totalFiles + ". total segments: " + segments.size());
     List<Future<Void>> results = new ArrayList<>(numOfThreadsForPruning);
     final Map<Segment, List<ExtendedBlocklet>> prunedBlockletMap =
         new ConcurrentHashMap<>(segments.size());