You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by QiangCai <gi...@git.apache.org> on 2017/06/07 04:06:43 UTC

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

GitHub user QiangCai opened a pull request:

    https://github.com/apache/carbondata/pull/1002

    [CARBONDATA-1136] Fix compaction bug for the partition table

    After the compaction of the partition table, the select query is not showing data.
    
    **Analyze**
    During compaction, we lost the partition id of table
    
    **Solution**
    Continue to use  the old partition id in CarbonMergerRDD.scala


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/QiangCai/carbondata fixCompactionIssue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1002
    
----
commit e05c696900920ed5b98e608305d49c17d192fb5b
Author: QiangCai <da...@gmail.com>
Date:   2017-06-07T03:51:08Z

    fix compact bug for partition table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2787/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/412/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/410/<h2>Failed Tests: <span class='status-failure'>1</span></h2><h3><a name='carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test' /><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/410/org.apache.carbondata$carbondata-spark-common-test/testReport'>carbondata-pr-spark-1.6/org.apache.carbondata:carbondata-spark-common-test</a>: <span class='status-failure'>1</span></h3><ul><li><a href='https://builds.apache.org/job/carbondata-pr-spark-1.6/410/org.apache.carbondata$carbondata-spark-common-test/testReport/org.apache.carbondata.spark.testsuite.allqueries/InsertIntoCarbonTableTestCase/insert_into_carbon_table_from_carbon_table_union_query/'><strong>org.apache.carbondata.spark.testsuite.allqueries.InsertIntoCarbonTableTestCase.insert into carbon table from carbon table union query</strong></a></li></ul>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1002#discussion_r120927111
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -405,11 +411,16 @@ class CarbonMergerRDD[K, V](
               NodeInfo(splitsPerNode.getTaskId, splitsPerNode.getCarbonInputSplitList.size()))
     
             if (blockletCount != 0) {
    +          val taskInfo = splitInfo.asInstanceOf[CarbonInputSplitTaskInfo]
               val multiBlockSplit = new CarbonMultiBlockSplit(absoluteTableIdentifier,
    -            splitInfo.asInstanceOf[CarbonInputSplitTaskInfo].getCarbonInputSplitList,
    +            taskInfo.getCarbonInputSplitList,
                 Array(nodeName))
    -          result.add(new CarbonSparkPartition(id, partitionNo, multiBlockSplit))
    -          partitionNo += 1
    +          if (isPartitionTable) {
    --- End diff --
    
    This handling will not be sufficient, 
    When number of partitions(Example:100) is not equal to number of nodes(Example:5) , getPartitions will divide total blocks among available nodes. Then each node will get more than one taskno/partitionNo to handle.
    Compute function in executor just merges all the given btrees(segid+taskid) into one task. So multiple taskids/partitions will be merged to one. This disturbs partition mapping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/356/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2247/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2524/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1002


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/721/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/208/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2522/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/carbondata-pr-spark-1.6/120/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2472/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1002: [CARBONDATA-1136] Fix compaction bug for the ...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1002#discussion_r121036463
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -405,11 +411,16 @@ class CarbonMergerRDD[K, V](
               NodeInfo(splitsPerNode.getTaskId, splitsPerNode.getCarbonInputSplitList.size()))
     
             if (blockletCount != 0) {
    +          val taskInfo = splitInfo.asInstanceOf[CarbonInputSplitTaskInfo]
               val multiBlockSplit = new CarbonMultiBlockSplit(absoluteTableIdentifier,
    -            splitInfo.asInstanceOf[CarbonInputSplitTaskInfo].getCarbonInputSplitList,
    +            taskInfo.getCarbonInputSplitList,
                 Array(nodeName))
    -          result.add(new CarbonSparkPartition(id, partitionNo, multiBlockSplit))
    -          partitionNo += 1
    +          if (isPartitionTable) {
    --- End diff --
    
    @gvramana right, each node will get more than one taskno/partitionNo to handle. But one spark task just handle one partitionNo/taskNo. a CarbonInputSplitTaskInfo represent a taskNo. So different taskNo will go to different spark task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1002: [CARBONDATA-1136] Fix compaction bug for the partiti...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/1002
  
    @gvramana I will raise another PR to optimize the compaction for normal table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---