You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by ravikiran23 <gi...@git.apache.org> on 2016/09/16 19:23:48 UTC

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

GitHub user ravikiran23 opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/161

    [CARBONDATA-246] compaction is wrong in case if last segment is not assigned to an executor.

    PROBLEM:
    
    if during compaction of 4 loads, for any executor if only first 3 loads task is assigned then the col cardinality calculation based on the last segment info will become wrong.
    
    in this case the cardinality will go wrong for that executor.
    
    Solution : 
    
    Pass the segment properties info from the driver using the last segment. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravikiran23/incubator-carbondata blockDistributionProblem

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/161.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #161
    
----
commit 9a0dbc0df8ae091dff0d02133295ce2344cf6734
Author: ravikiran <ra...@gmail.com>
Date:   2016-09-16T19:16:24Z

    if during compaction of 4 loads, for any executor if only first 3 loads task is assigned then the col cardinality calculation based on the last segment info will become wrong.
    
    in this case the cardinality will go wrong for that executor.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by ravikiran23 <gi...@git.apache.org>.
Github user ravikiran23 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79290197
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -102,6 +102,11 @@ class CarbonMergerRDD[K, V](
             var dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
             val carbonSparkPartition = theSplit.asInstanceOf[CarbonSparkPartition]
     
    +        // get destination segment properties as sent from driver which is of last segment.
    +
    +        val segmentProperties = new SegmentProperties(carbonMergerMapping.columnSchemaList.asJava,
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by ravikiran23 <gi...@git.apache.org>.
Github user ravikiran23 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79290299
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -259,6 +253,9 @@ class CarbonMergerRDD[K, V](
             )
           )
     
    +      // keep on assigning till last one is reached.
    +      blocksOfLastSegment = blocksOfOneSegment.asJava
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/161


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289289
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -102,6 +102,11 @@ class CarbonMergerRDD[K, V](
             var dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
             val carbonSparkPartition = theSplit.asInstanceOf[CarbonSparkPartition]
     
    +        // get destination segment properties as sent from driver which is of last segment.
    +
    +        val segmentProperties = new SegmentProperties(carbonMergerMapping.columnSchemaList.asJava,
    --- End diff --
    
    rename it to maxSegmentcolumnSchemaList and maxSegmentColCardinality.
    Write comment where it is declared


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289428
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -259,6 +253,9 @@ class CarbonMergerRDD[K, V](
             )
           )
     
    +      // keep on assigning till last one is reached.
    +      blocksOfLastSegment = blocksOfOneSegment.asJava
    --- End diff --
    
    add null check and size check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79282917
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/Compactor.scala ---
    @@ -69,7 +69,9 @@ object Compactor {
           schemaName,
           factTableName,
           validSegments,
    -      carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId
    +      carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId,
    +      colCardinality = Array[Int](0),
    --- End diff --
    
    Instead of creating an empty list and reassigning it you can create a reference of Array[Int] type like
    var colCardinality: Array[Int] = null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289444
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -280,6 +277,25 @@ class CarbonMergerRDD[K, V](
               taskInfoList.add(new TableTaskInfo(entry._1, entry._2).asInstanceOf[Distributable])
           )
         }
    +
    +    // prepare the details required to extract the segment properties using last segment.
    +
    --- End diff --
    
    Add empty and null check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by ravikiran23 <gi...@git.apache.org>.
Github user ravikiran23 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79283767
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/Compactor.scala ---
    @@ -69,7 +69,9 @@ object Compactor {
           schemaName,
           factTableName,
           validSegments,
    -      carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId
    +      carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId,
    +      colCardinality = Array[Int](0),
    --- End diff --
    
    fixed. using the null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

Posted by ravikiran23 <gi...@git.apache.org>.
Github user ravikiran23 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/161#discussion_r79290314
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala ---
    @@ -280,6 +277,25 @@ class CarbonMergerRDD[K, V](
               taskInfoList.add(new TableTaskInfo(entry._1, entry._2).asInstanceOf[Distributable])
           )
         }
    +
    +    // prepare the details required to extract the segment properties using last segment.
    +
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---