You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/09/17 11:49:17 UTC

[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4219: [WIP] Indexserverfix

vikramahuja1001 opened a new pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922741953


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922814470


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922740680


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5939/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922847853


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711895647



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
     }
   }
 
+  def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+    request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+      .equalsIgnoreCase(segment)) match {
+      case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)

Review comment:
       need to handle INSERT_OVERWRITE_IN_PROGRESS segment also here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711988975



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -313,31 +340,34 @@ object DistributedRDDUtils {
           case None => throw new RuntimeException("Could not find any alive executors.")
         }
       }
-      val existingExecutorMapping = executorToCacheSizeMapping.get(newHost)
-      if (existingExecutorMapping != null) {
-        val existingSize = existingExecutorMapping.get(newExecutor)
-        if (existingSize != null) {
-          existingExecutorMapping.put(newExecutor, existingSize + segment.getIndexSize
-            .toInt)
-        } else {
-          existingExecutorMapping.put(newExecutor, segment.getIndexSize
-            .toInt)
-        }
+      tableToExecutorMapping.putIfAbsent(tableUniqueName, new ConcurrentHashMap[String, String]())
+      val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
+      val oldMapping = existingSegmentMapping.putIfAbsent(segment.getSegmentNo,
+        s"${ newHost }_$newExecutor")
+      if (oldMapping == null) {
+        updateCacheSize(newHost, newExecutor, segment)
+        s"executor_${newHost}_$newExecutor"
       } else {
-        val newExecutorMapping = new ConcurrentHashMap[String, Long]()
-        newExecutorMapping.put(newExecutor, segment.getIndexSize)
-        executorToCacheSizeMapping.put(newHost, newExecutorMapping)
+        s"executor_$oldMapping"
       }
-      val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
-      if (existingSegmentMapping == null) {
-        val newSegmentMapping = new ConcurrentHashMap[String, String]()
-        newSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor")
-        tableToExecutorMapping.putIfAbsent(tableUniqueName, newSegmentMapping)
+    }
+  }
+
+  private def updateCacheSize(host: String, executor: String, segment: Segment) = {
+    val existingExecutorMapping = executorToCacheSizeMapping.get(host)
+    if (existingExecutorMapping != null) {
+      val existingSize = existingExecutorMapping.get(executor)
+      if (existingSize != null) {
+        existingExecutorMapping.put(executor, existingSize + segment.getIndexSize

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711931261



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -313,31 +340,34 @@ object DistributedRDDUtils {
           case None => throw new RuntimeException("Could not find any alive executors.")
         }
       }
-      val existingExecutorMapping = executorToCacheSizeMapping.get(newHost)
-      if (existingExecutorMapping != null) {
-        val existingSize = existingExecutorMapping.get(newExecutor)
-        if (existingSize != null) {
-          existingExecutorMapping.put(newExecutor, existingSize + segment.getIndexSize
-            .toInt)
-        } else {
-          existingExecutorMapping.put(newExecutor, segment.getIndexSize
-            .toInt)
-        }
+      tableToExecutorMapping.putIfAbsent(tableUniqueName, new ConcurrentHashMap[String, String]())
+      val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
+      val oldMapping = existingSegmentMapping.putIfAbsent(segment.getSegmentNo,
+        s"${ newHost }_$newExecutor")
+      if (oldMapping == null) {
+        updateCacheSize(newHost, newExecutor, segment)
+        s"executor_${newHost}_$newExecutor"
       } else {
-        val newExecutorMapping = new ConcurrentHashMap[String, Long]()
-        newExecutorMapping.put(newExecutor, segment.getIndexSize)
-        executorToCacheSizeMapping.put(newHost, newExecutorMapping)
+        s"executor_$oldMapping"
       }
-      val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
-      if (existingSegmentMapping == null) {
-        val newSegmentMapping = new ConcurrentHashMap[String, String]()
-        newSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor")
-        tableToExecutorMapping.putIfAbsent(tableUniqueName, newSegmentMapping)
+    }
+  }
+
+  private def updateCacheSize(host: String, executor: String, segment: Segment) = {
+    val existingExecutorMapping = executorToCacheSizeMapping.get(host)
+    if (existingExecutorMapping != null) {
+      val existingSize = existingExecutorMapping.get(executor)
+      if (existingSize != null) {
+        existingExecutorMapping.put(executor, existingSize + segment.getIndexSize

Review comment:
       can get size based on existing size and can move  common code "existingExecutorMapping.put(executor, size)" down




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711920184



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -108,7 +108,20 @@ object DistributedRDDUtils {
           val wrapper: IndexInputSplit = legacySegment
             .asInstanceOf[IndexInputSplitWrapper].getDistributable
           val executor = validExecutorIds(index % validExecutorIds.length)
-          wrapper.setLocations(Array("executor_" + executor))
+          tableToExecutorMapping.putIfAbsent(tableUniqueName,

Review comment:
       done

##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
     }
   }
 
+  def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+    request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+      .equalsIgnoreCase(segment)) match {
+      case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921877338


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/337/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922749835


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922841899


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922802502


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711895647



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
     }
   }
 
+  def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+    request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+      .equalsIgnoreCase(segment)) match {
+      case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)

Review comment:
       may be need to handle INSERT_OVERWRITE_IN_PROGRESS segment also here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922828389


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/341/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711899757



##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -108,7 +108,20 @@ object DistributedRDDUtils {
           val wrapper: IndexInputSplit = legacySegment
             .asInstanceOf[IndexInputSplitWrapper].getDistributable
           val executor = validExecutorIds(index % validExecutorIds.length)
-          wrapper.setLocations(Array("executor_" + executor))
+          tableToExecutorMapping.putIfAbsent(tableUniqueName,

Review comment:
       please add a detailed comment explaining this logic, so that next time if any issues, developer will be aware of the old issue fix




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921865232


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921868583


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4190/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org