You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/09/17 11:49:17 UTC
[GitHub] [carbondata] vikramahuja1001 opened a new pull request #4219: [WIP] Indexserverfix
vikramahuja1001 opened a new pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219
### Why is this PR needed?
### What changes were proposed in this PR?
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
### Is any new testcase added?
- No
- Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922741953
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922814470
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922740680
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5939/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922847853
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4194/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711895647
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
}
}
+ def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+ request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+ .equalsIgnoreCase(segment)) match {
+ case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)
Review comment:
need to handle INSERT_OVERWRITE_IN_PROGRESS segment also here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711988975
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -313,31 +340,34 @@ object DistributedRDDUtils {
case None => throw new RuntimeException("Could not find any alive executors.")
}
}
- val existingExecutorMapping = executorToCacheSizeMapping.get(newHost)
- if (existingExecutorMapping != null) {
- val existingSize = existingExecutorMapping.get(newExecutor)
- if (existingSize != null) {
- existingExecutorMapping.put(newExecutor, existingSize + segment.getIndexSize
- .toInt)
- } else {
- existingExecutorMapping.put(newExecutor, segment.getIndexSize
- .toInt)
- }
+ tableToExecutorMapping.putIfAbsent(tableUniqueName, new ConcurrentHashMap[String, String]())
+ val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
+ val oldMapping = existingSegmentMapping.putIfAbsent(segment.getSegmentNo,
+ s"${ newHost }_$newExecutor")
+ if (oldMapping == null) {
+ updateCacheSize(newHost, newExecutor, segment)
+ s"executor_${newHost}_$newExecutor"
} else {
- val newExecutorMapping = new ConcurrentHashMap[String, Long]()
- newExecutorMapping.put(newExecutor, segment.getIndexSize)
- executorToCacheSizeMapping.put(newHost, newExecutorMapping)
+ s"executor_$oldMapping"
}
- val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
- if (existingSegmentMapping == null) {
- val newSegmentMapping = new ConcurrentHashMap[String, String]()
- newSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor")
- tableToExecutorMapping.putIfAbsent(tableUniqueName, newSegmentMapping)
+ }
+ }
+
+ private def updateCacheSize(host: String, executor: String, segment: Segment) = {
+ val existingExecutorMapping = executorToCacheSizeMapping.get(host)
+ if (existingExecutorMapping != null) {
+ val existingSize = existingExecutorMapping.get(executor)
+ if (existingSize != null) {
+ existingExecutorMapping.put(executor, existingSize + segment.getIndexSize
Review comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711931261
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -313,31 +340,34 @@ object DistributedRDDUtils {
case None => throw new RuntimeException("Could not find any alive executors.")
}
}
- val existingExecutorMapping = executorToCacheSizeMapping.get(newHost)
- if (existingExecutorMapping != null) {
- val existingSize = existingExecutorMapping.get(newExecutor)
- if (existingSize != null) {
- existingExecutorMapping.put(newExecutor, existingSize + segment.getIndexSize
- .toInt)
- } else {
- existingExecutorMapping.put(newExecutor, segment.getIndexSize
- .toInt)
- }
+ tableToExecutorMapping.putIfAbsent(tableUniqueName, new ConcurrentHashMap[String, String]())
+ val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
+ val oldMapping = existingSegmentMapping.putIfAbsent(segment.getSegmentNo,
+ s"${ newHost }_$newExecutor")
+ if (oldMapping == null) {
+ updateCacheSize(newHost, newExecutor, segment)
+ s"executor_${newHost}_$newExecutor"
} else {
- val newExecutorMapping = new ConcurrentHashMap[String, Long]()
- newExecutorMapping.put(newExecutor, segment.getIndexSize)
- executorToCacheSizeMapping.put(newHost, newExecutorMapping)
+ s"executor_$oldMapping"
}
- val existingSegmentMapping = tableToExecutorMapping.get(tableUniqueName)
- if (existingSegmentMapping == null) {
- val newSegmentMapping = new ConcurrentHashMap[String, String]()
- newSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor")
- tableToExecutorMapping.putIfAbsent(tableUniqueName, newSegmentMapping)
+ }
+ }
+
+ private def updateCacheSize(host: String, executor: String, segment: Segment) = {
+ val existingExecutorMapping = executorToCacheSizeMapping.get(host)
+ if (existingExecutorMapping != null) {
+ val existingSize = existingExecutorMapping.get(executor)
+ if (existingSize != null) {
+ existingExecutorMapping.put(executor, existingSize + segment.getIndexSize
Review comment:
can get size based on existing size and can move common code "existingExecutorMapping.put(executor, size)" down
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711920184
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -108,7 +108,20 @@ object DistributedRDDUtils {
val wrapper: IndexInputSplit = legacySegment
.asInstanceOf[IndexInputSplitWrapper].getDistributable
val executor = validExecutorIds(index % validExecutorIds.length)
- wrapper.setLocations(Array("executor_" + executor))
+ tableToExecutorMapping.putIfAbsent(tableUniqueName,
Review comment:
done
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
}
}
+ def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+ request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+ .equalsIgnoreCase(segment)) match {
+ case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)
Review comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921877338
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/337/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922749835
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922841899
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5940/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922802502
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711895647
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -164,6 +177,14 @@ object DistributedRDDUtils {
}
}
+ def isSegmentInProgress(request: IndexInputFormat, segment: String): Boolean = {
+ request.getReadCommittedScope.getSegmentList.find(_.getLoadName
+ .equalsIgnoreCase(segment)) match {
+ case Some(value) => value.getSegmentStatus.equals(SegmentStatus.INSERT_IN_PROGRESS)
Review comment:
may be need to handle INSERT_OVERWRITE_IN_PROGRESS segment also here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-922828389
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/341/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4219: [CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Server caching mechanism.
Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#discussion_r711899757
##########
File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala
##########
@@ -108,7 +108,20 @@ object DistributedRDDUtils {
val wrapper: IndexInputSplit = legacySegment
.asInstanceOf[IndexInputSplitWrapper].getDistributable
val executor = validExecutorIds(index % validExecutorIds.length)
- wrapper.setLocations(Array("executor_" + executor))
+ tableToExecutorMapping.putIfAbsent(tableUniqueName,
Review comment:
please add a detailed comment explaining this logic, so that next time if any issues, developer will be aware of the old issue fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921865232
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5935/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4219: [WIP] Indexserverfix
Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4219:
URL: https://github.com/apache/carbondata/pull/4219#issuecomment-921868583
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4190/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org