You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/17 06:37:48 UTC
[GitHub] [spark] techaddict opened a new pull request, #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
techaddict opened a new pull request, #39104:
URL: https://github.com/apache/spark/pull/39104
### What changes were proposed in this pull request?
Add Protobuf serializer for RDDStorageInfoWrapper
### Why are the changes needed?
Support fast and compact serialization/deserialization for RDDStorageInfoWrapper over RocksDB.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
New UT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357097123
@techaddict Should change the pr title like `[SPARK-41425][CORE] Protobuf serializer for ...`, other pr titles need the same correction
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357101405
@LuciferYang I'm basing these PRs on #39048 #39096, which use this convention
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052719961
##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
+import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo, ResourceProfileInfo, RuntimeInfo}
Review Comment:
Let's change this to `oorg.apache.spark.status.api.v1._` to save future troubles.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052719752
##########
core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala:
##########
@@ -17,7 +17,7 @@
package org.apache.spark.status.protobuf
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
Review Comment:
Let's change this to `org.apache.spark.status._` to save future troubles.
##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
Review Comment:
Let's change this to `org.apache.spark.status._` to save future troubles.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358568558
@gengliangwang addressed comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] codecov-commenter commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1356769648
# [Codecov](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#39104](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d95fb4c) into [master](https://codecov.io/gh/apache/spark/commit/1e6fcba09d5c26a4aceff37af1f36efa25240d3e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1e6fcba) will **decrease** coverage by `49.27%`.
> The diff coverage is `n/a`.
> :exclamation: Current head d95fb4c differs from pull request most recent head 654099f. Consider uploading reports for the commit 654099f to get more accurate results
```diff
@@ Coverage Diff @@
## master #39104 +/- ##
===========================================
- Coverage 91.55% 42.28% -49.28%
===========================================
Files 305 165 -140
Lines 67432 38053 -29379
Branches 10214 6198 -4016
===========================================
- Hits 61740 16091 -45649
- Misses 4302 21262 +16960
+ Partials 1390 700 -690
```
| Flag | Coverage Δ | |
|---|---|---|
| unittests | `42.28% <ø> (-49.26%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [python/pyspark/pandas/frame.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2ZyYW1lLnB5) | `10.64% <0.00%> (-86.15%)` | :arrow_down: |
| [python/pyspark/pandas/accessors.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2FjY2Vzc29ycy5weQ==) | `9.50% <0.00%> (-82.65%)` | :arrow_down: |
| [python/pyspark/join.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvam9pbi5weQ==) | `12.12% <0.00%> (-81.82%)` | :arrow_down: |
| [python/pyspark/pandas/groupby.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2dyb3VwYnkucHk=) | `15.46% <0.00%> (-80.04%)` | :arrow_down: |
| [python/pyspark/sql/pandas/typehints.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvc3FsL3BhbmRhcy90eXBlaGludHMucHk=) | `17.50% <0.00%> (-77.50%)` | :arrow_down: |
| [python/pyspark/pandas/indexing.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2luZGV4aW5nLnB5) | `12.37% <0.00%> (-76.53%)` | :arrow_down: |
| [python/pyspark/pandas/categorical.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2NhdGVnb3JpY2FsLnB5) | `17.96% <0.00%> (-76.09%)` | :arrow_down: |
| [python/pyspark/sql/udf.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvc3FsL3VkZi5weQ==) | `16.58% <0.00%> (-75.78%)` | :arrow_down: |
| [python/pyspark/pandas/series.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3Nlcmllcy5weQ==) | `19.26% <0.00%> (-75.55%)` | :arrow_down: |
| [python/pyspark/pandas/indexes/base.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2luZGV4ZXMvYmFzZS5weQ==) | `21.01% <0.00%> (-75.42%)` | :arrow_down: |
| ... and [306 more](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052723424
##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
+import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo, ResourceProfileInfo, RuntimeInfo}
Review Comment:
done
##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358143826
@techaddict could you resolve the conflict? I am merging your PRs one by one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358850152
Thanks, merging to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
URL: https://github.com/apache/spark/pull/39104
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358522194
@gengliangwang merged latest master 👍🏼
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1051389905
##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+ def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+ val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+ builder.setInfo(serializeRDDStorageInfo(input.info))
+ builder.build().toByteArray
+ }
+
+ def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+ val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+ new RDDStorageInfoWrapper(
+ info = deserializeRDDStorageInfo(wrapper.getInfo)
+ )
+ }
+
+ private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+ val builder = StoreTypes.RDDStorageInfo.newBuilder()
+ builder.setId(info.id)
+ builder.setName(info.name)
+ builder.setNumPartitions(info.numPartitions)
+ builder.setNumCachedPartitions(info.numCachedPartitions)
+ builder.setStorageLevel(info.storageLevel)
+ builder.setMemoryUsed(info.memoryUsed)
+ builder.setDiskUsed(info.diskUsed)
+
+ if (info.dataDistribution.isDefined) {
+ info.dataDistribution.get.foreach { dd =>
+ val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+ dataDistributionBuilder.setAddress(dd.address)
+ dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+ dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+ dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+ dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+ dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+ dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+ dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+ builder.addDataDistribution(dataDistributionBuilder.build())
+ }
+ }
+
+ if (info.partitions.isDefined) {
+ info.partitions.get.foreach { p =>
+ val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+ partitionsBuilder.setBlockName(p.blockName)
+ partitionsBuilder.setStorageLevel(p.storageLevel)
+ partitionsBuilder.setMemoryUsed(p.memoryUsed)
+ partitionsBuilder.setDiskUsed(p.diskUsed)
+ p.executors.foreach(partitionsBuilder.addExecutors)
+ builder.addPartitions(partitionsBuilder.build())
+ }
+ }
+
+ builder.build()
+ }
+
+ private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+ new RDDStorageInfo(
+ id = info.getId,
+ name = info.getName,
+ numPartitions = info.getNumPartitions,
+ numCachedPartitions = info.getNumCachedPartitions,
+ storageLevel = info.getStorageLevel,
+ memoryUsed = info.getMemoryUsed,
+ diskUsed = info.getDiskUsed,
+ dataDistribution =
+ if (info.getDataDistributionList.isEmpty) {
Review Comment:
this assignment is based on https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L645
dataDistribution is None for empty seq, and partitions is always Some() even when underlying collection is empty
##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+ def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+ val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+ builder.setInfo(serializeRDDStorageInfo(input.info))
+ builder.build().toByteArray
+ }
+
+ def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+ val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+ new RDDStorageInfoWrapper(
+ info = deserializeRDDStorageInfo(wrapper.getInfo)
+ )
+ }
+
+ private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+ val builder = StoreTypes.RDDStorageInfo.newBuilder()
+ builder.setId(info.id)
+ builder.setName(info.name)
+ builder.setNumPartitions(info.numPartitions)
+ builder.setNumCachedPartitions(info.numCachedPartitions)
+ builder.setStorageLevel(info.storageLevel)
+ builder.setMemoryUsed(info.memoryUsed)
+ builder.setDiskUsed(info.diskUsed)
+
+ if (info.dataDistribution.isDefined) {
+ info.dataDistribution.get.foreach { dd =>
+ val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+ dataDistributionBuilder.setAddress(dd.address)
+ dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+ dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+ dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+ dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+ dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+ dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+ dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+ builder.addDataDistribution(dataDistributionBuilder.build())
+ }
+ }
+
+ if (info.partitions.isDefined) {
+ info.partitions.get.foreach { p =>
+ val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+ partitionsBuilder.setBlockName(p.blockName)
+ partitionsBuilder.setStorageLevel(p.storageLevel)
+ partitionsBuilder.setMemoryUsed(p.memoryUsed)
+ partitionsBuilder.setDiskUsed(p.diskUsed)
+ p.executors.foreach(partitionsBuilder.addExecutors)
+ builder.addPartitions(partitionsBuilder.build())
+ }
+ }
+
+ builder.build()
+ }
+
+ private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+ new RDDStorageInfo(
+ id = info.getId,
+ name = info.getName,
+ numPartitions = info.getNumPartitions,
+ numCachedPartitions = info.getNumCachedPartitions,
+ storageLevel = info.getStorageLevel,
+ memoryUsed = info.getMemoryUsed,
+ diskUsed = info.getDiskUsed,
+ dataDistribution =
+ if (info.getDataDistributionList.isEmpty) {
Review Comment:
this implementation is based on https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L645
dataDistribution is None for empty seq, and partitions is always Some() even when underlying collection is empty
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357385626
If so, we should add `[UI]`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052742075
##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+ def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+ val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+ builder.setInfo(serializeRDDStorageInfo(input.info))
+ builder.build().toByteArray
+ }
+
+ def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+ val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+ new RDDStorageInfoWrapper(
+ info = deserializeRDDStorageInfo(wrapper.getInfo)
+ )
+ }
+
+ private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+ val builder = StoreTypes.RDDStorageInfo.newBuilder()
+ builder.setId(info.id)
+ builder.setName(info.name)
+ builder.setNumPartitions(info.numPartitions)
+ builder.setNumCachedPartitions(info.numCachedPartitions)
+ builder.setStorageLevel(info.storageLevel)
+ builder.setMemoryUsed(info.memoryUsed)
+ builder.setDiskUsed(info.diskUsed)
+
+ if (info.dataDistribution.isDefined) {
+ info.dataDistribution.get.foreach { dd =>
+ val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+ dataDistributionBuilder.setAddress(dd.address)
+ dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+ dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+ dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+ dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+ dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+ dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+ dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+ builder.addDataDistribution(dataDistributionBuilder.build())
+ }
+ }
+
+ if (info.partitions.isDefined) {
+ info.partitions.get.foreach { p =>
+ val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+ partitionsBuilder.setBlockName(p.blockName)
+ partitionsBuilder.setStorageLevel(p.storageLevel)
+ partitionsBuilder.setMemoryUsed(p.memoryUsed)
+ partitionsBuilder.setDiskUsed(p.diskUsed)
+ p.executors.foreach(partitionsBuilder.addExecutors)
+ builder.addPartitions(partitionsBuilder.build())
+ }
+ }
+
+ builder.build()
+ }
+
+ private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+ new RDDStorageInfo(
+ id = info.getId,
+ name = info.getName,
+ numPartitions = info.getNumPartitions,
+ numCachedPartitions = info.getNumCachedPartitions,
+ storageLevel = info.getStorageLevel,
+ memoryUsed = info.getMemoryUsed,
+ diskUsed = info.getDiskUsed,
+ dataDistribution =
+ if (info.getDataDistributionList.isEmpty) {
Review Comment:
Yeah the optional sequence here is tricky...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052723531
##########
core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala:
##########
@@ -17,7 +17,7 @@
package org.apache.spark.status.protobuf
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org