You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/17 06:37:48 UTC

[GitHub] [spark] techaddict opened a new pull request, #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

techaddict opened a new pull request, #39104:
URL: https://github.com/apache/spark/pull/39104

   ### What changes were proposed in this pull request?
   Add Protobuf serializer for RDDStorageInfoWrapper
   
   ### Why are the changes needed?
   Support fast and compact serialization/deserialization for RDDStorageInfoWrapper over RocksDB.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357097123

   @techaddict Should change the pr title like `[SPARK-41425][CORE] Protobuf serializer for ...`, other pr titles need the same correction
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357101405

   @LuciferYang I'm basing these PRs on #39048 #39096, which use this convention


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052719961


##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
 
 import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
 import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
+import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo, ResourceProfileInfo, RuntimeInfo}

Review Comment:
   Let's change this to `oorg.apache.spark.status.api.v1._` to save future troubles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052719752


##########
core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala:
##########
@@ -17,7 +17,7 @@
 
 package org.apache.spark.status.protobuf
 
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}

Review Comment:
   Let's change this to `org.apache.spark.status._` to save future troubles.



##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
 
 import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
 import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}

Review Comment:
   Let's change this to `org.apache.spark.status._` to save future troubles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358568558

   @gengliangwang addressed comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] codecov-commenter commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1356769648

   # [Codecov](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#39104](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d95fb4c) into [master](https://codecov.io/gh/apache/spark/commit/1e6fcba09d5c26a4aceff37af1f36efa25240d3e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1e6fcba) will **decrease** coverage by `49.27%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head d95fb4c differs from pull request most recent head 654099f. Consider uploading reports for the commit 654099f to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #39104       +/-   ##
   ===========================================
   - Coverage   91.55%   42.28%   -49.28%     
   ===========================================
     Files         305      165      -140     
     Lines       67432    38053    -29379     
     Branches    10214     6198     -4016     
   ===========================================
   - Hits        61740    16091    -45649     
   - Misses       4302    21262    +16960     
   + Partials     1390      700      -690     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `42.28% <ø> (-49.26%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/spark/pull/39104?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/pyspark/pandas/frame.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2ZyYW1lLnB5) | `10.64% <0.00%> (-86.15%)` | :arrow_down: |
   | [python/pyspark/pandas/accessors.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2FjY2Vzc29ycy5weQ==) | `9.50% <0.00%> (-82.65%)` | :arrow_down: |
   | [python/pyspark/join.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvam9pbi5weQ==) | `12.12% <0.00%> (-81.82%)` | :arrow_down: |
   | [python/pyspark/pandas/groupby.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2dyb3VwYnkucHk=) | `15.46% <0.00%> (-80.04%)` | :arrow_down: |
   | [python/pyspark/sql/pandas/typehints.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvc3FsL3BhbmRhcy90eXBlaGludHMucHk=) | `17.50% <0.00%> (-77.50%)` | :arrow_down: |
   | [python/pyspark/pandas/indexing.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2luZGV4aW5nLnB5) | `12.37% <0.00%> (-76.53%)` | :arrow_down: |
   | [python/pyspark/pandas/categorical.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2NhdGVnb3JpY2FsLnB5) | `17.96% <0.00%> (-76.09%)` | :arrow_down: |
   | [python/pyspark/sql/udf.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvc3FsL3VkZi5weQ==) | `16.58% <0.00%> (-75.78%)` | :arrow_down: |
   | [python/pyspark/pandas/series.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3Nlcmllcy5weQ==) | `19.26% <0.00%> (-75.55%)` | :arrow_down: |
   | [python/pyspark/pandas/indexes/base.py](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2luZGV4ZXMvYmFzZS5weQ==) | `21.01% <0.00%> (-75.42%)` | :arrow_down: |
   | ... and [306 more](https://codecov.io/gh/apache/spark/pull/39104/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052723424


##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
 
 import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
 import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}
+import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo, ResourceProfileInfo, RuntimeInfo}

Review Comment:
   done



##########
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##########
@@ -21,8 +21,8 @@ import java.util.Date
 
 import org.apache.spark.{JobExecutionStatus, SparkFunSuite}
 import org.apache.spark.resource.{ExecutorResourceRequest, TaskResourceRequest}
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
-import org.apache.spark.status.api.v1.{AccumulableInfo, ApplicationAttemptInfo, ApplicationEnvironmentInfo, ApplicationInfo, JobData, ResourceProfileInfo, RuntimeInfo}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358143826

   @techaddict could you resolve the conflict? I am merging your PRs one by one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358850152

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper
URL: https://github.com/apache/spark/pull/39104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1358522194

   @gengliangwang merged latest master 👍🏼 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1051389905


##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+  def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+    val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+    builder.setInfo(serializeRDDStorageInfo(input.info))
+    builder.build().toByteArray
+  }
+
+  def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+    val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+    new RDDStorageInfoWrapper(
+      info = deserializeRDDStorageInfo(wrapper.getInfo)
+    )
+  }
+
+  private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+    val builder = StoreTypes.RDDStorageInfo.newBuilder()
+    builder.setId(info.id)
+    builder.setName(info.name)
+    builder.setNumPartitions(info.numPartitions)
+    builder.setNumCachedPartitions(info.numCachedPartitions)
+    builder.setStorageLevel(info.storageLevel)
+    builder.setMemoryUsed(info.memoryUsed)
+    builder.setDiskUsed(info.diskUsed)
+
+    if (info.dataDistribution.isDefined) {
+      info.dataDistribution.get.foreach { dd =>
+        val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+        dataDistributionBuilder.setAddress(dd.address)
+        dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+        dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+        dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+        dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+        dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+        dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+        dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+        builder.addDataDistribution(dataDistributionBuilder.build())
+      }
+    }
+
+    if (info.partitions.isDefined) {
+      info.partitions.get.foreach { p =>
+        val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+        partitionsBuilder.setBlockName(p.blockName)
+        partitionsBuilder.setStorageLevel(p.storageLevel)
+        partitionsBuilder.setMemoryUsed(p.memoryUsed)
+        partitionsBuilder.setDiskUsed(p.diskUsed)
+        p.executors.foreach(partitionsBuilder.addExecutors)
+        builder.addPartitions(partitionsBuilder.build())
+      }
+    }
+
+    builder.build()
+  }
+
+  private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+    new RDDStorageInfo(
+      id = info.getId,
+      name = info.getName,
+      numPartitions = info.getNumPartitions,
+      numCachedPartitions = info.getNumCachedPartitions,
+      storageLevel = info.getStorageLevel,
+      memoryUsed = info.getMemoryUsed,
+      diskUsed = info.getDiskUsed,
+      dataDistribution =
+        if (info.getDataDistributionList.isEmpty) {

Review Comment:
   this assignment is based on https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L645
   dataDistribution is None for empty seq, and partitions is always Some() even when underlying collection is empty



##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+  def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+    val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+    builder.setInfo(serializeRDDStorageInfo(input.info))
+    builder.build().toByteArray
+  }
+
+  def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+    val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+    new RDDStorageInfoWrapper(
+      info = deserializeRDDStorageInfo(wrapper.getInfo)
+    )
+  }
+
+  private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+    val builder = StoreTypes.RDDStorageInfo.newBuilder()
+    builder.setId(info.id)
+    builder.setName(info.name)
+    builder.setNumPartitions(info.numPartitions)
+    builder.setNumCachedPartitions(info.numCachedPartitions)
+    builder.setStorageLevel(info.storageLevel)
+    builder.setMemoryUsed(info.memoryUsed)
+    builder.setDiskUsed(info.diskUsed)
+
+    if (info.dataDistribution.isDefined) {
+      info.dataDistribution.get.foreach { dd =>
+        val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+        dataDistributionBuilder.setAddress(dd.address)
+        dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+        dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+        dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+        dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+        dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+        dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+        dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+        builder.addDataDistribution(dataDistributionBuilder.build())
+      }
+    }
+
+    if (info.partitions.isDefined) {
+      info.partitions.get.foreach { p =>
+        val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+        partitionsBuilder.setBlockName(p.blockName)
+        partitionsBuilder.setStorageLevel(p.storageLevel)
+        partitionsBuilder.setMemoryUsed(p.memoryUsed)
+        partitionsBuilder.setDiskUsed(p.diskUsed)
+        p.executors.foreach(partitionsBuilder.addExecutors)
+        builder.addPartitions(partitionsBuilder.build())
+      }
+    }
+
+    builder.build()
+  }
+
+  private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+    new RDDStorageInfo(
+      id = info.getId,
+      name = info.getName,
+      numPartitions = info.getNumPartitions,
+      numCachedPartitions = info.getNumCachedPartitions,
+      storageLevel = info.getStorageLevel,
+      memoryUsed = info.getMemoryUsed,
+      diskUsed = info.getDiskUsed,
+      dataDistribution =
+        if (info.getDataDistributionList.isEmpty) {

Review Comment:
   this implementation is based on https://github.com/apache/spark/blob/d95fb4c33f6f061190fae091868117d182659147/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L645
   dataDistribution is None for empty seq, and partitions is always Some() even when underlying collection is empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #39104:
URL: https://github.com/apache/spark/pull/39104#issuecomment-1357385626

   If so, we should add `[UI]`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052742075


##########
core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala:
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.status.RDDStorageInfoWrapper
+import org.apache.spark.status.api.v1.{RDDDataDistribution, RDDPartitionInfo, RDDStorageInfo}
+import org.apache.spark.status.protobuf.Utils.getOptional
+
+object RDDStorageInfoWrapperSerializer {
+  def serialize(input: RDDStorageInfoWrapper): Array[Byte] = {
+    val builder = StoreTypes.RDDStorageInfoWrapper.newBuilder()
+    builder.setInfo(serializeRDDStorageInfo(input.info))
+    builder.build().toByteArray
+  }
+
+  def deserialize(bytes: Array[Byte]): RDDStorageInfoWrapper = {
+    val wrapper = StoreTypes.RDDStorageInfoWrapper.parseFrom(bytes)
+    new RDDStorageInfoWrapper(
+      info = deserializeRDDStorageInfo(wrapper.getInfo)
+    )
+  }
+
+  private def serializeRDDStorageInfo(info: RDDStorageInfo): StoreTypes.RDDStorageInfo = {
+    val builder = StoreTypes.RDDStorageInfo.newBuilder()
+    builder.setId(info.id)
+    builder.setName(info.name)
+    builder.setNumPartitions(info.numPartitions)
+    builder.setNumCachedPartitions(info.numCachedPartitions)
+    builder.setStorageLevel(info.storageLevel)
+    builder.setMemoryUsed(info.memoryUsed)
+    builder.setDiskUsed(info.diskUsed)
+
+    if (info.dataDistribution.isDefined) {
+      info.dataDistribution.get.foreach { dd =>
+        val dataDistributionBuilder = StoreTypes.RDDDataDistribution.newBuilder()
+        dataDistributionBuilder.setAddress(dd.address)
+        dataDistributionBuilder.setMemoryUsed(dd.memoryUsed)
+        dataDistributionBuilder.setMemoryRemaining(dd.memoryRemaining)
+        dataDistributionBuilder.setDiskUsed(dd.diskUsed)
+        dd.onHeapMemoryUsed.foreach(dataDistributionBuilder.setOnHeapMemoryUsed)
+        dd.offHeapMemoryUsed.foreach(dataDistributionBuilder.setOffHeapMemoryUsed)
+        dd.onHeapMemoryRemaining.foreach(dataDistributionBuilder.setOnHeapMemoryRemaining)
+        dd.offHeapMemoryRemaining.foreach(dataDistributionBuilder.setOffHeapMemoryRemaining)
+        builder.addDataDistribution(dataDistributionBuilder.build())
+      }
+    }
+
+    if (info.partitions.isDefined) {
+      info.partitions.get.foreach { p =>
+        val partitionsBuilder = StoreTypes.RDDPartitionInfo.newBuilder()
+        partitionsBuilder.setBlockName(p.blockName)
+        partitionsBuilder.setStorageLevel(p.storageLevel)
+        partitionsBuilder.setMemoryUsed(p.memoryUsed)
+        partitionsBuilder.setDiskUsed(p.diskUsed)
+        p.executors.foreach(partitionsBuilder.addExecutors)
+        builder.addPartitions(partitionsBuilder.build())
+      }
+    }
+
+    builder.build()
+  }
+
+  private def deserializeRDDStorageInfo(info: StoreTypes.RDDStorageInfo): RDDStorageInfo = {
+    new RDDStorageInfo(
+      id = info.getId,
+      name = info.getName,
+      numPartitions = info.getNumPartitions,
+      numCachedPartitions = info.getNumCachedPartitions,
+      storageLevel = info.getStorageLevel,
+      memoryUsed = info.getMemoryUsed,
+      diskUsed = info.getDiskUsed,
+      dataDistribution =
+        if (info.getDataDistributionList.isEmpty) {

Review Comment:
   Yeah the optional sequence here is tricky...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39104:
URL: https://github.com/apache/spark/pull/39104#discussion_r1052723531


##########
core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala:
##########
@@ -17,7 +17,7 @@
 
 package org.apache.spark.status.protobuf
 
-import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, TaskDataWrapper}
+import org.apache.spark.status.{ApplicationEnvironmentInfoWrapper, ApplicationInfoWrapper, JobDataWrapper, RDDStorageInfoWrapper, TaskDataWrapper}

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org