You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/16 15:23:14 UTC

[GitHub] [spark] grundprinzip commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

grundprinzip commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1024147641


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -272,7 +274,14 @@ class SparkConnectPlanner(session: SparkSession) {
 
   private def transformLocalRelation(rel: proto.LocalRelation): LogicalPlan = {
     val attributes = rel.getAttributesList.asScala.map(transformAttribute(_)).toSeq
-    new org.apache.spark.sql.catalyst.plans.logical.LocalRelation(attributes)
+
+    val rows = ArrowConverters.fromBatchIterator(
+      rel.getDataList.asScala.map(_.toByteArray).iterator,

Review Comment:
   If `data` is a regular byte array, it becomes a ByteString that you can simply extract here.



##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala:
##########
@@ -55,10 +61,18 @@ trait SparkConnectPlanTest extends SharedSparkSession {
    * equivalent in Catalyst and can be easily used for planner testing.
    *
    * @param attrs
+   *   the attributes of LocalRelation
+   * @param literals
+   *   the data of LocalRelation
    * @return
    */
-  def createLocalRelationProto(attrs: Seq[AttributeReference]): proto.Relation = {
+  def createLocalRelationProto(
+      attrs: Seq[AttributeReference],
+      literals: Seq[Array[Byte]]): proto.Relation = {

Review Comment:
   `literals` -> `data`?



##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -213,7 +213,7 @@ message Deduplicate {
 
 message LocalRelation {
   repeated Expression.QualifiedAttribute attributes = 1;
-  // TODO: support local data.
+  repeated bytes data = 2;

Review Comment:
   I'm not sure this needs to be `repeated bytes` here because `bytes` itself is binary data of "arbitrary" length.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org