You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/29 16:04:48 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

grundprinzip opened a new pull request, #39291:
URL: https://github.com/apache/spark/pull/39291

   ### What changes were proposed in this pull request?
   
   This PR adds an extension mechanism to the Spark Connect protocol to support custom Relation and Expression types. This is necessary to support custom extensions in Spark like Delta or custom plugins in Catalyst.
   
   This is achieved by adding `protobuf.Any` fields in both `Relation` and `Expression`. To load the extension, this PR adds two configuration flags to indicate which classes should be loaded.
   
     * `spark.connect.extensions.relation.classes`
     * `spark.connect.extensions.expression.classes`
   
   To add a new plugin, the consumers have to implement either `RelationPlugin` or `ExpressionPlugin` and implement the corresponding `transform()` method. If the plugin does not support the transformation of the input, they plugin must return `None`.
   
   Below is a simplified example of an expression and relation plugin.
   
   First, define the custom message type that are necessary for the particular input.
   
   ```
   message ExamplePluginRelation {
     Relation input = 1;
     string custom_field = 2;
   
   }
   
   message ExamplePluginExpression {
     Expression child = 1;
     string custom_field = 2;
   }
   ```
   
   Second, define the necessary `RelationPlugin` and `ExpressionPlugin` implementations.
   
   ```
   class ExampleRelationPlugin extends RelationPlugin {
     override def transform(
         relation: protobuf.Any,
         planner: SparkConnectPlanner): Option[LogicalPlan] = {
   
       if (!relation.is(classOf[proto.ExamplePluginRelation])) {
         return None
       }
       val plugin = relation.unpack(classOf[proto.ExamplePluginRelation])
       Some(planner.transformRelation(plugin.getInput))
     }
   }
   
   class ExampleExpressionPlugin extends ExpressionPlugin {
     override def transform(
         relation: protobuf.Any,
         planner: SparkConnectPlanner): Option[Expression] = {
       if (!relation.is(classOf[proto.ExamplePluginExpression])) {
         return None
       }
       val exp = relation.unpack(classOf[proto.ExamplePluginExpression])
       Some(
         Alias(planner.transformExpression(exp.getChild), exp.getCustomField)(explicitMetadata =
           None))
     }
   }
   ```
   
   Now, on the client side, the new extensions simply have to be encoded into the `protobuf.Any` value to be available once the the plugins are loaded. Below is an example for wrapping the custom message type into a standard `Relation` with a `Range` child.
   
   ```
   Relation
     .newBuilder()
     .setExtension(
       protobuf.Any.pack(
         proto.ExamplePluginRelation
           .newBuilder()
           .setInput(
             proto.Relation
               .newBuilder()
               .setRange(proto.Range
                 .newBuilder()
                 .setStart(0)
                 .setEnd(10)
                 .setStep(1)))
           .build()))
   ```
   
   When the plan is transformed the custom extensions will behave like any other built-in functionality of Spark.
   
   
   ### Why are the changes needed?
   Extensibility
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, adds a new extension mechanism to the Spark Connect protoocol.
   
   
   ### How was this patch tested?
   Added test coverage for the plugin registry, and end to end transformation and execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1367734984

   Added the missing generated Python files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1367808269

   LGTM, merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1367436858

   R: @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression
URL: https://github.com/apache/spark/pull/39291


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1367434361

   R: @cloud-fan @hvanhovell @HyukjinKwon @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1695727004

   @sthagedorn can you post this in the dev mailing list thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1367443914

   Will add support for `Command` in the same patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sthagedorn commented on pull request #39291: [SPARK-41629][CONNECT] Support for Protocol Extensions in Relation and Expression

Posted by "sthagedorn (via GitHub)" <gi...@apache.org>.
sthagedorn commented on PR #39291:
URL: https://github.com/apache/spark/pull/39291#issuecomment-1695718748

   I tried to use this in my own application on Spark 3.4.1, but I encounter differences in the API regarding source vs the shipped class files. I posted to the mailing list, but did not receive a response so far: https://lists.apache.org/thread/7sfy15ck7c2q8x5p9y5t73d10f1tojzs  
   
   I thought I might try it here again :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org