You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/05/17 14:51:14 UTC

[GitHub] [incubator-seatunnel] smokeriu opened a new issue, #1901: [Feature][spark-transform-code] Transform via user define code

smokeriu opened a new issue, #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901

   ### Search before asking
   
   - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
   
   
   ### Description
   
   Allow users to upload a custom Java Code file . and call it like a UDF.
   
   ### Usage Scenario
   
   - User use --files to upload their code.java. 
   - Define a class named CodeInvokeExpress in `package org.apache.spark.sql.expression`. 
     - Because I need use some private method/class.
   - Use `CodeGenerator` to compile user code.java.
     - A similar CodeGenerator is already implemented by Spark via Janino, so we don't need to add additional dependencies.
     - I plan to use, for the time being, the CodeGenerator that Spark has implemented . It has been fully tested.
   - Use Expression.eval to invoke compile result.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] smokeriu commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
smokeriu commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129545829

   > > * I plan to use, for the time being, the CodeGenerator that Spark has implemented . It has been fully tested.
   > 
   > If use this, maybe we should create a shade dependcy, because SeaTunnel core logic should not depend on engine code.
   
   As I envision it now, it's just a Spark Transform, so we can start with the dependencies that Spark already has.And can use some of the methods/tools already implemented by Spark.
   Flink or a generic implementation may have to be discussed more, as I haven't worked on it for Flink before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] smokeriu commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
smokeriu commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129619464

   > If so, I would like to split the transform from our distribution like source/sink, user can olny need to add `seatunnel-api-xx` to their new transform plugin, and put the plugin into transform directory, seatunnel will load it atomically.
   
   It is a good idea. Users can implement their own algorithms by `extends BaseTransform`, etc.
   But for this Issue, do you think there is a need to implement it.
   The difference is that the user only needs a single code.java instead of packaging the algorithm. I think it will be more useful in test scenarios and simple scenarios


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
BenJFan commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129621900

   > > Why upload java file than upload jar? In jar, you will have all dependcy which you should use. In java file is hard to do that.
   > 
   > Sometimes it is difficult to do some work through SQL alone, but it becomes easier through Java. Sometimes the user just needs simple code to get the job done. At this point, I think it will increase the workload if you do it by uploading jar. Because when using javacode, the user does not need to do the work of packaging and so on. Of course, the disadvantage is that the user can only use the dependencies that already exist in our app. However, in the future, we can provide --jars entry to the user, then the user will be able to use other dependencies in the code.
   
   In my view, if user start write code, they must use IDE like idea or eclipse.  The package isn't a big problem. Only with a java file, the user would not know the code can run success before submit.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
BenJFan commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129497603

   Why upload java file than upload jar? In jar, you will have all dependcy which you should use. In java file is hard to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
BenJFan commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129498780

   > * I plan to use, for the time being, the CodeGenerator that Spark has implemented . It has been fully tested.
   
   If use this, maybe we should create a shade dependcy, because SeaTunnel core logic should not depend on engine code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129586111

   > > Why upload java file than upload jar? In jar, you will have all dependcy which you should use. In java file is hard to do that.
   > 
   > Sometimes it is difficult to do some work through SQL alone, but it becomes easier through Java. Sometimes the user just needs simple code to get the job done. At this point, I think it will increase the workload if you do it by uploading jar. Because when using javacode, the user does not need to do the work of packaging and so on. Of course, the disadvantage is that the user can only use the dependencies that already exist in our app. However, in the future, we can provide --jars entry to the user, then the user will be able to use other dependencies in the code.
   
   If so, I would like to split the transform from our distribution like source/sink, user can olny need to add `seatunnel-api-xx` to their new transform plugin, and put the plugin into transform directory, seatunnel will load it atomically.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ruanwenjun commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129490665

   Sounds good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] smokeriu commented on issue #1901: [Feature][spark-transform-code] Transform via user define code

Posted by GitBox <gi...@apache.org>.
smokeriu commented on issue #1901:
URL: https://github.com/apache/incubator-seatunnel/issues/1901#issuecomment-1129539308

   > Why upload java file than upload jar? In jar, you will have all dependcy which you should use. In java file is hard to do that.
   
   Sometimes it is difficult to do some work through SQL alone, but it becomes easier through Java.
   Sometimes the user just needs simple code to get the job done. At this point, I think it will increase the workload if you do it by uploading jar.
   Because when using javacode, the user does not need to do the work of packaging and so on.
   Of course, the disadvantage is that the user can only use the dependencies that already exist in our app. However, in the future, we can provide --jars entry to the user, then the user will be able to use other dependencies in the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org