You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/09/25 01:33:36 UTC

[GitHub] [flink] sunjincheng121 edited a comment on issue #9748: [FLINK-14016][python][flink-table-planner] Introduce FlinkLogicalPythonScalarFunctionExec and DataStreamPythonScalarFunctionExec for Python function execution

sunjincheng121 edited a comment on issue #9748: [FLINK-14016][python][flink-table-planner] Introduce FlinkLogicalPythonScalarFunctionExec and DataStreamPythonScalarFunctionExec for Python function execution
URL: https://github.com/apache/flink/pull/9748#issuecomment-534481069
 
 
   Thanks a lot for your contribute @dianfu . I have gone through the PR. My main comments are below:
   - For the logical node, I think we don't need to create a new node for the python, instead, we can reuse the current `FlinkLogicalCalc` node. In this way, not only we can reuse most of the code, but also I think it matches the semantic that it is a Calc.
   - For the physical node, we can create a `DataStreamCalcBase` node and let both `DataStreamPythonCalc` and `DataStreamCalc` extend it.
   - For the split rule, maybe we can further split the calc a bit, i.e., don't mix java and python UDFs rather than further split them in translateToPlan. Doing all split in the split rule is more clear.
   - For the filter, we calculate these condition UDFs together with other UDFs and do the filter later. I think we can optimize it a bit, i.e., calculate the conditions first and then check whether to call the other UDFs. This can be easily achieved in the SplitRule.
   
   Last for the new optimization phase, I agree with the current design. Another choice is using the current logic optimization phase, but I think it would bring some side effects 
   like a) The PythonSplitRule would conflict with the CalcMerge rules and we need to do some hack for the merge rules. b) Adding all rules in logical phase makes the time of optimization longer and longer. This is the reason that we have a lot of Rule based optimization phase in Blink.
   
   What do you think? @dianfu @twalthr :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services