You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/07/06 19:16:10 UTC

[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

    [ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364923#comment-15364923 ] 

Rohini Palaniswamy commented on PIG-3764:
-----------------------------------------

Looked into byte code generation this weekend while looking at how best to fix PIG-3000 instead of just rewriting the plan. 

On a very high level, 
   - the idea is to generate and compile the input plans of POForeach (nested as well) and POFilter into a single class with code all totally inlined and replace them with the new generated class in the plan using a separate Optimizer. 
   - Create variables with operator key names as much as possible for easy debugging. 
   - Provide a interface for UDFs to also provide simplified versions of the code to avoid wrapping in the tuple and DataBag and pass an array or ArrayList directly so that we can do tight loops. Inline that as well instead of method calls if possible
  - Add methods to existing operators that will be used to generate bytecode instead of adding new class like GeneratedPigExpression/GeneratedPigExpressionGenerator with methods and code for all operations. If that becomes more complicated, then will go with the separate classes idea in Julien's prototype.

Took a look at ByteBuddy which seemed to provide a good high level abstraction over ASM. But none of the other hadoop projects seemed to have used it. If it does not work out can use ASM directly.  Will try do a prototype later in the month.

[~julienledem],
   Thoughts?

> Compile physical operators to bytecode
> --------------------------------------
>
>                 Key: PIG-3764
>                 URL: https://issues.apache.org/jira/browse/PIG-3764
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Julien Le Dem
>              Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)