You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Praveen R <pr...@sigmoidanalytics.com> on 2014/09/30 16:01:26 UTC

Pig on Spark - Suggestions in handling code changes out of Spark

*Hi Everyone,*

Earlier we have made some changes on
https://github.com/sigmoidanalytics/spork/tree/spork-pig-12 to achieve
complete e2e coverage but we couldn't restrict ourselves in making changes
in pig codebase as we found it slightly easier to do.

We are now working on merging these changes to
https://github.com/apache/pig/tree/spark and had to re-look into these
changes, either find a workaround or propose the change on trunk.

Below is the gist of code changes that are made out of Spark for which the
related code can be found here <http://goo.gl/nRgldU>


   1.

   Had to comment out PigStatsUtil.addNativeJobStats(PigStats.get(), this,
   true); to get native (mapred) operator working
   2.

   Changes in PigRecordReader to identify endOfAllInput
   3.

   POUserFunc - made properties attribute public
   4.

   POCollectedGroup - getNextTuple modified to identify the end of all input
   5.

   POFRJoin - made LRs attribute public to use it during FR join
   6.

   POMergeJoin - made LRs attribute public to use it during merge join
   7.

   POStream - problem with identifying endOfAllInput, made some changes
   8.

   JsonLoader - made properties public to use from JsonStorage
   9.

   JsonStorage - uses properties from JsonLoader
   10.

   PigStorage - mRequiredColumns attribute
   11.

   BinSedesTuple, BinSedesTupleFactory - made the class serializable
   12.

   SchemaTupleBackend - changes to initialize stbInstance when null



Would like to seek upfront suggestions before I submit the related patches
and take the discussion on a issue basis.

BW, below are the jira issues relating above changes which I would be
working on. Please feel free to comment on the issue whoever is interested
in taking them up.

PIG-4193, PIG-4189, PIG-4190, PIG-4192, PIG-4200, PIG-4207, PIG-4208,
PIG-4209

Thanks,
Praveen R

Re: Pig on Spark - Suggestions in handling code changes out of Spark

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Do you want to submit all the changes in one single separate Jira? Will be
easier for us to review.

On Tue, Sep 30, 2014 at 7:01 AM, Praveen R <pr...@sigmoidanalytics.com>
wrote:

> *Hi Everyone,*
>
> Earlier we have made some changes on
> https://github.com/sigmoidanalytics/spork/tree/spork-pig-12 to achieve
> complete e2e coverage but we couldn't restrict ourselves in making changes
> in pig codebase as we found it slightly easier to do.
>
> We are now working on merging these changes to
> https://github.com/apache/pig/tree/spark and had to re-look into these
> changes, either find a workaround or propose the change on trunk.
>
> Below is the gist of code changes that are made out of Spark for which the
> related code can be found here <http://goo.gl/nRgldU>
>
>
>    1.
>
>    Had to comment out PigStatsUtil.addNativeJobStats(PigStats.get(), this,
>    true); to get native (mapred) operator working
>    2.
>
>    Changes in PigRecordReader to identify endOfAllInput
>    3.
>
>    POUserFunc - made properties attribute public
>    4.
>
>    POCollectedGroup - getNextTuple modified to identify the end of all
> input
>    5.
>
>    POFRJoin - made LRs attribute public to use it during FR join
>    6.
>
>    POMergeJoin - made LRs attribute public to use it during merge join
>    7.
>
>    POStream - problem with identifying endOfAllInput, made some changes
>    8.
>
>    JsonLoader - made properties public to use from JsonStorage
>    9.
>
>    JsonStorage - uses properties from JsonLoader
>    10.
>
>    PigStorage - mRequiredColumns attribute
>    11.
>
>    BinSedesTuple, BinSedesTupleFactory - made the class serializable
>    12.
>
>    SchemaTupleBackend - changes to initialize stbInstance when null
>
>
>
> Would like to seek upfront suggestions before I submit the related patches
> and take the discussion on a issue basis.
>
> BW, below are the jira issues relating above changes which I would be
> working on. Please feel free to comment on the issue whoever is interested
> in taking them up.
>
> PIG-4193, PIG-4189, PIG-4190, PIG-4192, PIG-4200, PIG-4207, PIG-4208,
> PIG-4209
>
> Thanks,
> Praveen R
>