You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by prateek vaishnav <pr...@inmobi.com> on 2016/02/23 13:29:01 UTC

Review Request 43875: Fixed TestEvalPipelineLocal test suite.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/
-----------------------------------------------------------

Review request for pig and Pallavi Rao.


Repository: pig-git


Description
-------

https://issues.apache.org/jira/browse/PIG-4807

Following test cases have been fixed -
1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF

1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads  UDFContext.getUDFProperties() to return NULL.
 
public Properties getUDFProperties(Class c) {
    UDFContextKey k = generateKey(c, null);
    Properties p = udfConfs.get(k);
    if (p == null) {
        p = new Properties();
        udfConfs.put(k, p);
    }
    return p;
}

Here, udfConfs remains empty even when it was set while processing the pig query.
udf configuration in jobConf is getting lost while running the job.
In the code udf configuration is meant to be saved by serializing them in jobConf.

Currently, serialization is done before loading configuration in jobConf.
It is done in 'newJobConf(PigContext pigContext)'
It needs to be done after loading configuration.

JobConf jobConf = SparkUtil.newJobConf(pigContext);
configureLoader(physicalPlan, op, jobConf);
UDFContext.getUDFContext().serialize(jobConf);
            
2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.

3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator. 

JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
                sortOperator.new SortComparator(), true);

It should be using mComparator stored in POSort class. I have changed it to following

JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
                sortOperator.getMComparator(), true);


Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857 
  src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc 
  src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29 
  test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c 

Diff: https://reviews.apache.org/r/43875/diff/


Testing
-------

Successfully ran TestEvalPipelineLocal in spark/mr/local mode.


Thanks,

prateek vaishnav

Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.

Posted by Pallavi Rao <pa...@inmobi.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120467
-----------------------------------------------------------


Ship it!




Ship It!

- Pallavi Rao


On Feb. 24, 2016, 7:47 a.m., prateek vaishnav wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2016, 7:47 a.m.)
> 
> 
> Review request for pig and Pallavi Rao.
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/PIG-4807
> 
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
> 
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads  UDFContext.getUDFProperties() to return NULL.
>  
> public Properties getUDFProperties(Class c) {
>     UDFContextKey k = generateKey(c, null);
>     Properties p = udfConfs.get(k);
>     if (p == null) {
>         p = new Properties();
>         udfConfs.put(k, p);
>     }
>     return p;
> }
> 
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
> 
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
> 
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>             
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
> 
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator. 
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.new SortComparator(), true);
> 
> It should be using mComparator stored in POSort class. I have changed it to following
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.getMComparator(), true);
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29 
>   test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c 
> 
> Diff: https://reviews.apache.org/r/43875/diff/
> 
> 
> Testing
> -------
> 
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
> 
> 
> Thanks,
> 
> prateek vaishnav
> 
>

Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.

Posted by prateek vaishnav <pr...@inmobi.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/
-----------------------------------------------------------

(Updated Feb. 24, 2016, 7:47 a.m.)


Review request for pig and Pallavi Rao.


Changes
-------

Addressed review comments


Repository: pig-git


Description
-------

https://issues.apache.org/jira/browse/PIG-4807

Following test cases have been fixed -
1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF

1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads  UDFContext.getUDFProperties() to return NULL.
 
public Properties getUDFProperties(Class c) {
    UDFContextKey k = generateKey(c, null);
    Properties p = udfConfs.get(k);
    if (p == null) {
        p = new Properties();
        udfConfs.put(k, p);
    }
    return p;
}

Here, udfConfs remains empty even when it was set while processing the pig query.
udf configuration in jobConf is getting lost while running the job.
In the code udf configuration is meant to be saved by serializing them in jobConf.

Currently, serialization is done before loading configuration in jobConf.
It is done in 'newJobConf(PigContext pigContext)'
It needs to be done after loading configuration.

JobConf jobConf = SparkUtil.newJobConf(pigContext);
configureLoader(physicalPlan, op, jobConf);
UDFContext.getUDFContext().serialize(jobConf);
            
2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.

3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator. 

JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
                sortOperator.new SortComparator(), true);

It should be using mComparator stored in POSort class. I have changed it to following

JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
                sortOperator.getMComparator(), true);


Diffs (updated)
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857 
  src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc 
  src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29 
  test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c 

Diff: https://reviews.apache.org/r/43875/diff/


Testing
-------

Successfully ran TestEvalPipelineLocal in spark/mr/local mode.


Thanks,

prateek vaishnav

Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.

Posted by prateek vaishnav <pr...@inmobi.com>.


> On Feb. 24, 2016, 6:12 a.m., Pallavi Rao wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, line 559
> > <https://reviews.apache.org/r/43875/diff/1/?file=1265339#file1265339line559>
> >
> >     Please create a JIRA for XML format support.

https://issues.apache.org/jira/browse/PIG-4815


- prateek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120456
-----------------------------------------------------------


On Feb. 23, 2016, 12:29 p.m., prateek vaishnav wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2016, 12:29 p.m.)
> 
> 
> Review request for pig and Pallavi Rao.
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/PIG-4807
> 
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
> 
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads  UDFContext.getUDFProperties() to return NULL.
>  
> public Properties getUDFProperties(Class c) {
>     UDFContextKey k = generateKey(c, null);
>     Properties p = udfConfs.get(k);
>     if (p == null) {
>         p = new Properties();
>         udfConfs.put(k, p);
>     }
>     return p;
> }
> 
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
> 
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
> 
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>             
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
> 
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator. 
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.new SortComparator(), true);
> 
> It should be using mComparator stored in POSort class. I have changed it to following
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.getMComparator(), true);
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29 
>   test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c 
> 
> Diff: https://reviews.apache.org/r/43875/diff/
> 
> 
> Testing
> -------
> 
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
> 
> 
> Thanks,
> 
> prateek vaishnav
> 
>

Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.

Posted by Pallavi Rao <pa...@inmobi.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120456
-----------------------------------------------------------




src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 100)
<https://reviews.apache.org/r/43875/#comment181910>

    In general, wildcards are not preferred in import.



src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 555)
<https://reviews.apache.org/r/43875/#comment181911>

    Please create a JIRA for XML format support.



src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 74)
<https://reviews.apache.org/r/43875/#comment181912>

    Shouldn't this be return op.getName() ?



src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 130)
<https://reviews.apache.org/r/43875/#comment181913>

    Nit : naming of variable sparkp. Should follow camel case. May be sparkInnnerOp?


- Pallavi Rao


On Feb. 23, 2016, 12:29 p.m., prateek vaishnav wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2016, 12:29 p.m.)
> 
> 
> Review request for pig and Pallavi Rao.
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/PIG-4807
> 
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
> 
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads  UDFContext.getUDFProperties() to return NULL.
>  
> public Properties getUDFProperties(Class c) {
>     UDFContextKey k = generateKey(c, null);
>     Properties p = udfConfs.get(k);
>     if (p == null) {
>         p = new Properties();
>         udfConfs.put(k, p);
>     }
>     return p;
> }
> 
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
> 
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
> 
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>             
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
> 
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator. 
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.new SortComparator(), true);
> 
> It should be using mComparator stored in POSort class. I have changed it to following
> 
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
>                 sortOperator.getMComparator(), true);
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29 
>   test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c 
> 
> Diff: https://reviews.apache.org/r/43875/diff/
> 
> 
> Testing
> -------
> 
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
> 
> 
> Thanks,
> 
> prateek vaishnav
> 
>