You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by prateek vaishnav <pr...@inmobi.com> on 2016/02/23 13:29:01 UTC
Review Request 43875: Fixed TestEvalPipelineLocal test suite.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/
-----------------------------------------------------------
Review request for pig and Pallavi Rao.
Repository: pig-git
Description
-------
https://issues.apache.org/jira/browse/PIG-4807
Following test cases have been fixed -
1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads UDFContext.getUDFProperties() to return NULL.
public Properties getUDFProperties(Class c) {
UDFContextKey k = generateKey(c, null);
Properties p = udfConfs.get(k);
if (p == null) {
p = new Properties();
udfConfs.put(k, p);
}
return p;
}
Here, udfConfs remains empty even when it was set while processing the pig query.
udf configuration in jobConf is getting lost while running the job.
In the code udf configuration is meant to be saved by serializing them in jobConf.
Currently, serialization is done before loading configuration in jobConf.
It is done in 'newJobConf(PigContext pigContext)'
It needs to be done after loading configuration.
JobConf jobConf = SparkUtil.newJobConf(pigContext);
configureLoader(physicalPlan, op, jobConf);
UDFContext.getUDFContext().serialize(jobConf);
2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator.
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
sortOperator.new SortComparator(), true);
It should be using mComparator stored in POSort class. I have changed it to following
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
sortOperator.getMComparator(), true);
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29
test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c
Diff: https://reviews.apache.org/r/43875/diff/
Testing
-------
Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
Thanks,
prateek vaishnav
Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.
Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120467
-----------------------------------------------------------
Ship it!
Ship It!
- Pallavi Rao
On Feb. 24, 2016, 7:47 a.m., prateek vaishnav wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2016, 7:47 a.m.)
>
>
> Review request for pig and Pallavi Rao.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/PIG-4807
>
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
>
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads UDFContext.getUDFProperties() to return NULL.
>
> public Properties getUDFProperties(Class c) {
> UDFContextKey k = generateKey(c, null);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
>
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
>
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
>
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
>
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator.
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.new SortComparator(), true);
>
> It should be using mComparator stored in POSort class. I have changed it to following
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.getMComparator(), true);
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc
> src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29
> test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c
>
> Diff: https://reviews.apache.org/r/43875/diff/
>
>
> Testing
> -------
>
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
>
>
> Thanks,
>
> prateek vaishnav
>
>
Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.
Posted by prateek vaishnav <pr...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/
-----------------------------------------------------------
(Updated Feb. 24, 2016, 7:47 a.m.)
Review request for pig and Pallavi Rao.
Changes
-------
Addressed review comments
Repository: pig-git
Description
-------
https://issues.apache.org/jira/browse/PIG-4807
Following test cases have been fixed -
1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads UDFContext.getUDFProperties() to return NULL.
public Properties getUDFProperties(Class c) {
UDFContextKey k = generateKey(c, null);
Properties p = udfConfs.get(k);
if (p == null) {
p = new Properties();
udfConfs.put(k, p);
}
return p;
}
Here, udfConfs remains empty even when it was set while processing the pig query.
udf configuration in jobConf is getting lost while running the job.
In the code udf configuration is meant to be saved by serializing them in jobConf.
Currently, serialization is done before loading configuration in jobConf.
It is done in 'newJobConf(PigContext pigContext)'
It needs to be done after loading configuration.
JobConf jobConf = SparkUtil.newJobConf(pigContext);
configureLoader(physicalPlan, op, jobConf);
UDFContext.getUDFContext().serialize(jobConf);
2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator.
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
sortOperator.new SortComparator(), true);
It should be using mComparator stored in POSort class. I have changed it to following
JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
sortOperator.getMComparator(), true);
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29
test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c
Diff: https://reviews.apache.org/r/43875/diff/
Testing
-------
Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
Thanks,
prateek vaishnav
Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.
Posted by prateek vaishnav <pr...@inmobi.com>.
> On Feb. 24, 2016, 6:12 a.m., Pallavi Rao wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, line 559
> > <https://reviews.apache.org/r/43875/diff/1/?file=1265339#file1265339line559>
> >
> > Please create a JIRA for XML format support.
https://issues.apache.org/jira/browse/PIG-4815
- prateek
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120456
-----------------------------------------------------------
On Feb. 23, 2016, 12:29 p.m., prateek vaishnav wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
>
> (Updated Feb. 23, 2016, 12:29 p.m.)
>
>
> Review request for pig and Pallavi Rao.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/PIG-4807
>
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
>
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads UDFContext.getUDFProperties() to return NULL.
>
> public Properties getUDFProperties(Class c) {
> UDFContextKey k = generateKey(c, null);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
>
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
>
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
>
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
>
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator.
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.new SortComparator(), true);
>
> It should be using mComparator stored in POSort class. I have changed it to following
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.getMComparator(), true);
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc
> src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29
> test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c
>
> Diff: https://reviews.apache.org/r/43875/diff/
>
>
> Testing
> -------
>
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
>
>
> Thanks,
>
> prateek vaishnav
>
>
Re: Review Request 43875: Fixed TestEvalPipelineLocal test suite.
Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43875/#review120456
-----------------------------------------------------------
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 100)
<https://reviews.apache.org/r/43875/#comment181910>
In general, wildcards are not preferred in import.
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java (line 555)
<https://reviews.apache.org/r/43875/#comment181911>
Please create a JIRA for XML format support.
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 74)
<https://reviews.apache.org/r/43875/#comment181912>
Shouldn't this be return op.getName() ?
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java (line 130)
<https://reviews.apache.org/r/43875/#comment181913>
Nit : naming of variable sparkp. Should follow camel case. May be sparkInnnerOp?
- Pallavi Rao
On Feb. 23, 2016, 12:29 p.m., prateek vaishnav wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43875/
> -----------------------------------------------------------
>
> (Updated Feb. 23, 2016, 12:29 p.m.)
>
>
> Review request for pig and Pallavi Rao.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/PIG-4807
>
> Following test cases have been fixed -
> 1. org.apache.pig.test.TestEvalPipelineLocal.testSetLocationCalledInFE
> 2. org.apache.pig.test.TestEvalPipelineLocal.testExplainInDotGraph
> 3. org.apache.pig.test.TestEvalPipelineLocal.testSortWithUDF
>
> 1 was failing because of not saving UDF_CONTEXT configuration in jobConf. This leads UDFContext.getUDFProperties() to return NULL.
>
> public Properties getUDFProperties(Class c) {
> UDFContextKey k = generateKey(c, null);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
>
> Here, udfConfs remains empty even when it was set while processing the pig query.
> udf configuration in jobConf is getting lost while running the job.
> In the code udf configuration is meant to be saved by serializing them in jobConf.
>
> Currently, serialization is done before loading configuration in jobConf.
> It is done in 'newJobConf(PigContext pigContext)'
> It needs to be done after loading configuration.
>
> JobConf jobConf = SparkUtil.newJobConf(pigContext);
> configureLoader(physicalPlan, op, jobConf);
> UDFContext.getUDFContext().serialize(jobConf);
>
> 2 was failing because of pig-spark not supporting 'explain' in dot format. I have added the DotSparkPrinter to fix the same.
>
> 3 was failing because instead of UDFSortComparator, SortConveter class was using SortComparator.
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.new SortComparator(), true);
>
> It should be using mComparator stored in POSort class. I have changed it to following
>
> JavaPairRDD<Tuple, Object> sorted = r.sortByKey(
> sortOperator.getMComparator(), true);
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java a759857
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java b74977d
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java 90cff23
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java f54f8fc
> src/org/apache/pig/backend/hadoop/executionengine/spark/plan/DotSparkPrinter.java e69de29
> test/org/apache/pig/test/TestEvalPipelineLocal.java d73074c
>
> Diff: https://reviews.apache.org/r/43875/diff/
>
>
> Testing
> -------
>
> Successfully ran TestEvalPipelineLocal in spark/mr/local mode.
>
>
> Thanks,
>
> prateek vaishnav
>
>