You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/05/25 03:05:17 UTC
[jira] [Updated] (PIG-4283) Enable unit test "TestGrunt" for spark
[ https://issues.apache.org/jira/browse/PIG-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4283:
----------------------------------
Attachment: PIG-4283.patch
[~mohitsabharwal],[~xuefuz],[~praveenr019]:
PIG-4283.patch fixes following four unit test failures:
TestGrunt#testAutoShipUDFContainingJar
TestGrunt#testKeepGoigFailed
TestScriptLanguage#testSysArguments
TestScriptLanguage#runParallelTest2
Changes in PIG-4283.patch are:
1. change shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java to generate core-site.xml, hdfs-site.xml, mapred-site.xml and yarn=site.xml in hadoop2 env.
2. fix org.apache.pig.test.TestGrunt#testAutoShipUDFContainingJar org.apache.pig.test.TestGrunt#testKeepGoigFailed
*why TestGrunt#testAutoShipUDFContainingJar failed?*
The reason why TestGrunt#testAutoShipUDFContainingJar failed in previous code is because "it can not find table_testAutoShipUDFContainingJar". table_testAutoShipUDFContainingJar is located in hdfs://xxxx:/user/root/table_testAutoShipUDFContainingJar. It finds table_testAutoShipUDFContainingJar in local file system not hadoop file system because the hadoop env is not correct(the value of FS_DEFAULT_NAME_KEY in core-site.xml is "file:///" not "hdfs://xxxx:8020".
Even we fix the hadoop2 env problem in shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java, TestGrunt#testAutoShipUDFContainingJar fails in [{{assertTrue(found);}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1506]
That's because jar loaded info is not shown in the [{{pri.stderrContent}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1495]. Then I add SparkLauncher#addJarToSparkJobWorkingDirectory and let "ADDED JAR xxxx" info appear in the {{pri.stderrContent}}.
*why TestGrunt#testKeepGoigFailed fails?*
following script should fail because "B = stream A through `false`;"( "file false not exists".)
{code}
String strCmd =
"rmf bar;"
+"rmf foo;"
+"rmf baz;"
+"A = load 'passwd';"
+"B = foreach A generate 1;"
+"C = foreach A generate 0/0;"
+"store B into 'foo';"
+"store C into 'bar';"
+"A = load 'passwd';"
+"B = stream A through `false`;"
+"store B into 'baz';"
+"cat baz;";
{code}
stream grammer:
{code}
alias = STREAM alias [, alias …] THROUGH {'command' | cmd_alias } [AS schema] ;
{code}
when script fails, [exception|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L936] 'baz does not exist' should be thrown out because in mr when the job fails, it will automatically delete the output directory "baz" and [{{caught}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L931] is true.
in spark, when job fails or not, it will not automatically delete the output directory(see [SPARK_5836|https://issues.apache.org/jira/browse/SPARK-5836] and {{caught}} is false. So in spark mode,the judgement of {{caught}} is true is skipped to make the unit test pass.
> Enable unit test "TestGrunt" for spark
> --------------------------------------
>
> Key: PIG-4283
> URL: https://issues.apache.org/jira/browse/PIG-4283
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4283.patch, TEST-org.apache.pig.test.TestGrunt.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)