You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/05/25 03:05:17 UTC
[jira] [Updated] (PIG-4283) Enable unit test "TestGrunt" for spark

     [ https://issues.apache.org/jira/browse/PIG-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4283:
----------------------------------
    Attachment: PIG-4283.patch

[~mohitsabharwal],[~xuefuz],[~praveenr019]:
PIG-4283.patch fixes following four unit test failures:
TestGrunt#testAutoShipUDFContainingJar
TestGrunt#testKeepGoigFailed
TestScriptLanguage#testSysArguments
TestScriptLanguage#runParallelTest2

Changes in PIG-4283.patch are:
	1. change shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java to generate core-site.xml, hdfs-site.xml, mapred-site.xml and yarn=site.xml in hadoop2 env. 
	2. fix org.apache.pig.test.TestGrunt#testAutoShipUDFContainingJar       org.apache.pig.test.TestGrunt#testKeepGoigFailed
	
*why TestGrunt#testAutoShipUDFContainingJar failed?*
The reason why TestGrunt#testAutoShipUDFContainingJar failed in previous code is because "it can not find table_testAutoShipUDFContainingJar". table_testAutoShipUDFContainingJar is located in hdfs://xxxx:/user/root/table_testAutoShipUDFContainingJar.   It finds table_testAutoShipUDFContainingJar in local file system not hadoop file system because the hadoop env is not correct(the value of FS_DEFAULT_NAME_KEY in core-site.xml is "file:///" not "hdfs://xxxx:8020". 
	
Even we fix the hadoop2 env problem in shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java, TestGrunt#testAutoShipUDFContainingJar  fails in  [{{assertTrue(found);}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1506]
	
That's because jar loaded info is not shown in the [{{pri.stderrContent}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1495].  Then I add SparkLauncher#addJarToSparkJobWorkingDirectory  and let "ADDED JAR xxxx" info appear in the {{pri.stderrContent}}.

*why TestGrunt#testKeepGoigFailed fails?*
following script should fail because "B = stream A through `false`;"( "file false not exists".)
	{code}
	  String strCmd =
	            "rmf bar;"
	            +"rmf foo;"
	            +"rmf baz;"
	            +"A = load 'passwd';"
	            +"B = foreach A generate 1;"
	            +"C = foreach A generate 0/0;"
	            +"store B into 'foo';"
	            +"store C into 'bar';"
	            +"A = load 'passwd';"
	            +"B = stream A through `false`;"
	            +"store B into 'baz';"
	            +"cat baz;";
	{code}

stream grammer:
{code}
alias = STREAM alias [, alias …] THROUGH {'command' | cmd_alias } [AS schema] ;
{code}

when script fails, [exception|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L936] 'baz does not exist' should be thrown out because in mr when the job fails, it will automatically delete the output directory "baz" and [{{caught}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L931] is true.

in spark, when job fails or not, it will not automatically delete the output directory(see [SPARK_5836|https://issues.apache.org/jira/browse/SPARK-5836] and {{caught}} is false. So in spark mode,the judgement of {{caught}} is true is skipped to make the unit test pass.


> Enable unit test "TestGrunt" for spark
> --------------------------------------
>
>                 Key: PIG-4283
>                 URL: https://issues.apache.org/jira/browse/PIG-4283
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4283.patch, TEST-org.apache.pig.test.TestGrunt.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)