You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/06/18 05:25:00 UTC

[jira] [Commented] (PIG-4607) Enable "TestRank1","TestRank3" unit tests in spark mode

    [ https://issues.apache.org/jira/browse/PIG-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591176#comment-14591176 ] 

liyunzhang_intel commented on PIG-4607:
---------------------------------------

Let's make an example to explain why unit tests about TestRank1, TestRank3 fail:

rank01RowNumber.pig:
{code}
A = LOAD './rank01RowNumber.txt' AS (f1:chararray,f2:int,f3:chararray);
C = rank A;
store C into './rank01RowNumber.out';
{code}

cat rank01RowNumber.txt:
{code}
A	1	N
B	2	N
C	3	M
D	4	P
E	4	Q
E	4	Q
F	8	Q
F	7	Q
F	8	T
F	8	Q
G	10	V
{code}

the physical plan is :
{code}
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
C: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-13
|
|---C: PORank[tuple] - scope-12
    |
    |---C: POCounter[tuple] - scope-11
        |
        |---A: New For Each(false,false,false)[bag] - scope-10
            |   |
            |   Cast[chararray] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Cast[int] - scope-5
            |   |
            |   |---Project[bytearray][1] - scope-4
            |   |
            |   Cast[chararray] - scope-8
            |   |
            |   |---Project[bytearray][2] - scope-7
            |
            |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/rank01RowNumber.txt:org.apache.pig.builtin.PigStorage) - scope-0
{code}

The spark plan is:
{code}
scope-14
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-14
C: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-13
|
|---A: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[chararray] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/rank01RowNumber.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
{code}

The root cause is "POCounter" and "PORank" are missing in the spark plan.

> Enable "TestRank1","TestRank3" unit tests in spark mode
> -------------------------------------------------------
>
>                 Key: PIG-4607
>                 URL: https://issues.apache.org/jira/browse/PIG-4607
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: kexianda
>             Fix For: spark-branch
>
>
>  In https://builds.apache.org/job/Pig-spark/216/#showFailuresLink, unit tests about TestRank1, TestRank3:
> org.apache.pig.test.TestRank1.testRank02RowNumber
> org.apache.pig.test.TestRank1.testRank01RowNumber
> org.apache.pig.test.TestRank3.testRankWithSplitInMap
> org.apache.pig.test.TestRank3.testRankWithSplitInReduce
> org.apache.pig.test.TestRank3.testRankCascade



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)