You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/03/05 03:39:40 UTC

[jira] [Updated] (PIG-4269) Enable unit test "TestAccumulator" for spark

     [ https://issues.apache.org/jira/browse/PIG-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyunzhang_intel updated PIG-4269:
----------------------------------
    Attachment: PIG-4269_Jekins.png
                PIG-4269.patch

After using PIG-4269.patch, unit tests about TestAccumulator pass except testAccumWithSort,testAccumWithDistinct and testAccumAfterNestedOp(see PIG-4269_Jekins.png]

The reason why testAccumWithSort,testAccumWithDistinct and testAccumAfterNestedOp fail is:
TestAccumulator#testAccumAfterNestedOp
pig script:
{code}
  A = load '" + INPUT_FILE1 + "' as (id:int, fruit);
  B = group A by id;
  C = foreach B
      { o = order A by id;
        generate org.apache.pig.test.utils.AccumulatorBagCount(o);
      };
{code}	  
   in Spark:
{code}
    C: Store(hdfs://localhost:52502/tmp/temp827450292/tmp-1280786869:org
    .apache.pig.impl.io.InterStorage) - scope-17
      |
      |---C: New For Each(false)[bag] - scope-16
          |   |
          |   POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] - scope-12
          |   |
          |   |---RelationToExpressionProject[bag][*] - scope-11
          |       |
          |       |---o: POSort[bag]() - scope-15
          |           |   |
          |           |   Project[int][0] - scope-14
          |           |
          |           |---Project[bag][1] - scope-13
          |
          |---B: Package(Packager)[tuple]{int} - scope-8
              |
              |---B: Global Rearrange[tuple] - scope-7

        B: Local Rearrange[tuple]{int}(false) - scope-9
        |   |
        |   Project[int][0] - scope-10
        |
        |---A: New For Each(false,false)[bag] - scope-6
            |   |
            |   Cast[int] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Project[bytearray][1] - scope-4
            |
            |---A: Load(hdfs://localhost:52502/user/root/AccumulatorInput1.txt:org.apache.pig.builtin.PigStorage) - scope-0
 {code}  
   in MR:
{code}
    #--------------------------------------------------
    # Map Reduce Plan
    #--------------------------------------------------
    MapReduce node scope-18
    Map Plan
    B: Local Rearrange[tuple]{int}(false) - scope-9
    |   |
    |   Project[int][0] - scope-10
    |
    |---A: New For Each(false,false)[bag] - scope-6
        |   |
        |   Cast[int] - scope-2
        |   |
        |   |---Project[bytearray][0] - scope-1
        |   |
        |   Project[bytearray][1] - scope-4
        |
        |---A: Load(hdfs://localhost:40299/user/root/AccumulatorInput1.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
    Reduce Plan
    C: Store(hdfs://localhost:40299/tmp/temp-493016342/tmp-1209478651:org.apache.pig.impl.io.InterStorage) - scope-17
    |
    |---C: New For Each(false)[bag] - scope-16
        |   |
        |   POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] - scope-12
        |   |
        |   |---RelationToExpressionProject[bag][*] - scope-11
        |       |
        |       |---Project[bag][1] - scope-13
        |
        |---B: Package(Packager)[tuple]{int} - scope-8--------
    Global sort: false
    ----------------
{code}
	In spark mode, the pig script fails because it generates POSort in sparkplan while in MR mode POSort is not generated. If POSort is genereated, it will throw Exception "Caught error from UDF: org.apache.pig.test.utils.AccumulatorBagCount [exec() should not be called.] "
	


> Enable unit test "TestAccumulator" for spark
> --------------------------------------------
>
>                 Key: PIG-4269
>                 URL: https://issues.apache.org/jira/browse/PIG-4269
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4269.patch, PIG-4269_Jekins.png, TEST-org.apache.pig.test.TestAccumulator.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)