You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2017/05/25 13:33:04 UTC
[jira] [Updated] (PIG-5240) Fix TestPigRunner#simpleMultiQueryTest3
in spark mode for wrong inputStats
[ https://issues.apache.org/jira/browse/PIG-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Szita updated PIG-5240:
----------------------------
Summary: Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats (was: Fix TestPigRunner in spark mode)
> Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats
> --------------------------------------------------------------------------
>
> Key: PIG-5240
> URL: https://issues.apache.org/jira/browse/PIG-5240
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
>
> in TestPigRunner#simpleMultiQueryTest3 ,
> the explain plan
> {code}
> #--------------------------------------------------
> # Spark Plan
> #--------------------------------------------------
> Spark node scope-53
> Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-54
> |
> |---A: New For Each(false,false,false)[bag] - scope-10
> | |
> | Cast[int] - scope-2
> | |
> | |---Project[bytearray][0] - scope-1
> | |
> | Cast[int] - scope-5
> | |
> | |---Project[bytearray][1] - scope-4
> | |
> | Cast[int] - scope-8
> | |
> | |---Project[bytearray][2] - scope-7
> |
> |---A: Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) - scope-0--------
> Spark node scope-55
> Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-56
> |
> |---C: Filter[bag] - scope-14
> | |
> | Less Than or Equal[boolean] - scope-17
> | |
> | |---Project[int][1] - scope-15
> | |
> | |---Constant(5) - scope-16
> |
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-10--------
> Spark node scope-57
> C: Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage) - scope-21
> |
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-14--------
> Spark node scope-65
> D: Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-52
> |
> |---D: FRJoinSpark[tuple] - scope-44
> | |
> | Project[int][0] - scope-41
> | |
> | Project[int][0] - scope-42
> | |
> | Project[int][0] - scope-43
> |
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-58
> |
> |---BroadcastSpark - scope-63
> | |
> | |---B: Filter[bag] - scope-26
> | | |
> | | Equal To[boolean] - scope-29
> | | |
> | | |---Project[int][0] - scope-27
> | | |
> | | |---Constant(3) - scope-28
> | |
> | |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-60
> |
> |---BroadcastSpark - scope-64
> |
> |---A1: New For Each(false,false,false)[bag] - scope-40
> | |
> | Cast[int] - scope-32
> | |
> | |---Project[bytearray][0] - scope-31
> | |
> | Cast[int] - scope-35
> | |
> | |---Project[bytearray][1] - scope-34
> | |
> | Cast[int] - scope-38
> | |
> | |---Project[bytearray][2] - scope-37
> |
> |---A1: Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-30--------
> {code}
> assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
> assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the there are 3 loads in {{Spark node scope-65}}. [{{stats.get("BytesRead")}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L93] returns 49( guess this is the sum of
> three loads({{input2}},{{tmp1818797386}},{{tmp-546700946}}). But current [{{bytesRead}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L91] is -1 because [{{singleInput}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L92] is false.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)