You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Nandor Kollar <nk...@cloudera.com> on 2017/04/11 15:20:38 UTC

Re: Review Request 57996: Aggregate warnings feature for Pig on Spark

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57996/#review171570
-----------------------------------------------------------


Ship it!




Ship It!

- Nandor Kollar


On March 28, 2017, 2:29 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57996/
> -----------------------------------------------------------
> 
> (Updated March 28, 2017, 2:29 p.m.)
> 
> 
> Review request for pig, Daniel Dai, liyun zhang, Rohini Palaniswamy, and Xuefu Zhang.
> 
> 
> Bugs: PIG-5186
>     https://issues.apache.org/jira/browse/PIG-5186
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning test case failures). I aim to enable this now.
> In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to support distributed counters).
> Pig has some builtin warning enums in PigWarning, and also supports custom warnings for user defined functions.
> This latter is problematic with Spark because you cannot register new accumulators on the backend and read their values later in the driver.
> 
> A workaround has been implemented in my patch whereas we define Map type of Accumulators (beside the Long type we already use). One for the builtin warnings, one for the custom ones. These are passed from driver to backend, where the executors can create entries in the maps or increment preexisting values.
> 
> Also added upgrade of DummyContextUDF, this will help fix HiveUDF_7 e2e test case on Spark.
> Previously this was using org.apache.hadoop.mapred.Reporter we have to update this to PigHadoopLogger which supports Spark too.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/PigWarning.java fcda1145f4e7c16940a540222ac7cc5370e3db33 
>   src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigHadoopLogger.java 255650edb519acc452812a5d67f3ac2376c278c2 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 36813b27be1090b04d577829080e4b931c5eb950 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java 8cf6513d3e5425e974d27d74c92592a6f0ed2cf2 
>   src/org/apache/pig/tools/pigstats/PigStatusReporter.java 5396535301b0e90dc5d3be2064cfe0bdf488bf6a 
>   src/org/apache/pig/tools/pigstats/PigWarnCounterIncrementable.java PRE-CREATION 
>   src/org/apache/pig/tools/pigstats/spark/SparkCounter.java 2411f875ec996fedb870c1b709b99e949803ed50 
>   src/org/apache/pig/tools/pigstats/spark/SparkCounterGroup.java c23624dfcd2e11429fd8355497d184b155450c1f 
>   src/org/apache/pig/tools/pigstats/spark/SparkCounters.java 5ca077ca519ad766c8f6a23ef5b69cd02f3abe99 
>   src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 808c3deb47bc8d9a212a701c81e0c9c6abe88f37 
>   src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java 699219d30519c7db56a4b39c7690fa20d62df44d 
>   src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java 2945c80dba4a23b07ee3d8a613b6a7e9319622ba 
>   test/e2e/pig/udfs/java/org/apache/pig/test/udf/evalfunc/DummyContextUDF.java d5eb9ae660f94a444a17f2171107b6ff7e81819b 
> 
> 
> Diff: https://reviews.apache.org/r/57996/diff/1/
> 
> 
> Testing
> -------
> 
> After this patch Warning E2E tests on Spark pass.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>