You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Murshid Chalaev (JIRA)" <ji...@apache.org> on 2016/09/01 11:08:20 UTC

[jira] [Created] (PIG-5019) Pig generates tons of warnings for udf with enabled warnings aggregation

Murshid Chalaev created PIG-5019:
------------------------------------

             Summary: Pig generates tons of warnings for udf with enabled warnings aggregation
                 Key: PIG-5019
                 URL: https://issues.apache.org/jira/browse/PIG-5019
             Project: Pig
          Issue Type: Bug
          Components: internal-udfs
    Affects Versions: 0.14.0
            Reporter: Murshid Chalaev


For data set containing 9 lines the aggregated warning message is displayed 
{code}
2016-09-01 19:40:33,664 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_1 6 time(s).
{code}

but in contained logs I see a separate log message "Cannot
extract group for input" for every not matching value
{code}
2016-09-01 19:40:28,115 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M
: b[10,4],b[-1,-1],extract_fields[17,17] C:  R: 
2016-09-01 19:40:28,122 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
t : Cannot extract group for input /v1=1&v3=9
2016-09-01 19:40:28,124 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
t : Cannot extract group for input /v2=3&v3=7
2016-09-01 19:40:28,124 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v1=4&v3=6
2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v2=5&v3=5
2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v1=8&v3=2
2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v3=9&v2=1
{code}

It does not log the warning messages in the task logs.

The patch for PIG-2207 was committed to
Pig 0.13+

In 0.12 we had a single counter for all UDF warnings, but in  0.13+ we have
separate counter and message for every unique warning log line. 

Two lines below are unique
/v2=3&v3=7
/v1=4&v3=6

That's why Pig print both of them to the console.

Printing a separate log message for every data line slows down the overall performance as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)