You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2011/05/17 21:59:47 UTC

[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports

    [ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035010#comment-13035010 ] 

Richard Ding commented on PIG-2029:
-----------------------------------

Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't available by querying hadoop using hadoop client API. This is misleading. I propose that we change those values to 'n/a' as following:

{code}
Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTIme	AvgMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	Alias	Feature	Outputs
job_201104272229_434232	2	10	354	220	287	168	149	163	IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P	DISTINCT,MULTI_QUERY	
job_201104272229_434319	2	0	9	3	6	0	0	0	UNION5	MULTI_QUERY,MAP_ONLY	/user/rding/verifypigstats2-UNION5,
job_201104272229_434320	2	10	n/a	n/a	n/a	n/a	n/a	n/a	CNJOIN3,GNJOIN3,sampleNJOIN3	GROUP_BY,COMBINER	
job_201104272229_434321	1	10	5	5	5	23	9	17	CNJOIN25,GNJOIN25,sampleNJOIN25	GROUP_BY,COMBINER	
job_201104272229_434322	2	10	n/a	n/a	n/a	n/a	n/a	n/a	CNJOIN15,GNJOIN15,sampleNJOIN15	GROUP_BY,COMBINER	
job_201104272229_434323	2	10	n/a	n/a	n/a	n/a	n/a	n/a	CNJOIN19,GNJOIN19,sampleNJOIN19	GROUP_BY,COMBINER	
job_201104272229_434331	2	1	n/a	n/a	n/a	n/a	n/a	n/a	ONJOIN15	SAMPLER	
job_201104272229_434332	2	1	n/a	n/a	n/a	n/a	n/a	n/a	ONJOIN3	SAMPLER	
job_201104272229_434333	1	1	2	2	2	13	13	13	ONJOIN25	SAMPLER	
job_201104272229_434334	1	1	1	1	1	12	12	12	ONJOIN19	SAMPLER	
job_201104272229_434342	1	10	2	2	2	16	8	11	ONJOIN25	ORDER_BY,COMBINER	
{code}

> Inconsistency in Pig Stats reports 
> -----------------------------------
>
>                 Key: PIG-2029
>                 URL: https://issues.apache.org/jira/browse/PIG-2029
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Richard Ding
>             Fix For: 0.10
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same inputs). Sometimes the PigStats reports all the stats (such as Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId	Maps	Reduces	MaxMapTime	MinMapTIme	AvgMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	Alias	Feature	Outputs
> job_201103091134_556458	160	100	552	191	368	1257	371	392	IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P	DISTINCT,MULTI_QUERY	
> job_201103091134_556600	0	0	0	0	0	0	0	0	UNION5	    MULTI_QUERY,MAP_ONLY	/user/viraj/dir,,
> job_201103091134_556601	7	100	17	8	14	200	15	27	CNJOIN25,GNJOIN25,sampleNJOIN25	GROUP_BY,COMBINER	
> job_201103091134_556602	0	0	0	0	0	0	0	0	CNJOIN3,GNJOIN3,sampleNJOIN3	GROUP_BY,COMBINER	
> job_201103091134_556603	0	0	0	0	0	0	0	0	CNJOIN15,GNJOIN15,sampleNJOIN15	GROUP_BY,COMBINER	
> job_201103091134_556604	2	100	13	7	10	34	13	31	CNJOIN19,GNJOIN19,sampleNJOIN19	GROUP_BY,COMBINER	
> job_201103091134_556644	0	0	0	0	0	0	0	0	ONJOIN15	SAMPLER	
> job_201103091134_556645	0	0	0	0	0	0	0	0	ONJOIN25	SAMPLER	
> job_201103091134_556646	0	0	0	0	0	0	0	0	ONJOIN3	SAMPLER	
> job_201103091134_556654	0	0	0	0	0	0	0	0	ONJOIN19	SAMPLER	
> job_201103091134_556662	0	0	0	0	0	0	0	0	ONJOIN19	ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId	Maps	Reduces	MaxMapTime	MinMapTIme	AvgMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	Alias	Feature	Outputs
> job_201104272229_75503	159	100	484	192	353	396	308	321	IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P	DISTINCT,MULTI_QUERY	
> job_201104272229_75693	18	0	31	14	24	0	0	            UNION5	   MULTI_QUERY,MAP_ONLY	/user/viraj/dir,
> job_201104272229_75694	7	100	34	13	22	46	20	25	CNJOIN25,GNJOIN25,sampleNJOIN25	GROUP_BY,COMBINER	
> job_201104272229_75695	125	100	19	11	15	32	18	26	CNJOIN3,GNJOIN3,sampleNJOIN3	GROUP_BY,COMBINER	
> job_201104272229_75698	1	100	12	12	12	13	9	11	CNJOIN15,GNJOIN15,sampleNJOIN15	GROUP_BY,COMBINER	
> job_201104272229_75702	2	100	21	5	13	35	22	26	CNJOIN19,GNJOIN19,sampleNJOIN19	GROUP_BY,COMBINER	
> job_201104272229_75724	1	1	4	4	4	11	11	11	ONJOIN15	SAMPLER	
> job_201104272229_75725	0	0	0	0	0	0	0	            ONJOIN25	SAMPLER	
> job_201104272229_75726	6	1	8	6	8	24	24	24	ONJOIN3	SAMPLER	
> job_201104272229_75729	0	0	0	0	0	0	0	            ONJOIN19	SAMPLER	
> job_201104272229_75752	1	100	5	5	5	12	9	11	ONJOIN19	ORDER_BY,COMBINER
> ..
> {quote}
> Viraj

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira