You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sam Joe <ga...@gmail.com> on 2015/11/13 17:28:52 UTC

Group By Eliminating Few Records

Hi,

I am trying to group a table (final) containing 10 records, by a
column screen_name using the following command.

final_by_sn = GROUP final BY screen_name;

When I dump final_by_sn table, only 4 records are returned as shown below:

grunt> dump final_by_sn;

(gwenshap,{(.@bigdata used this photo in his blog post and made me realize
how much I miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943)
})
(p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
http://t.co/UBAni5DPrw http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6
new @p2pLanguages jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new @p2pLanguages
jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
(GuitartJosep,{(#BigData: What it can and can't do!
http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What it can
and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
What it can and can't do!
http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
(W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
#TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
#Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead Phoenix
AZ (#job) wanted in #Arizona. #TechFetch
http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})

dump final;

(RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! 'Proper' Big
Data, Smart Cities, Internet of Things &amp; more! #TechNorth
http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
2014,654395184428515332)
(#BigData: What it can and can't do!
http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 +0000
2015,654395189595869184)
(.@bigdata used this photo in his blog post and made me realize how much I
miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct 15
20:49:39 +0000 2007,654395195581009920)
("Global Release [Big Data Book] Profit From Science" on @LinkedIn
http://t.co/WnJ2HwthYF Congrats to George
Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
2012,654395207065034752)
(Hi, BesPardon Don't Forget to follow --&gt;&gt; http://t.co/Dahu964w5U
Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12 16:44:50
+0000 2015,654395216208752641)
(On Google Books, language, and the possible limits of big data
https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 +0000
2012,654395216057659392)
(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 +0000
2009,654395220373729280)
(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 +0000
2014,654395236718911488)
(#Appboy expands suite of #mobile #analytics @venturebeat @wesleyyuhn1
http://t.co/85P6vEJg08 #MarTech #automation
http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12 +0000
2014,654395243975065600)
(Best Cloud Hosting and CDN services for Web Developers
http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting #webmasters
#websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
2012,654395246025904128)
grunt>


Could you please help me understand why 6 records are eliminated while
doing a group by?

Thanks,
Joel

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
 Hi,

Hdfs. I'm processing json data in hdfs.

For testing in local, I copied 2 rows to a text file.

Thanks,
Joel

On Tuesday, November 17, 2015, Andrew Oliver <ac...@gmail.com> wrote:

> What is your record source? Files or Hive or?
> On Nov 17, 2015 6:29 PM, "Sam Joe" <games2013.sam@gmail.com <javascript:;>>
> wrote:
>
> > Hi Arvind,
> >
> > You are right. It works fine in local mode. No records eliminated.
> >
> > I need to now find out why while using mapreduce mode some records are
> > getting eliminated.
> >
> > Any suggestions on troubleshooting steps for finding out the root-cause
> in
> > mapreduce mode? Which logs to be checked, etc.
> >
> > Appreciate any help!
> >
> > Thanks,
> > Joel
> >
> > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <arvind18352@gmail.com
> <javascript:;>> wrote:
> >
> > > tested on pig .15 using your data and in local mode .. could not
> > reproduce
> > > issue ..
> > > ==================================================
> > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > >
> > > (Ian_hoch,{(en,Ian_hoch)})
> > > (gwenshap,{(en,gwenshap)})
> > > (p2people,{(en,p2people)})
> > > (DoThisBest,{(en,DoThisBest)})
> > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > (GuitartJosep,{(en,GuitartJosep)})
> > > (Komalmittal91,{(en,Komalmittal91)})
> > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > (innovatesocialm,{(en,innovatesocialm)})
> > > ==================================================
> > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > >
> > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > ==================================================
> > >
> > > suggestions ..
> > > > try in local mode to reporduce issue .. (if you have not already done
> > so)
> > > > close all old sessions and open a new one... (i know its dumb..but
> > helped
> > > me some times)
> > >
> > >
> > > *Cheers !!*
> > > Arvind
> > >
> > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <games2013.sam@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I reproduced the issue with less columns as well.
> > > >
> > > > dump final_by_lsn;
> > > >
> > > > (en,LornaGreenNWC)
> > > > (en,GuitartJosep)
> > > > (en,gwenshap)
> > > > (en,innovatesocialm)
> > > > (en,Komalmittal91)
> > > > (en,Ian_hoch)
> > > > (en,p2people)
> > > > (en,W4_Jobs_in_ARZ)
> > > > (en,wesleyyuhn1)
> > > > (en,DoThisBest)
> > > >
> > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > >
> > > >
> > > > grunt> dump final_by_lsn_g;
> > > >
> > > > (gwenshap,{(en,gwenshap)})
> > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > >
> (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > >
> > > >
> > > > Steps I tried to find the root-cause:
> > > > - Removing special characters from the data
> > > > - Setting the loglevel to 'Debug'
> > > > However, I couldn't find a clue about the problem.
> > > >
> > > >
> > > >
> > > > Can someone please help me troubleshoot the issue?
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> sterrell@oculus360.us <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > Please try reproducing the problem with the smallest amount of data
> > > > > possible.  Use as few rows and the smallest strings possible that
> > still
> > > > > demonstrate the discrepancy.  And then repost your problem.  In
> doing
> > > so,
> > > > > it will make your request easier to digest by the readers of group,
> > and
> > > > you
> > > > > might even discover a problem in your original data if you can not
> > > > > reproduce it on a smaller scale.
> > > > >
> > > > > Thanks,
> > > > >     Steve
> > > > >
> > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <games2013.sam@gmail.com
> <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to group a table (final) containing 10 records, by a
> > > > > > column screen_name using the following command.
> > > > > >
> > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > >
> > > > > > When I dump final_by_sn table, only 4 records are returned as
> shown
> > > > > below:
> > > > > >
> > > > > > grunt> dump final_by_sn;
> > > > > >
> > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made
> me
> > > > > realize
> > > > > > how much I miss Japan:
> > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > )
> > > > > > })
> > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > ),(6
> > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > > @p2pLanguages
> > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
> > What
> > > it
> > > > > can
> > > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > ),(#BigData:
> > > > > > What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > > #Arizona.
> > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> ),(Big
> > > > #Data
> > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> > #Lead
> > > > > > Phoenix
> > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > >
> > > > > > dump final;
> > > > > >
> > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> > > 'Proper'
> > > > > Big
> > > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > > 2014,654395184428515332)
> > > > > > (#BigData: What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18
> 10:20:02
> > > > +0000
> > > > > > 2015,654395189595869184)
> > > > > > (.@bigdata used this photo in his blog post and made me realize
> how
> > > > much
> > > > > I
> > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> Oct
> > > 15
> > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > ("Global Release [Big Data Book] Profit From Science" on
> @LinkedIn
> > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > > > 2012,654395207065034752)
> > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > http://t.co/Dahu964w5U
> > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb
> 12
> > > > > 16:44:50
> > > > > > +0000 2015,654395216208752641)
> > > > > > (On Google Books, language, and the possible limits of big data
> > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09
> > > +0000
> > > > > > 2012,654395216057659392)
> > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> 06:17:09
> > > > +0000
> > > > > > 2009,654395220373729280)
> > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> 09:32:31
> > > > +0000
> > > > > > 2014,654395236718911488)
> > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > @wesleyyuhn1
> > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> > 12:35:12
> > > > > +0000
> > > > > > 2014,654395243975065600)
> > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > > #webmasters
> > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > > 2012,654395246025904128)
> > > > > > grunt>
> > > > > >
> > > > > >
> > > > > > Could you please help me understand why 6 records are eliminated
> > > while
> > > > > > doing a group by?
> > > > > >
> > > > > > Thanks,
> > > > > > Joel
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Andrew Oliver <ac...@gmail.com>.
What is your record source? Files or Hive or?
On Nov 17, 2015 6:29 PM, "Sam Joe" <ga...@gmail.com> wrote:

> Hi Arvind,
>
> You are right. It works fine in local mode. No records eliminated.
>
> I need to now find out why while using mapreduce mode some records are
> getting eliminated.
>
> Any suggestions on troubleshooting steps for finding out the root-cause in
> mapreduce mode? Which logs to be checked, etc.
>
> Appreciate any help!
>
> Thanks,
> Joel
>
> On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com> wrote:
>
> > tested on pig .15 using your data and in local mode .. could not
> reproduce
> > issue ..
> > ==================================================
> > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> >
> > (Ian_hoch,{(en,Ian_hoch)})
> > (gwenshap,{(en,gwenshap)})
> > (p2people,{(en,p2people)})
> > (DoThisBest,{(en,DoThisBest)})
> > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > (GuitartJosep,{(en,GuitartJosep)})
> > (Komalmittal91,{(en,Komalmittal91)})
> > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > (innovatesocialm,{(en,innovatesocialm)})
> > ==================================================
> > final_by_lsn_g = GROUP final_by_lsn BY language;
> >
> >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > ==================================================
> >
> > suggestions ..
> > > try in local mode to reporduce issue .. (if you have not already done
> so)
> > > close all old sessions and open a new one... (i know its dumb..but
> helped
> > me some times)
> >
> >
> > *Cheers !!*
> > Arvind
> >
> > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I reproduced the issue with less columns as well.
> > >
> > > dump final_by_lsn;
> > >
> > > (en,LornaGreenNWC)
> > > (en,GuitartJosep)
> > > (en,gwenshap)
> > > (en,innovatesocialm)
> > > (en,Komalmittal91)
> > > (en,Ian_hoch)
> > > (en,p2people)
> > > (en,W4_Jobs_in_ARZ)
> > > (en,wesleyyuhn1)
> > > (en,DoThisBest)
> > >
> > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > >
> > >
> > > grunt> dump final_by_lsn_g;
> > >
> > > (gwenshap,{(en,gwenshap)})
> > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > >
> > >
> > > Steps I tried to find the root-cause:
> > > - Removing special characters from the data
> > > - Setting the loglevel to 'Debug'
> > > However, I couldn't find a clue about the problem.
> > >
> > >
> > >
> > > Can someone please help me troubleshoot the issue?
> > >
> > > Thanks,
> > > Joel
> > >
> > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <sterrell@oculus360.us
> >
> > > wrote:
> > >
> > > > Please try reproducing the problem with the smallest amount of data
> > > > possible.  Use as few rows and the smallest strings possible that
> still
> > > > demonstrate the discrepancy.  And then repost your problem.  In doing
> > so,
> > > > it will make your request easier to digest by the readers of group,
> and
> > > you
> > > > might even discover a problem in your original data if you can not
> > > > reproduce it on a smaller scale.
> > > >
> > > > Thanks,
> > > >     Steve
> > > >
> > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to group a table (final) containing 10 records, by a
> > > > > column screen_name using the following command.
> > > > >
> > > > > final_by_sn = GROUP final BY screen_name;
> > > > >
> > > > > When I dump final_by_sn table, only 4 records are returned as shown
> > > > below:
> > > > >
> > > > > grunt> dump final_by_sn;
> > > > >
> > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> > > > realize
> > > > > how much I miss Japan:
> > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > )
> > > > > })
> > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > http://t.co/UBAni5DPrw
> > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > ),(6
> > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > @p2pLanguages
> > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
> What
> > it
> > > > can
> > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > ),(#BigData:
> > > > > What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > #Arizona.
> > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> > > #Data
> > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> #Lead
> > > > > Phoenix
> > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > >
> > > > > dump final;
> > > > >
> > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> > 'Proper'
> > > > Big
> > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > 2014,654395184428515332)
> > > > > (#BigData: What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02
> > > +0000
> > > > > 2015,654395189595869184)
> > > > > (.@bigdata used this photo in his blog post and made me realize how
> > > much
> > > > I
> > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct
> > 15
> > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > > 2012,654395207065034752)
> > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > http://t.co/Dahu964w5U
> > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> > > > 16:44:50
> > > > > +0000 2015,654395216208752641)
> > > > > (On Google Books, language, and the possible limits of big data
> > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09
> > +0000
> > > > > 2012,654395216057659392)
> > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09
> > > +0000
> > > > > 2009,654395220373729280)
> > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31
> > > +0000
> > > > > 2014,654395236718911488)
> > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > @wesleyyuhn1
> > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> 12:35:12
> > > > +0000
> > > > > 2014,654395243975065600)
> > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > #webmasters
> > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > 2012,654395246025904128)
> > > > > grunt>
> > > > >
> > > > >
> > > > > Could you please help me understand why 6 records are eliminated
> > while
> > > > > doing a group by?
> > > > >
> > > > > Thanks,
> > > > > Joel
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
Hi Andrew, I tried that too. Every field has got correct data.

Thanks,
Joel

On Wed, Nov 18, 2015 at 12:55 AM, Andrew Oliver <ac...@gmail.com> wrote:

> Project just screen_name. If it is blank or empty you have your answer.
> On Nov 17, 2015 23:47, "Sam Joe" <ga...@gmail.com> wrote:
>
> > debug is on. verbose have to try.
> >
> > Thx.
> >
> > On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <ar...@gmail.com>
> wrote:
> >
> > > have you tried
> > > grunt> set debug on;
> > > grunt> set verbose on;
> > >
> > > this gives some counters which might help ..
> > >
> > >
> > > *Cheers !!*
> > > Arvind
> > >
> > > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com>
> > wrote:
> > >
> > > > Hi Arvind,
> > > >
> > > > Thanks but I ensured that each element is populated to their
> respective
> > > > fields. I also ensured that the data is clean since the record which
> is
> > > > getting eliminated is getting processed fine if only one record is
> > > > processed.
> > > >
> > > > How to find the root-cause? I am not getting anything from the server
> > > logs
> > > > or from the application logs. Is there any place I should look?
> > > >
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi ..
> > > > > if you are reading json then ensure that the file content is parsed
> > > > correct
> > > > > by pig before you do grouping.
> > > > > Simple dump sometimes does not show if the json was parsed into
> > > multiple
> > > > > columns or entire line was read as one string into the 1st column
> > only.
> > > > >
> > > > >
> > > > >
> > > > > *Cheers !!*
> > > > > Arvind
> > > > >
> > > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Arvind,
> > > > > >
> > > > > > You are right. It works fine in local mode. No records
> eliminated.
> > > > > >
> > > > > > I need to now find out why while using mapreduce mode some
> records
> > > are
> > > > > > getting eliminated.
> > > > > >
> > > > > > Any suggestions on troubleshooting steps for finding out the
> > > root-cause
> > > > > in
> > > > > > mapreduce mode? Which logs to be checked, etc.
> > > > > >
> > > > > > Appreciate any help!
> > > > > >
> > > > > > Thanks,
> > > > > > Joel
> > > > > >
> > > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <
> arvind18352@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > tested on pig .15 using your data and in local mode .. could
> not
> > > > > > reproduce
> > > > > > > issue ..
> > > > > > > ==================================================
> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > > > >
> > > > > > > (Ian_hoch,{(en,Ian_hoch)})
> > > > > > > (gwenshap,{(en,gwenshap)})
> > > > > > > (p2people,{(en,p2people)})
> > > > > > > (DoThisBest,{(en,DoThisBest)})
> > > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > > > > > (GuitartJosep,{(en,GuitartJosep)})
> > > > > > > (Komalmittal91,{(en,Komalmittal91)})
> > > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > > > > > (innovatesocialm,{(en,innovatesocialm)})
> > > > > > > ==================================================
> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > > > > > ==================================================
> > > > > > >
> > > > > > > suggestions ..
> > > > > > > > try in local mode to reporduce issue .. (if you have not
> > already
> > > > done
> > > > > > so)
> > > > > > > > close all old sessions and open a new one... (i know its
> > > dumb..but
> > > > > > helped
> > > > > > > me some times)
> > > > > > >
> > > > > > >
> > > > > > > *Cheers !!*
> > > > > > > Arvind
> > > > > > >
> > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <
> > games2013.sam@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I reproduced the issue with less columns as well.
> > > > > > > >
> > > > > > > > dump final_by_lsn;
> > > > > > > >
> > > > > > > > (en,LornaGreenNWC)
> > > > > > > > (en,GuitartJosep)
> > > > > > > > (en,gwenshap)
> > > > > > > > (en,innovatesocialm)
> > > > > > > > (en,Komalmittal91)
> > > > > > > > (en,Ian_hoch)
> > > > > > > > (en,p2people)
> > > > > > > > (en,W4_Jobs_in_ARZ)
> > > > > > > > (en,wesleyyuhn1)
> > > > > > > > (en,DoThisBest)
> > > > > > > >
> > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > > > > >
> > > > > > > >
> > > > > > > > grunt> dump final_by_lsn_g;
> > > > > > > >
> > > > > > > > (gwenshap,{(en,gwenshap)})
> > > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > > > > > >
> > > > >
> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > > > > > >
> > > > > > > >
> > > > > > > > Steps I tried to find the root-cause:
> > > > > > > > - Removing special characters from the data
> > > > > > > > - Setting the loglevel to 'Debug'
> > > > > > > > However, I couldn't find a clue about the problem.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Can someone please help me troubleshoot the issue?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Joel
> > > > > > > >
> > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> > > > > sterrell@oculus360.us
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Please try reproducing the problem with the smallest amount
> > of
> > > > data
> > > > > > > > > possible.  Use as few rows and the smallest strings
> possible
> > > that
> > > > > > still
> > > > > > > > > demonstrate the discrepancy.  And then repost your problem.
> > In
> > > > > doing
> > > > > > > so,
> > > > > > > > > it will make your request easier to digest by the readers
> of
> > > > group,
> > > > > > and
> > > > > > > > you
> > > > > > > > > might even discover a problem in your original data if you
> > can
> > > > not
> > > > > > > > > reproduce it on a smaller scale.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >     Steve
> > > > > > > > >
> > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
> > > > games2013.sam@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I am trying to group a table (final) containing 10
> records,
> > > by
> > > > a
> > > > > > > > > > column screen_name using the following command.
> > > > > > > > > >
> > > > > > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > > > > > >
> > > > > > > > > > When I dump final_by_sn table, only 4 records are
> returned
> > as
> > > > > shown
> > > > > > > > > below:
> > > > > > > > > >
> > > > > > > > > > grunt> dump final_by_sn;
> > > > > > > > > >
> > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post
> and
> > > made
> > > > > me
> > > > > > > > > realize
> > > > > > > > > > how much I miss Japan:
> > > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > > > > > )
> > > > > > > > > > })
> > > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop
> > > skills
> > > > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > > > > > ),(6
> > > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6
> new
> > > > > > > > @p2pLanguages
> > > > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > ),(#BigData:
> > > > > > What
> > > > > > > it
> > > > > > > > > can
> > > > > > > > > > and can't do!
> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > > > > > ),(#BigData:
> > > > > > > > > > What it can and can't do!
> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job)
> wanted
> > in
> > > > > > > #Arizona.
> > > > > > > > > > #TechFetch
> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> > > > > ),(Big
> > > > > > > > #Data
> > > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> > > #Data
> > > > > > #Lead
> > > > > > > > > > Phoenix
> > > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > > > > > >
> > > > > > > > > > dump final;
> > > > > > > > > >
> > > > > > > > > > (RT @lordlancaster: Absolutely blown away by
> > > @SciTecDaresbury!
> > > > > > > 'Proper'
> > > > > > > > > Big
> > > > > > > > > > Data, Smart Cities, Internet of Things &amp; more!
> > #TechNorth
> > > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39
> > +0000
> > > > > > > > > > 2014,654395184428515332)
> > > > > > > > > > (#BigData: What it can and can't do!
> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun
> 18
> > > > > 10:20:02
> > > > > > > > +0000
> > > > > > > > > > 2015,654395189595869184)
> > > > > > > > > > (.@bigdata used this photo in his blog post and made me
> > > realize
> > > > > how
> > > > > > > > much
> > > > > > > > > I
> > > > > > > > > > miss Japan:
> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> > > > > Oct
> > > > > > > 15
> > > > > > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > > > > > ("Global Release [Big Data Book] Profit From Science" on
> > > > > @LinkedIn
> > > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43
> > > +0000
> > > > > > > > > > 2012,654395207065034752)
> > > > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > > > > > http://t.co/Dahu964w5U
> > > > > > > > > > Thanks..
> http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
> > > Feb
> > > > > 12
> > > > > > > > > 16:44:50
> > > > > > > > > > +0000 2015,654395216208752641)
> > > > > > > > > > (On Google Books, language, and the possible limits of
> big
> > > data
> > > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
> > > > 16:25:09
> > > > > > > +0000
> > > > > > > > > > 2012,654395216057659392)
> > > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> > > > > 06:17:09
> > > > > > > > +0000
> > > > > > > > > > 2009,654395220373729280)
> > > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > > > #TechFetch
> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> > > > > 09:32:31
> > > > > > > > +0000
> > > > > > > > > > 2014,654395236718911488)
> > > > > > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > > > > > @wesleyyuhn1
> > > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul
> > 21
> > > > > > 12:35:12
> > > > > > > > > +0000
> > > > > > > > > > 2014,654395243975065600)
> > > > > > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing
> #cloudhosting
> > > > > > > #webmasters
> > > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20
> +0000
> > > > > > > > > > 2012,654395246025904128)
> > > > > > > > > > grunt>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Could you please help me understand why 6 records are
> > > > eliminated
> > > > > > > while
> > > > > > > > > > doing a group by?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Joel
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Andrew Oliver <ac...@gmail.com>.
Project just screen_name. If it is blank or empty you have your answer.
On Nov 17, 2015 23:47, "Sam Joe" <ga...@gmail.com> wrote:

> debug is on. verbose have to try.
>
> Thx.
>
> On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <ar...@gmail.com> wrote:
>
> > have you tried
> > grunt> set debug on;
> > grunt> set verbose on;
> >
> > this gives some counters which might help ..
> >
> >
> > *Cheers !!*
> > Arvind
> >
> > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com>
> wrote:
> >
> > > Hi Arvind,
> > >
> > > Thanks but I ensured that each element is populated to their respective
> > > fields. I also ensured that the data is clean since the record which is
> > > getting eliminated is getting processed fine if only one record is
> > > processed.
> > >
> > > How to find the root-cause? I am not getting anything from the server
> > logs
> > > or from the application logs. Is there any place I should look?
> > >
> > >
> > > Thanks,
> > > Joel
> > >
> > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com>
> > wrote:
> > >
> > > > Hi ..
> > > > if you are reading json then ensure that the file content is parsed
> > > correct
> > > > by pig before you do grouping.
> > > > Simple dump sometimes does not show if the json was parsed into
> > multiple
> > > > columns or entire line was read as one string into the 1st column
> only.
> > > >
> > > >
> > > >
> > > > *Cheers !!*
> > > > Arvind
> > > >
> > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Arvind,
> > > > >
> > > > > You are right. It works fine in local mode. No records eliminated.
> > > > >
> > > > > I need to now find out why while using mapreduce mode some records
> > are
> > > > > getting eliminated.
> > > > >
> > > > > Any suggestions on troubleshooting steps for finding out the
> > root-cause
> > > > in
> > > > > mapreduce mode? Which logs to be checked, etc.
> > > > >
> > > > > Appreciate any help!
> > > > >
> > > > > Thanks,
> > > > > Joel
> > > > >
> > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > tested on pig .15 using your data and in local mode .. could not
> > > > > reproduce
> > > > > > issue ..
> > > > > > ==================================================
> > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > > >
> > > > > > (Ian_hoch,{(en,Ian_hoch)})
> > > > > > (gwenshap,{(en,gwenshap)})
> > > > > > (p2people,{(en,p2people)})
> > > > > > (DoThisBest,{(en,DoThisBest)})
> > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > > > > (GuitartJosep,{(en,GuitartJosep)})
> > > > > > (Komalmittal91,{(en,Komalmittal91)})
> > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > > > > (innovatesocialm,{(en,innovatesocialm)})
> > > > > > ==================================================
> > > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > > > > ==================================================
> > > > > >
> > > > > > suggestions ..
> > > > > > > try in local mode to reporduce issue .. (if you have not
> already
> > > done
> > > > > so)
> > > > > > > close all old sessions and open a new one... (i know its
> > dumb..but
> > > > > helped
> > > > > > me some times)
> > > > > >
> > > > > >
> > > > > > *Cheers !!*
> > > > > > Arvind
> > > > > >
> > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <
> games2013.sam@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I reproduced the issue with less columns as well.
> > > > > > >
> > > > > > > dump final_by_lsn;
> > > > > > >
> > > > > > > (en,LornaGreenNWC)
> > > > > > > (en,GuitartJosep)
> > > > > > > (en,gwenshap)
> > > > > > > (en,innovatesocialm)
> > > > > > > (en,Komalmittal91)
> > > > > > > (en,Ian_hoch)
> > > > > > > (en,p2people)
> > > > > > > (en,W4_Jobs_in_ARZ)
> > > > > > > (en,wesleyyuhn1)
> > > > > > > (en,DoThisBest)
> > > > > > >
> > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > > > >
> > > > > > >
> > > > > > > grunt> dump final_by_lsn_g;
> > > > > > >
> > > > > > > (gwenshap,{(en,gwenshap)})
> > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > > > > >
> > > >
> (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > > > > >
> > > > > > >
> > > > > > > Steps I tried to find the root-cause:
> > > > > > > - Removing special characters from the data
> > > > > > > - Setting the loglevel to 'Debug'
> > > > > > > However, I couldn't find a clue about the problem.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Can someone please help me troubleshoot the issue?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Joel
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> > > > sterrell@oculus360.us
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Please try reproducing the problem with the smallest amount
> of
> > > data
> > > > > > > > possible.  Use as few rows and the smallest strings possible
> > that
> > > > > still
> > > > > > > > demonstrate the discrepancy.  And then repost your problem.
> In
> > > > doing
> > > > > > so,
> > > > > > > > it will make your request easier to digest by the readers of
> > > group,
> > > > > and
> > > > > > > you
> > > > > > > > might even discover a problem in your original data if you
> can
> > > not
> > > > > > > > reproduce it on a smaller scale.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >     Steve
> > > > > > > >
> > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
> > > games2013.sam@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am trying to group a table (final) containing 10 records,
> > by
> > > a
> > > > > > > > > column screen_name using the following command.
> > > > > > > > >
> > > > > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > > > > >
> > > > > > > > > When I dump final_by_sn table, only 4 records are returned
> as
> > > > shown
> > > > > > > > below:
> > > > > > > > >
> > > > > > > > > grunt> dump final_by_sn;
> > > > > > > > >
> > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and
> > made
> > > > me
> > > > > > > > realize
> > > > > > > > > how much I miss Japan:
> > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > > > > )
> > > > > > > > > })
> > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop
> > skills
> > > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > > > > ),(6
> > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > > > > > @p2pLanguages
> > > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > ),(#BigData:
> > > > > What
> > > > > > it
> > > > > > > > can
> > > > > > > > > and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > > > > ),(#BigData:
> > > > > > > > > What it can and can't do!
> > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted
> in
> > > > > > #Arizona.
> > > > > > > > > #TechFetch
> http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> > > > ),(Big
> > > > > > > #Data
> > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> > #Data
> > > > > #Lead
> > > > > > > > > Phoenix
> > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > > > > >
> > > > > > > > > dump final;
> > > > > > > > >
> > > > > > > > > (RT @lordlancaster: Absolutely blown away by
> > @SciTecDaresbury!
> > > > > > 'Proper'
> > > > > > > > Big
> > > > > > > > > Data, Smart Cities, Internet of Things &amp; more!
> #TechNorth
> > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39
> +0000
> > > > > > > > > 2014,654395184428515332)
> > > > > > > > > (#BigData: What it can and can't do!
> > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18
> > > > 10:20:02
> > > > > > > +0000
> > > > > > > > > 2015,654395189595869184)
> > > > > > > > > (.@bigdata used this photo in his blog post and made me
> > realize
> > > > how
> > > > > > > much
> > > > > > > > I
> > > > > > > > > miss Japan:
> > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> > > > Oct
> > > > > > 15
> > > > > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > > > > ("Global Release [Big Data Book] Profit From Science" on
> > > > @LinkedIn
> > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43
> > +0000
> > > > > > > > > 2012,654395207065034752)
> > > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > > > > http://t.co/Dahu964w5U
> > > > > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
> > Feb
> > > > 12
> > > > > > > > 16:44:50
> > > > > > > > > +0000 2015,654395216208752641)
> > > > > > > > > (On Google Books, language, and the possible limits of big
> > data
> > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
> > > 16:25:09
> > > > > > +0000
> > > > > > > > > 2012,654395216057659392)
> > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> > > > 06:17:09
> > > > > > > +0000
> > > > > > > > > 2009,654395220373729280)
> > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > > #TechFetch
> > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> > > > 09:32:31
> > > > > > > +0000
> > > > > > > > > 2014,654395236718911488)
> > > > > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > > > > @wesleyyuhn1
> > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul
> 21
> > > > > 12:35:12
> > > > > > > > +0000
> > > > > > > > > 2014,654395243975065600)
> > > > > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > > > > > #webmasters
> > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > > > > > 2012,654395246025904128)
> > > > > > > > > grunt>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Could you please help me understand why 6 records are
> > > eliminated
> > > > > > while
> > > > > > > > > doing a group by?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Joel
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
debug is on. verbose have to try.

Thx.

On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <ar...@gmail.com> wrote:

> have you tried
> grunt> set debug on;
> grunt> set verbose on;
>
> this gives some counters which might help ..
>
>
> *Cheers !!*
> Arvind
>
> On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com> wrote:
>
> > Hi Arvind,
> >
> > Thanks but I ensured that each element is populated to their respective
> > fields. I also ensured that the data is clean since the record which is
> > getting eliminated is getting processed fine if only one record is
> > processed.
> >
> > How to find the root-cause? I am not getting anything from the server
> logs
> > or from the application logs. Is there any place I should look?
> >
> >
> > Thanks,
> > Joel
> >
> > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com>
> wrote:
> >
> > > Hi ..
> > > if you are reading json then ensure that the file content is parsed
> > correct
> > > by pig before you do grouping.
> > > Simple dump sometimes does not show if the json was parsed into
> multiple
> > > columns or entire line was read as one string into the 1st column only.
> > >
> > >
> > >
> > > *Cheers !!*
> > > Arvind
> > >
> > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com>
> > wrote:
> > >
> > > > Hi Arvind,
> > > >
> > > > You are right. It works fine in local mode. No records eliminated.
> > > >
> > > > I need to now find out why while using mapreduce mode some records
> are
> > > > getting eliminated.
> > > >
> > > > Any suggestions on troubleshooting steps for finding out the
> root-cause
> > > in
> > > > mapreduce mode? Which logs to be checked, etc.
> > > >
> > > > Appreciate any help!
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com>
> > > wrote:
> > > >
> > > > > tested on pig .15 using your data and in local mode .. could not
> > > > reproduce
> > > > > issue ..
> > > > > ==================================================
> > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > >
> > > > > (Ian_hoch,{(en,Ian_hoch)})
> > > > > (gwenshap,{(en,gwenshap)})
> > > > > (p2people,{(en,p2people)})
> > > > > (DoThisBest,{(en,DoThisBest)})
> > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > > > (GuitartJosep,{(en,GuitartJosep)})
> > > > > (Komalmittal91,{(en,Komalmittal91)})
> > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > > > (innovatesocialm,{(en,innovatesocialm)})
> > > > > ==================================================
> > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > > > ==================================================
> > > > >
> > > > > suggestions ..
> > > > > > try in local mode to reporduce issue .. (if you have not already
> > done
> > > > so)
> > > > > > close all old sessions and open a new one... (i know its
> dumb..but
> > > > helped
> > > > > me some times)
> > > > >
> > > > >
> > > > > *Cheers !!*
> > > > > Arvind
> > > > >
> > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I reproduced the issue with less columns as well.
> > > > > >
> > > > > > dump final_by_lsn;
> > > > > >
> > > > > > (en,LornaGreenNWC)
> > > > > > (en,GuitartJosep)
> > > > > > (en,gwenshap)
> > > > > > (en,innovatesocialm)
> > > > > > (en,Komalmittal91)
> > > > > > (en,Ian_hoch)
> > > > > > (en,p2people)
> > > > > > (en,W4_Jobs_in_ARZ)
> > > > > > (en,wesleyyuhn1)
> > > > > > (en,DoThisBest)
> > > > > >
> > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > > >
> > > > > >
> > > > > > grunt> dump final_by_lsn_g;
> > > > > >
> > > > > > (gwenshap,{(en,gwenshap)})
> > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > > > >
> > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > > > >
> > > > > >
> > > > > > Steps I tried to find the root-cause:
> > > > > > - Removing special characters from the data
> > > > > > - Setting the loglevel to 'Debug'
> > > > > > However, I couldn't find a clue about the problem.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Can someone please help me troubleshoot the issue?
> > > > > >
> > > > > > Thanks,
> > > > > > Joel
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> > > sterrell@oculus360.us
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Please try reproducing the problem with the smallest amount of
> > data
> > > > > > > possible.  Use as few rows and the smallest strings possible
> that
> > > > still
> > > > > > > demonstrate the discrepancy.  And then repost your problem.  In
> > > doing
> > > > > so,
> > > > > > > it will make your request easier to digest by the readers of
> > group,
> > > > and
> > > > > > you
> > > > > > > might even discover a problem in your original data if you can
> > not
> > > > > > > reproduce it on a smaller scale.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >     Steve
> > > > > > >
> > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
> > games2013.sam@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am trying to group a table (final) containing 10 records,
> by
> > a
> > > > > > > > column screen_name using the following command.
> > > > > > > >
> > > > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > > > >
> > > > > > > > When I dump final_by_sn table, only 4 records are returned as
> > > shown
> > > > > > > below:
> > > > > > > >
> > > > > > > > grunt> dump final_by_sn;
> > > > > > > >
> > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and
> made
> > > me
> > > > > > > realize
> > > > > > > > how much I miss Japan:
> > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > > > )
> > > > > > > > })
> > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop
> skills
> > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > > > ),(6
> > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > http://t.co/UBAni5DPrw
> > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > > > > @p2pLanguages
> > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> ),(#BigData:
> > > > What
> > > > > it
> > > > > > > can
> > > > > > > > and can't do!
> > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > > > ),(#BigData:
> > > > > > > > What it can and can't do!
> > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > > > > #Arizona.
> > > > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> > > ),(Big
> > > > > > #Data
> > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> #Data
> > > > #Lead
> > > > > > > > Phoenix
> > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > > > >
> > > > > > > > dump final;
> > > > > > > >
> > > > > > > > (RT @lordlancaster: Absolutely blown away by
> @SciTecDaresbury!
> > > > > 'Proper'
> > > > > > > Big
> > > > > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > > > > 2014,654395184428515332)
> > > > > > > > (#BigData: What it can and can't do!
> > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18
> > > 10:20:02
> > > > > > +0000
> > > > > > > > 2015,654395189595869184)
> > > > > > > > (.@bigdata used this photo in his blog post and made me
> realize
> > > how
> > > > > > much
> > > > > > > I
> > > > > > > > miss Japan:
> https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> > > Oct
> > > > > 15
> > > > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > > > ("Global Release [Big Data Book] Profit From Science" on
> > > @LinkedIn
> > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43
> +0000
> > > > > > > > 2012,654395207065034752)
> > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > > > http://t.co/Dahu964w5U
> > > > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
> Feb
> > > 12
> > > > > > > 16:44:50
> > > > > > > > +0000 2015,654395216208752641)
> > > > > > > > (On Google Books, language, and the possible limits of big
> data
> > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
> > 16:25:09
> > > > > +0000
> > > > > > > > 2012,654395216057659392)
> > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > > http://t.co/UBAni5DPrw
> > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> > > 06:17:09
> > > > > > +0000
> > > > > > > > 2009,654395220373729280)
> > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > #TechFetch
> > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> > > 09:32:31
> > > > > > +0000
> > > > > > > > 2014,654395236718911488)
> > > > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > > > @wesleyyuhn1
> > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> > > > 12:35:12
> > > > > > > +0000
> > > > > > > > 2014,654395243975065600)
> > > > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > > > > #webmasters
> > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > > > > 2012,654395246025904128)
> > > > > > > > grunt>
> > > > > > > >
> > > > > > > >
> > > > > > > > Could you please help me understand why 6 records are
> > eliminated
> > > > > while
> > > > > > > > doing a group by?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Joel
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Arvind S <ar...@gmail.com>.
have you tried
grunt> set debug on;
grunt> set verbose on;

this gives some counters which might help ..


*Cheers !!*
Arvind

On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi Arvind,
>
> Thanks but I ensured that each element is populated to their respective
> fields. I also ensured that the data is clean since the record which is
> getting eliminated is getting processed fine if only one record is
> processed.
>
> How to find the root-cause? I am not getting anything from the server logs
> or from the application logs. Is there any place I should look?
>
>
> Thanks,
> Joel
>
> On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com> wrote:
>
> > Hi ..
> > if you are reading json then ensure that the file content is parsed
> correct
> > by pig before you do grouping.
> > Simple dump sometimes does not show if the json was parsed into multiple
> > columns or entire line was read as one string into the 1st column only.
> >
> >
> >
> > *Cheers !!*
> > Arvind
> >
> > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com>
> wrote:
> >
> > > Hi Arvind,
> > >
> > > You are right. It works fine in local mode. No records eliminated.
> > >
> > > I need to now find out why while using mapreduce mode some records are
> > > getting eliminated.
> > >
> > > Any suggestions on troubleshooting steps for finding out the root-cause
> > in
> > > mapreduce mode? Which logs to be checked, etc.
> > >
> > > Appreciate any help!
> > >
> > > Thanks,
> > > Joel
> > >
> > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com>
> > wrote:
> > >
> > > > tested on pig .15 using your data and in local mode .. could not
> > > reproduce
> > > > issue ..
> > > > ==================================================
> > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > >
> > > > (Ian_hoch,{(en,Ian_hoch)})
> > > > (gwenshap,{(en,gwenshap)})
> > > > (p2people,{(en,p2people)})
> > > > (DoThisBest,{(en,DoThisBest)})
> > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > > (GuitartJosep,{(en,GuitartJosep)})
> > > > (Komalmittal91,{(en,Komalmittal91)})
> > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > > (innovatesocialm,{(en,innovatesocialm)})
> > > > ==================================================
> > > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > > >
> > > >
> > > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > > ==================================================
> > > >
> > > > suggestions ..
> > > > > try in local mode to reporduce issue .. (if you have not already
> done
> > > so)
> > > > > close all old sessions and open a new one... (i know its dumb..but
> > > helped
> > > > me some times)
> > > >
> > > >
> > > > *Cheers !!*
> > > > Arvind
> > > >
> > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I reproduced the issue with less columns as well.
> > > > >
> > > > > dump final_by_lsn;
> > > > >
> > > > > (en,LornaGreenNWC)
> > > > > (en,GuitartJosep)
> > > > > (en,gwenshap)
> > > > > (en,innovatesocialm)
> > > > > (en,Komalmittal91)
> > > > > (en,Ian_hoch)
> > > > > (en,p2people)
> > > > > (en,W4_Jobs_in_ARZ)
> > > > > (en,wesleyyuhn1)
> > > > > (en,DoThisBest)
> > > > >
> > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > > >
> > > > >
> > > > > grunt> dump final_by_lsn_g;
> > > > >
> > > > > (gwenshap,{(en,gwenshap)})
> > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > > >
> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > > >
> > > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > > >
> > > > >
> > > > > Steps I tried to find the root-cause:
> > > > > - Removing special characters from the data
> > > > > - Setting the loglevel to 'Debug'
> > > > > However, I couldn't find a clue about the problem.
> > > > >
> > > > >
> > > > >
> > > > > Can someone please help me troubleshoot the issue?
> > > > >
> > > > > Thanks,
> > > > > Joel
> > > > >
> > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> > sterrell@oculus360.us
> > > >
> > > > > wrote:
> > > > >
> > > > > > Please try reproducing the problem with the smallest amount of
> data
> > > > > > possible.  Use as few rows and the smallest strings possible that
> > > still
> > > > > > demonstrate the discrepancy.  And then repost your problem.  In
> > doing
> > > > so,
> > > > > > it will make your request easier to digest by the readers of
> group,
> > > and
> > > > > you
> > > > > > might even discover a problem in your original data if you can
> not
> > > > > > reproduce it on a smaller scale.
> > > > > >
> > > > > > Thanks,
> > > > > >     Steve
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
> games2013.sam@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to group a table (final) containing 10 records, by
> a
> > > > > > > column screen_name using the following command.
> > > > > > >
> > > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > > >
> > > > > > > When I dump final_by_sn table, only 4 records are returned as
> > shown
> > > > > > below:
> > > > > > >
> > > > > > > grunt> dump final_by_sn;
> > > > > > >
> > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made
> > me
> > > > > > realize
> > > > > > > how much I miss Japan:
> > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > > )
> > > > > > > })
> > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > > ),(6
> > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > http://t.co/UBAni5DPrw
> > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > > > @p2pLanguages
> > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
> > > What
> > > > it
> > > > > > can
> > > > > > > and can't do!
> http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > > ),(#BigData:
> > > > > > > What it can and can't do!
> > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > > > #Arizona.
> > > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> > ),(Big
> > > > > #Data
> > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> > > #Lead
> > > > > > > Phoenix
> > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > > >
> > > > > > > dump final;
> > > > > > >
> > > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> > > > 'Proper'
> > > > > > Big
> > > > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > > > 2014,654395184428515332)
> > > > > > > (#BigData: What it can and can't do!
> > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18
> > 10:20:02
> > > > > +0000
> > > > > > > 2015,654395189595869184)
> > > > > > > (.@bigdata used this photo in his blog post and made me realize
> > how
> > > > > much
> > > > > > I
> > > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> > Oct
> > > > 15
> > > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > > ("Global Release [Big Data Book] Profit From Science" on
> > @LinkedIn
> > > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > > > > 2012,654395207065034752)
> > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > > http://t.co/Dahu964w5U
> > > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb
> > 12
> > > > > > 16:44:50
> > > > > > > +0000 2015,654395216208752641)
> > > > > > > (On Google Books, language, and the possible limits of big data
> > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
> 16:25:09
> > > > +0000
> > > > > > > 2012,654395216057659392)
> > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > > http://t.co/UBAni5DPrw
> > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> > 06:17:09
> > > > > +0000
> > > > > > > 2009,654395220373729280)
> > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> #TechFetch
> > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> > 09:32:31
> > > > > +0000
> > > > > > > 2014,654395236718911488)
> > > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > > @wesleyyuhn1
> > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> > > 12:35:12
> > > > > > +0000
> > > > > > > 2014,654395243975065600)
> > > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > > > #webmasters
> > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > > > 2012,654395246025904128)
> > > > > > > grunt>
> > > > > > >
> > > > > > >
> > > > > > > Could you please help me understand why 6 records are
> eliminated
> > > > while
> > > > > > > doing a group by?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Joel
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
Hi Arvind,

Thanks but I ensured that each element is populated to their respective
fields. I also ensured that the data is clean since the record which is
getting eliminated is getting processed fine if only one record is
processed.

How to find the root-cause? I am not getting anything from the server logs
or from the application logs. Is there any place I should look?


Thanks,
Joel

On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com> wrote:

> Hi ..
> if you are reading json then ensure that the file content is parsed correct
> by pig before you do grouping.
> Simple dump sometimes does not show if the json was parsed into multiple
> columns or entire line was read as one string into the 1st column only.
>
>
>
> *Cheers !!*
> Arvind
>
> On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com> wrote:
>
> > Hi Arvind,
> >
> > You are right. It works fine in local mode. No records eliminated.
> >
> > I need to now find out why while using mapreduce mode some records are
> > getting eliminated.
> >
> > Any suggestions on troubleshooting steps for finding out the root-cause
> in
> > mapreduce mode? Which logs to be checked, etc.
> >
> > Appreciate any help!
> >
> > Thanks,
> > Joel
> >
> > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com>
> wrote:
> >
> > > tested on pig .15 using your data and in local mode .. could not
> > reproduce
> > > issue ..
> > > ==================================================
> > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > >
> > > (Ian_hoch,{(en,Ian_hoch)})
> > > (gwenshap,{(en,gwenshap)})
> > > (p2people,{(en,p2people)})
> > > (DoThisBest,{(en,DoThisBest)})
> > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > > (GuitartJosep,{(en,GuitartJosep)})
> > > (Komalmittal91,{(en,Komalmittal91)})
> > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > > (innovatesocialm,{(en,innovatesocialm)})
> > > ==================================================
> > > final_by_lsn_g = GROUP final_by_lsn BY language;
> > >
> > >
> > >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > > ==================================================
> > >
> > > suggestions ..
> > > > try in local mode to reporduce issue .. (if you have not already done
> > so)
> > > > close all old sessions and open a new one... (i know its dumb..but
> > helped
> > > me some times)
> > >
> > >
> > > *Cheers !!*
> > > Arvind
> > >
> > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I reproduced the issue with less columns as well.
> > > >
> > > > dump final_by_lsn;
> > > >
> > > > (en,LornaGreenNWC)
> > > > (en,GuitartJosep)
> > > > (en,gwenshap)
> > > > (en,innovatesocialm)
> > > > (en,Komalmittal91)
> > > > (en,Ian_hoch)
> > > > (en,p2people)
> > > > (en,W4_Jobs_in_ARZ)
> > > > (en,wesleyyuhn1)
> > > > (en,DoThisBest)
> > > >
> > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > > >
> > > >
> > > > grunt> dump final_by_lsn_g;
> > > >
> > > > (gwenshap,{(en,gwenshap)})
> > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > >
> (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > > >
> > > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > > >
> > > >
> > > > Steps I tried to find the root-cause:
> > > > - Removing special characters from the data
> > > > - Setting the loglevel to 'Debug'
> > > > However, I couldn't find a clue about the problem.
> > > >
> > > >
> > > >
> > > > Can someone please help me troubleshoot the issue?
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> sterrell@oculus360.us
> > >
> > > > wrote:
> > > >
> > > > > Please try reproducing the problem with the smallest amount of data
> > > > > possible.  Use as few rows and the smallest strings possible that
> > still
> > > > > demonstrate the discrepancy.  And then repost your problem.  In
> doing
> > > so,
> > > > > it will make your request easier to digest by the readers of group,
> > and
> > > > you
> > > > > might even discover a problem in your original data if you can not
> > > > > reproduce it on a smaller scale.
> > > > >
> > > > > Thanks,
> > > > >     Steve
> > > > >
> > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <games2013.sam@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to group a table (final) containing 10 records, by a
> > > > > > column screen_name using the following command.
> > > > > >
> > > > > > final_by_sn = GROUP final BY screen_name;
> > > > > >
> > > > > > When I dump final_by_sn table, only 4 records are returned as
> shown
> > > > > below:
> > > > > >
> > > > > > grunt> dump final_by_sn;
> > > > > >
> > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made
> me
> > > > > realize
> > > > > > how much I miss Japan:
> > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > > )
> > > > > > })
> > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > > ),(6
> > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > > @p2pLanguages
> > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
> > What
> > > it
> > > > > can
> > > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > > ),(#BigData:
> > > > > > What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > > #Arizona.
> > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> ),(Big
> > > > #Data
> > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> > #Lead
> > > > > > Phoenix
> > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > > >
> > > > > > dump final;
> > > > > >
> > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> > > 'Proper'
> > > > > Big
> > > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > > 2014,654395184428515332)
> > > > > > (#BigData: What it can and can't do!
> > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18
> 10:20:02
> > > > +0000
> > > > > > 2015,654395189595869184)
> > > > > > (.@bigdata used this photo in his blog post and made me realize
> how
> > > > much
> > > > > I
> > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> Oct
> > > 15
> > > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > > ("Global Release [Big Data Book] Profit From Science" on
> @LinkedIn
> > > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > > > 2012,654395207065034752)
> > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > > http://t.co/Dahu964w5U
> > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb
> 12
> > > > > 16:44:50
> > > > > > +0000 2015,654395216208752641)
> > > > > > (On Google Books, language, and the possible limits of big data
> > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09
> > > +0000
> > > > > > 2012,654395216057659392)
> > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > > http://t.co/UBAni5DPrw
> > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04
> 06:17:09
> > > > +0000
> > > > > > 2009,654395220373729280)
> > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29
> 09:32:31
> > > > +0000
> > > > > > 2014,654395236718911488)
> > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > > @wesleyyuhn1
> > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> > 12:35:12
> > > > > +0000
> > > > > > 2014,654395243975065600)
> > > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > > #webmasters
> > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > > 2012,654395246025904128)
> > > > > > grunt>
> > > > > >
> > > > > >
> > > > > > Could you please help me understand why 6 records are eliminated
> > > while
> > > > > > doing a group by?
> > > > > >
> > > > > > Thanks,
> > > > > > Joel
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Arvind S <ar...@gmail.com>.
Hi ..
if you are reading json then ensure that the file content is parsed correct
by pig before you do grouping.
Simple dump sometimes does not show if the json was parsed into multiple
columns or entire line was read as one string into the 1st column only.



*Cheers !!*
Arvind

On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi Arvind,
>
> You are right. It works fine in local mode. No records eliminated.
>
> I need to now find out why while using mapreduce mode some records are
> getting eliminated.
>
> Any suggestions on troubleshooting steps for finding out the root-cause in
> mapreduce mode? Which logs to be checked, etc.
>
> Appreciate any help!
>
> Thanks,
> Joel
>
> On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com> wrote:
>
> > tested on pig .15 using your data and in local mode .. could not
> reproduce
> > issue ..
> > ==================================================
> > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> >
> > (Ian_hoch,{(en,Ian_hoch)})
> > (gwenshap,{(en,gwenshap)})
> > (p2people,{(en,p2people)})
> > (DoThisBest,{(en,DoThisBest)})
> > (wesleyyuhn1,{(en,wesleyyuhn1)})
> > (GuitartJosep,{(en,GuitartJosep)})
> > (Komalmittal91,{(en,Komalmittal91)})
> > (LornaGreenNWC,{(en,LornaGreenNWC)})
> > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> > (innovatesocialm,{(en,innovatesocialm)})
> > ==================================================
> > final_by_lsn_g = GROUP final_by_lsn BY language;
> >
> >
> >
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> > ==================================================
> >
> > suggestions ..
> > > try in local mode to reporduce issue .. (if you have not already done
> so)
> > > close all old sessions and open a new one... (i know its dumb..but
> helped
> > me some times)
> >
> >
> > *Cheers !!*
> > Arvind
> >
> > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I reproduced the issue with less columns as well.
> > >
> > > dump final_by_lsn;
> > >
> > > (en,LornaGreenNWC)
> > > (en,GuitartJosep)
> > > (en,gwenshap)
> > > (en,innovatesocialm)
> > > (en,Komalmittal91)
> > > (en,Ian_hoch)
> > > (en,p2people)
> > > (en,W4_Jobs_in_ARZ)
> > > (en,wesleyyuhn1)
> > > (en,DoThisBest)
> > >
> > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> > >
> > >
> > > grunt> dump final_by_lsn_g;
> > >
> > > (gwenshap,{(en,gwenshap)})
> > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> > >
> > >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> > >
> > >
> > > Steps I tried to find the root-cause:
> > > - Removing special characters from the data
> > > - Setting the loglevel to 'Debug'
> > > However, I couldn't find a clue about the problem.
> > >
> > >
> > >
> > > Can someone please help me troubleshoot the issue?
> > >
> > > Thanks,
> > > Joel
> > >
> > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <sterrell@oculus360.us
> >
> > > wrote:
> > >
> > > > Please try reproducing the problem with the smallest amount of data
> > > > possible.  Use as few rows and the smallest strings possible that
> still
> > > > demonstrate the discrepancy.  And then repost your problem.  In doing
> > so,
> > > > it will make your request easier to digest by the readers of group,
> and
> > > you
> > > > might even discover a problem in your original data if you can not
> > > > reproduce it on a smaller scale.
> > > >
> > > > Thanks,
> > > >     Steve
> > > >
> > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to group a table (final) containing 10 records, by a
> > > > > column screen_name using the following command.
> > > > >
> > > > > final_by_sn = GROUP final BY screen_name;
> > > > >
> > > > > When I dump final_by_sn table, only 4 records are returned as shown
> > > > below:
> > > > >
> > > > > grunt> dump final_by_sn;
> > > > >
> > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> > > > realize
> > > > > how much I miss Japan:
> > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > > )
> > > > > })
> > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > http://t.co/UBAni5DPrw
> > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > > ),(6
> > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > > @p2pLanguages
> > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData:
> What
> > it
> > > > can
> > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > > ),(#BigData:
> > > > > What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> > #Arizona.
> > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> > > #Data
> > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> #Lead
> > > > > Phoenix
> > > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > > >
> > > > > dump final;
> > > > >
> > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> > 'Proper'
> > > > Big
> > > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > > 2014,654395184428515332)
> > > > > (#BigData: What it can and can't do!
> > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02
> > > +0000
> > > > > 2015,654395189595869184)
> > > > > (.@bigdata used this photo in his blog post and made me realize how
> > > much
> > > > I
> > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct
> > 15
> > > > > 20:49:39 +0000 2007,654395195581009920)
> > > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > > 2012,654395207065034752)
> > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > > http://t.co/Dahu964w5U
> > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> > > > 16:44:50
> > > > > +0000 2015,654395216208752641)
> > > > > (On Google Books, language, and the possible limits of big data
> > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09
> > +0000
> > > > > 2012,654395216057659392)
> > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > > http://t.co/UBAni5DPrw
> > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09
> > > +0000
> > > > > 2009,654395220373729280)
> > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31
> > > +0000
> > > > > 2014,654395236718911488)
> > > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> > @wesleyyuhn1
> > > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21
> 12:35:12
> > > > +0000
> > > > > 2014,654395243975065600)
> > > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> > #webmasters
> > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > > 2012,654395246025904128)
> > > > > grunt>
> > > > >
> > > > >
> > > > > Could you please help me understand why 6 records are eliminated
> > while
> > > > > doing a group by?
> > > > >
> > > > > Thanks,
> > > > > Joel
> > > > >
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
Hi Arvind,

You are right. It works fine in local mode. No records eliminated.

I need to now find out why while using mapreduce mode some records are
getting eliminated.

Any suggestions on troubleshooting steps for finding out the root-cause in
mapreduce mode? Which logs to be checked, etc.

Appreciate any help!

Thanks,
Joel

On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <ar...@gmail.com> wrote:

> tested on pig .15 using your data and in local mode .. could not reproduce
> issue ..
> ==================================================
> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>
> (Ian_hoch,{(en,Ian_hoch)})
> (gwenshap,{(en,gwenshap)})
> (p2people,{(en,p2people)})
> (DoThisBest,{(en,DoThisBest)})
> (wesleyyuhn1,{(en,wesleyyuhn1)})
> (GuitartJosep,{(en,GuitartJosep)})
> (Komalmittal91,{(en,Komalmittal91)})
> (LornaGreenNWC,{(en,LornaGreenNWC)})
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> (innovatesocialm,{(en,innovatesocialm)})
> ==================================================
> final_by_lsn_g = GROUP final_by_lsn BY language;
>
>
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> ==================================================
>
> suggestions ..
> > try in local mode to reporduce issue .. (if you have not already done so)
> > close all old sessions and open a new one... (i know its dumb..but helped
> me some times)
>
>
> *Cheers !!*
> Arvind
>
> On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com> wrote:
>
> > Hi,
> >
> > I reproduced the issue with less columns as well.
> >
> > dump final_by_lsn;
> >
> > (en,LornaGreenNWC)
> > (en,GuitartJosep)
> > (en,gwenshap)
> > (en,innovatesocialm)
> > (en,Komalmittal91)
> > (en,Ian_hoch)
> > (en,p2people)
> > (en,W4_Jobs_in_ARZ)
> > (en,wesleyyuhn1)
> > (en,DoThisBest)
> >
> > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> >
> >
> > grunt> dump final_by_lsn_g;
> >
> > (gwenshap,{(en,gwenshap)})
> > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> >
> >
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> >
> >
> > Steps I tried to find the root-cause:
> > - Removing special characters from the data
> > - Setting the loglevel to 'Debug'
> > However, I couldn't find a clue about the problem.
> >
> >
> >
> > Can someone please help me troubleshoot the issue?
> >
> > Thanks,
> > Joel
> >
> > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <st...@oculus360.us>
> > wrote:
> >
> > > Please try reproducing the problem with the smallest amount of data
> > > possible.  Use as few rows and the smallest strings possible that still
> > > demonstrate the discrepancy.  And then repost your problem.  In doing
> so,
> > > it will make your request easier to digest by the readers of group, and
> > you
> > > might even discover a problem in your original data if you can not
> > > reproduce it on a smaller scale.
> > >
> > > Thanks,
> > >     Steve
> > >
> > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to group a table (final) containing 10 records, by a
> > > > column screen_name using the following command.
> > > >
> > > > final_by_sn = GROUP final BY screen_name;
> > > >
> > > > When I dump final_by_sn table, only 4 records are returned as shown
> > > below:
> > > >
> > > > grunt> dump final_by_sn;
> > > >
> > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> > > realize
> > > > how much I miss Japan:
> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > > )
> > > > })
> > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > > ),(6
> > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > http://t.co/UBAni5DPrw
> > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> > @p2pLanguages
> > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What
> it
> > > can
> > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > > ),(#BigData:
> > > > What it can and can't do!
> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in
> #Arizona.
> > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> > #Data
> > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead
> > > > Phoenix
> > > > AZ (#job) wanted in #Arizona. #TechFetch
> > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > > >
> > > > dump final;
> > > >
> > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury!
> 'Proper'
> > > Big
> > > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > > 2014,654395184428515332)
> > > > (#BigData: What it can and can't do!
> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02
> > +0000
> > > > 2015,654395189595869184)
> > > > (.@bigdata used this photo in his blog post and made me realize how
> > much
> > > I
> > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct
> 15
> > > > 20:49:39 +0000 2007,654395195581009920)
> > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > > > http://t.co/WnJ2HwthYF Congrats to George
> > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > > 2012,654395207065034752)
> > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> > http://t.co/Dahu964w5U
> > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> > > 16:44:50
> > > > +0000 2015,654395216208752641)
> > > > (On Google Books, language, and the possible limits of big data
> > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09
> +0000
> > > > 2012,654395216057659392)
> > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > > http://t.co/UBAni5DPrw
> > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09
> > +0000
> > > > 2009,654395220373729280)
> > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31
> > +0000
> > > > 2014,654395236718911488)
> > > > (#Appboy expands suite of #mobile #analytics @venturebeat
> @wesleyyuhn1
> > > > http://t.co/85P6vEJg08 #MarTech #automation
> > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12
> > > +0000
> > > > 2014,654395243975065600)
> > > > (Best Cloud Hosting and CDN services for Web Developers
> > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting
> #webmasters
> > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > > 2012,654395246025904128)
> > > > grunt>
> > > >
> > > >
> > > > Could you please help me understand why 6 records are eliminated
> while
> > > > doing a group by?
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Arvind S <ar...@gmail.com>.
tested on pig .15 using your data and in local mode .. could not reproduce
issue ..
==================================================
final_by_lsn_g = GROUP final_by_lsn BY screen_name;

(Ian_hoch,{(en,Ian_hoch)})
(gwenshap,{(en,gwenshap)})
(p2people,{(en,p2people)})
(DoThisBest,{(en,DoThisBest)})
(wesleyyuhn1,{(en,wesleyyuhn1)})
(GuitartJosep,{(en,GuitartJosep)})
(Komalmittal91,{(en,Komalmittal91)})
(LornaGreenNWC,{(en,LornaGreenNWC)})
(W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
(innovatesocialm,{(en,innovatesocialm)})
==================================================
final_by_lsn_g = GROUP final_by_lsn BY language;

(en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
==================================================

suggestions ..
> try in local mode to reporduce issue .. (if you have not already done so)
> close all old sessions and open a new one... (i know its dumb..but helped
me some times)


*Cheers !!*
Arvind

On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi,
>
> I reproduced the issue with less columns as well.
>
> dump final_by_lsn;
>
> (en,LornaGreenNWC)
> (en,GuitartJosep)
> (en,gwenshap)
> (en,innovatesocialm)
> (en,Komalmittal91)
> (en,Ian_hoch)
> (en,p2people)
> (en,W4_Jobs_in_ARZ)
> (en,wesleyyuhn1)
> (en,DoThisBest)
>
> grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>
>
> grunt> dump final_by_lsn_g;
>
> (gwenshap,{(en,gwenshap)})
> (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
>
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
>
>
> Steps I tried to find the root-cause:
> - Removing special characters from the data
> - Setting the loglevel to 'Debug'
> However, I couldn't find a clue about the problem.
>
>
>
> Can someone please help me troubleshoot the issue?
>
> Thanks,
> Joel
>
> On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <st...@oculus360.us>
> wrote:
>
> > Please try reproducing the problem with the smallest amount of data
> > possible.  Use as few rows and the smallest strings possible that still
> > demonstrate the discrepancy.  And then repost your problem.  In doing so,
> > it will make your request easier to digest by the readers of group, and
> you
> > might even discover a problem in your original data if you can not
> > reproduce it on a smaller scale.
> >
> > Thanks,
> >     Steve
> >
> > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am trying to group a table (final) containing 10 records, by a
> > > column screen_name using the following command.
> > >
> > > final_by_sn = GROUP final BY screen_name;
> > >
> > > When I dump final_by_sn table, only 4 records are returned as shown
> > below:
> > >
> > > grunt> dump final_by_sn;
> > >
> > > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> > realize
> > > how much I miss Japan:
> > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > )
> > > })
> > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > ),(6
> > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> @p2pLanguages
> > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What it
> > can
> > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > ),(#BigData:
> > > What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> #Data
> > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead
> > > Phoenix
> > > AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > >
> > > dump final;
> > >
> > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! 'Proper'
> > Big
> > > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > 2014,654395184428515332)
> > > (#BigData: What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02
> +0000
> > > 2015,654395189595869184)
> > > (.@bigdata used this photo in his blog post and made me realize how
> much
> > I
> > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct 15
> > > 20:49:39 +0000 2007,654395195581009920)
> > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > > http://t.co/WnJ2HwthYF Congrats to George
> > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > 2012,654395207065034752)
> > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> http://t.co/Dahu964w5U
> > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> > 16:44:50
> > > +0000 2015,654395216208752641)
> > > (On Google Books, language, and the possible limits of big data
> > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 +0000
> > > 2012,654395216057659392)
> > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09
> +0000
> > > 2009,654395220373729280)
> > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31
> +0000
> > > 2014,654395236718911488)
> > > (#Appboy expands suite of #mobile #analytics @venturebeat @wesleyyuhn1
> > > http://t.co/85P6vEJg08 #MarTech #automation
> > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12
> > +0000
> > > 2014,654395243975065600)
> > > (Best Cloud Hosting and CDN services for Web Developers
> > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting #webmasters
> > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > 2012,654395246025904128)
> > > grunt>
> > >
> > >
> > > Could you please help me understand why 6 records are eliminated while
> > > doing a group by?
> > >
> > > Thanks,
> > > Joel
> > >
> >
>

Re: Group By Eliminating Few Records

Posted by Sam Joe <ga...@gmail.com>.
Hi,

I reproduced the issue with less columns as well.

dump final_by_lsn;

(en,LornaGreenNWC)
(en,GuitartJosep)
(en,gwenshap)
(en,innovatesocialm)
(en,Komalmittal91)
(en,Ian_hoch)
(en,p2people)
(en,W4_Jobs_in_ARZ)
(en,wesleyyuhn1)
(en,DoThisBest)

grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;


grunt> dump final_by_lsn_g;

(gwenshap,{(en,gwenshap)})
(p2people,{(en,p2people),(en,p2people),(en,p2people)})
(GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
(W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})


Steps I tried to find the root-cause:
- Removing special characters from the data
- Setting the loglevel to 'Debug'
However, I couldn't find a clue about the problem.



Can someone please help me troubleshoot the issue?

Thanks,
Joel

On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <st...@oculus360.us>
wrote:

> Please try reproducing the problem with the smallest amount of data
> possible.  Use as few rows and the smallest strings possible that still
> demonstrate the discrepancy.  And then repost your problem.  In doing so,
> it will make your request easier to digest by the readers of group, and you
> might even discover a problem in your original data if you can not
> reproduce it on a smaller scale.
>
> Thanks,
>     Steve
>
> On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to group a table (final) containing 10 records, by a
> > column screen_name using the following command.
> >
> > final_by_sn = GROUP final BY screen_name;
> >
> > When I dump final_by_sn table, only 4 records are returned as shown
> below:
> >
> > grunt> dump final_by_sn;
> >
> > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> realize
> > how much I miss Japan:
> https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > )
> > })
> > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > http://t.co/UBAni5DPrw
> http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > ),(6
> > new @p2pLanguages jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new @p2pLanguages
> > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > (GuitartJosep,{(#BigData: What it can and can't do!
> > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What it
> can
> > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > ),(#BigData:
> > What it can and can't do!
> > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead
> > Phoenix
> > AZ (#job) wanted in #Arizona. #TechFetch
> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> >
> > dump final;
> >
> > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! 'Proper'
> Big
> > Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > 2014,654395184428515332)
> > (#BigData: What it can and can't do!
> > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 +0000
> > 2015,654395189595869184)
> > (.@bigdata used this photo in his blog post and made me realize how much
> I
> > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct 15
> > 20:49:39 +0000 2007,654395195581009920)
> > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > http://t.co/WnJ2HwthYF Congrats to George
> > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > 2012,654395207065034752)
> > (Hi, BesPardon Don't Forget to follow --&gt;&gt; http://t.co/Dahu964w5U
> > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> 16:44:50
> > +0000 2015,654395216208752641)
> > (On Google Books, language, and the possible limits of big data
> > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 +0000
> > 2012,654395216057659392)
> > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > http://t.co/UBAni5DPrw
> > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 +0000
> > 2009,654395220373729280)
> > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 +0000
> > 2014,654395236718911488)
> > (#Appboy expands suite of #mobile #analytics @venturebeat @wesleyyuhn1
> > http://t.co/85P6vEJg08 #MarTech #automation
> > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12
> +0000
> > 2014,654395243975065600)
> > (Best Cloud Hosting and CDN services for Web Developers
> > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting #webmasters
> > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > 2012,654395246025904128)
> > grunt>
> >
> >
> > Could you please help me understand why 6 records are eliminated while
> > doing a group by?
> >
> > Thanks,
> > Joel
> >
>

Re: Group By Eliminating Few Records

Posted by Steve Terrell <st...@oculus360.us>.
Please try reproducing the problem with the smallest amount of data
possible.  Use as few rows and the smallest strings possible that still
demonstrate the discrepancy.  And then repost your problem.  In doing so,
it will make your request easier to digest by the readers of group, and you
might even discover a problem in your original data if you can not
reproduce it on a smaller scale.

Thanks,
    Steve

On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi,
>
> I am trying to group a table (final) containing 10 records, by a
> column screen_name using the following command.
>
> final_by_sn = GROUP final BY screen_name;
>
> When I dump final_by_sn table, only 4 records are returned as shown below:
>
> grunt> dump final_by_sn;
>
> (gwenshap,{(.@bigdata used this photo in his blog post and made me realize
> how much I miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> )
> })
> (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> http://t.co/UBAni5DPrw http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> ),(6
> new @p2pLanguages jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new @p2pLanguages
> jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> (GuitartJosep,{(#BigData: What it can and can't do!
> http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What it can
> and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> ),(#BigData:
> What it can and can't do!
> http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data
> #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead
> Phoenix
> AZ (#job) wanted in #Arizona. #TechFetch
> http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
>
> dump final;
>
> (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! 'Proper' Big
> Data, Smart Cities, Internet of Things &amp; more! #TechNorth
> http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> 2014,654395184428515332)
> (#BigData: What it can and can't do!
> http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 +0000
> 2015,654395189595869184)
> (.@bigdata used this photo in his blog post and made me realize how much I
> miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct 15
> 20:49:39 +0000 2007,654395195581009920)
> ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> http://t.co/WnJ2HwthYF Congrats to George
> Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> 2012,654395207065034752)
> (Hi, BesPardon Don't Forget to follow --&gt;&gt; http://t.co/Dahu964w5U
> Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12 16:44:50
> +0000 2015,654395216208752641)
> (On Google Books, language, and the possible limits of big data
> https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 +0000
> 2012,654395216057659392)
> (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> http://t.co/UBAni5DPrw
> http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 +0000
> 2009,654395220373729280)
> (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 +0000
> 2014,654395236718911488)
> (#Appboy expands suite of #mobile #analytics @venturebeat @wesleyyuhn1
> http://t.co/85P6vEJg08 #MarTech #automation
> http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12 +0000
> 2014,654395243975065600)
> (Best Cloud Hosting and CDN services for Web Developers
> http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting #webmasters
> #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> 2012,654395246025904128)
> grunt>
>
>
> Could you please help me understand why 6 records are eliminated while
> doing a group by?
>
> Thanks,
> Joel
>