You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sam Joe <ga...@gmail.com> on 2015/11/19 20:31:52 UTC

Group By Eliminating Few Records In MapReduce Mode (Works Fine In Local)

Hi,

Can anyone please help me in finding the root-cause of this issue?

Thanks,
Joel

On Wed, Nov 18, 2015 at 1:04 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi Andrew, I tried that too. Every field has got correct data.
>
> Thanks,
> Joel
>
> On Wed, Nov 18, 2015 at 12:55 AM, Andrew Oliver <ac...@gmail.com>
>  wrote:
>
>> Project just screen_name. If it is blank or empty you have your answer.
>> On Nov 17, 2015 23:47, "Sam Joe" <ga...@gmail.com> wrote:
>>
>> > debug is on. verbose have to try.
>> >
>> > Thx.
>> >
>> > On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <ar...@gmail.com>
>> wrote:
>> >
>> > > have you tried
>> > > grunt> set debug on;
>> > > grunt> set verbose on;
>> > >
>> > > this gives some counters which might help ..
>> > >
>> > >
>> > > *Cheers !!*
>> > > Arvind
>> > >
>> > > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Arvind,
>> > > >
>> > > > Thanks but I ensured that each element is populated to their
>> respective
>> > > > fields. I also ensured that the data is clean since the record
>> which is
>> > > > getting eliminated is getting processed fine if only one record is
>> > > > processed.
>> > > >
>> > > > How to find the root-cause? I am not getting anything from the
>> server
>> > > logs
>> > > > or from the application logs. Is there any place I should look?
>> > > >
>> > > >
>> > > > Thanks,
>> > > > Joel
>> > > >
>> > > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <ar...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi ..
>> > > > > if you are reading json then ensure that the file content is
>> parsed
>> > > > correct
>> > > > > by pig before you do grouping.
>> > > > > Simple dump sometimes does not show if the json was parsed into
>> > > multiple
>> > > > > columns or entire line was read as one string into the 1st column
>> > only.
>> > > > >
>> > > > >
>> > > > >
>> > > > > *Cheers !!*
>> > > > > Arvind
>> > > > >
>> > > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <games2013.sam@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Arvind,
>> > > > > >
>> > > > > > You are right. It works fine in local mode. No records
>> eliminated.
>> > > > > >
>> > > > > > I need to now find out why while using mapreduce mode some
>> records
>> > > are
>> > > > > > getting eliminated.
>> > > > > >
>> > > > > > Any suggestions on troubleshooting steps for finding out the
>> > > root-cause
>> > > > > in
>> > > > > > mapreduce mode? Which logs to be checked, etc.
>> > > > > >
>> > > > > > Appreciate any help!
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Joel
>> > > > > >
>> > > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <
>> arvind18352@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > tested on pig .15 using your data and in local mode .. could
>> not
>> > > > > > reproduce
>> > > > > > > issue ..
>> > > > > > > ==================================================
>> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>> > > > > > >
>> > > > > > > (Ian_hoch,{(en,Ian_hoch)})
>> > > > > > > (gwenshap,{(en,gwenshap)})
>> > > > > > > (p2people,{(en,p2people)})
>> > > > > > > (DoThisBest,{(en,DoThisBest)})
>> > > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
>> > > > > > > (GuitartJosep,{(en,GuitartJosep)})
>> > > > > > > (Komalmittal91,{(en,Komalmittal91)})
>> > > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
>> > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
>> > > > > > > (innovatesocialm,{(en,innovatesocialm)})
>> > > > > > > ==================================================
>> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
>> > > > > > > ==================================================
>> > > > > > >
>> > > > > > > suggestions ..
>> > > > > > > > try in local mode to reporduce issue .. (if you have not
>> > already
>> > > > done
>> > > > > > so)
>> > > > > > > > close all old sessions and open a new one... (i know its
>> > > dumb..but
>> > > > > > helped
>> > > > > > > me some times)
>> > > > > > >
>> > > > > > >
>> > > > > > > *Cheers !!*
>> > > > > > > Arvind
>> > > > > > >
>> > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <
>> > games2013.sam@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I reproduced the issue with less columns as well.
>> > > > > > > >
>> > > > > > > > dump final_by_lsn;
>> > > > > > > >
>> > > > > > > > (en,LornaGreenNWC)
>> > > > > > > > (en,GuitartJosep)
>> > > > > > > > (en,gwenshap)
>> > > > > > > > (en,innovatesocialm)
>> > > > > > > > (en,Komalmittal91)
>> > > > > > > > (en,Ian_hoch)
>> > > > > > > > (en,p2people)
>> > > > > > > > (en,W4_Jobs_in_ARZ)
>> > > > > > > > (en,wesleyyuhn1)
>> > > > > > > > (en,DoThisBest)
>> > > > > > > >
>> > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > grunt> dump final_by_lsn_g;
>> > > > > > > >
>> > > > > > > > (gwenshap,{(en,gwenshap)})
>> > > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
>> > > > > > > >
>> > > > >
>> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Steps I tried to find the root-cause:
>> > > > > > > > - Removing special characters from the data
>> > > > > > > > - Setting the loglevel to 'Debug'
>> > > > > > > > However, I couldn't find a clue about the problem.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Can someone please help me troubleshoot the issue?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Joel
>> > > > > > > >
>> > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
>> > > > > sterrell@oculus360.us
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Please try reproducing the problem with the smallest
>> amount
>> > of
>> > > > data
>> > > > > > > > > possible.  Use as few rows and the smallest strings
>> possible
>> > > that
>> > > > > > still
>> > > > > > > > > demonstrate the discrepancy.  And then repost your
>> problem.
>> > In
>> > > > > doing
>> > > > > > > so,
>> > > > > > > > > it will make your request easier to digest by the readers
>> of
>> > > > group,
>> > > > > > and
>> > > > > > > > you
>> > > > > > > > > might even discover a problem in your original data if you
>> > can
>> > > > not
>> > > > > > > > > reproduce it on a smaller scale.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >     Steve
>> > > > > > > > >
>> > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
>> > > > games2013.sam@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi,
>> > > > > > > > > >
>> > > > > > > > > > I am trying to group a table (final) containing 10
>> records,
>> > > by
>> > > > a
>> > > > > > > > > > column screen_name using the following command.
>> > > > > > > > > >
>> > > > > > > > > > final_by_sn = GROUP final BY screen_name;
>> > > > > > > > > >
>> > > > > > > > > > When I dump final_by_sn table, only 4 records are
>> returned
>> > as
>> > > > > shown
>> > > > > > > > > below:
>> > > > > > > > > >
>> > > > > > > > > > grunt> dump final_by_sn;
>> > > > > > > > > >
>> > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post
>> and
>> > > made
>> > > > > me
>> > > > > > > > > realize
>> > > > > > > > > > how much I miss Japan:
>> > > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
>> > > > > > > > > > )
>> > > > > > > > > > })
>> > > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop
>> > > skills
>> > > > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
>> > > > > > > > > > ),(6
>> > > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
>> > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6
>> new
>> > > > > > > > @p2pLanguages
>> > > > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
>> > > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
>> > > ),(#BigData:
>> > > > > > What
>> > > > > > > it
>> > > > > > > > > can
>> > > > > > > > > > and can't do!
>> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
>> > > > > > > > > > ),(#BigData:
>> > > > > > > > > > What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
>> > > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job)
>> wanted
>> > in
>> > > > > > > #Arizona.
>> > > > > > > > > > #TechFetch
>> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
>> > > > > ),(Big
>> > > > > > > > #Data
>> > > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
>> > > #Data
>> > > > > > #Lead
>> > > > > > > > > > Phoenix
>> > > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
>> > > > > > > > > >
>> > > > > > > > > > dump final;
>> > > > > > > > > >
>> > > > > > > > > > (RT @lordlancaster: Absolutely blown away by
>> > > @SciTecDaresbury!
>> > > > > > > 'Proper'
>> > > > > > > > > Big
>> > > > > > > > > > Data, Smart Cities, Internet of Things &amp; more!
>> > #TechNorth
>> > > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39
>> > +0000
>> > > > > > > > > > 2014,654395184428515332)
>> > > > > > > > > > (#BigData: What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun
>> 18
>> > > > > 10:20:02
>> > > > > > > > +0000
>> > > > > > > > > > 2015,654395189595869184)
>> > > > > > > > > > (.@bigdata used this photo in his blog post and made me
>> > > realize
>> > > > > how
>> > > > > > > > much
>> > > > > > > > > I
>> > > > > > > > > > miss Japan:
>> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
>> > > > > Oct
>> > > > > > > 15
>> > > > > > > > > > 20:49:39 +0000 2007,654395195581009920)
>> > > > > > > > > > ("Global Release [Big Data Book] Profit From Science" on
>> > > > > @LinkedIn
>> > > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
>> > > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12
>> 13:46:43
>> > > +0000
>> > > > > > > > > > 2012,654395207065034752)
>> > > > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
>> > > > > > > > http://t.co/Dahu964w5U
>> > > > > > > > > > Thanks..
>> http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
>> > > Feb
>> > > > > 12
>> > > > > > > > > 16:44:50
>> > > > > > > > > > +0000 2015,654395216208752641)
>> > > > > > > > > > (On Google Books, language, and the possible limits of
>> big
>> > > data
>> > > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
>> > > > 16:25:09
>> > > > > > > +0000
>> > > > > > > > > > 2012,654395216057659392)
>> > > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
>> > > > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar
>> 04
>> > > > > 06:17:09
>> > > > > > > > +0000
>> > > > > > > > > > 2009,654395220373729280)
>> > > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
>> > > > #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug
>> 29
>> > > > > 09:32:31
>> > > > > > > > +0000
>> > > > > > > > > > 2014,654395236718911488)
>> > > > > > > > > > (#Appboy expands suite of #mobile #analytics
>> @venturebeat
>> > > > > > > @wesleyyuhn1
>> > > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
>> > > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon
>>  Jul
>> > 21
>> > > > > > 12:35:12
>> > > > > > > > > +0000
>> > > > > > > > > > 2014,654395243975065600)
>> > > > > > > > > > (Best Cloud Hosting and CDN services for Web Developers
>> > > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing
>> #cloudhosting
>> > > > > > > #webmasters
>> > > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20
>> +0000
>> > > > > > > > > > 2012,654395246025904128)
>> > > > > > > > > > grunt>
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Could you please help me understand why 6 records are
>> > > > eliminated
>> > > > > > > while
>> > > > > > > > > > doing a group by?
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Joel
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Group By Eliminating Few Records In MapReduce Mode (Works Fine In Local)

Posted by Arvind S <ar...@gmail.com>.
do you have another environment or sand box (i.e. hortonworks packaged
Sandbox etc .) where you can try ..
and check if issue is reproducible .. if yes then provide the data and code
so that others can check.

If its not reproducible on sandbox then its a problem on your target env ..
which might not be solvable over emails

*Cheers !!*
Arvind

On Fri, Nov 20, 2015 at 1:01 AM, Sam Joe <ga...@gmail.com> wrote:

> Hi,
>
> Can anyone please help me in finding the root-cause of this issue?
>
> Thanks,
> Joel
>
> On Wed, Nov 18, 2015 at 1:04 AM, Sam Joe <ga...@gmail.com> wrote:
>
> > Hi Andrew, I tried that too. Every field has got correct data.
> >
> > Thanks,
> > Joel
> >
> > On Wed, Nov 18, 2015 at 12:55 AM, Andrew Oliver <ac...@gmail.com>
> >  wrote:
> >
> >> Project just screen_name. If it is blank or empty you have your answer.
> >> On Nov 17, 2015 23:47, "Sam Joe" <ga...@gmail.com> wrote:
> >>
> >> > debug is on. verbose have to try.
> >> >
> >> > Thx.
> >> >
> >> > On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <ar...@gmail.com>
> >> wrote:
> >> >
> >> > > have you tried
> >> > > grunt> set debug on;
> >> > > grunt> set verbose on;
> >> > >
> >> > > this gives some counters which might help ..
> >> > >
> >> > >
> >> > > *Cheers !!*
> >> > > Arvind
> >> > >
> >> > > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <ga...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Arvind,
> >> > > >
> >> > > > Thanks but I ensured that each element is populated to their
> >> respective
> >> > > > fields. I also ensured that the data is clean since the record
> >> which is
> >> > > > getting eliminated is getting processed fine if only one record is
> >> > > > processed.
> >> > > >
> >> > > > How to find the root-cause? I am not getting anything from the
> >> server
> >> > > logs
> >> > > > or from the application logs. Is there any place I should look?
> >> > > >
> >> > > >
> >> > > > Thanks,
> >> > > > Joel
> >> > > >
> >> > > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <arvind18352@gmail.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Hi ..
> >> > > > > if you are reading json then ensure that the file content is
> >> parsed
> >> > > > correct
> >> > > > > by pig before you do grouping.
> >> > > > > Simple dump sometimes does not show if the json was parsed into
> >> > > multiple
> >> > > > > columns or entire line was read as one string into the 1st
> column
> >> > only.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > *Cheers !!*
> >> > > > > Arvind
> >> > > > >
> >> > > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <
> games2013.sam@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi Arvind,
> >> > > > > >
> >> > > > > > You are right. It works fine in local mode. No records
> >> eliminated.
> >> > > > > >
> >> > > > > > I need to now find out why while using mapreduce mode some
> >> records
> >> > > are
> >> > > > > > getting eliminated.
> >> > > > > >
> >> > > > > > Any suggestions on troubleshooting steps for finding out the
> >> > > root-cause
> >> > > > > in
> >> > > > > > mapreduce mode? Which logs to be checked, etc.
> >> > > > > >
> >> > > > > > Appreciate any help!
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Joel
> >> > > > > >
> >> > > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <
> >> arvind18352@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > tested on pig .15 using your data and in local mode .. could
> >> not
> >> > > > > > reproduce
> >> > > > > > > issue ..
> >> > > > > > > ==================================================
> >> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> >> > > > > > >
> >> > > > > > > (Ian_hoch,{(en,Ian_hoch)})
> >> > > > > > > (gwenshap,{(en,gwenshap)})
> >> > > > > > > (p2people,{(en,p2people)})
> >> > > > > > > (DoThisBest,{(en,DoThisBest)})
> >> > > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
> >> > > > > > > (GuitartJosep,{(en,GuitartJosep)})
> >> > > > > > > (Komalmittal91,{(en,Komalmittal91)})
> >> > > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
> >> > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
> >> > > > > > > (innovatesocialm,{(en,innovatesocialm)})
> >> > > > > > > ==================================================
> >> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
> >> > > > > > > ==================================================
> >> > > > > > >
> >> > > > > > > suggestions ..
> >> > > > > > > > try in local mode to reporduce issue .. (if you have not
> >> > already
> >> > > > done
> >> > > > > > so)
> >> > > > > > > > close all old sessions and open a new one... (i know its
> >> > > dumb..but
> >> > > > > > helped
> >> > > > > > > me some times)
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > *Cheers !!*
> >> > > > > > > Arvind
> >> > > > > > >
> >> > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <
> >> > games2013.sam@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I reproduced the issue with less columns as well.
> >> > > > > > > >
> >> > > > > > > > dump final_by_lsn;
> >> > > > > > > >
> >> > > > > > > > (en,LornaGreenNWC)
> >> > > > > > > > (en,GuitartJosep)
> >> > > > > > > > (en,gwenshap)
> >> > > > > > > > (en,innovatesocialm)
> >> > > > > > > > (en,Komalmittal91)
> >> > > > > > > > (en,Ian_hoch)
> >> > > > > > > > (en,p2people)
> >> > > > > > > > (en,W4_Jobs_in_ARZ)
> >> > > > > > > > (en,wesleyyuhn1)
> >> > > > > > > > (en,DoThisBest)
> >> > > > > > > >
> >> > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > grunt> dump final_by_lsn_g;
> >> > > > > > > >
> >> > > > > > > > (gwenshap,{(en,gwenshap)})
> >> > > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> >> > > > > > > >
> >> > > > >
> >> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Steps I tried to find the root-cause:
> >> > > > > > > > - Removing special characters from the data
> >> > > > > > > > - Setting the loglevel to 'Debug'
> >> > > > > > > > However, I couldn't find a clue about the problem.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Can someone please help me troubleshoot the issue?
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Joel
> >> > > > > > > >
> >> > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
> >> > > > > sterrell@oculus360.us
> >> > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Please try reproducing the problem with the smallest
> >> amount
> >> > of
> >> > > > data
> >> > > > > > > > > possible.  Use as few rows and the smallest strings
> >> possible
> >> > > that
> >> > > > > > still
> >> > > > > > > > > demonstrate the discrepancy.  And then repost your
> >> problem.
> >> > In
> >> > > > > doing
> >> > > > > > > so,
> >> > > > > > > > > it will make your request easier to digest by the
> readers
> >> of
> >> > > > group,
> >> > > > > > and
> >> > > > > > > > you
> >> > > > > > > > > might even discover a problem in your original data if
> you
> >> > can
> >> > > > not
> >> > > > > > > > > reproduce it on a smaller scale.
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >     Steve
> >> > > > > > > > >
> >> > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
> >> > > > games2013.sam@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi,
> >> > > > > > > > > >
> >> > > > > > > > > > I am trying to group a table (final) containing 10
> >> records,
> >> > > by
> >> > > > a
> >> > > > > > > > > > column screen_name using the following command.
> >> > > > > > > > > >
> >> > > > > > > > > > final_by_sn = GROUP final BY screen_name;
> >> > > > > > > > > >
> >> > > > > > > > > > When I dump final_by_sn table, only 4 records are
> >> returned
> >> > as
> >> > > > > shown
> >> > > > > > > > > below:
> >> > > > > > > > > >
> >> > > > > > > > > > grunt> dump final_by_sn;
> >> > > > > > > > > >
> >> > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post
> >> and
> >> > > made
> >> > > > > me
> >> > > > > > > > > realize
> >> > > > > > > > > > how much I miss Japan:
> >> > > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> >> > > > > > > > > > )
> >> > > > > > > > > > })
> >> > > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData
> #Hadoop
> >> > > skills
> >> > > > > > > > > > http://t.co/UBAni5DPrw
> >> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> >> > > > > > > > > > ),(6
> >> > > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> >> > > > > > > > http://t.co/UBAni5DPrw
> >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> ),(6
> >> new
> >> > > > > > > > @p2pLanguages
> >> > > > > > > > > > jobs w/ #BigData #Hadoop skills
> http://t.co/UBAni5DPrw
> >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> >> > > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
> >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> >> > > ),(#BigData:
> >> > > > > > What
> >> > > > > > > it
> >> > > > > > > > > can
> >> > > > > > > > > > and can't do!
> >> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> >> > > > > > > > > > ),(#BigData:
> >> > > > > > > > > > What it can and can't do!
> >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> >> > > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job)
> >> wanted
> >> > in
> >> > > > > > > #Arizona.
> >> > > > > > > > > > #TechFetch
> >> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> >> > > > > ),(Big
> >> > > > > > > > #Data
> >> > > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
> ),(Big
> >> > > #Data
> >> > > > > > #Lead
> >> > > > > > > > > > Phoenix
> >> > > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
> >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> >> > > > > > > > > >
> >> > > > > > > > > > dump final;
> >> > > > > > > > > >
> >> > > > > > > > > > (RT @lordlancaster: Absolutely blown away by
> >> > > @SciTecDaresbury!
> >> > > > > > > 'Proper'
> >> > > > > > > > > Big
> >> > > > > > > > > > Data, Smart Cities, Internet of Things &amp; more!
> >> > #TechNorth
> >> > > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39
> >> > +0000
> >> > > > > > > > > > 2014,654395184428515332)
> >> > > > > > > > > > (#BigData: What it can and can't do!
> >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu
> Jun
> >> 18
> >> > > > > 10:20:02
> >> > > > > > > > +0000
> >> > > > > > > > > > 2015,654395189595869184)
> >> > > > > > > > > > (.@bigdata used this photo in his blog post and made
> me
> >> > > realize
> >> > > > > how
> >> > > > > > > > much
> >> > > > > > > > > I
> >> > > > > > > > > > miss Japan:
> >> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
> >> > > > > Oct
> >> > > > > > > 15
> >> > > > > > > > > > 20:49:39 +0000 2007,654395195581009920)
> >> > > > > > > > > > ("Global Release [Big Data Book] Profit From Science"
> on
> >> > > > > @LinkedIn
> >> > > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
> >> > > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12
> >> 13:46:43
> >> > > +0000
> >> > > > > > > > > > 2012,654395207065034752)
> >> > > > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
> >> > > > > > > > http://t.co/Dahu964w5U
> >> > > > > > > > > > Thanks..
> >> http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
> >> > > Feb
> >> > > > > 12
> >> > > > > > > > > 16:44:50
> >> > > > > > > > > > +0000 2015,654395216208752641)
> >> > > > > > > > > > (On Google Books, language, and the possible limits of
> >> big
> >> > > data
> >> > > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug
> 31
> >> > > > 16:25:09
> >> > > > > > > +0000
> >> > > > > > > > > > 2012,654395216057659392)
> >> > > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> >> > > > > > > > > > http://t.co/UBAni5DPrw
> >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar
> >> 04
> >> > > > > 06:17:09
> >> > > > > > > > +0000
> >> > > > > > > > > > 2009,654395220373729280)
> >> > > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> >> > > > #TechFetch
> >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug
> >> 29
> >> > > > > 09:32:31
> >> > > > > > > > +0000
> >> > > > > > > > > > 2014,654395236718911488)
> >> > > > > > > > > > (#Appboy expands suite of #mobile #analytics
> >> @venturebeat
> >> > > > > > > @wesleyyuhn1
> >> > > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
> >> > > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon
> >>  Jul
> >> > 21
> >> > > > > > 12:35:12
> >> > > > > > > > > +0000
> >> > > > > > > > > > 2014,654395243975065600)
> >> > > > > > > > > > (Best Cloud Hosting and CDN services for Web
> Developers
> >> > > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing
> >> #cloudhosting
> >> > > > > > > #webmasters
> >> > > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20
> >> +0000
> >> > > > > > > > > > 2012,654395246025904128)
> >> > > > > > > > > > grunt>
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Could you please help me understand why 6 records are
> >> > > > eliminated
> >> > > > > > > while
> >> > > > > > > > > > doing a group by?
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Joel
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>