You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Srini <pi...@gmail.com> on 2012/12/24 06:24:13 UTC

Sequence File processing

Hi ,

I have used SequeceFileLoader for loading sequence file.

A= load 'part-m-0000' using SequenceFileLoader() as
(key:long,value:chararray)

"value" is the  chararray which consists of 10 fields which are separated
by delimiter ( "|" here ). How do I create schema here so that I can make
further analysis with these fields (such as filter, group )

Any help is appreciated.

Thanks,
Srini

Re: Sequence File processing

Posted by Srini <pi...@gmail.com>.
Thanks Cheolsoo.

On Mon, Dec 24, 2012 at 1:37 PM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hi Srini,
>
> You can use STRSPLIT to split your "value" chararray and define schema in a
> FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),
>
> A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
> (key:long,value:chararray);
> B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
> j:int, k:int);
> DESCRIBE B;
> DUMP B;
>
> This will return:
>
> B: {key: chararray,i: int,j: int,k: int}
> (k,1,2,3)
>
> Thanks,
> Cheolsoo
>
>
> On Sun, Dec 23, 2012 at 9:24 PM, Srini <pi...@gmail.com> wrote:
>
> > Hi ,
> >
> > I have used SequeceFileLoader for loading sequence file.
> >
> > A= load 'part-m-0000' using SequenceFileLoader() as
> > (key:long,value:chararray)
> >
> > "value" is the  chararray which consists of 10 fields which are separated
> > by delimiter ( "|" here ). How do I create schema here so that I can make
> > further analysis with these fields (such as filter, group )
> >
> > Any help is appreciated.
> >
> > Thanks,
> > Srini
> >
>



-- 
Regards,
Srinivas
Srinivas@cloudwick.com

Re: Sequence File processing

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Please see the list of editor plugins in
https://cwiki.apache.org/confluence/display/PIG/PigTools

D


On Mon, Dec 24, 2012 at 9:42 PM, Kshiva Kps <ks...@gmail.com> wrote:

> Hi,
>
> Is there any PIG editors and where we can write 100 to 150 pig scripts
> I'm believing is not possible to  do in CLI mode .
> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>
>
> Thanks
>
>
> On Tue, Dec 25, 2012 at 3:09 AM, Mohammad Tariq <do...@gmail.com>
> wrote:
>
> > +1
> >
> > Best Regards,
> > Tariq
> > +91-9741563634
> > https://mtariq.jux.com/
> >
> >
> > On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <cheolsoo@cloudera.com
> > >wrote:
> >
> > > Hi Srini,
> > >
> > > You can use STRSPLIT to split your "value" chararray and define schema
> > in a
> > > FOREACH. For example, if the "value" consists of 3 integers (i.e.
> > "1|2|3"),
> > >
> > > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
> > > (key:long,value:chararray);
> > > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
> > > j:int, k:int);
> > > DESCRIBE B;
> > > DUMP B;
> > >
> > > This will return:
> > >
> > > B: {key: chararray,i: int,j: int,k: int}
> > > (k,1,2,3)
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > > On Sun, Dec 23, 2012 at 9:24 PM, Srini <pi...@gmail.com> wrote:
> > >
> > > > Hi ,
> > > >
> > > > I have used SequeceFileLoader for loading sequence file.
> > > >
> > > > A= load 'part-m-0000' using SequenceFileLoader() as
> > > > (key:long,value:chararray)
> > > >
> > > > "value" is the  chararray which consists of 10 fields which are
> > separated
> > > > by delimiter ( "|" here ). How do I create schema here so that I can
> > make
> > > > further analysis with these fields (such as filter, group )
> > > >
> > > > Any help is appreciated.
> > > >
> > > > Thanks,
> > > > Srini
> > > >
> > >
> >
>

Re: Sequence File processing

Posted by Kshiva Kps <ks...@gmail.com>.
Hi,

Is there any PIG editors and where we can write 100 to 150 pig scripts
I'm believing is not possible to  do in CLI mode .
Like IDE for JAVA /TOAD for SQL pls advice , many thanks


Thanks


On Tue, Dec 25, 2012 at 3:09 AM, Mohammad Tariq <do...@gmail.com> wrote:

> +1
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <cheolsoo@cloudera.com
> >wrote:
>
> > Hi Srini,
> >
> > You can use STRSPLIT to split your "value" chararray and define schema
> in a
> > FOREACH. For example, if the "value" consists of 3 integers (i.e.
> "1|2|3"),
> >
> > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
> > (key:long,value:chararray);
> > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
> > j:int, k:int);
> > DESCRIBE B;
> > DUMP B;
> >
> > This will return:
> >
> > B: {key: chararray,i: int,j: int,k: int}
> > (k,1,2,3)
> >
> > Thanks,
> > Cheolsoo
> >
> >
> > On Sun, Dec 23, 2012 at 9:24 PM, Srini <pi...@gmail.com> wrote:
> >
> > > Hi ,
> > >
> > > I have used SequeceFileLoader for loading sequence file.
> > >
> > > A= load 'part-m-0000' using SequenceFileLoader() as
> > > (key:long,value:chararray)
> > >
> > > "value" is the  chararray which consists of 10 fields which are
> separated
> > > by delimiter ( "|" here ). How do I create schema here so that I can
> make
> > > further analysis with these fields (such as filter, group )
> > >
> > > Any help is appreciated.
> > >
> > > Thanks,
> > > Srini
> > >
> >
>

Re: Sequence File processing

Posted by Mohammad Tariq <do...@gmail.com>.
+1

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hi Srini,
>
> You can use STRSPLIT to split your "value" chararray and define schema in a
> FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),
>
> A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
> (key:long,value:chararray);
> B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
> j:int, k:int);
> DESCRIBE B;
> DUMP B;
>
> This will return:
>
> B: {key: chararray,i: int,j: int,k: int}
> (k,1,2,3)
>
> Thanks,
> Cheolsoo
>
>
> On Sun, Dec 23, 2012 at 9:24 PM, Srini <pi...@gmail.com> wrote:
>
> > Hi ,
> >
> > I have used SequeceFileLoader for loading sequence file.
> >
> > A= load 'part-m-0000' using SequenceFileLoader() as
> > (key:long,value:chararray)
> >
> > "value" is the  chararray which consists of 10 fields which are separated
> > by delimiter ( "|" here ). How do I create schema here so that I can make
> > further analysis with these fields (such as filter, group )
> >
> > Any help is appreciated.
> >
> > Thanks,
> > Srini
> >
>

Re: Sequence File processing

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Srini,

You can use STRSPLIT to split your "value" chararray and define schema in a
FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),

A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
(key:long,value:chararray);
B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
j:int, k:int);
DESCRIBE B;
DUMP B;

This will return:

B: {key: chararray,i: int,j: int,k: int}
(k,1,2,3)

Thanks,
Cheolsoo


On Sun, Dec 23, 2012 at 9:24 PM, Srini <pi...@gmail.com> wrote:

> Hi ,
>
> I have used SequeceFileLoader for loading sequence file.
>
> A= load 'part-m-0000' using SequenceFileLoader() as
> (key:long,value:chararray)
>
> "value" is the  chararray which consists of 10 fields which are separated
> by delimiter ( "|" here ). How do I create schema here so that I can make
> further analysis with these fields (such as filter, group )
>
> Any help is appreciated.
>
> Thanks,
> Srini
>