You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Emmett Shear <em...@justin.tv> on 2008/09/17 01:00:17 UTC

log file formats

I have a log file I'm writing explicitly to process in pig. The data for
each line is a set of key-value pairs, which seems pretty much perfect as a
fit for Pig, however I'm having some trouble. It's easy to parse a space
delimited list into a tuple with PigStorage(' '); is there an easy way to
parse "k=v k=v k=v" key value pairs into a map? I have total control over
the log format.

Thanks,
Emmett

RE: log file formats

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Hi Emmett,

You might want to look at the existing functions to see if they shed
some light:

http://svn.apache.org/viewvc/incubator/pig/trunk/src/org/apache/pig/buil
tin/PigStorage.java?view=markup

Olga 

> -----Original Message-----
> From: Emmett Shear [mailto:emmett@justin.tv] 
> Sent: Tuesday, September 16, 2008 7:11 PM
> To: pig-user@incubator.apache.org
> Subject: Re: log file formats
> 
> Thanks Olga!
> 
> I've started writing my own KeyValueStorage class, and I'm 
> stuck trying to write getNext and putNext.
> 
> It seems like there's nowhere it store the key name 
> information I've extracted in the Tuple getNext creates, nor 
> is there any way to get the necessary key name information 
> out from the Tuple in putNext to write the key-value pairs 
> back out. A Tuple is literally just a fixed length list of 
> items, with no place to put the keys (as names for the 
> positions). Am I missing something? Or am I trying to do 
> something impossible?
> 
> Thanks,
> Emmett
> 
> On Tue, Sep 16, 2008 at 4:26 PM, Olga Natkovich 
> <ol...@yahoo-inc.com> wrote:
> 
> > You would need to write a custome storage funcion for this.
> >
> > Olga
> >
> > > -----Original Message-----
> > > From: Emmett Shear [mailto:emmett@justin.tv]
> > > Sent: Tuesday, September 16, 2008 4:00 PM
> > > To: pig-user@incubator.apache.org
> > > Subject: log file formats
> > >
> > > I have a log file I'm writing explicitly to process in pig.
> > > The data for each line is a set of key-value pairs, which seems 
> > > pretty much perfect as a fit for Pig, however I'm having some 
> > > trouble. It's easy to parse a space delimited list into a 
> tuple with 
> > > PigStorage(' '); is there an easy way to parse "k=v k=v k=v" key 
> > > value pairs into a map? I have total control over the log format.
> > >
> > > Thanks,
> > > Emmett
> > >
> >
> 

Re: log file formats

Posted by Emmett Shear <em...@justin.tv>.
Thanks Olga!

I've started writing my own KeyValueStorage class, and I'm stuck trying to
write getNext and putNext.

It seems like there's nowhere it store the key name information I've
extracted in the Tuple getNext creates, nor is there any way to get the
necessary key name information out from the Tuple in putNext to write the
key-value pairs back out. A Tuple is literally just a fixed length list of
items, with no place to put the keys (as names for the positions). Am I
missing something? Or am I trying to do something impossible?

Thanks,
Emmett

On Tue, Sep 16, 2008 at 4:26 PM, Olga Natkovich <ol...@yahoo-inc.com> wrote:

> You would need to write a custome storage funcion for this.
>
> Olga
>
> > -----Original Message-----
> > From: Emmett Shear [mailto:emmett@justin.tv]
> > Sent: Tuesday, September 16, 2008 4:00 PM
> > To: pig-user@incubator.apache.org
> > Subject: log file formats
> >
> > I have a log file I'm writing explicitly to process in pig.
> > The data for each line is a set of key-value pairs, which
> > seems pretty much perfect as a fit for Pig, however I'm
> > having some trouble. It's easy to parse a space delimited
> > list into a tuple with PigStorage(' '); is there an easy way
> > to parse "k=v k=v k=v" key value pairs into a map? I have
> > total control over the log format.
> >
> > Thanks,
> > Emmett
> >
>

RE: log file formats

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
You would need to write a custome storage funcion for this.

Olga 

> -----Original Message-----
> From: Emmett Shear [mailto:emmett@justin.tv] 
> Sent: Tuesday, September 16, 2008 4:00 PM
> To: pig-user@incubator.apache.org
> Subject: log file formats
> 
> I have a log file I'm writing explicitly to process in pig. 
> The data for each line is a set of key-value pairs, which 
> seems pretty much perfect as a fit for Pig, however I'm 
> having some trouble. It's easy to parse a space delimited 
> list into a tuple with PigStorage(' '); is there an easy way 
> to parse "k=v k=v k=v" key value pairs into a map? I have 
> total control over the log format.
> 
> Thanks,
> Emmett
>