You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Reza <re...@yahoo.com> on 2011/09/12 21:33:18 UTC

LoadFunc and schemas (pig 0.9)

Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse it. The problem is that I cant figure out how to access the schema that is defined in the load statement. I am forced to do something like this:

A = LOAD '/test/loadfiles/*' USING com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)') AS (site:chararray,zone:chararray,pos:chararray);


I have to define my schema twice, once for my custom loader and once for pig. I can see that there is a LoadCastor interface, but its not clear to me how to use it in LoadFunc. All I need to do is get access to the schema inside of my LogStorage class. Whats the proper way to load custom (non uniform) data into a schema?

thanks

Re: LoadFunc and schemas (pig 0.9)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I've written a fair number of these, let me know if something is unclear.

D

On Mon, Sep 12, 2011 at 1:44 PM, Reza <re...@yahoo.com> wrote:

> sorry, didnt fully understand what you said, I think this will work now.
>
> thanks
>
>
> ________________________________
> From: Reza <re...@yahoo.com>
> To: "user@pig.apache.org" <us...@pig.apache.org>
> Sent: Monday, September 12, 2011 4:31 PM
> Subject: Re: LoadFunc and schemas (pig 0.9)
>
> That would work but it would overload the cluster since the tuples are
> roughly 1k of data each. Really need the ability to parse down data to the
> defined schema...
>
>
> ________________________________
> From: Dmitriy Ryaboy <dv...@gmail.com>
> To: user@pig.apache.org; Reza <re...@yahoo.com>
> Sent: Monday, September 12, 2011 4:18 PM
> Subject: Re: LoadFunc and schemas (pig 0.9)
>
> Don't provide an AS clause. Instead, implement the LoadMetadata interface
> and return the appropriate schema in getSchema().
>
> D
>
> On Mon, Sep 12, 2011 at 12:44 PM, Reza <re...@yahoo.com> wrote:
>
> > Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to
> parse
> > it. The problem is that I cant figure out how to access the schema that
> is
> > defined in the load statement. I am forced to do something like this:
> >
> > A = LOAD '/test/loadfiles/*' USING
> >
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> > AS (site:chararray,zone:chararray,pos:chararray);
> >
> >
> > I have to define my schema twice, once for my custom loader and once for
> > pig. I can see that there is a LoadCastor interface, but its not clear to
> me
> > how to use it in LoadFunc. All I need to do is get access to the schema
> (the
> > text after 'AS') inside of my LogStorage class. Whats the proper way to
> load
> > custom (non uniform) data into a schema?
> >
> > thanks
>

Re: LoadFunc and schemas (pig 0.9)

Posted by Reza <re...@yahoo.com>.
sorry, didnt fully understand what you said, I think this will work now.

thanks


________________________________
From: Reza <re...@yahoo.com>
To: "user@pig.apache.org" <us...@pig.apache.org>
Sent: Monday, September 12, 2011 4:31 PM
Subject: Re: LoadFunc and schemas (pig 0.9)

That would work but it would overload the cluster since the tuples are roughly 1k of data each. Really need the ability to parse down data to the defined schema...


________________________________
From: Dmitriy Ryaboy <dv...@gmail.com>
To: user@pig.apache.org; Reza <re...@yahoo.com>
Sent: Monday, September 12, 2011 4:18 PM
Subject: Re: LoadFunc and schemas (pig 0.9)

Don't provide an AS clause. Instead, implement the LoadMetadata interface
and return the appropriate schema in getSchema().

D

On Mon, Sep 12, 2011 at 12:44 PM, Reza <re...@yahoo.com> wrote:

> Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse
> it. The problem is that I cant figure out how to access the schema that is
> defined in the load statement. I am forced to do something like this:
>
> A = LOAD '/test/loadfiles/*' USING
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> AS (site:chararray,zone:chararray,pos:chararray);
>
>
> I have to define my schema twice, once for my custom loader and once for
> pig. I can see that there is a LoadCastor interface, but its not clear to me
> how to use it in LoadFunc. All I need to do is get access to the schema (the
> text after 'AS') inside of my LogStorage class. Whats the proper way to load
> custom (non uniform) data into a schema?
>
> thanks

Re: LoadFunc and schemas (pig 0.9)

Posted by Reza <re...@yahoo.com>.
That would work but it would overload the cluster since the tuples are roughly 1k of data each. Really need the ability to parse down data to the defined schema...


________________________________
From: Dmitriy Ryaboy <dv...@gmail.com>
To: user@pig.apache.org; Reza <re...@yahoo.com>
Sent: Monday, September 12, 2011 4:18 PM
Subject: Re: LoadFunc and schemas (pig 0.9)

Don't provide an AS clause. Instead, implement the LoadMetadata interface
and return the appropriate schema in getSchema().

D

On Mon, Sep 12, 2011 at 12:44 PM, Reza <re...@yahoo.com> wrote:

> Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse
> it. The problem is that I cant figure out how to access the schema that is
> defined in the load statement. I am forced to do something like this:
>
> A = LOAD '/test/loadfiles/*' USING
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> AS (site:chararray,zone:chararray,pos:chararray);
>
>
> I have to define my schema twice, once for my custom loader and once for
> pig. I can see that there is a LoadCastor interface, but its not clear to me
> how to use it in LoadFunc. All I need to do is get access to the schema (the
> text after 'AS') inside of my LogStorage class. Whats the proper way to load
> custom (non uniform) data into a schema?
>
> thanks

Re: LoadFunc and schemas (pig 0.9)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Don't provide an AS clause. Instead, implement the LoadMetadata interface
and return the appropriate schema in getSchema().

D

On Mon, Sep 12, 2011 at 12:44 PM, Reza <re...@yahoo.com> wrote:

> Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse
> it. The problem is that I cant figure out how to access the schema that is
> defined in the load statement. I am forced to do something like this:
>
> A = LOAD '/test/loadfiles/*' USING
> com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)')
> AS (site:chararray,zone:chararray,pos:chararray);
>
>
> I have to define my schema twice, once for my custom loader and once for
> pig. I can see that there is a LoadCastor interface, but its not clear to me
> how to use it in LoadFunc. All I need to do is get access to the schema (the
> text after 'AS') inside of my LogStorage class. Whats the proper way to load
> custom (non uniform) data into a schema?
>
> thanks

LoadFunc and schemas (pig 0.9)

Posted by Reza <re...@yahoo.com>.
Using pig 0.9. My data is very dynamic so I use a custom LoadFunc to parse it. The problem is that I cant figure out how to access the schema that is defined in the load statement. I am forced to do something like this:

A = LOAD '/test/loadfiles/*' USING com.custom.pig.LogStorage('(site:chararray,zone:chararray,pos:chararray)') AS (site:chararray,zone:chararray,pos:chararray);


I have to define my schema twice, once for my custom loader and once for pig. I can see that there is a LoadCastor interface, but its not clear to me how to use it in LoadFunc. All I need to do is get access to the schema (the text after 'AS') inside of my LogStorage class. Whats the proper way to load custom (non uniform) data into a schema?

thanks