You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jeff Yuan <qu...@gmail.com> on 2013/02/22 02:53:41 UTC

Question about loader and storer

Hi,

I am new to the pig community, and have a couple of questions
regarding loader/storers.  Because I am writing code that closely
couples with Pig, I thought I'd ask on the dev mailing list, but
please let me know if this is more appropriate for the user list.

1) Is there a way to set the default pig loader as something other
than PigStorage via configuration? What I mean is, by default if a
loader is not specified, PigStorage is assumed. Can I change things so
that if no loader is specified in the load statement, another custom
loader is used?

2) Is there a way to output the entire query result to stdout? Is the
easiest way to do so by writing a custom storer that doesn't actually
store to a file but instead just output to stdout?

If no specific answer is available, advice on where in the pig code I
should look to discover answers for myself would also be highly
appreciated.

Thanks,
Jeff

Re: Question about loader and storer

Posted by Jeff Yuan <qu...@gmail.com>.
Thanks to Johnny, Aniket, and Prashant for your help!

On Thu, Feb 21, 2013 at 7:05 PM, Prashant Kommireddi
<pr...@gmail.com> wrote:
> I have opened a JIRA https://issues.apache.org/jira/browse/PIG-3211
>
> On Thu, Feb 21, 2013 at 6:29 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> I agree. PigStorage is the default constructed by LogicalPlanBuilder and
>> it's not configurable.
>>
>> Jeff, can you open a JIRA? It would be a nice feature to add.
>>
>> -Prashant
>>
>>
>> On Thu, Feb 21, 2013 at 6:26 PM, Aniket Mokashi <an...@gmail.com>wrote:
>>
>>> I think default loader is hardcoded in the pig code. You can open a jira
>>> if
>>> you need such a feature.
>>>
>>> Thanks,
>>> Aniket
>>>
>>>
>>> On Thu, Feb 21, 2013 at 6:08 PM, Johnny Zhang <xi...@cloudera.com>
>>> wrote:
>>>
>>> > Jeff:
>>> > Basically, edit the pig.properties to
>>> > .....
>>> > pig.load.default.statements=/tmp/.temppigbootup
>>> > .....
>>> >
>>> > and in file /tmp/.temppigbootup, you have load statement
>>> > data = LOAD 'top_queries_input_data.txt' AS (query:CHARARRAY,
>>> count:INT);
>>> >
>>> > you can edit content to use other loader here.
>>> >
>>> > Hope it is helpful. This is different from what you want, and I am also
>>> > searching if we can define default loader other than PigStorage.
>>> >
>>> > Johnny
>>> >
>>> >
>>> > On Thu, Feb 21, 2013 at 6:03 PM, Johnny Zhang <xi...@cloudera.com>
>>> > wrote:
>>> >
>>> > > Hi, Jeff Yuan:
>>> > >
>>> > > On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com>
>>> > wrote:
>>> > >
>>> > >> Hi,
>>> > >>
>>> > >> I am new to the pig community, and have a couple of questions
>>> > >> regarding loader/storers.  Because I am writing code that closely
>>> > >> couples with Pig, I thought I'd ask on the dev mailing list, but
>>> > >> please let me know if this is more appropriate for the user list.
>>> > >>
>>> > >> 1) Is there a way to set the default pig loader as something other
>>> > >> than PigStorage via configuration? What I mean is, by default if a
>>> > >> loader is not specified, PigStorage is assumed. Can I change things
>>> so
>>> > >> that if no loader is specified in the load statement, another custom
>>> > >> loader is used?
>>> > >>
>>> > > I am not sure if there is another way, but you can edit "
>>> > > pig.load.default.statements=" in pig.properties file. So in your Pig
>>> > > script you don't have to write load statement, but Pig will always
>>> load
>>> > it
>>> > > for you (of course with the loader you specified)
>>> > >
>>> > >>
>>> > >> 2) Is there a way to output the entire query result to stdout? Is the
>>> > >> easiest way to do so by writing a custom storer that doesn't actually
>>> > >> store to a file but instead just output to stdout?
>>> > >>
>>> > > can you try 'DUMP' ?
>>> > >
>>> > >>
>>> > >> If no specific answer is available, advice on where in the pig code I
>>> > >> should look to discover answers for myself would also be highly
>>> > >> appreciated.
>>> > >>
>>> > >> Thanks,
>>> > >> Jeff
>>> > >>
>>> > >
>>> > > Johnny
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>>

Re: Question about loader and storer

Posted by Prashant Kommireddi <pr...@gmail.com>.
I have opened a JIRA https://issues.apache.org/jira/browse/PIG-3211

On Thu, Feb 21, 2013 at 6:29 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> I agree. PigStorage is the default constructed by LogicalPlanBuilder and
> it's not configurable.
>
> Jeff, can you open a JIRA? It would be a nice feature to add.
>
> -Prashant
>
>
> On Thu, Feb 21, 2013 at 6:26 PM, Aniket Mokashi <an...@gmail.com>wrote:
>
>> I think default loader is hardcoded in the pig code. You can open a jira
>> if
>> you need such a feature.
>>
>> Thanks,
>> Aniket
>>
>>
>> On Thu, Feb 21, 2013 at 6:08 PM, Johnny Zhang <xi...@cloudera.com>
>> wrote:
>>
>> > Jeff:
>> > Basically, edit the pig.properties to
>> > .....
>> > pig.load.default.statements=/tmp/.temppigbootup
>> > .....
>> >
>> > and in file /tmp/.temppigbootup, you have load statement
>> > data = LOAD 'top_queries_input_data.txt' AS (query:CHARARRAY,
>> count:INT);
>> >
>> > you can edit content to use other loader here.
>> >
>> > Hope it is helpful. This is different from what you want, and I am also
>> > searching if we can define default loader other than PigStorage.
>> >
>> > Johnny
>> >
>> >
>> > On Thu, Feb 21, 2013 at 6:03 PM, Johnny Zhang <xi...@cloudera.com>
>> > wrote:
>> >
>> > > Hi, Jeff Yuan:
>> > >
>> > > On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com>
>> > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I am new to the pig community, and have a couple of questions
>> > >> regarding loader/storers.  Because I am writing code that closely
>> > >> couples with Pig, I thought I'd ask on the dev mailing list, but
>> > >> please let me know if this is more appropriate for the user list.
>> > >>
>> > >> 1) Is there a way to set the default pig loader as something other
>> > >> than PigStorage via configuration? What I mean is, by default if a
>> > >> loader is not specified, PigStorage is assumed. Can I change things
>> so
>> > >> that if no loader is specified in the load statement, another custom
>> > >> loader is used?
>> > >>
>> > > I am not sure if there is another way, but you can edit "
>> > > pig.load.default.statements=" in pig.properties file. So in your Pig
>> > > script you don't have to write load statement, but Pig will always
>> load
>> > it
>> > > for you (of course with the loader you specified)
>> > >
>> > >>
>> > >> 2) Is there a way to output the entire query result to stdout? Is the
>> > >> easiest way to do so by writing a custom storer that doesn't actually
>> > >> store to a file but instead just output to stdout?
>> > >>
>> > > can you try 'DUMP' ?
>> > >
>> > >>
>> > >> If no specific answer is available, advice on where in the pig code I
>> > >> should look to discover answers for myself would also be highly
>> > >> appreciated.
>> > >>
>> > >> Thanks,
>> > >> Jeff
>> > >>
>> > >
>> > > Johnny
>> > >
>> >
>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>>
>
>

Re: Question about loader and storer

Posted by Prashant Kommireddi <pr...@gmail.com>.
I agree. PigStorage is the default constructed by LogicalPlanBuilder and
it's not configurable.

Jeff, can you open a JIRA? It would be a nice feature to add.

-Prashant

On Thu, Feb 21, 2013 at 6:26 PM, Aniket Mokashi <an...@gmail.com> wrote:

> I think default loader is hardcoded in the pig code. You can open a jira if
> you need such a feature.
>
> Thanks,
> Aniket
>
>
> On Thu, Feb 21, 2013 at 6:08 PM, Johnny Zhang <xi...@cloudera.com>
> wrote:
>
> > Jeff:
> > Basically, edit the pig.properties to
> > .....
> > pig.load.default.statements=/tmp/.temppigbootup
> > .....
> >
> > and in file /tmp/.temppigbootup, you have load statement
> > data = LOAD 'top_queries_input_data.txt' AS (query:CHARARRAY, count:INT);
> >
> > you can edit content to use other loader here.
> >
> > Hope it is helpful. This is different from what you want, and I am also
> > searching if we can define default loader other than PigStorage.
> >
> > Johnny
> >
> >
> > On Thu, Feb 21, 2013 at 6:03 PM, Johnny Zhang <xi...@cloudera.com>
> > wrote:
> >
> > > Hi, Jeff Yuan:
> > >
> > > On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com>
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> I am new to the pig community, and have a couple of questions
> > >> regarding loader/storers.  Because I am writing code that closely
> > >> couples with Pig, I thought I'd ask on the dev mailing list, but
> > >> please let me know if this is more appropriate for the user list.
> > >>
> > >> 1) Is there a way to set the default pig loader as something other
> > >> than PigStorage via configuration? What I mean is, by default if a
> > >> loader is not specified, PigStorage is assumed. Can I change things so
> > >> that if no loader is specified in the load statement, another custom
> > >> loader is used?
> > >>
> > > I am not sure if there is another way, but you can edit "
> > > pig.load.default.statements=" in pig.properties file. So in your Pig
> > > script you don't have to write load statement, but Pig will always load
> > it
> > > for you (of course with the loader you specified)
> > >
> > >>
> > >> 2) Is there a way to output the entire query result to stdout? Is the
> > >> easiest way to do so by writing a custom storer that doesn't actually
> > >> store to a file but instead just output to stdout?
> > >>
> > > can you try 'DUMP' ?
> > >
> > >>
> > >> If no specific answer is available, advice on where in the pig code I
> > >> should look to discover answers for myself would also be highly
> > >> appreciated.
> > >>
> > >> Thanks,
> > >> Jeff
> > >>
> > >
> > > Johnny
> > >
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Re: Question about loader and storer

Posted by Aniket Mokashi <an...@gmail.com>.
I think default loader is hardcoded in the pig code. You can open a jira if
you need such a feature.

Thanks,
Aniket


On Thu, Feb 21, 2013 at 6:08 PM, Johnny Zhang <xi...@cloudera.com> wrote:

> Jeff:
> Basically, edit the pig.properties to
> .....
> pig.load.default.statements=/tmp/.temppigbootup
> .....
>
> and in file /tmp/.temppigbootup, you have load statement
> data = LOAD 'top_queries_input_data.txt' AS (query:CHARARRAY, count:INT);
>
> you can edit content to use other loader here.
>
> Hope it is helpful. This is different from what you want, and I am also
> searching if we can define default loader other than PigStorage.
>
> Johnny
>
>
> On Thu, Feb 21, 2013 at 6:03 PM, Johnny Zhang <xi...@cloudera.com>
> wrote:
>
> > Hi, Jeff Yuan:
> >
> > On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> I am new to the pig community, and have a couple of questions
> >> regarding loader/storers.  Because I am writing code that closely
> >> couples with Pig, I thought I'd ask on the dev mailing list, but
> >> please let me know if this is more appropriate for the user list.
> >>
> >> 1) Is there a way to set the default pig loader as something other
> >> than PigStorage via configuration? What I mean is, by default if a
> >> loader is not specified, PigStorage is assumed. Can I change things so
> >> that if no loader is specified in the load statement, another custom
> >> loader is used?
> >>
> > I am not sure if there is another way, but you can edit "
> > pig.load.default.statements=" in pig.properties file. So in your Pig
> > script you don't have to write load statement, but Pig will always load
> it
> > for you (of course with the loader you specified)
> >
> >>
> >> 2) Is there a way to output the entire query result to stdout? Is the
> >> easiest way to do so by writing a custom storer that doesn't actually
> >> store to a file but instead just output to stdout?
> >>
> > can you try 'DUMP' ?
> >
> >>
> >> If no specific answer is available, advice on where in the pig code I
> >> should look to discover answers for myself would also be highly
> >> appreciated.
> >>
> >> Thanks,
> >> Jeff
> >>
> >
> > Johnny
> >
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Question about loader and storer

Posted by Johnny Zhang <xi...@cloudera.com>.
Jeff:
Basically, edit the pig.properties to
.....
pig.load.default.statements=/tmp/.temppigbootup
.....

and in file /tmp/.temppigbootup, you have load statement
data = LOAD 'top_queries_input_data.txt' AS (query:CHARARRAY, count:INT);

you can edit content to use other loader here.

Hope it is helpful. This is different from what you want, and I am also
searching if we can define default loader other than PigStorage.

Johnny


On Thu, Feb 21, 2013 at 6:03 PM, Johnny Zhang <xi...@cloudera.com> wrote:

> Hi, Jeff Yuan:
>
> On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com> wrote:
>
>> Hi,
>>
>> I am new to the pig community, and have a couple of questions
>> regarding loader/storers.  Because I am writing code that closely
>> couples with Pig, I thought I'd ask on the dev mailing list, but
>> please let me know if this is more appropriate for the user list.
>>
>> 1) Is there a way to set the default pig loader as something other
>> than PigStorage via configuration? What I mean is, by default if a
>> loader is not specified, PigStorage is assumed. Can I change things so
>> that if no loader is specified in the load statement, another custom
>> loader is used?
>>
> I am not sure if there is another way, but you can edit "
> pig.load.default.statements=" in pig.properties file. So in your Pig
> script you don't have to write load statement, but Pig will always load it
> for you (of course with the loader you specified)
>
>>
>> 2) Is there a way to output the entire query result to stdout? Is the
>> easiest way to do so by writing a custom storer that doesn't actually
>> store to a file but instead just output to stdout?
>>
> can you try 'DUMP' ?
>
>>
>> If no specific answer is available, advice on where in the pig code I
>> should look to discover answers for myself would also be highly
>> appreciated.
>>
>> Thanks,
>> Jeff
>>
>
> Johnny
>

Re: Question about loader and storer

Posted by Johnny Zhang <xi...@cloudera.com>.
Hi, Jeff Yuan:

On Thu, Feb 21, 2013 at 5:53 PM, Jeff Yuan <qu...@gmail.com> wrote:

> Hi,
>
> I am new to the pig community, and have a couple of questions
> regarding loader/storers.  Because I am writing code that closely
> couples with Pig, I thought I'd ask on the dev mailing list, but
> please let me know if this is more appropriate for the user list.
>
> 1) Is there a way to set the default pig loader as something other
> than PigStorage via configuration? What I mean is, by default if a
> loader is not specified, PigStorage is assumed. Can I change things so
> that if no loader is specified in the load statement, another custom
> loader is used?
>
I am not sure if there is another way, but you can edit "
pig.load.default.statements=" in pig.properties file. So in your Pig script
you don't have to write load statement, but Pig will always load it for you
(of course with the loader you specified)

>
> 2) Is there a way to output the entire query result to stdout? Is the
> easiest way to do so by writing a custom storer that doesn't actually
> store to a file but instead just output to stdout?
>
can you try 'DUMP' ?

>
> If no specific answer is available, advice on where in the pig code I
> should look to discover answers for myself would also be highly
> appreciated.
>
> Thanks,
> Jeff
>

Johnny