You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Saurabh S <sa...@live.com> on 2012/05/16 00:34:28 UTC
Load Pig metadata from file?
Here is a sample LOAD statement from Programming Pig book:
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float);
In my case, there are around 250 columns to load. So, I created a file, say, metadata.dat with its contents as follows:
(exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float)
My load statement now looks like
daily = load 'NYSE_daily' as $md;
and the execution looks like.
pig -f script.pig -param md=$(cat metadata.dat)
However, I get the following error in this method:
ERROR 1000: Error during parsing. Lexical error at line 9, column 0. Encountered: <EOF> after : ""
Copying the contents of the file in appropriate place works fine. But the pig script is cluttered with the metdata and I would like to separate it from the script. Any ideas?
HCatLoader() does not seem to be available on my system.
Re: Load Pig metadata from file?
Posted by shan s <my...@gmail.com>.
Can you use macros instead? It would be much cleaner..
I was just pointed to
http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
On Wed, May 16, 2012 at 4:04 AM, Saurabh S <sa...@live.com> wrote:
>
> Here is a sample LOAD statement from Programming Pig book:
>
> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
> date:chararray, open:float, high:float, low:float, close:float,
> volume:int, adj_close:float);
>
> In my case, there are around 250 columns to load. So, I created a file,
> say, metadata.dat with its contents as follows:
>
> (exchange:chararray, symbol:chararray,
>
> date:chararray, open:float, high:float, low:float, close:float,
>
> volume:int, adj_close:float)
>
> My load statement now looks like
>
> daily = load 'NYSE_daily' as $md;
>
> and the execution looks like.
>
> pig -f script.pig -param md=$(cat metadata.dat)
>
> However, I get the following error in this method:
>
> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
> Encountered: <EOF> after : ""
>
> Copying the contents of the file in appropriate place works fine. But the
> pig script is cluttered with the metdata and I would like to separate it
> from the script. Any ideas?
>
> HCatLoader() does not seem to be available on my system.
>
>
>
>
Re: Load Pig metadata from file?
Posted by Ruslan Al-Fakikh <ru...@jalent.ru>.
Saurabh,
We had the same requirement in our project and what we did is
implementing our custom Loader which takes an xml file containing all
the schema information. Like this:
data = LOAD 'path' USING com.example.CustomLoader('schema.xml');
But this is not a trivial solution, because you'll have to deal with Pig API
Ruslan
On Thu, May 17, 2012 at 12:25 AM, Thejas Nair <th...@hortonworks.com> wrote:
> you can also use 'pig -dryrun ..' to see what the pig query after parameter
> substitution looks like.
>
> Thanks,
> Thejas
>
>
>
> On 5/15/12 4:56 PM, Saurabh S wrote:
>>
>>
>> Aniket: You were spot on. This method doesn't allow any spaces in the file
>> because the parameter will get truncated at the first sighting of a white
>> space. I found that using the 'bash -x' method that you suggested. Thanks a
>> lot for that!
>>
>> Shan: I'm just beginning to use Pig and don't know a lot about macros.
>> I'll look into them, however.
>>
>> Regards,
>> Saurabh
>>
>>> Date: Tue, 15 May 2012 15:58:53 -0700
>>> Subject: Re: Load Pig metadata from file?
>>> From: aniket486@gmail.com
>>> To: user@pig.apache.org
>>>
>>> I think you need to play with some quotes, its more likely a bash
>>> problem.
>>>
>>> one way to debug is bash -x pig -f script.pig -param md=$(cat
>>> metadata.dat) and check what does hadoop jar gets in the end.
>>>
>>> try - md="$(cat metadata.dat)"
>>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>>> and so on..
>>>
>>> Thanks,
>>> Aniket
>>>
>>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<sa...@live.com> wrote:
>>>
>>>>
>>>> Here is a sample LOAD statement from Programming Pig book:
>>>>
>>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>>> date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>> volume:int, adj_close:float);
>>>>
>>>> In my case, there are around 250 columns to load. So, I created a file,
>>>> say, metadata.dat with its contents as follows:
>>>>
>>>> (exchange:chararray, symbol:chararray,
>>>>
>>>> date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>
>>>> volume:int, adj_close:float)
>>>>
>>>> My load statement now looks like
>>>>
>>>> daily = load 'NYSE_daily' as $md;
>>>>
>>>> and the execution looks like.
>>>>
>>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>>
>>>> However, I get the following error in this method:
>>>>
>>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>>> Encountered:<EOF> after : ""
>>>>
>>>> Copying the contents of the file in appropriate place works fine. But
>>>> the
>>>> pig script is cluttered with the metdata and I would like to separate it
>>>> from the script. Any ideas?
>>>>
>>>> HCatLoader() does not seem to be available on my system.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>
>>
>
>
--
Best Regards,
Ruslan Al-Fakikh
Re: Load Pig metadata from file?
Posted by Thejas Nair <th...@hortonworks.com>.
you can also use 'pig -dryrun ..' to see what the pig query after
parameter substitution looks like.
Thanks,
Thejas
On 5/15/12 4:56 PM, Saurabh S wrote:
>
> Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that!
>
> Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however.
>
> Regards,
> Saurabh
>
>> Date: Tue, 15 May 2012 15:58:53 -0700
>> Subject: Re: Load Pig metadata from file?
>> From: aniket486@gmail.com
>> To: user@pig.apache.org
>>
>> I think you need to play with some quotes, its more likely a bash problem.
>>
>> one way to debug is bash -x pig -f script.pig -param md=$(cat
>> metadata.dat) and check what does hadoop jar gets in the end.
>>
>> try - md="$(cat metadata.dat)"
>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>> and so on..
>>
>> Thanks,
>> Aniket
>>
>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<sa...@live.com> wrote:
>>
>>>
>>> Here is a sample LOAD statement from Programming Pig book:
>>>
>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>> date:chararray, open:float, high:float, low:float, close:float,
>>> volume:int, adj_close:float);
>>>
>>> In my case, there are around 250 columns to load. So, I created a file,
>>> say, metadata.dat with its contents as follows:
>>>
>>> (exchange:chararray, symbol:chararray,
>>>
>>> date:chararray, open:float, high:float, low:float, close:float,
>>>
>>> volume:int, adj_close:float)
>>>
>>> My load statement now looks like
>>>
>>> daily = load 'NYSE_daily' as $md;
>>>
>>> and the execution looks like.
>>>
>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>
>>> However, I get the following error in this method:
>>>
>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>> Encountered:<EOF> after : ""
>>>
>>> Copying the contents of the file in appropriate place works fine. But the
>>> pig script is cluttered with the metdata and I would like to separate it
>>> from the script. Any ideas?
>>>
>>> HCatLoader() does not seem to be available on my system.
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>
RE: Load Pig metadata from file?
Posted by Saurabh S <sa...@live.com>.
Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that!
Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however.
Regards,
Saurabh
> Date: Tue, 15 May 2012 15:58:53 -0700
> Subject: Re: Load Pig metadata from file?
> From: aniket486@gmail.com
> To: user@pig.apache.org
>
> I think you need to play with some quotes, its more likely a bash problem.
>
> one way to debug is bash -x pig -f script.pig -param md=$(cat
> metadata.dat) and check what does hadoop jar gets in the end.
>
> try - md="$(cat metadata.dat)"
> or -md="'$(cat metadata.dat)'" (single quote inside double quote
> and so on..
>
> Thanks,
> Aniket
>
> On Tue, May 15, 2012 at 3:34 PM, Saurabh S <sa...@live.com> wrote:
>
> >
> > Here is a sample LOAD statement from Programming Pig book:
> >
> > daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
> > date:chararray, open:float, high:float, low:float, close:float,
> > volume:int, adj_close:float);
> >
> > In my case, there are around 250 columns to load. So, I created a file,
> > say, metadata.dat with its contents as follows:
> >
> > (exchange:chararray, symbol:chararray,
> >
> > date:chararray, open:float, high:float, low:float, close:float,
> >
> > volume:int, adj_close:float)
> >
> > My load statement now looks like
> >
> > daily = load 'NYSE_daily' as $md;
> >
> > and the execution looks like.
> >
> > pig -f script.pig -param md=$(cat metadata.dat)
> >
> > However, I get the following error in this method:
> >
> > ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
> > Encountered: <EOF> after : ""
> >
> > Copying the contents of the file in appropriate place works fine. But the
> > pig script is cluttered with the metdata and I would like to separate it
> > from the script. Any ideas?
> >
> > HCatLoader() does not seem to be available on my system.
> >
> >
> >
> >
>
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
Re: Load Pig metadata from file?
Posted by Aniket Mokashi <an...@gmail.com>.
I think you need to play with some quotes, its more likely a bash problem.
one way to debug is bash -x pig -f script.pig -param md=$(cat
metadata.dat) and check what does hadoop jar gets in the end.
try - md="$(cat metadata.dat)"
or -md="'$(cat metadata.dat)'" (single quote inside double quote
and so on..
Thanks,
Aniket
On Tue, May 15, 2012 at 3:34 PM, Saurabh S <sa...@live.com> wrote:
>
> Here is a sample LOAD statement from Programming Pig book:
>
> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
> date:chararray, open:float, high:float, low:float, close:float,
> volume:int, adj_close:float);
>
> In my case, there are around 250 columns to load. So, I created a file,
> say, metadata.dat with its contents as follows:
>
> (exchange:chararray, symbol:chararray,
>
> date:chararray, open:float, high:float, low:float, close:float,
>
> volume:int, adj_close:float)
>
> My load statement now looks like
>
> daily = load 'NYSE_daily' as $md;
>
> and the execution looks like.
>
> pig -f script.pig -param md=$(cat metadata.dat)
>
> However, I get the following error in this method:
>
> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
> Encountered: <EOF> after : ""
>
> Copying the contents of the file in appropriate place works fine. But the
> pig script is cluttered with the metdata and I would like to separate it
> from the script. Any ideas?
>
> HCatLoader() does not seem to be available on my system.
>
>
>
>
--
"...:::Aniket:::... Quetzalco@tl"