You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Saurabh S <sa...@live.com> on 2012/05/16 00:34:28 UTC

Load Pig metadata from file?

Here is a sample LOAD statement from Programming Pig book: 

daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
            date:chararray, open:float, high:float, low:float, close:float,
            volume:int, adj_close:float);

In my case, there are around 250 columns to load. So, I created a file, say, metadata.dat with its contents as follows:

 (exchange:chararray, symbol:chararray,

            date:chararray, open:float, high:float, low:float, close:float,

            volume:int, adj_close:float)

My load statement now looks like

daily = load 'NYSE_daily' as $md;

and the execution looks like.

pig -f script.pig -param md=$(cat metadata.dat)

However, I get the following error in this method:

ERROR 1000: Error during parsing. Lexical error at line 9, column 0.  Encountered: <EOF> after : ""

Copying the contents of the file in appropriate place works fine. But the pig script is cluttered with the metdata and I would like to separate it from the script. Any ideas?

HCatLoader() does not seem to be available on my system.



 		 	   		  

Re: Load Pig metadata from file?

Posted by shan s <my...@gmail.com>.
Can you use macros instead? It would be much cleaner..
I was just pointed to
http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/


On Wed, May 16, 2012 at 4:04 AM, Saurabh S <sa...@live.com> wrote:

>
> Here is a sample LOAD statement from Programming Pig book:
>
> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>            date:chararray, open:float, high:float, low:float, close:float,
>            volume:int, adj_close:float);
>
> In my case, there are around 250 columns to load. So, I created a file,
> say, metadata.dat with its contents as follows:
>
>  (exchange:chararray, symbol:chararray,
>
>            date:chararray, open:float, high:float, low:float, close:float,
>
>            volume:int, adj_close:float)
>
> My load statement now looks like
>
> daily = load 'NYSE_daily' as $md;
>
> and the execution looks like.
>
> pig -f script.pig -param md=$(cat metadata.dat)
>
> However, I get the following error in this method:
>
> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>  Encountered: <EOF> after : ""
>
> Copying the contents of the file in appropriate place works fine. But the
> pig script is cluttered with the metdata and I would like to separate it
> from the script. Any ideas?
>
> HCatLoader() does not seem to be available on my system.
>
>
>
>

Re: Load Pig metadata from file?

Posted by Ruslan Al-Fakikh <ru...@jalent.ru>.
Saurabh,

We had the same requirement in our project and what we did is
implementing our custom Loader which takes an xml file containing all
the schema information. Like this:
data = LOAD 'path' USING com.example.CustomLoader('schema.xml');
But this is not a trivial solution, because you'll have to deal with Pig API

Ruslan

On Thu, May 17, 2012 at 12:25 AM, Thejas Nair <th...@hortonworks.com> wrote:
> you can also use 'pig -dryrun ..' to see what the pig query after parameter
> substitution looks like.
>
> Thanks,
> Thejas
>
>
>
> On 5/15/12 4:56 PM, Saurabh S wrote:
>>
>>
>> Aniket: You were spot on. This method doesn't allow any spaces in the file
>> because the parameter will get truncated at the first sighting of a white
>> space. I found that using the 'bash -x' method that you suggested. Thanks a
>> lot for that!
>>
>> Shan: I'm just beginning to use Pig and don't know a lot about macros.
>> I'll look into them, however.
>>
>> Regards,
>> Saurabh
>>
>>> Date: Tue, 15 May 2012 15:58:53 -0700
>>> Subject: Re: Load Pig metadata from file?
>>> From: aniket486@gmail.com
>>> To: user@pig.apache.org
>>>
>>> I think you need to play with some quotes, its more likely a bash
>>> problem.
>>>
>>> one way to debug is bash -x pig  -f script.pig -param md=$(cat
>>> metadata.dat) and check what does hadoop jar gets in the end.
>>>
>>> try - md="$(cat metadata.dat)"
>>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>>> and so on..
>>>
>>> Thanks,
>>> Aniket
>>>
>>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<sa...@live.com>  wrote:
>>>
>>>>
>>>> Here is a sample LOAD statement from Programming Pig book:
>>>>
>>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>            volume:int, adj_close:float);
>>>>
>>>> In my case, there are around 250 columns to load. So, I created a file,
>>>> say, metadata.dat with its contents as follows:
>>>>
>>>>  (exchange:chararray, symbol:chararray,
>>>>
>>>>            date:chararray, open:float, high:float, low:float,
>>>> close:float,
>>>>
>>>>            volume:int, adj_close:float)
>>>>
>>>> My load statement now looks like
>>>>
>>>> daily = load 'NYSE_daily' as $md;
>>>>
>>>> and the execution looks like.
>>>>
>>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>>
>>>> However, I get the following error in this method:
>>>>
>>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>>>  Encountered:<EOF>  after : ""
>>>>
>>>> Copying the contents of the file in appropriate place works fine. But
>>>> the
>>>> pig script is cluttered with the metdata and I would like to separate it
>>>> from the script. Any ideas?
>>>>
>>>> HCatLoader() does not seem to be available on my system.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>
>>
>
>



-- 
Best Regards,
Ruslan Al-Fakikh

Re: Load Pig metadata from file?

Posted by Thejas Nair <th...@hortonworks.com>.
you can also use 'pig -dryrun ..' to see what the pig query after 
parameter substitution looks like.

Thanks,
Thejas


On 5/15/12 4:56 PM, Saurabh S wrote:
>
> Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that!
>
> Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however.
>
> Regards,
> Saurabh
>
>> Date: Tue, 15 May 2012 15:58:53 -0700
>> Subject: Re: Load Pig metadata from file?
>> From: aniket486@gmail.com
>> To: user@pig.apache.org
>>
>> I think you need to play with some quotes, its more likely a bash problem.
>>
>> one way to debug is bash -x pig  -f script.pig -param md=$(cat
>> metadata.dat) and check what does hadoop jar gets in the end.
>>
>> try - md="$(cat metadata.dat)"
>> or -md="'$(cat metadata.dat)'" (single quote inside double quote
>> and so on..
>>
>> Thanks,
>> Aniket
>>
>> On Tue, May 15, 2012 at 3:34 PM, Saurabh S<sa...@live.com>  wrote:
>>
>>>
>>> Here is a sample LOAD statement from Programming Pig book:
>>>
>>> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>>>             date:chararray, open:float, high:float, low:float, close:float,
>>>             volume:int, adj_close:float);
>>>
>>> In my case, there are around 250 columns to load. So, I created a file,
>>> say, metadata.dat with its contents as follows:
>>>
>>>   (exchange:chararray, symbol:chararray,
>>>
>>>             date:chararray, open:float, high:float, low:float, close:float,
>>>
>>>             volume:int, adj_close:float)
>>>
>>> My load statement now looks like
>>>
>>> daily = load 'NYSE_daily' as $md;
>>>
>>> and the execution looks like.
>>>
>>> pig -f script.pig -param md=$(cat metadata.dat)
>>>
>>> However, I get the following error in this method:
>>>
>>> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>>>   Encountered:<EOF>  after : ""
>>>
>>> Copying the contents of the file in appropriate place works fine. But the
>>> pig script is cluttered with the metdata and I would like to separate it
>>> from the script. Any ideas?
>>>
>>> HCatLoader() does not seem to be available on my system.
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>   		 	   		


RE: Load Pig metadata from file?

Posted by Saurabh S <sa...@live.com>.
Aniket: You were spot on. This method doesn't allow any spaces in the file because the parameter will get truncated at the first sighting of a white space. I found that using the 'bash -x' method that you suggested. Thanks a lot for that!

Shan: I'm just beginning to use Pig and don't know a lot about macros. I'll look into them, however.

Regards,
Saurabh

> Date: Tue, 15 May 2012 15:58:53 -0700
> Subject: Re: Load Pig metadata from file?
> From: aniket486@gmail.com
> To: user@pig.apache.org
> 
> I think you need to play with some quotes, its more likely a bash problem.
> 
> one way to debug is bash -x pig  -f script.pig -param md=$(cat
> metadata.dat) and check what does hadoop jar gets in the end.
> 
> try - md="$(cat metadata.dat)"
> or -md="'$(cat metadata.dat)'" (single quote inside double quote
> and so on..
> 
> Thanks,
> Aniket
> 
> On Tue, May 15, 2012 at 3:34 PM, Saurabh S <sa...@live.com> wrote:
> 
> >
> > Here is a sample LOAD statement from Programming Pig book:
> >
> > daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
> >            date:chararray, open:float, high:float, low:float, close:float,
> >            volume:int, adj_close:float);
> >
> > In my case, there are around 250 columns to load. So, I created a file,
> > say, metadata.dat with its contents as follows:
> >
> >  (exchange:chararray, symbol:chararray,
> >
> >            date:chararray, open:float, high:float, low:float, close:float,
> >
> >            volume:int, adj_close:float)
> >
> > My load statement now looks like
> >
> > daily = load 'NYSE_daily' as $md;
> >
> > and the execution looks like.
> >
> > pig -f script.pig -param md=$(cat metadata.dat)
> >
> > However, I get the following error in this method:
> >
> > ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
> >  Encountered: <EOF> after : ""
> >
> > Copying the contents of the file in appropriate place works fine. But the
> > pig script is cluttered with the metdata and I would like to separate it
> > from the script. Any ideas?
> >
> > HCatLoader() does not seem to be available on my system.
> >
> >
> >
> >
> 
> 
> 
> 
> -- 
> "...:::Aniket:::... Quetzalco@tl"
 		 	   		  

Re: Load Pig metadata from file?

Posted by Aniket Mokashi <an...@gmail.com>.
I think you need to play with some quotes, its more likely a bash problem.

one way to debug is bash -x pig  -f script.pig -param md=$(cat
metadata.dat) and check what does hadoop jar gets in the end.

try - md="$(cat metadata.dat)"
or -md="'$(cat metadata.dat)'" (single quote inside double quote
and so on..

Thanks,
Aniket

On Tue, May 15, 2012 at 3:34 PM, Saurabh S <sa...@live.com> wrote:

>
> Here is a sample LOAD statement from Programming Pig book:
>
> daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
>            date:chararray, open:float, high:float, low:float, close:float,
>            volume:int, adj_close:float);
>
> In my case, there are around 250 columns to load. So, I created a file,
> say, metadata.dat with its contents as follows:
>
>  (exchange:chararray, symbol:chararray,
>
>            date:chararray, open:float, high:float, low:float, close:float,
>
>            volume:int, adj_close:float)
>
> My load statement now looks like
>
> daily = load 'NYSE_daily' as $md;
>
> and the execution looks like.
>
> pig -f script.pig -param md=$(cat metadata.dat)
>
> However, I get the following error in this method:
>
> ERROR 1000: Error during parsing. Lexical error at line 9, column 0.
>  Encountered: <EOF> after : ""
>
> Copying the contents of the file in appropriate place works fine. But the
> pig script is cluttered with the metdata and I would like to separate it
> from the script. Any ideas?
>
> HCatLoader() does not seem to be available on my system.
>
>
>
>




-- 
"...:::Aniket:::... Quetzalco@tl"