You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Roger Unwin <un...@sdsc.edu> on 2009/04/23 23:29:47 UTC

Question about Pig BinaryStorage()

Santhosh,

I am trying to iterate through a group of binary files.  I would like  
the reduce job to get 1 binary file each.  Below is the first part of  
it, trying to read the data in.

I have the following script:

images = load 'images' using BinaryStorage() split by 'file';

dump images;

Here is my invocation:
java -cp pig.jar:/home/unwin/hadoop-0.19.1/conf org.apache.pig.Main -x  
local myScript5
2009-04-23 14:22:38,669 [main] ERROR  
org 
.apache 
.pig 
.backend 
.hadoop.executionengine.physicalLayer.relationalOperators.POStore -  
Received error from storer function:  
org.apache.pig.backend.executionengine.ExecException: ERROR 2081:  
Unable to setup the load function.
2009-04-23 14:22:38,673 [main] INFO   
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed  
jobs!!
2009-04-23 14:22:38,674 [main] INFO   
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out  
of 1 failed!
2009-04-23 14:22:38,678 [main] ERROR org.apache.pig.tools.grunt.Grunt  
- ERROR 1066: Unable to open iterator for alias images

Here is where the files are in hadoop:
unwin@hadoop-n:~/pig-0.2.0$ ../hadoop-0.18.3/bin/hadoop dfs -ls 'images'
Found 10 items
-rw-r--r--   2 unwin supergroup     272449 2009-04-22 11:04 /user/ 
unwin/images/IMG_0010.JPG
-rw-r--r--   2 unwin supergroup     267580 2009-04-22 11:04 /user/ 
unwin/images/IMG_0011.JPG
-rw-r--r--   2 unwin supergroup     378000 2009-04-22 11:04 /user/ 
unwin/images/IMG_0012.JPG
-rw-r--r--   2 unwin supergroup     327829 2009-04-22 11:04 /user/ 
unwin/images/IMG_0013.JPG
-rw-r--r--   2 unwin supergroup     476088 2009-04-22 11:04 /user/ 
unwin/images/IMG_0014.JPG
-rw-r--r--   2 unwin supergroup     357258 2009-04-22 11:04 /user/ 
unwin/images/IMG_0015.JPG
-rw-r--r--   2 unwin supergroup     401496 2009-04-22 11:04 /user/ 
unwin/images/IMG_0016.JPG
-rw-r--r--   2 unwin supergroup     377798 2009-04-22 11:04 /user/ 
unwin/images/IMG_0017.JPG
-rw-r--r--   2 unwin supergroup     466437 2009-04-22 11:04 /user/ 
unwin/images/IMG_0018.JPG
-rw-r--r--   2 unwin supergroup     351952 2009-04-22 11:04 /user/ 
unwin/images/IMG_0019.JPG

Do you see anything obvious?, or a better way of iterating?

Thanks,

Roger

Re: Question about Pig BinaryStorage()

Posted by Alan Gates <ga...@yahoo-inc.com>.
Not that's currently exposed to the UDF developer.  Pig does know, but  
it doesn't expose the information in the interface.  You could hard  
code it in your script.

Alan.

On Apr 24, 2009, at 1:05 PM, Roger Unwin wrote:

> Santhosh,
>
> I forgot to add is there a way to tell the name of the file that the  
> binary data comes from?
>
> Thanks,
>
> Roger
>
>
>
> On Apr 24, 2009, at 12:47 PM, Roger Unwin <un...@sdsc.edu> wrote:
>
>>
>> Santhosh,
>>
>> You were spot on. That got the files loading, and they streamed out  
>> in binary from the dump.
>>
>> Now I am trying to work out how to pass the file contents  
>> (bytearray?) from the load to a UDF.
>>
>> I can't find an example on the web to show me how to do this, can  
>> you assist me one more time?
>>
>> Thanks,
>>
>> Roger
>>
>> REGISTER ./pigTest.jar;
>>
>> A = load 'images' using BinaryStorage() split by 'file';
>> B = edu.sdsc.pig.test.myUDF(A);
>>
>> dump B;
>>
>>
>> -------------------
>>
>>
>> unwin@hadoop-n:~/pig-0.2.0$ java -cp pig.jar:/home/unwin/ 
>> hadoop-0.18.3/conf org.apache.pig.Main  myScript5
>> 2009-04-24 12:46:58,733 [main] INFO   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Connecting to hadoop file system at: hdfs://hadoop-n:54310
>> 2009-04-24 12:46:59,033 [main] INFO   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Connecting to map-reduce job tracker at: hadoop-t:54311
>> 2009-04-24 12:46:59,270 [main] ERROR  
>> org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during  
>> parsing. Encountered " "." ". "" at line 5, column 8.
>> Was expecting one of:
>>   "as" ...
>>   ";" ...
>>
>> Details at logfile: /home/unwin/pig-0.2.0/pig_1240577218499.log


Re: Question about Pig BinaryStorage()

Posted by Roger Unwin <un...@sdsc.edu>.
Santhosh,

I forgot to add is there a way to tell the name of the file that the  
binary data comes from?

Thanks,

Roger



On Apr 24, 2009, at 12:47 PM, Roger Unwin <un...@sdsc.edu> wrote:

>
> Santhosh,
>
> You were spot on. That got the files loading, and they streamed out  
> in binary from the dump.
>
> Now I am trying to work out how to pass the file contents  
> (bytearray?) from the load to a UDF.
>
> I can't find an example on the web to show me how to do this, can  
> you assist me one more time?
>
> Thanks,
>
> Roger
>
> REGISTER ./pigTest.jar;
>
> A = load 'images' using BinaryStorage() split by 'file';
> B = edu.sdsc.pig.test.myUDF(A);
>
> dump B;
>
>
> -------------------
>
>
> unwin@hadoop-n:~/pig-0.2.0$ java -cp pig.jar:/home/unwin/ 
> hadoop-0.18.3/conf org.apache.pig.Main  myScript5
> 2009-04-24 12:46:58,733 [main] INFO   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Connecting to hadoop file system at: hdfs://hadoop-n:54310
> 2009-04-24 12:46:59,033 [main] INFO   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Connecting to map-reduce job tracker at: hadoop-t:54311
> 2009-04-24 12:46:59,270 [main] ERROR  
> org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing.  
> Encountered " "." ". "" at line 5, column 8.
> Was expecting one of:
>    "as" ...
>    ";" ...
>
> Details at logfile: /home/unwin/pig-0.2.0/pig_1240577218499.log

Re: Question about Pig BinaryStorage()

Posted by Alan Gates <ga...@yahoo-inc.com>.
Roger,

Santhosh is on vacation for a couple of weeks, so I'll try to help  
out, though I don't have all the context.  Do you want to pass the  
entire tuple to your UDF?  If so, then the syntax would be:

A = load 'images' using BinaryStorage() split by 'file';
B = edu.sdsc.pig.test.myUDF(*);

Then your UDF would get each tuple (record) of images one at a time.

Alan.

On Apr 24, 2009, at 12:47 PM, Roger Unwin wrote:

>
> Santhosh,
>
> You were spot on. That got the files loading, and they streamed out  
> in binary from the dump.
>
> Now I am trying to work out how to pass the file contents  
> (bytearray?) from the load to a UDF.
>
> I can't find an example on the web to show me how to do this, can  
> you assist me one more time?
>
> Thanks,
>
> Roger
>
> REGISTER ./pigTest.jar;
>
> A = load 'images' using BinaryStorage() split by 'file';
> B = edu.sdsc.pig.test.myUDF(A);
>
> dump B;
>
>
> -------------------
>
>
> unwin@hadoop-n:~/pig-0.2.0$ java -cp pig.jar:/home/unwin/ 
> hadoop-0.18.3/conf org.apache.pig.Main  myScript5
> 2009-04-24 12:46:58,733 [main] INFO   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Connecting to hadoop file system at: hdfs://hadoop-n:54310
> 2009-04-24 12:46:59,033 [main] INFO   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Connecting to map-reduce job tracker at: hadoop-t:54311
> 2009-04-24 12:46:59,270 [main] ERROR  
> org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing.  
> Encountered " "." ". "" at line 5, column 8.
> Was expecting one of:
>    "as" ...
>    ";" ...
>
> Details at logfile: /home/unwin/pig-0.2.0/pig_1240577218499.log


Re: Question about Pig BinaryStorage()

Posted by Roger Unwin <un...@sdsc.edu>.
Santhosh,

You were spot on. That got the files loading, and they streamed out in  
binary from the dump.

Now I am trying to work out how to pass the file contents (bytearray?)  
from the load to a UDF.

I can't find an example on the web to show me how to do this, can you  
assist me one more time?

Thanks,

Roger

REGISTER ./pigTest.jar;

A = load 'images' using BinaryStorage() split by 'file';
B = edu.sdsc.pig.test.myUDF(A);

dump B;


-------------------


unwin@hadoop-n:~/pig-0.2.0$ java -cp pig.jar:/home/unwin/hadoop-0.18.3/ 
conf org.apache.pig.Main  myScript5
2009-04-24 12:46:58,733 [main] INFO   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Connecting to hadoop file system at: hdfs://hadoop-n:54310
2009-04-24 12:46:59,033 [main] INFO   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Connecting to map-reduce job tracker at: hadoop-t:54311
2009-04-24 12:46:59,270 [main] ERROR org.apache.pig.tools.grunt.Grunt  
- ERROR 1000: Error during parsing. Encountered " "." ". "" at line 5,  
column 8.
Was expecting one of:
     "as" ...
     ";" ...

Details at logfile: /home/unwin/pig-0.2.0/pig_1240577218499.log

RE: Question about Pig BinaryStorage()

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
Drop the -x local.

java -cp pig.jar:/home/unwin/hadoop-0.19.1/conf org.apache.pig.Main
myScript5 

-----Original Message-----
From: Roger Unwin [mailto:unwin@sdsc.edu] 
Sent: Thursday, April 23, 2009 2:30 PM
To: Santhosh Srinivasan
Cc: pig-user@hadoop.apache.org
Subject: Question about Pig BinaryStorage()

Santhosh,

I am trying to iterate through a group of binary files.  I would like  
the reduce job to get 1 binary file each.  Below is the first part of  
it, trying to read the data in.

I have the following script:

images = load 'images' using BinaryStorage() split by 'file';

dump images;

Here is my invocation:
java -cp pig.jar:/home/unwin/hadoop-0.19.1/conf org.apache.pig.Main -x  
local myScript5
2009-04-23 14:22:38,669 [main] ERROR  
org 
.apache 
.pig 
.backend 
.hadoop.executionengine.physicalLayer.relationalOperators.POStore -  
Received error from storer function:  
org.apache.pig.backend.executionengine.ExecException: ERROR 2081:  
Unable to setup the load function.
2009-04-23 14:22:38,673 [main] INFO   
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed  
jobs!!
2009-04-23 14:22:38,674 [main] INFO   
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out  
of 1 failed!
2009-04-23 14:22:38,678 [main] ERROR org.apache.pig.tools.grunt.Grunt  
- ERROR 1066: Unable to open iterator for alias images

Here is where the files are in hadoop:
unwin@hadoop-n:~/pig-0.2.0$ ../hadoop-0.18.3/bin/hadoop dfs -ls 'images'
Found 10 items
-rw-r--r--   2 unwin supergroup     272449 2009-04-22 11:04 /user/ 
unwin/images/IMG_0010.JPG
-rw-r--r--   2 unwin supergroup     267580 2009-04-22 11:04 /user/ 
unwin/images/IMG_0011.JPG
-rw-r--r--   2 unwin supergroup     378000 2009-04-22 11:04 /user/ 
unwin/images/IMG_0012.JPG
-rw-r--r--   2 unwin supergroup     327829 2009-04-22 11:04 /user/ 
unwin/images/IMG_0013.JPG
-rw-r--r--   2 unwin supergroup     476088 2009-04-22 11:04 /user/ 
unwin/images/IMG_0014.JPG
-rw-r--r--   2 unwin supergroup     357258 2009-04-22 11:04 /user/ 
unwin/images/IMG_0015.JPG
-rw-r--r--   2 unwin supergroup     401496 2009-04-22 11:04 /user/ 
unwin/images/IMG_0016.JPG
-rw-r--r--   2 unwin supergroup     377798 2009-04-22 11:04 /user/ 
unwin/images/IMG_0017.JPG
-rw-r--r--   2 unwin supergroup     466437 2009-04-22 11:04 /user/ 
unwin/images/IMG_0018.JPG
-rw-r--r--   2 unwin supergroup     351952 2009-04-22 11:04 /user/ 
unwin/images/IMG_0019.JPG

Do you see anything obvious?, or a better way of iterating?

Thanks,

Roger