You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Earl Cahill <ca...@yahoo.com> on 2008/10/26 01:34:54 UTC

LOADing from a directory

Maybe I am way off, but I sure can't seem to load from a directory.  I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like

bob\t3
alice\t2

, using the perl code below.  I then run

grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
grunt> DUMP raw;

and get the exception below.  To get around this, I am currently doing something like this

bzcat /path/to/dir/*.bz2 > /tmp/log.txt

then pigging on /tmp/log.txt.  It doesn't actually take that much longer, but I think it a touch annoying.  Am I way off?  The docs here

http://wiki.apache.org/pig/PigLatin

say


	* If you pass a directory name to LOAD, it will load all files within the directory. 

Here is the exception

2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
        at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
        at org.apache.pig.PigServer.openIterator(PigServer.java:343)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
        at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
        at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
        at org.apache.pig.PigServer.openIterator(PigServer.java:332)
        ... 5 more
Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
        at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
        ... 8 more

2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw

Here is the perl to generate the files

#!/usr/bin/perl -w

use strict;

mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
while(my $line = <DATA>) {
    chomp($line);
    if($line =~ m@^/@) {
        open(FILE, '>', $line);
        next;
    }
    print FILE join("\t", split /,/, $line) . "\n";
}

__DATA__
/tmp/pig_test/one.txt
alice,2
bob,5
/tmp/pig_test/two.txt
alice,3
bob,3
/tmp/pig_test/three.txt
alice,7
bob,7


Thoughts?

Thanks,
Earl

http://blog.spack.net
http://holaservers.com



      

Re: LOADing from a directory

Posted by Daniel Dai <da...@gmail.com>.
Hi, Earl,
I guess you are using trunk, looks like in branches/types, the problem is 
fixed.

Daniel

----- Original Message ----- 
From: "Earl Cahill" <ca...@yahoo.com>
To: "Pig User" <pi...@incubator.apache.org>
Cc: "Mridul Muralidharan" <mr...@yahoo-inc.com>
Sent: Wednesday, October 29, 2008 2:00 PM
Subject: Re: LOADing from a directory


> Right you are.  I just changed the wiki to reflect the hadoop mode 
> requirement.
>
> -> If you are in hadoop mode and pass a directory name to LOAD, it will 
> load all files within the directory. (Throws an exception in local mode.)
>
> Thanks,
> Earl
>
> http://blog.spack.net
> http://holaservers.com
>
>
>
>
> ________________________________
> From: Mridul Muralidharan <mr...@yahoo-inc.com>
> To: pig-user@incubator.apache.org; Earl Cahill <ca...@yahoo.com>
> Sent: Monday, October 27, 2008 5:09:05 PM
> Subject: Re: LOADing from a directory
>
>
> IIRC this is an issue with local mode.
> The same script would work in mapreduce mode though id I am not wrong.
>
> From what I recall, dfs api's handle this clobbing, etc ... while the 
> local mode impl in pig does not.
> So you end up with this exception.
>
> Regards,
> Mridul
>
> Earl Cahill wrote:
>> Maybe I am way off, but I sure can't seem to load from a directory.  I 
>> made a directory (/tmp/pig_test) with three dumb files each containing 
>> lines that look like
>>
>> bob\t3
>> alice\t2
>>
>> , using the perl code below.  I then run
>>
>> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
>> grunt> DUMP raw;
>>
>> and get the exception below.  To get around this, I am currently doing 
>> something like this
>>
>> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
>>
>> then pigging on /tmp/log.txt.  It doesn't actually take that much longer, 
>> but I think it a touch annoying.  Am I way off?  The docs here
>>
>> http://wiki.apache.org/pig/PigLatin
>>
>> say
>>
>>
>>     * If you pass a directory name to LOAD, it will load all files within 
>> the directory.
>> Here is the exception
>>
>> 2008-10-25 17:26:11,241 [main] ERROR 
>> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to 
>> open iterator for alias: raw
>>         at 
>> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>>         at org.apache.pig.PigServer.openIterator(PigServer.java:343)
>>         at 
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
>>         at 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
>>         at 
>> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>>         at org.apache.pig.Main.main(Main.java:270)
>> Caused by: org.apache.pig.backend.executionengine.ExecException: 
>> java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>>         at 
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
>>         at 
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
>>         at 
>> org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>>         at org.apache.pig.PigServer.openIterator(PigServer.java:332)
>>         ... 5 more
>> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>>         at java.io.FileInputStream.open(Native Method)
>>         at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>         at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>         at 
>> org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
>>         at 
>> org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
>>         at 
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
>>         ... 8 more
>>
>> 2008-10-25 17:26:11,241 [main] ERROR 
>> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to 
>> open iterator for alias: raw
>>
>> Here is the perl to generate the files
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
>> while(my $line = <DATA>) {
>>     chomp($line);
>>     if($line =~ m@^/@) {
>>         open(FILE, '>', $line);
>>         next;
>>     }
>>     print FILE join("\t", split /,/, $line) . "\n";
>> }
>>
>> __DATA__
>> /tmp/pig_test/one.txt
>> alice,2
>> bob,5
>> /tmp/pig_test/two.txt
>> alice,3
>> bob,3
>> /tmp/pig_test/three.txt
>> alice,7
>> bob,7
>>
>>
>> Thoughts?
>>
>> Thanks,
>> Earl
>>
>> http://blog.spack.net
>> http://holaservers.com
>>
>>
>>
>>
>
>
> 


Re: LOADing from a directory

Posted by Earl Cahill <ca...@yahoo.com>.
Right you are.  I just changed the wiki to reflect the hadoop mode requirement.

 -> If you are in hadoop mode and pass a directory name to LOAD, it will load all files within the directory. (Throws an exception in local mode.)

Thanks,
Earl

 http://blog.spack.net
http://holaservers.com




________________________________
From: Mridul Muralidharan <mr...@yahoo-inc.com>
To: pig-user@incubator.apache.org; Earl Cahill <ca...@yahoo.com>
Sent: Monday, October 27, 2008 5:09:05 PM
Subject: Re: LOADing from a directory


IIRC this is an issue with local mode.
The same script would work in mapreduce mode though id I am not wrong.

>From what I recall, dfs api's handle this clobbing, etc ... while the local mode impl in pig does not.
So you end up with this exception.

Regards,
Mridul

Earl Cahill wrote:
> Maybe I am way off, but I sure can't seem to load from a directory.  I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like
> 
> bob\t3
> alice\t2
> 
> , using the perl code below.  I then run
> 
> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
> grunt> DUMP raw;
> 
> and get the exception below.  To get around this, I am currently doing something like this
> 
> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
> 
> then pigging on /tmp/log.txt.  It doesn't actually take that much longer, but I think it a touch annoying.  Am I way off?  The docs here
> 
> http://wiki.apache.org/pig/PigLatin
> 
> say
> 
> 
>     * If you pass a directory name to LOAD, it will load all files within the directory. 
> Here is the exception
> 
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
>         at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:343)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
>         at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
>         at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:332)
>         ... 5 more
> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:106)
>         at java.io.FileInputStream.<init>(FileInputStream.java:66)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
>         at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
>         ... 8 more
> 
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
> 
> Here is the perl to generate the files
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
> while(my $line = <DATA>) {
>     chomp($line);
>     if($line =~ m@^/@) {
>         open(FILE, '>', $line);
>         next;
>     }
>     print FILE join("\t", split /,/, $line) . "\n";
> }
> 
> __DATA__
> /tmp/pig_test/one.txt
> alice,2
> bob,5
> /tmp/pig_test/two.txt
> alice,3
> bob,3
> /tmp/pig_test/three.txt
> alice,7
> bob,7
> 
> 
> Thoughts?
> 
> Thanks,
> Earl
> 
> http://blog.spack.net
> http://holaservers.com
> 
> 
> 
>      


      

Re: LOADing from a directory

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
IIRC this is an issue with local mode.
The same script would work in mapreduce mode though id I am not wrong.

 From what I recall, dfs api's handle this clobbing, etc ... while the 
local mode impl in pig does not.
So you end up with this exception.

Regards,
Mridul

Earl Cahill wrote:
> Maybe I am way off, but I sure can't seem to load from a directory.  I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like
> 
> bob\t3
> alice\t2
> 
> , using the perl code below.  I then run
> 
> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
> grunt> DUMP raw;
> 
> and get the exception below.  To get around this, I am currently doing something like this
> 
> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
> 
> then pigging on /tmp/log.txt.  It doesn't actually take that much longer, but I think it a touch annoying.  Am I way off?  The docs here
> 
> http://wiki.apache.org/pig/PigLatin
> 
> say
> 
> 
> 	* If you pass a directory name to LOAD, it will load all files within the directory. 
> 
> Here is the exception
> 
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
>         at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:343)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
>         at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>         at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
>         at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:332)
>         ... 5 more
> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:106)
>         at java.io.FileInputStream.<init>(FileInputStream.java:66)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
>         at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
>         at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
>         ... 8 more
> 
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
> 
> Here is the perl to generate the files
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
> while(my $line = <DATA>) {
>     chomp($line);
>     if($line =~ m@^/@) {
>         open(FILE, '>', $line);
>         next;
>     }
>     print FILE join("\t", split /,/, $line) . "\n";
> }
> 
> __DATA__
> /tmp/pig_test/one.txt
> alice,2
> bob,5
> /tmp/pig_test/two.txt
> alice,3
> bob,3
> /tmp/pig_test/three.txt
> alice,7
> bob,7
> 
> 
> Thoughts?
> 
> Thanks,
> Earl
> 
> http://blog.spack.net
> http://holaservers.com
> 
> 
> 
>