You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Earl Cahill <ca...@yahoo.com> on 2008/10/26 01:34:54 UTC
LOADing from a directory
Maybe I am way off, but I sure can't seem to load from a directory. I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like
bob\t3
alice\t2
, using the perl code below. I then run
grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
grunt> DUMP raw;
and get the exception below. To get around this, I am currently doing something like this
bzcat /path/to/dir/*.bz2 > /tmp/log.txt
then pigging on /tmp/log.txt. It doesn't actually take that much longer, but I think it a touch annoying. Am I way off? The docs here
http://wiki.apache.org/pig/PigLatin
say
* If you pass a directory name to LOAD, it will load all files within the directory.
Here is the exception
2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
at org.apache.pig.PigServer.openIterator(PigServer.java:343)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
at org.apache.pig.PigServer.openIterator(PigServer.java:332)
... 5 more
Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at java.io.FileInputStream.<init>(FileInputStream.java:66)
at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
... 8 more
2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
Here is the perl to generate the files
#!/usr/bin/perl -w
use strict;
mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
while(my $line = <DATA>) {
chomp($line);
if($line =~ m@^/@) {
open(FILE, '>', $line);
next;
}
print FILE join("\t", split /,/, $line) . "\n";
}
__DATA__
/tmp/pig_test/one.txt
alice,2
bob,5
/tmp/pig_test/two.txt
alice,3
bob,3
/tmp/pig_test/three.txt
alice,7
bob,7
Thoughts?
Thanks,
Earl
http://blog.spack.net
http://holaservers.com
Re: LOADing from a directory
Posted by Daniel Dai <da...@gmail.com>.
Hi, Earl,
I guess you are using trunk, looks like in branches/types, the problem is
fixed.
Daniel
----- Original Message -----
From: "Earl Cahill" <ca...@yahoo.com>
To: "Pig User" <pi...@incubator.apache.org>
Cc: "Mridul Muralidharan" <mr...@yahoo-inc.com>
Sent: Wednesday, October 29, 2008 2:00 PM
Subject: Re: LOADing from a directory
> Right you are. I just changed the wiki to reflect the hadoop mode
> requirement.
>
> -> If you are in hadoop mode and pass a directory name to LOAD, it will
> load all files within the directory. (Throws an exception in local mode.)
>
> Thanks,
> Earl
>
> http://blog.spack.net
> http://holaservers.com
>
>
>
>
> ________________________________
> From: Mridul Muralidharan <mr...@yahoo-inc.com>
> To: pig-user@incubator.apache.org; Earl Cahill <ca...@yahoo.com>
> Sent: Monday, October 27, 2008 5:09:05 PM
> Subject: Re: LOADing from a directory
>
>
> IIRC this is an issue with local mode.
> The same script would work in mapreduce mode though id I am not wrong.
>
> From what I recall, dfs api's handle this clobbing, etc ... while the
> local mode impl in pig does not.
> So you end up with this exception.
>
> Regards,
> Mridul
>
> Earl Cahill wrote:
>> Maybe I am way off, but I sure can't seem to load from a directory. I
>> made a directory (/tmp/pig_test) with three dumb files each containing
>> lines that look like
>>
>> bob\t3
>> alice\t2
>>
>> , using the perl code below. I then run
>>
>> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
>> grunt> DUMP raw;
>>
>> and get the exception below. To get around this, I am currently doing
>> something like this
>>
>> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
>>
>> then pigging on /tmp/log.txt. It doesn't actually take that much longer,
>> but I think it a touch annoying. Am I way off? The docs here
>>
>> http://wiki.apache.org/pig/PigLatin
>>
>> say
>>
>>
>> * If you pass a directory name to LOAD, it will load all files within
>> the directory.
>> Here is the exception
>>
>> 2008-10-25 17:26:11,241 [main] ERROR
>> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to
>> open iterator for alias: raw
>> at
>> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
>> at org.apache.pig.PigServer.openIterator(PigServer.java:343)
>> at
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
>> at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
>> at
>> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
>> at org.apache.pig.Main.main(Main.java:270)
>> Caused by: org.apache.pig.backend.executionengine.ExecException:
>> java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>> at
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
>> at
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
>> at
>> org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
>> at org.apache.pig.PigServer.openIterator(PigServer.java:332)
>> ... 5 more
>> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
>> at java.io.FileInputStream.open(Native Method)
>> at java.io.FileInputStream.<init>(FileInputStream.java:106)
>> at java.io.FileInputStream.<init>(FileInputStream.java:66)
>> at
>> org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
>> at
>> org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
>> at
>> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
>> ... 8 more
>>
>> 2008-10-25 17:26:11,241 [main] ERROR
>> org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to
>> open iterator for alias: raw
>>
>> Here is the perl to generate the files
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
>> while(my $line = <DATA>) {
>> chomp($line);
>> if($line =~ m@^/@) {
>> open(FILE, '>', $line);
>> next;
>> }
>> print FILE join("\t", split /,/, $line) . "\n";
>> }
>>
>> __DATA__
>> /tmp/pig_test/one.txt
>> alice,2
>> bob,5
>> /tmp/pig_test/two.txt
>> alice,3
>> bob,3
>> /tmp/pig_test/three.txt
>> alice,7
>> bob,7
>>
>>
>> Thoughts?
>>
>> Thanks,
>> Earl
>>
>> http://blog.spack.net
>> http://holaservers.com
>>
>>
>>
>>
>
>
>
Re: LOADing from a directory
Posted by Earl Cahill <ca...@yahoo.com>.
Right you are. I just changed the wiki to reflect the hadoop mode requirement.
-> If you are in hadoop mode and pass a directory name to LOAD, it will load all files within the directory. (Throws an exception in local mode.)
Thanks,
Earl
http://blog.spack.net
http://holaservers.com
________________________________
From: Mridul Muralidharan <mr...@yahoo-inc.com>
To: pig-user@incubator.apache.org; Earl Cahill <ca...@yahoo.com>
Sent: Monday, October 27, 2008 5:09:05 PM
Subject: Re: LOADing from a directory
IIRC this is an issue with local mode.
The same script would work in mapreduce mode though id I am not wrong.
>From what I recall, dfs api's handle this clobbing, etc ... while the local mode impl in pig does not.
So you end up with this exception.
Regards,
Mridul
Earl Cahill wrote:
> Maybe I am way off, but I sure can't seem to load from a directory. I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like
>
> bob\t3
> alice\t2
>
> , using the perl code below. I then run
>
> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
> grunt> DUMP raw;
>
> and get the exception below. To get around this, I am currently doing something like this
>
> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
>
> then pigging on /tmp/log.txt. It doesn't actually take that much longer, but I think it a touch annoying. Am I way off? The docs here
>
> http://wiki.apache.org/pig/PigLatin
>
> say
>
>
> * If you pass a directory name to LOAD, it will load all files within the directory.
> Here is the exception
>
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
> at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
> at org.apache.pig.PigServer.openIterator(PigServer.java:343)
> at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
> at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
> at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
> at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
> at org.apache.pig.PigServer.openIterator(PigServer.java:332)
> ... 5 more
> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:106)
> at java.io.FileInputStream.<init>(FileInputStream.java:66)
> at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
> at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
> ... 8 more
>
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
>
> Here is the perl to generate the files
>
> #!/usr/bin/perl -w
>
> use strict;
>
> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
> while(my $line = <DATA>) {
> chomp($line);
> if($line =~ m@^/@) {
> open(FILE, '>', $line);
> next;
> }
> print FILE join("\t", split /,/, $line) . "\n";
> }
>
> __DATA__
> /tmp/pig_test/one.txt
> alice,2
> bob,5
> /tmp/pig_test/two.txt
> alice,3
> bob,3
> /tmp/pig_test/three.txt
> alice,7
> bob,7
>
>
> Thoughts?
>
> Thanks,
> Earl
>
> http://blog.spack.net
> http://holaservers.com
>
>
>
>
Re: LOADing from a directory
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
IIRC this is an issue with local mode.
The same script would work in mapreduce mode though id I am not wrong.
From what I recall, dfs api's handle this clobbing, etc ... while the
local mode impl in pig does not.
So you end up with this exception.
Regards,
Mridul
Earl Cahill wrote:
> Maybe I am way off, but I sure can't seem to load from a directory. I made a directory (/tmp/pig_test) with three dumb files each containing lines that look like
>
> bob\t3
> alice\t2
>
> , using the perl code below. I then run
>
> grunt> raw = LOAD '/tmp/pig_test' USING PigStorage() AS (name, count);
> grunt> DUMP raw;
>
> and get the exception below. To get around this, I am currently doing something like this
>
> bzcat /path/to/dir/*.bz2 > /tmp/log.txt
>
> then pigging on /tmp/log.txt. It doesn't actually take that much longer, but I think it a touch annoying. Am I way off? The docs here
>
> http://wiki.apache.org/pig/PigLatin
>
> say
>
>
> * If you pass a directory name to LOAD, it will load all files within the directory.
>
> Here is the exception
>
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
> at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
> at org.apache.pig.PigServer.openIterator(PigServer.java:343)
> at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:283)
> at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
> at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> at org.apache.pig.Main.main(Main.java:270)
> Caused by: org.apache.pig.backend.executionengine.ExecException: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:154)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:45)
> at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
> at org.apache.pig.PigServer.openIterator(PigServer.java:332)
> ... 5 more
> Caused by: java.io.FileNotFoundException: /tmp/pig_test (Is a directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:106)
> at java.io.FileInputStream.<init>(FileInputStream.java:66)
> at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:217)
> at org.apache.pig.backend.local.executionengine.POLoad.open(POLoad.java:69)
> at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:144)
> ... 8 more
>
> 2008-10-25 17:26:11,241 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: raw
>
> Here is the perl to generate the files
>
> #!/usr/bin/perl -w
>
> use strict;
>
> mkdir '/tmp/pig_test' unless(-d '/tmp/pig_test');
> while(my $line = <DATA>) {
> chomp($line);
> if($line =~ m@^/@) {
> open(FILE, '>', $line);
> next;
> }
> print FILE join("\t", split /,/, $line) . "\n";
> }
>
> __DATA__
> /tmp/pig_test/one.txt
> alice,2
> bob,5
> /tmp/pig_test/two.txt
> alice,3
> bob,3
> /tmp/pig_test/three.txt
> alice,7
> bob,7
>
>
> Thoughts?
>
> Thanks,
> Earl
>
> http://blog.spack.net
> http://holaservers.com
>
>
>
>