You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kayla Jay <ka...@yahoo.com> on 2008/06/06 15:30:09 UTC

newbie question

If I have multiple files in a directory, how do I load this into Pig?   I want to run Pig over an input directory, not an individual file.

%ls Data
myfile1.txt
myfile2.txt
myfile3.txt
myfile4.txt
myfile5.txt

thanks.

Also, if I run the sample Pig Latin commands, I keep getting errors saying "Unable to open iterator"

For example,

A = LOAD 'myfile.txt' USING PigStorage('\t') AS (f1,f2,f3);
dump A

Gives me correct:
<1, 2, 3>
<4, 2, 1>
<8, 3, 4>
<4, 3, 3>
<7, 2, 5>
<8, 4, 3>

but, then when I do the next sample,
Y = FILTER A BY f1 == '8';
dump Y

I get a bunch of parser errors  then the Unable to open iterator Y.

This happens for most of the rest of the samples.

What's going on?


      

Re: newbie question

Posted by Alan Gates <ga...@yahoo-inc.com>.
If you want to read every file in the directory, you can give the 
directory name.  Every file should be read.  At least in map reduce 
mode.  I'm not sure if this works in local mode.

Alan.

Kayla Jay wrote:
> If I have multiple files in a directory, how do I load this into Pig?   I want to run Pig over an input directory, not an individual file.
>
> %ls Data
> myfile1.txt
> myfile2.txt
> myfile3.txt
> myfile4.txt
> myfile5.txt
>
> thanks.
>
> Also, if I run the sample Pig Latin commands, I keep getting errors saying "Unable to open iterator"
>
> For example,
>
> A = LOAD 'myfile.txt' USING PigStorage('\t') AS (f1,f2,f3);
> dump A
>
> Gives me correct:
> <1, 2, 3>
> <4, 2, 1>
> <8, 3, 4>
> <4, 3, 3>
> <7, 2, 5>
> <8, 4, 3>
>
> but, then when I do the next sample,
> Y = FILTER A BY f1 == '8';
> dump Y
>
> I get a bunch of parser errors  then the Unable to open iterator Y.
>
> This happens for most of the rest of the samples.
>
> What's going on?
>
>
>       
>   

Re: newbie question

Posted by pi song <pi...@gmail.com>.
Regarding multiple file input, please have a look at Hadoop globbing
support:-

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)

On Sat, Jun 7, 2008 at 2:33 AM, Prashanth Pappu <pr...@conviva.com>
wrote:

> > Y = FILTER A BY f1 == '8';
> > dump Y
>
>
> You are using the '==' operator with a string '8'. Just try
> Y = FILTER A BY f1==8;
>
> This is related to the concerns I've been raising. In the above example
> (with f1 == '8'), the result is an empty table. And we need to ensure that
> both semantically and implementation wise, PIG handles empty tables/bags in
> a manner consistent with non-empty tables.
>

Re: newbie question

Posted by Prashanth Pappu <pr...@conviva.com>.
> Y = FILTER A BY f1 == '8';
> dump Y


You are using the '==' operator with a string '8'. Just try
Y = FILTER A BY f1==8;

This is related to the concerns I've been raising. In the above example
(with f1 == '8'), the result is an empty table. And we need to ensure that
both semantically and implementation wise, PIG handles empty tables/bags in
a manner consistent with non-empty tables.