You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tanton Gibbs <ta...@gmail.com> on 2008/05/21 23:48:53 UTC

Reading file from HDFS

How do I get pig to process a file that is already loaded on the
hadoop file system.

Right now, from GRUNT, I can do an ls, but it shows the local file
system.  I've also, tried

A = load 'myfile' using PigStorage()
A = load 'file:/myfile' using PigStorage()
A =  load 'file://myfile' using PigStorage()
A = load 'file://user/tgibbs/myfile' using PigStorage()
A = load 'hdfs:/myfile' using PigStorage()

All of the above fail in various ways.

Also, when pig loads it displays
1    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
  - Connecting to hadoop file system at: file:///

I'm using hadoop v. 0.16.3 and the latest pig from svn.

Anybody have any ideas?

Thanks!
Tanton

Re: Reading file from HDFS

Posted by pi song <pi...@gmail.com>.
We have changed the folder and configuration structure last month.

Now you can run Pig by just doing  (from {pig-home}) ./bin/pig

And you can set up all the configuration including the location of the HDFS
in ./conf/pig.properties

Pi


On 5/22/08, Tanton Gibbs <ta...@gmail.com> wrote:
>
> On a whim, I just tried running java -cp pig:$HADOOPSITECONFIG
> org.apache.pig.Main
>
> That worked correctly and found my hadoop cluster.
>
> thanks for the tip!
>
> Tanton
>
> On Wed, May 21, 2008 at 5:06 PM, Tanton Gibbs <ta...@gmail.com>
> wrote:
> > I ran the pig script in the bin directory.
> >
> > I looked for pig.pl (mentioned in the wiki) but couldn't find it.
> >
> > I set HADOOPSITECONFIG and HADOOP_HOME, but apparently that isn't enough
> :)
> >
> > On Wed, May 21, 2008 at 4:55 PM, Olga Natkovich <ol...@yahoo-inc.com>
> wrote:
> >> This means that pig is not connected to your hadoop cluster. What
> >> command did you use to start pig?
> >>
> >> Olga
> >>
> >>> -----Original Message-----
> >>> From: Tanton Gibbs [mailto:tanton.gibbs@gmail.com]
> >>> Sent: Wednesday, May 21, 2008 2:49 PM
> >>> To: pig-user@incubator.apache.org
> >>> Subject: Reading file from HDFS
> >>>
> >>> How do I get pig to process a file that is already loaded on
> >>> the hadoop file system.
> >>>
> >>> Right now, from GRUNT, I can do an ls, but it shows the local
> >>> file system.  I've also, tried
> >>>
> >>> A = load 'myfile' using PigStorage()
> >>> A = load 'file:/myfile' using PigStorage() A =  load
> >>> 'file://myfile' using PigStorage() A = load
> >>> 'file://user/tgibbs/myfile' using PigStorage() A = load
> >>> 'hdfs:/myfile' using PigStorage()
> >>>
> >>> All of the above fail in various ways.
> >>>
> >>> Also, when pig loads it displays
> >>> 1    [main] INFO
> >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
> >>>   - Connecting to hadoop file system at: file:///
> >>>
> >>> I'm using hadoop v. 0.16.3 and the latest pig from svn.
> >>>
> >>> Anybody have any ideas?
> >>>
> >>> Thanks!
> >>> Tanton
> >>>
> >>
> >
>

Re: Reading file from HDFS

Posted by Tanton Gibbs <ta...@gmail.com>.
On a whim, I just tried running java -cp pig:$HADOOPSITECONFIG
org.apache.pig.Main

That worked correctly and found my hadoop cluster.

thanks for the tip!

Tanton

On Wed, May 21, 2008 at 5:06 PM, Tanton Gibbs <ta...@gmail.com> wrote:
> I ran the pig script in the bin directory.
>
> I looked for pig.pl (mentioned in the wiki) but couldn't find it.
>
> I set HADOOPSITECONFIG and HADOOP_HOME, but apparently that isn't enough :)
>
> On Wed, May 21, 2008 at 4:55 PM, Olga Natkovich <ol...@yahoo-inc.com> wrote:
>> This means that pig is not connected to your hadoop cluster. What
>> command did you use to start pig?
>>
>> Olga
>>
>>> -----Original Message-----
>>> From: Tanton Gibbs [mailto:tanton.gibbs@gmail.com]
>>> Sent: Wednesday, May 21, 2008 2:49 PM
>>> To: pig-user@incubator.apache.org
>>> Subject: Reading file from HDFS
>>>
>>> How do I get pig to process a file that is already loaded on
>>> the hadoop file system.
>>>
>>> Right now, from GRUNT, I can do an ls, but it shows the local
>>> file system.  I've also, tried
>>>
>>> A = load 'myfile' using PigStorage()
>>> A = load 'file:/myfile' using PigStorage() A =  load
>>> 'file://myfile' using PigStorage() A = load
>>> 'file://user/tgibbs/myfile' using PigStorage() A = load
>>> 'hdfs:/myfile' using PigStorage()
>>>
>>> All of the above fail in various ways.
>>>
>>> Also, when pig loads it displays
>>> 1    [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>>>   - Connecting to hadoop file system at: file:///
>>>
>>> I'm using hadoop v. 0.16.3 and the latest pig from svn.
>>>
>>> Anybody have any ideas?
>>>
>>> Thanks!
>>> Tanton
>>>
>>
>

Re: Reading file from HDFS

Posted by Tanton Gibbs <ta...@gmail.com>.
I ran the pig script in the bin directory.

I looked for pig.pl (mentioned in the wiki) but couldn't find it.

I set HADOOPSITECONFIG and HADOOP_HOME, but apparently that isn't enough :)

On Wed, May 21, 2008 at 4:55 PM, Olga Natkovich <ol...@yahoo-inc.com> wrote:
> This means that pig is not connected to your hadoop cluster. What
> command did you use to start pig?
>
> Olga
>
>> -----Original Message-----
>> From: Tanton Gibbs [mailto:tanton.gibbs@gmail.com]
>> Sent: Wednesday, May 21, 2008 2:49 PM
>> To: pig-user@incubator.apache.org
>> Subject: Reading file from HDFS
>>
>> How do I get pig to process a file that is already loaded on
>> the hadoop file system.
>>
>> Right now, from GRUNT, I can do an ls, but it shows the local
>> file system.  I've also, tried
>>
>> A = load 'myfile' using PigStorage()
>> A = load 'file:/myfile' using PigStorage() A =  load
>> 'file://myfile' using PigStorage() A = load
>> 'file://user/tgibbs/myfile' using PigStorage() A = load
>> 'hdfs:/myfile' using PigStorage()
>>
>> All of the above fail in various ways.
>>
>> Also, when pig loads it displays
>> 1    [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>>   - Connecting to hadoop file system at: file:///
>>
>> I'm using hadoop v. 0.16.3 and the latest pig from svn.
>>
>> Anybody have any ideas?
>>
>> Thanks!
>> Tanton
>>
>

RE: Reading file from HDFS

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
This means that pig is not connected to your hadoop cluster. What
command did you use to start pig?

Olga 

> -----Original Message-----
> From: Tanton Gibbs [mailto:tanton.gibbs@gmail.com] 
> Sent: Wednesday, May 21, 2008 2:49 PM
> To: pig-user@incubator.apache.org
> Subject: Reading file from HDFS
> 
> How do I get pig to process a file that is already loaded on 
> the hadoop file system.
> 
> Right now, from GRUNT, I can do an ls, but it shows the local 
> file system.  I've also, tried
> 
> A = load 'myfile' using PigStorage()
> A = load 'file:/myfile' using PigStorage() A =  load 
> 'file://myfile' using PigStorage() A = load 
> 'file://user/tgibbs/myfile' using PigStorage() A = load 
> 'hdfs:/myfile' using PigStorage()
> 
> All of the above fail in various ways.
> 
> Also, when pig loads it displays
> 1    [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>   - Connecting to hadoop file system at: file:///
> 
> I'm using hadoop v. 0.16.3 and the latest pig from svn.
> 
> Anybody have any ideas?
> 
> Thanks!
> Tanton
>