You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Nipun Saggar <ni...@gmail.com> on 2009/08/10 21:04:51 UTC

Load statement

Hi guys,

I have recently started using pig and I have a doubt regarding the LOAD
statement. Does the LOAD statement load data from the local file system or
from HDFS? I am asking this question since I was trying to run the sample
program (idmapreduce.java) given at
http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in 'mapreduce'
mode. I was under the impression that in mapreduce mode data is looked up
from HDFS but was getting java.io.IOException passwd not found exception
until I gave the correct path on the local file system.

Does LOAD always read from the local file system or is there a way to load
data from HDFS?

Thanks,
Nipun

Re: Load statement

Posted by Turner Kunkel <th...@gmail.com>.

java_home looking at wrong location?
I grabbed Java from Sun and it installed Java in the /usr/lib/jvm location.
It's the only experience I have with it, so I wouldn't know otherwise.
Does your Hadoop setup work otherwise?  (eg; copying something into HDFS
without error)

-Turner

On Mon, Aug 10, 2009 at 3:36 PM, Nipun Saggar <ni...@gmail.com>wrote:

> Hi Dimitriy,
>
> Even setting PiG_HADOOP_VERSION didn't help. I have applied PIG-660.patch.
>
> Thanks,
> -Nipun
> On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> >wrote:
>
> > Try this:
> >
> > export PIG_HADOOP_VERSION=20
> >
> > Which of the posted patches did you use?
> >
> > -Dmitriy
> >
> > On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com>
> > wrote:
> > > Hi Turner,
> > >
> > > Pig is still connecting to file system at file:///
> > >
> > > Here is how the environment variables you mentioned look like:
> > >
> > > JAVA_HOME =
> > /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> > > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> > > PIGDIR = /Users/nipuns/pig/pig-0.3.0
> > > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
> > >
> > > Please note that I have applied the patch given at
> > > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with
> > pig
> > > 0.30
> > >
> > > -Nipun
> > >
> > > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com>
> > wrote:
> > >
> > >> You have to get it to connect to your Hadoop setup.
> > >>
> > >> Go to your Pig files directory and type these commands:
> > >>
> > >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your
> > Java
> > >> install directory
> > >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with
> your
> > >> 'conf' Hadoop folder location
> > >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory
> where
> > >> your Pig files are
> > >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
> > >> Hadoop files are
> > >>
> > >> Then run Pig and you should get it connecting to the HDFS instead of
> > >> reporting "file system at: file:///".
> > >>
> > >> Hope this helps.
> > >>
> > >> -Turner
> > >>
> > >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <nipun.saggar@gmail.com
> > >> >wrote:
> > >>
> > >> > This is the command I had executed:
> > >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> > >> >
> > >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting
> to
> > >> > hadoop file system at: file:///
> > >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > >> > processName=JobTracker, sessionId=
> > >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
> > with
> > >> > processName=JobTracker, sessionId= - already initialized
> > >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser
> for
> > >> > parsing the arguments. Applications should implement Tool for the
> > same.
> > >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> > >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce
> > job
> > >> > failed
> > >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> > >> > java.io.IOException: /user/nipuns/passwd does not exist
> > >> >    at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> > >> >    at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> > >> >    at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> > >> >    at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> > >> >    at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> > >> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> > >> >    at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> > >> >    at
> > >> >
> > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> > >> >    at java.lang.Thread.run(Thread.java:637)
> > >> >
> > >> > HDFS contains the following files:
> > >> > $hadoop fs -ls
> > >> > Found 3 items
> > >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> > >> > /user/nipuns/excite.log.bz2
> > >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> > >> > /user/nipuns/passwd
> > >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> > >> > /user/nipuns/test.txt
> > >> >
> > >> > The same program runs without any problems if I modify the file path
> > to
> > >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
> > statement
> > >> > is
> > >> > reading from local file system instead of HDFS.
> > >> > -Nipun
> > >> >
> > >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
> > dvryaboy@cloudera.com
> > >> > >wrote:
> > >> >
> > >> > > Nipin,
> > >> > > Are you sure you were actually running in mapreduce mode?
> > >> > > Did it say something like 'connecting to filesystem at
> > >> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
> > >> > >
> > >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
> thkunkel@gmail.com>
> > >> > wrote:
> > >> > > > I was under the impression that it always loads from HDFS under
> > Map
> > >> > > Reduce
> > >> > > > mode.
> > >> > > >
> > >> > > > -Turner
> > >> > > >
> > >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> > >> nipun.saggar@gmail.com
> > >> > > >wrote:
> > >> > > >
> > >> > > >> Hi guys,
> > >> > > >>
> > >> > > >> I have recently started using pig and I have a doubt regarding
> > the
> > >> > LOAD
> > >> > > >> statement. Does the LOAD statement load data from the local
> file
> > >> > system
> > >> > > or
> > >> > > >> from HDFS? I am asking this question since I was trying to run
> > the
> > >> > > sample
> > >> > > >> program (idmapreduce.java) given at
> > >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
> > >> > 'mapreduce'
> > >> > > >> mode. I was under the impression that in mapreduce mode data is
> > >> looked
> > >> > > up
> > >> > > >> from HDFS but was getting java.io.IOException passwd not found
> > >> > exception
> > >> > > >> until I gave the correct path on the local file system.
> > >> > > >>
> > >> > > >> Does LOAD always read from the local file system or is there a
> > way
> > >> to
> > >> > > load
> > >> > > >> data from HDFS?
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >> Nipun
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> -Turner Kunkel
> > >>
> > >
> >
>



-- 

-Turner Kunkel

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

Dmitriy,

The property settings seem to working fine. I tried the experiment you had
suggested. On setting exectype=local in pig.properties file,  pig started in
local mode. Similarly on setting exectype=mapreduce, pig tried to open in
mapreduce mode. Again, in mapreduce mode it is connecting to hadoop file
system at file:///

-Nipun
On Mon, Aug 17, 2009 at 9:24 PM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:

> Nipin,
> The good news is, you are now actually using the version of Pig you
> think you are using.
> The bad news you know -- it's not connecting.
> The next thing to try is to make sure that you are including the
> correct property files.
> Try a simple test -- change the default exectype , and start pig
> without a -x parameter. Make sure that when you flip the exectype, Pig
> is respecting the setting (you can easily do the same kind of
> experiment using some other setting, like the debug level).
>
> The environment variable for the properties location is PIG_CONF_DIR .
> You can also supply it through the --config command-line parameter.
>
>
> -Dmitriy
>
> On Thu, Aug 13, 2009 at 11:02 AM, Nipun Saggar<ni...@gmail.com>
> wrote:
> > Yes, I started with hadoop 0.18, moved to hadoop-0.19 and finally to
> > hadoop-0.20 . But I believe only hadoop-0.20 services are currently
> running.
> >
> >
> > On Tue, Aug 11, 2009 at 9:43 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> >wrote:
> >
> >> The change in the error code is interesting. Do you have other
> >> versions of pig and/or hadoop installed on your system?
> >>
> >> On Mon, Aug 10, 2009 at 7:18 PM, Nipun Saggar<ni...@gmail.com>
> >> wrote:
> >> > Even after setting PIG_CLASSPATH and applying patch pig-909, pig is
> still
> >> > trying to connect to hadoop file system at file:///
> >> > But the exception being thrown has been changed from
> >> > 09/08/11 00:19:07 ERROR mapReduceLayer.MapReduceLauncher:
> >> > java.io.IOException: /user/nipuns/passwd does not exist
> >> > to
> >> > 09/08/11 07:42:31 ERROR mapReduceLayer.Launcher: java.lang.Exception:
> >> > org.apache.pig.backend.executionengine.ExecException: ERROR 2100:
> >> > file:/user/nipuns/passwd does not exist.
> >> >
> >> > Thanks,
> >> > Nipun
> >> > On Tue, Aug 11, 2009 at 2:49 AM, Dmitriy Ryaboy <
> dvryaboy@cloudera.com
> >> >wrote:
> >> >
> >> >> There's about 8 patches in that JIRA, and my shims ones are decidedly
> >> >> different from the others -- so it matters whether you applied a shim
> >> >> or a rewrite. Both should work, but just in case, it's useful to know
> >> >> which you are using.
> >> >>
> >> >> It sounds like Pig isn't finding your hadoop config.
> >> >>
> >> >> Try this also:
> >> >> export PIG_CLASSPATH=${PIGDIR}/pig.jar
> >> >>
> >> >> And perhaps apply PIG-909 (it ensures that bin/pig respects
> HADOOP_HOME)
> >> >>
> >> >> -D
> >> >>
> >> >> On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<nipun.saggar@gmail.com
> >
> >> >> wrote:
> >> >> > Hi Dimitriy,
> >> >> >
> >> >> > Even setting PiG_HADOOP_VERSION didn't help. I have applied
> >> >> PIG-660.patch.
> >> >> >
> >> >> > Thanks,
> >> >> > -Nipun
> >> >> > On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <
> >> dvryaboy@cloudera.com
> >> >> >wrote:
> >> >> >
> >> >> >> Try this:
> >> >> >>
> >> >> >> export PIG_HADOOP_VERSION=20
> >> >> >>
> >> >> >> Which of the posted patches did you use?
> >> >> >>
> >> >> >> -Dmitriy
> >> >> >>
> >> >> >> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<
> nipun.saggar@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> > Hi Turner,
> >> >> >> >
> >> >> >> > Pig is still connecting to file system at file:///
> >> >> >> >
> >> >> >> > Here is how the environment variables you mentioned look like:
> >> >> >> >
> >> >> >> > JAVA_HOME =
> >> >> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> >> >> >> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> >> >> >> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
> >> >> >> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
> >> >> >> >
> >> >> >> > Please note that I have applied the patch given at
> >> >> >> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop
> 0.20
> >> with
> >> >> >> pig
> >> >> >> > 0.30
> >> >> >> >
> >> >> >> > -Nipun
> >> >> >> >
> >> >> >> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <
> thkunkel@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> You have to get it to connect to your Hadoop setup.
> >> >> >> >>
> >> >> >> >> Go to your Pig files directory and type these commands:
> >> >> >> >>
> >> >> >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with
> >> your
> >> >> >> Java
> >> >> >> >> install directory
> >> >> >> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this
> with
> >> >> your
> >> >> >> >> 'conf' Hadoop folder location
> >> >> >> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with
> directory
> >> >> where
> >> >> >> >> your Pig files are
> >> >> >> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with
> where
> >> >> your
> >> >> >> >> Hadoop files are
> >> >> >> >>
> >> >> >> >> Then run Pig and you should get it connecting to the HDFS
> instead
> >> of
> >> >> >> >> reporting "file system at: file:///".
> >> >> >> >>
> >> >> >> >> Hope this helps.
> >> >> >> >>
> >> >> >> >> -Turner
> >> >> >> >>
> >> >> >> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <
> >> >> nipun.saggar@gmail.com
> >> >> >> >> >wrote:
> >> >> >> >>
> >> >> >> >> > This is the command I had executed:
> >> >> >> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> >> >> >> >> >
> >> >> >> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine:
> >> Connecting
> >> >> to
> >> >> >> >> > hadoop file system at: file:///
> >> >> >> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM
> Metrics
> >> >> with
> >> >> >> >> > processName=JobTracker, sessionId=
> >> >> >> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM
> >> >> Metrics
> >> >> >> with
> >> >> >> >> > processName=JobTracker, sessionId= - already initialized
> >> >> >> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use
> >> GenericOptionsParser
> >> >> for
> >> >> >> >> > parsing the arguments. Applications should implement Tool for
> >> the
> >> >> >> same.
> >> >> >> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0%
> >> >> complete
> >> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map
> >> >> reduce
> >> >> >> job
> >> >> >> >> > failed
> >> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> >> >> >> >> > java.io.IOException: /user/nipuns/passwd does not exist
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >> >> >> >> >    at
> >> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >> >> >> >> >    at
> >> org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >> >> >> >> >    at
> >> >> >> >> >
> >> >> >>
> >> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >> >> >> >> >    at java.lang.Thread.run(Thread.java:637)
> >> >> >> >> >
> >> >> >> >> > HDFS contains the following files:
> >> >> >> >> > $hadoop fs -ls
> >> >> >> >> > Found 3 items
> >> >> >> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> >> >> >> >> > /user/nipuns/excite.log.bz2
> >> >> >> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> >> >> >> >> > /user/nipuns/passwd
> >> >> >> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> >> >> >> >> > /user/nipuns/test.txt
> >> >> >> >> >
> >> >> >> >> > The same program runs without any problems if I modify the
> file
> >> >> path
> >> >> >> to
> >> >> >> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that
> LOAD
> >> >> >> statement
> >> >> >> >> > is
> >> >> >> >> > reading from local file system instead of HDFS.
> >> >> >> >> > -Nipun
> >> >> >> >> >
> >> >> >> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
> >> >> >> dvryaboy@cloudera.com
> >> >> >> >> > >wrote:
> >> >> >> >> >
> >> >> >> >> > > Nipin,
> >> >> >> >> > > Are you sure you were actually running in mapreduce mode?
> >> >> >> >> > > Did it say something like 'connecting to filesystem at
> >> >> >> >> > > hdfs://localhost:xxx' or "connecting to filesystem at
> >> file:///" ?
> >> >> >> >> > >
> >> >> >> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
> >> >> thkunkel@gmail.com>
> >> >> >> >> > wrote:
> >> >> >> >> > > > I was under the impression that it always loads from HDFS
> >> under
> >> >> >> Map
> >> >> >> >> > > Reduce
> >> >> >> >> > > > mode.
> >> >> >> >> > > >
> >> >> >> >> > > > -Turner
> >> >> >> >> > > >
> >> >> >> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> >> >> >> >> nipun.saggar@gmail.com
> >> >> >> >> > > >wrote:
> >> >> >> >> > > >
> >> >> >> >> > > >> Hi guys,
> >> >> >> >> > > >>
> >> >> >> >> > > >> I have recently started using pig and I have a doubt
> >> regarding
> >> >> >> the
> >> >> >> >> > LOAD
> >> >> >> >> > > >> statement. Does the LOAD statement load data from the
> local
> >> >> file
> >> >> >> >> > system
> >> >> >> >> > > or
> >> >> >> >> > > >> from HDFS? I am asking this question since I was trying
> to
> >> run
> >> >> >> the
> >> >> >> >> > > sample
> >> >> >> >> > > >> program (idmapreduce.java) given at
> >> >> >> >> > > >>
> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.htmlin
> >> >> >> >> > 'mapreduce'
> >> >> >> >> > > >> mode. I was under the impression that in mapreduce mode
> >> data
> >> >> is
> >> >> >> >> looked
> >> >> >> >> > > up
> >> >> >> >> > > >> from HDFS but was getting java.io.IOException passwd not
> >> found
> >> >> >> >> > exception
> >> >> >> >> > > >> until I gave the correct path on the local file system.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Does LOAD always read from the local file system or is
> >> there a
> >> >> >> way
> >> >> >> >> to
> >> >> >> >> > > load
> >> >> >> >> > > >> data from HDFS?
> >> >> >> >> > > >>
> >> >> >> >> > > >> Thanks,
> >> >> >> >> > > >> Nipun
> >> >> >> >> > > >>
> >> >> >> >> > > >
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >> -Turner Kunkel
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Load statement

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.

Nipin,
The good news is, you are now actually using the version of Pig you
think you are using.
The bad news you know -- it's not connecting.
The next thing to try is to make sure that you are including the
correct property files.
Try a simple test -- change the default exectype , and start pig
without a -x parameter. Make sure that when you flip the exectype, Pig
is respecting the setting (you can easily do the same kind of
experiment using some other setting, like the debug level).

The environment variable for the properties location is PIG_CONF_DIR .
You can also supply it through the --config command-line parameter.


-Dmitriy

On Thu, Aug 13, 2009 at 11:02 AM, Nipun Saggar<ni...@gmail.com> wrote:
> Yes, I started with hadoop 0.18, moved to hadoop-0.19 and finally to
> hadoop-0.20 . But I believe only hadoop-0.20 services are currently running.
>
>
> On Tue, Aug 11, 2009 at 9:43 PM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:
>
>> The change in the error code is interesting. Do you have other
>> versions of pig and/or hadoop installed on your system?
>>
>> On Mon, Aug 10, 2009 at 7:18 PM, Nipun Saggar<ni...@gmail.com>
>> wrote:
>> > Even after setting PIG_CLASSPATH and applying patch pig-909, pig is still
>> > trying to connect to hadoop file system at file:///
>> > But the exception being thrown has been changed from
>> > 09/08/11 00:19:07 ERROR mapReduceLayer.MapReduceLauncher:
>> > java.io.IOException: /user/nipuns/passwd does not exist
>> > to
>> > 09/08/11 07:42:31 ERROR mapReduceLayer.Launcher: java.lang.Exception:
>> > org.apache.pig.backend.executionengine.ExecException: ERROR 2100:
>> > file:/user/nipuns/passwd does not exist.
>> >
>> > Thanks,
>> > Nipun
>> > On Tue, Aug 11, 2009 at 2:49 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
>> >wrote:
>> >
>> >> There's about 8 patches in that JIRA, and my shims ones are decidedly
>> >> different from the others -- so it matters whether you applied a shim
>> >> or a rewrite. Both should work, but just in case, it's useful to know
>> >> which you are using.
>> >>
>> >> It sounds like Pig isn't finding your hadoop config.
>> >>
>> >> Try this also:
>> >> export PIG_CLASSPATH=${PIGDIR}/pig.jar
>> >>
>> >> And perhaps apply PIG-909 (it ensures that bin/pig respects HADOOP_HOME)
>> >>
>> >> -D
>> >>
>> >> On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<ni...@gmail.com>
>> >> wrote:
>> >> > Hi Dimitriy,
>> >> >
>> >> > Even setting PiG_HADOOP_VERSION didn't help. I have applied
>> >> PIG-660.patch.
>> >> >
>> >> > Thanks,
>> >> > -Nipun
>> >> > On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <
>> dvryaboy@cloudera.com
>> >> >wrote:
>> >> >
>> >> >> Try this:
>> >> >>
>> >> >> export PIG_HADOOP_VERSION=20
>> >> >>
>> >> >> Which of the posted patches did you use?
>> >> >>
>> >> >> -Dmitriy
>> >> >>
>> >> >> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<nipun.saggar@gmail.com
>> >
>> >> >> wrote:
>> >> >> > Hi Turner,
>> >> >> >
>> >> >> > Pig is still connecting to file system at file:///
>> >> >> >
>> >> >> > Here is how the environment variables you mentioned look like:
>> >> >> >
>> >> >> > JAVA_HOME =
>> >> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
>> >> >> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
>> >> >> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
>> >> >> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
>> >> >> >
>> >> >> > Please note that I have applied the patch given at
>> >> >> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20
>> with
>> >> >> pig
>> >> >> > 0.30
>> >> >> >
>> >> >> > -Nipun
>> >> >> >
>> >> >> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <thkunkel@gmail.com
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> You have to get it to connect to your Hadoop setup.
>> >> >> >>
>> >> >> >> Go to your Pig files directory and type these commands:
>> >> >> >>
>> >> >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with
>> your
>> >> >> Java
>> >> >> >> install directory
>> >> >> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with
>> >> your
>> >> >> >> 'conf' Hadoop folder location
>> >> >> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory
>> >> where
>> >> >> >> your Pig files are
>> >> >> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where
>> >> your
>> >> >> >> Hadoop files are
>> >> >> >>
>> >> >> >> Then run Pig and you should get it connecting to the HDFS instead
>> of
>> >> >> >> reporting "file system at: file:///".
>> >> >> >>
>> >> >> >> Hope this helps.
>> >> >> >>
>> >> >> >> -Turner
>> >> >> >>
>> >> >> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <
>> >> nipun.saggar@gmail.com
>> >> >> >> >wrote:
>> >> >> >>
>> >> >> >> > This is the command I had executed:
>> >> >> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
>> >> >> >> >
>> >> >> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine:
>> Connecting
>> >> to
>> >> >> >> > hadoop file system at: file:///
>> >> >> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics
>> >> with
>> >> >> >> > processName=JobTracker, sessionId=
>> >> >> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM
>> >> Metrics
>> >> >> with
>> >> >> >> > processName=JobTracker, sessionId= - already initialized
>> >> >> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use
>> GenericOptionsParser
>> >> for
>> >> >> >> > parsing the arguments. Applications should implement Tool for
>> the
>> >> >> same.
>> >> >> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0%
>> >> complete
>> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map
>> >> reduce
>> >> >> job
>> >> >> >> > failed
>> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
>> >> >> >> > java.io.IOException: /user/nipuns/passwd does not exist
>> >> >> >> >    at
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>> >> >> >> >    at
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>> >> >> >> >    at
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>> >> >> >> >    at
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>> >> >> >> >    at
>> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> >> >> >> >    at
>> org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>> >> >> >> >    at
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>> >> >> >> >    at
>> >> >> >> >
>> >> >>
>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>> >> >> >> >    at java.lang.Thread.run(Thread.java:637)
>> >> >> >> >
>> >> >> >> > HDFS contains the following files:
>> >> >> >> > $hadoop fs -ls
>> >> >> >> > Found 3 items
>> >> >> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
>> >> >> >> > /user/nipuns/excite.log.bz2
>> >> >> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
>> >> >> >> > /user/nipuns/passwd
>> >> >> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
>> >> >> >> > /user/nipuns/test.txt
>> >> >> >> >
>> >> >> >> > The same program runs without any problems if I modify the file
>> >> path
>> >> >> to
>> >> >> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
>> >> >> statement
>> >> >> >> > is
>> >> >> >> > reading from local file system instead of HDFS.
>> >> >> >> > -Nipun
>> >> >> >> >
>> >> >> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
>> >> >> dvryaboy@cloudera.com
>> >> >> >> > >wrote:
>> >> >> >> >
>> >> >> >> > > Nipin,
>> >> >> >> > > Are you sure you were actually running in mapreduce mode?
>> >> >> >> > > Did it say something like 'connecting to filesystem at
>> >> >> >> > > hdfs://localhost:xxx' or "connecting to filesystem at
>> file:///" ?
>> >> >> >> > >
>> >> >> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
>> >> thkunkel@gmail.com>
>> >> >> >> > wrote:
>> >> >> >> > > > I was under the impression that it always loads from HDFS
>> under
>> >> >> Map
>> >> >> >> > > Reduce
>> >> >> >> > > > mode.
>> >> >> >> > > >
>> >> >> >> > > > -Turner
>> >> >> >> > > >
>> >> >> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
>> >> >> >> nipun.saggar@gmail.com
>> >> >> >> > > >wrote:
>> >> >> >> > > >
>> >> >> >> > > >> Hi guys,
>> >> >> >> > > >>
>> >> >> >> > > >> I have recently started using pig and I have a doubt
>> regarding
>> >> >> the
>> >> >> >> > LOAD
>> >> >> >> > > >> statement. Does the LOAD statement load data from the local
>> >> file
>> >> >> >> > system
>> >> >> >> > > or
>> >> >> >> > > >> from HDFS? I am asking this question since I was trying to
>> run
>> >> >> the
>> >> >> >> > > sample
>> >> >> >> > > >> program (idmapreduce.java) given at
>> >> >> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.htmlin
>> >> >> >> > 'mapreduce'
>> >> >> >> > > >> mode. I was under the impression that in mapreduce mode
>> data
>> >> is
>> >> >> >> looked
>> >> >> >> > > up
>> >> >> >> > > >> from HDFS but was getting java.io.IOException passwd not
>> found
>> >> >> >> > exception
>> >> >> >> > > >> until I gave the correct path on the local file system.
>> >> >> >> > > >>
>> >> >> >> > > >> Does LOAD always read from the local file system or is
>> there a
>> >> >> way
>> >> >> >> to
>> >> >> >> > > load
>> >> >> >> > > >> data from HDFS?
>> >> >> >> > > >>
>> >> >> >> > > >> Thanks,
>> >> >> >> > > >> Nipun
>> >> >> >> > > >>
>> >> >> >> > > >
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >> -Turner Kunkel
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

Yes, I started with hadoop 0.18, moved to hadoop-0.19 and finally to
hadoop-0.20 . But I believe only hadoop-0.20 services are currently running.


On Tue, Aug 11, 2009 at 9:43 PM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:

> The change in the error code is interesting. Do you have other
> versions of pig and/or hadoop installed on your system?
>
> On Mon, Aug 10, 2009 at 7:18 PM, Nipun Saggar<ni...@gmail.com>
> wrote:
> > Even after setting PIG_CLASSPATH and applying patch pig-909, pig is still
> > trying to connect to hadoop file system at file:///
> > But the exception being thrown has been changed from
> > 09/08/11 00:19:07 ERROR mapReduceLayer.MapReduceLauncher:
> > java.io.IOException: /user/nipuns/passwd does not exist
> > to
> > 09/08/11 07:42:31 ERROR mapReduceLayer.Launcher: java.lang.Exception:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2100:
> > file:/user/nipuns/passwd does not exist.
> >
> > Thanks,
> > Nipun
> > On Tue, Aug 11, 2009 at 2:49 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> >wrote:
> >
> >> There's about 8 patches in that JIRA, and my shims ones are decidedly
> >> different from the others -- so it matters whether you applied a shim
> >> or a rewrite. Both should work, but just in case, it's useful to know
> >> which you are using.
> >>
> >> It sounds like Pig isn't finding your hadoop config.
> >>
> >> Try this also:
> >> export PIG_CLASSPATH=${PIGDIR}/pig.jar
> >>
> >> And perhaps apply PIG-909 (it ensures that bin/pig respects HADOOP_HOME)
> >>
> >> -D
> >>
> >> On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<ni...@gmail.com>
> >> wrote:
> >> > Hi Dimitriy,
> >> >
> >> > Even setting PiG_HADOOP_VERSION didn't help. I have applied
> >> PIG-660.patch.
> >> >
> >> > Thanks,
> >> > -Nipun
> >> > On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <
> dvryaboy@cloudera.com
> >> >wrote:
> >> >
> >> >> Try this:
> >> >>
> >> >> export PIG_HADOOP_VERSION=20
> >> >>
> >> >> Which of the posted patches did you use?
> >> >>
> >> >> -Dmitriy
> >> >>
> >> >> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<nipun.saggar@gmail.com
> >
> >> >> wrote:
> >> >> > Hi Turner,
> >> >> >
> >> >> > Pig is still connecting to file system at file:///
> >> >> >
> >> >> > Here is how the environment variables you mentioned look like:
> >> >> >
> >> >> > JAVA_HOME =
> >> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> >> >> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> >> >> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
> >> >> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
> >> >> >
> >> >> > Please note that I have applied the patch given at
> >> >> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20
> with
> >> >> pig
> >> >> > 0.30
> >> >> >
> >> >> > -Nipun
> >> >> >
> >> >> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <thkunkel@gmail.com
> >
> >> >> wrote:
> >> >> >
> >> >> >> You have to get it to connect to your Hadoop setup.
> >> >> >>
> >> >> >> Go to your Pig files directory and type these commands:
> >> >> >>
> >> >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with
> your
> >> >> Java
> >> >> >> install directory
> >> >> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with
> >> your
> >> >> >> 'conf' Hadoop folder location
> >> >> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory
> >> where
> >> >> >> your Pig files are
> >> >> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where
> >> your
> >> >> >> Hadoop files are
> >> >> >>
> >> >> >> Then run Pig and you should get it connecting to the HDFS instead
> of
> >> >> >> reporting "file system at: file:///".
> >> >> >>
> >> >> >> Hope this helps.
> >> >> >>
> >> >> >> -Turner
> >> >> >>
> >> >> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <
> >> nipun.saggar@gmail.com
> >> >> >> >wrote:
> >> >> >>
> >> >> >> > This is the command I had executed:
> >> >> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> >> >> >> >
> >> >> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine:
> Connecting
> >> to
> >> >> >> > hadoop file system at: file:///
> >> >> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics
> >> with
> >> >> >> > processName=JobTracker, sessionId=
> >> >> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM
> >> Metrics
> >> >> with
> >> >> >> > processName=JobTracker, sessionId= - already initialized
> >> >> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use
> GenericOptionsParser
> >> for
> >> >> >> > parsing the arguments. Applications should implement Tool for
> the
> >> >> same.
> >> >> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0%
> >> complete
> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map
> >> reduce
> >> >> job
> >> >> >> > failed
> >> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> >> >> >> > java.io.IOException: /user/nipuns/passwd does not exist
> >> >> >> >    at
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >> >> >> >    at
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >> >> >> >    at
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >> >> >> >    at
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >> >> >> >    at
> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >> >> >> >    at
> org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >> >> >> >    at
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >> >> >> >    at
> >> >> >> >
> >> >>
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >> >> >> >    at java.lang.Thread.run(Thread.java:637)
> >> >> >> >
> >> >> >> > HDFS contains the following files:
> >> >> >> > $hadoop fs -ls
> >> >> >> > Found 3 items
> >> >> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> >> >> >> > /user/nipuns/excite.log.bz2
> >> >> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> >> >> >> > /user/nipuns/passwd
> >> >> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> >> >> >> > /user/nipuns/test.txt
> >> >> >> >
> >> >> >> > The same program runs without any problems if I modify the file
> >> path
> >> >> to
> >> >> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
> >> >> statement
> >> >> >> > is
> >> >> >> > reading from local file system instead of HDFS.
> >> >> >> > -Nipun
> >> >> >> >
> >> >> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
> >> >> dvryaboy@cloudera.com
> >> >> >> > >wrote:
> >> >> >> >
> >> >> >> > > Nipin,
> >> >> >> > > Are you sure you were actually running in mapreduce mode?
> >> >> >> > > Did it say something like 'connecting to filesystem at
> >> >> >> > > hdfs://localhost:xxx' or "connecting to filesystem at
> file:///" ?
> >> >> >> > >
> >> >> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
> >> thkunkel@gmail.com>
> >> >> >> > wrote:
> >> >> >> > > > I was under the impression that it always loads from HDFS
> under
> >> >> Map
> >> >> >> > > Reduce
> >> >> >> > > > mode.
> >> >> >> > > >
> >> >> >> > > > -Turner
> >> >> >> > > >
> >> >> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> >> >> >> nipun.saggar@gmail.com
> >> >> >> > > >wrote:
> >> >> >> > > >
> >> >> >> > > >> Hi guys,
> >> >> >> > > >>
> >> >> >> > > >> I have recently started using pig and I have a doubt
> regarding
> >> >> the
> >> >> >> > LOAD
> >> >> >> > > >> statement. Does the LOAD statement load data from the local
> >> file
> >> >> >> > system
> >> >> >> > > or
> >> >> >> > > >> from HDFS? I am asking this question since I was trying to
> run
> >> >> the
> >> >> >> > > sample
> >> >> >> > > >> program (idmapreduce.java) given at
> >> >> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.htmlin
> >> >> >> > 'mapreduce'
> >> >> >> > > >> mode. I was under the impression that in mapreduce mode
> data
> >> is
> >> >> >> looked
> >> >> >> > > up
> >> >> >> > > >> from HDFS but was getting java.io.IOException passwd not
> found
> >> >> >> > exception
> >> >> >> > > >> until I gave the correct path on the local file system.
> >> >> >> > > >>
> >> >> >> > > >> Does LOAD always read from the local file system or is
> there a
> >> >> way
> >> >> >> to
> >> >> >> > > load
> >> >> >> > > >> data from HDFS?
> >> >> >> > > >>
> >> >> >> > > >> Thanks,
> >> >> >> > > >> Nipun
> >> >> >> > > >>
> >> >> >> > > >
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >> -Turner Kunkel
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Load statement

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.

The change in the error code is interesting. Do you have other
versions of pig and/or hadoop installed on your system?

On Mon, Aug 10, 2009 at 7:18 PM, Nipun Saggar<ni...@gmail.com> wrote:
> Even after setting PIG_CLASSPATH and applying patch pig-909, pig is still
> trying to connect to hadoop file system at file:///
> But the exception being thrown has been changed from
> 09/08/11 00:19:07 ERROR mapReduceLayer.MapReduceLauncher:
> java.io.IOException: /user/nipuns/passwd does not exist
> to
> 09/08/11 07:42:31 ERROR mapReduceLayer.Launcher: java.lang.Exception:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2100:
> file:/user/nipuns/passwd does not exist.
>
> Thanks,
> Nipun
> On Tue, Aug 11, 2009 at 2:49 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:
>
>> There's about 8 patches in that JIRA, and my shims ones are decidedly
>> different from the others -- so it matters whether you applied a shim
>> or a rewrite. Both should work, but just in case, it's useful to know
>> which you are using.
>>
>> It sounds like Pig isn't finding your hadoop config.
>>
>> Try this also:
>> export PIG_CLASSPATH=${PIGDIR}/pig.jar
>>
>> And perhaps apply PIG-909 (it ensures that bin/pig respects HADOOP_HOME)
>>
>> -D
>>
>> On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<ni...@gmail.com>
>> wrote:
>> > Hi Dimitriy,
>> >
>> > Even setting PiG_HADOOP_VERSION didn't help. I have applied
>> PIG-660.patch.
>> >
>> > Thanks,
>> > -Nipun
>> > On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
>> >wrote:
>> >
>> >> Try this:
>> >>
>> >> export PIG_HADOOP_VERSION=20
>> >>
>> >> Which of the posted patches did you use?
>> >>
>> >> -Dmitriy
>> >>
>> >> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com>
>> >> wrote:
>> >> > Hi Turner,
>> >> >
>> >> > Pig is still connecting to file system at file:///
>> >> >
>> >> > Here is how the environment variables you mentioned look like:
>> >> >
>> >> > JAVA_HOME =
>> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
>> >> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
>> >> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
>> >> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
>> >> >
>> >> > Please note that I have applied the patch given at
>> >> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with
>> >> pig
>> >> > 0.30
>> >> >
>> >> > -Nipun
>> >> >
>> >> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> You have to get it to connect to your Hadoop setup.
>> >> >>
>> >> >> Go to your Pig files directory and type these commands:
>> >> >>
>> >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your
>> >> Java
>> >> >> install directory
>> >> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with
>> your
>> >> >> 'conf' Hadoop folder location
>> >> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory
>> where
>> >> >> your Pig files are
>> >> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where
>> your
>> >> >> Hadoop files are
>> >> >>
>> >> >> Then run Pig and you should get it connecting to the HDFS instead of
>> >> >> reporting "file system at: file:///".
>> >> >>
>> >> >> Hope this helps.
>> >> >>
>> >> >> -Turner
>> >> >>
>> >> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <
>> nipun.saggar@gmail.com
>> >> >> >wrote:
>> >> >>
>> >> >> > This is the command I had executed:
>> >> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
>> >> >> >
>> >> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting
>> to
>> >> >> > hadoop file system at: file:///
>> >> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics
>> with
>> >> >> > processName=JobTracker, sessionId=
>> >> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM
>> Metrics
>> >> with
>> >> >> > processName=JobTracker, sessionId= - already initialized
>> >> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser
>> for
>> >> >> > parsing the arguments. Applications should implement Tool for the
>> >> same.
>> >> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0%
>> complete
>> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map
>> reduce
>> >> job
>> >> >> > failed
>> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
>> >> >> > java.io.IOException: /user/nipuns/passwd does not exist
>> >> >> >    at
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>> >> >> >    at
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>> >> >> >    at
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>> >> >> >    at
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>> >> >> >    at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> >> >> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>> >> >> >    at
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>> >> >> >    at
>> >> >> >
>> >> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>> >> >> >    at java.lang.Thread.run(Thread.java:637)
>> >> >> >
>> >> >> > HDFS contains the following files:
>> >> >> > $hadoop fs -ls
>> >> >> > Found 3 items
>> >> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
>> >> >> > /user/nipuns/excite.log.bz2
>> >> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
>> >> >> > /user/nipuns/passwd
>> >> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
>> >> >> > /user/nipuns/test.txt
>> >> >> >
>> >> >> > The same program runs without any problems if I modify the file
>> path
>> >> to
>> >> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
>> >> statement
>> >> >> > is
>> >> >> > reading from local file system instead of HDFS.
>> >> >> > -Nipun
>> >> >> >
>> >> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
>> >> dvryaboy@cloudera.com
>> >> >> > >wrote:
>> >> >> >
>> >> >> > > Nipin,
>> >> >> > > Are you sure you were actually running in mapreduce mode?
>> >> >> > > Did it say something like 'connecting to filesystem at
>> >> >> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
>> >> >> > >
>> >> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
>> thkunkel@gmail.com>
>> >> >> > wrote:
>> >> >> > > > I was under the impression that it always loads from HDFS under
>> >> Map
>> >> >> > > Reduce
>> >> >> > > > mode.
>> >> >> > > >
>> >> >> > > > -Turner
>> >> >> > > >
>> >> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
>> >> >> nipun.saggar@gmail.com
>> >> >> > > >wrote:
>> >> >> > > >
>> >> >> > > >> Hi guys,
>> >> >> > > >>
>> >> >> > > >> I have recently started using pig and I have a doubt regarding
>> >> the
>> >> >> > LOAD
>> >> >> > > >> statement. Does the LOAD statement load data from the local
>> file
>> >> >> > system
>> >> >> > > or
>> >> >> > > >> from HDFS? I am asking this question since I was trying to run
>> >> the
>> >> >> > > sample
>> >> >> > > >> program (idmapreduce.java) given at
>> >> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
>> >> >> > 'mapreduce'
>> >> >> > > >> mode. I was under the impression that in mapreduce mode data
>> is
>> >> >> looked
>> >> >> > > up
>> >> >> > > >> from HDFS but was getting java.io.IOException passwd not found
>> >> >> > exception
>> >> >> > > >> until I gave the correct path on the local file system.
>> >> >> > > >>
>> >> >> > > >> Does LOAD always read from the local file system or is there a
>> >> way
>> >> >> to
>> >> >> > > load
>> >> >> > > >> data from HDFS?
>> >> >> > > >>
>> >> >> > > >> Thanks,
>> >> >> > > >> Nipun
>> >> >> > > >>
>> >> >> > > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> -Turner Kunkel
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

Even after setting PIG_CLASSPATH and applying patch pig-909, pig is still
trying to connect to hadoop file system at file:///
But the exception being thrown has been changed from
09/08/11 00:19:07 ERROR mapReduceLayer.MapReduceLauncher:
java.io.IOException: /user/nipuns/passwd does not exist
to
09/08/11 07:42:31 ERROR mapReduceLayer.Launcher: java.lang.Exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 2100:
file:/user/nipuns/passwd does not exist.

Thanks,
Nipun
On Tue, Aug 11, 2009 at 2:49 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:

> There's about 8 patches in that JIRA, and my shims ones are decidedly
> different from the others -- so it matters whether you applied a shim
> or a rewrite. Both should work, but just in case, it's useful to know
> which you are using.
>
> It sounds like Pig isn't finding your hadoop config.
>
> Try this also:
> export PIG_CLASSPATH=${PIGDIR}/pig.jar
>
> And perhaps apply PIG-909 (it ensures that bin/pig respects HADOOP_HOME)
>
> -D
>
> On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<ni...@gmail.com>
> wrote:
> > Hi Dimitriy,
> >
> > Even setting PiG_HADOOP_VERSION didn't help. I have applied
> PIG-660.patch.
> >
> > Thanks,
> > -Nipun
> > On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> >wrote:
> >
> >> Try this:
> >>
> >> export PIG_HADOOP_VERSION=20
> >>
> >> Which of the posted patches did you use?
> >>
> >> -Dmitriy
> >>
> >> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com>
> >> wrote:
> >> > Hi Turner,
> >> >
> >> > Pig is still connecting to file system at file:///
> >> >
> >> > Here is how the environment variables you mentioned look like:
> >> >
> >> > JAVA_HOME =
> >> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> >> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> >> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
> >> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
> >> >
> >> > Please note that I have applied the patch given at
> >> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with
> >> pig
> >> > 0.30
> >> >
> >> > -Nipun
> >> >
> >> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com>
> >> wrote:
> >> >
> >> >> You have to get it to connect to your Hadoop setup.
> >> >>
> >> >> Go to your Pig files directory and type these commands:
> >> >>
> >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your
> >> Java
> >> >> install directory
> >> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with
> your
> >> >> 'conf' Hadoop folder location
> >> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory
> where
> >> >> your Pig files are
> >> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where
> your
> >> >> Hadoop files are
> >> >>
> >> >> Then run Pig and you should get it connecting to the HDFS instead of
> >> >> reporting "file system at: file:///".
> >> >>
> >> >> Hope this helps.
> >> >>
> >> >> -Turner
> >> >>
> >> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <
> nipun.saggar@gmail.com
> >> >> >wrote:
> >> >>
> >> >> > This is the command I had executed:
> >> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> >> >> >
> >> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting
> to
> >> >> > hadoop file system at: file:///
> >> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics
> with
> >> >> > processName=JobTracker, sessionId=
> >> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM
> Metrics
> >> with
> >> >> > processName=JobTracker, sessionId= - already initialized
> >> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser
> for
> >> >> > parsing the arguments. Applications should implement Tool for the
> >> same.
> >> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0%
> complete
> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map
> reduce
> >> job
> >> >> > failed
> >> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> >> >> > java.io.IOException: /user/nipuns/passwd does not exist
> >> >> >    at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >> >> >    at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >> >> >    at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >> >> >    at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >> >> >    at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >> >> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >> >> >    at
> >> >> >
> >> >> >
> >> >>
> >>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >> >> >    at
> >> >> >
> >> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >> >> >    at java.lang.Thread.run(Thread.java:637)
> >> >> >
> >> >> > HDFS contains the following files:
> >> >> > $hadoop fs -ls
> >> >> > Found 3 items
> >> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> >> >> > /user/nipuns/excite.log.bz2
> >> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> >> >> > /user/nipuns/passwd
> >> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> >> >> > /user/nipuns/test.txt
> >> >> >
> >> >> > The same program runs without any problems if I modify the file
> path
> >> to
> >> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
> >> statement
> >> >> > is
> >> >> > reading from local file system instead of HDFS.
> >> >> > -Nipun
> >> >> >
> >> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
> >> dvryaboy@cloudera.com
> >> >> > >wrote:
> >> >> >
> >> >> > > Nipin,
> >> >> > > Are you sure you were actually running in mapreduce mode?
> >> >> > > Did it say something like 'connecting to filesystem at
> >> >> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
> >> >> > >
> >> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<
> thkunkel@gmail.com>
> >> >> > wrote:
> >> >> > > > I was under the impression that it always loads from HDFS under
> >> Map
> >> >> > > Reduce
> >> >> > > > mode.
> >> >> > > >
> >> >> > > > -Turner
> >> >> > > >
> >> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> >> >> nipun.saggar@gmail.com
> >> >> > > >wrote:
> >> >> > > >
> >> >> > > >> Hi guys,
> >> >> > > >>
> >> >> > > >> I have recently started using pig and I have a doubt regarding
> >> the
> >> >> > LOAD
> >> >> > > >> statement. Does the LOAD statement load data from the local
> file
> >> >> > system
> >> >> > > or
> >> >> > > >> from HDFS? I am asking this question since I was trying to run
> >> the
> >> >> > > sample
> >> >> > > >> program (idmapreduce.java) given at
> >> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
> >> >> > 'mapreduce'
> >> >> > > >> mode. I was under the impression that in mapreduce mode data
> is
> >> >> looked
> >> >> > > up
> >> >> > > >> from HDFS but was getting java.io.IOException passwd not found
> >> >> > exception
> >> >> > > >> until I gave the correct path on the local file system.
> >> >> > > >>
> >> >> > > >> Does LOAD always read from the local file system or is there a
> >> way
> >> >> to
> >> >> > > load
> >> >> > > >> data from HDFS?
> >> >> > > >>
> >> >> > > >> Thanks,
> >> >> > > >> Nipun
> >> >> > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> -Turner Kunkel
> >> >>
> >> >
> >>
> >
>

Re: Load statement

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.

There's about 8 patches in that JIRA, and my shims ones are decidedly
different from the others -- so it matters whether you applied a shim
or a rewrite. Both should work, but just in case, it's useful to know
which you are using.

It sounds like Pig isn't finding your hadoop config.

Try this also:
export PIG_CLASSPATH=${PIGDIR}/pig.jar

And perhaps apply PIG-909 (it ensures that bin/pig respects HADOOP_HOME)

-D

On Mon, Aug 10, 2009 at 1:36 PM, Nipun Saggar<ni...@gmail.com> wrote:
> Hi Dimitriy,
>
> Even setting PiG_HADOOP_VERSION didn't help. I have applied PIG-660.patch.
>
> Thanks,
> -Nipun
> On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:
>
>> Try this:
>>
>> export PIG_HADOOP_VERSION=20
>>
>> Which of the posted patches did you use?
>>
>> -Dmitriy
>>
>> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com>
>> wrote:
>> > Hi Turner,
>> >
>> > Pig is still connecting to file system at file:///
>> >
>> > Here is how the environment variables you mentioned look like:
>> >
>> > JAVA_HOME =
>> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
>> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
>> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
>> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
>> >
>> > Please note that I have applied the patch given at
>> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with
>> pig
>> > 0.30
>> >
>> > -Nipun
>> >
>> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com>
>> wrote:
>> >
>> >> You have to get it to connect to your Hadoop setup.
>> >>
>> >> Go to your Pig files directory and type these commands:
>> >>
>> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your
>> Java
>> >> install directory
>> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with your
>> >> 'conf' Hadoop folder location
>> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory where
>> >> your Pig files are
>> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
>> >> Hadoop files are
>> >>
>> >> Then run Pig and you should get it connecting to the HDFS instead of
>> >> reporting "file system at: file:///".
>> >>
>> >> Hope this helps.
>> >>
>> >> -Turner
>> >>
>> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <nipun.saggar@gmail.com
>> >> >wrote:
>> >>
>> >> > This is the command I had executed:
>> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
>> >> >
>> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
>> >> > hadoop file system at: file:///
>> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> >> > processName=JobTracker, sessionId=
>> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
>> with
>> >> > processName=JobTracker, sessionId= - already initialized
>> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
>> >> > parsing the arguments. Applications should implement Tool for the
>> same.
>> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
>> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce
>> job
>> >> > failed
>> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
>> >> > java.io.IOException: /user/nipuns/passwd does not exist
>> >> >    at
>> >> >
>> >> >
>> >>
>> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>> >> >    at
>> >> >
>> >> >
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>> >> >    at
>> >> >
>> >> >
>> >>
>> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>> >> >    at
>> >> >
>> >> >
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>> >> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> >> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>> >> >    at
>> >> >
>> >> >
>> >>
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>> >> >    at
>> >> >
>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>> >> >    at java.lang.Thread.run(Thread.java:637)
>> >> >
>> >> > HDFS contains the following files:
>> >> > $hadoop fs -ls
>> >> > Found 3 items
>> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
>> >> > /user/nipuns/excite.log.bz2
>> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
>> >> > /user/nipuns/passwd
>> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
>> >> > /user/nipuns/test.txt
>> >> >
>> >> > The same program runs without any problems if I modify the file path
>> to
>> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
>> statement
>> >> > is
>> >> > reading from local file system instead of HDFS.
>> >> > -Nipun
>> >> >
>> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
>> dvryaboy@cloudera.com
>> >> > >wrote:
>> >> >
>> >> > > Nipin,
>> >> > > Are you sure you were actually running in mapreduce mode?
>> >> > > Did it say something like 'connecting to filesystem at
>> >> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
>> >> > >
>> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com>
>> >> > wrote:
>> >> > > > I was under the impression that it always loads from HDFS under
>> Map
>> >> > > Reduce
>> >> > > > mode.
>> >> > > >
>> >> > > > -Turner
>> >> > > >
>> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
>> >> nipun.saggar@gmail.com
>> >> > > >wrote:
>> >> > > >
>> >> > > >> Hi guys,
>> >> > > >>
>> >> > > >> I have recently started using pig and I have a doubt regarding
>> the
>> >> > LOAD
>> >> > > >> statement. Does the LOAD statement load data from the local file
>> >> > system
>> >> > > or
>> >> > > >> from HDFS? I am asking this question since I was trying to run
>> the
>> >> > > sample
>> >> > > >> program (idmapreduce.java) given at
>> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
>> >> > 'mapreduce'
>> >> > > >> mode. I was under the impression that in mapreduce mode data is
>> >> looked
>> >> > > up
>> >> > > >> from HDFS but was getting java.io.IOException passwd not found
>> >> > exception
>> >> > > >> until I gave the correct path on the local file system.
>> >> > > >>
>> >> > > >> Does LOAD always read from the local file system or is there a
>> way
>> >> to
>> >> > > load
>> >> > > >> data from HDFS?
>> >> > > >>
>> >> > > >> Thanks,
>> >> > > >> Nipun
>> >> > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> -Turner Kunkel
>> >>
>> >
>>
>

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

Hi Dimitriy,

Even setting PiG_HADOOP_VERSION didn't help. I have applied PIG-660.patch.

Thanks,
-Nipun
On Tue, Aug 11, 2009 at 2:01 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:

> Try this:
>
> export PIG_HADOOP_VERSION=20
>
> Which of the posted patches did you use?
>
> -Dmitriy
>
> On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com>
> wrote:
> > Hi Turner,
> >
> > Pig is still connecting to file system at file:///
> >
> > Here is how the environment variables you mentioned look like:
> >
> > JAVA_HOME =
> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> > PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> > PIGDIR = /Users/nipuns/pig/pig-0.3.0
> > HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
> >
> > Please note that I have applied the patch given at
> > https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with
> pig
> > 0.30
> >
> > -Nipun
> >
> > On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com>
> wrote:
> >
> >> You have to get it to connect to your Hadoop setup.
> >>
> >> Go to your Pig files directory and type these commands:
> >>
> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your
> Java
> >> install directory
> >> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with your
> >> 'conf' Hadoop folder location
> >> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory where
> >> your Pig files are
> >> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
> >> Hadoop files are
> >>
> >> Then run Pig and you should get it connecting to the HDFS instead of
> >> reporting "file system at: file:///".
> >>
> >> Hope this helps.
> >>
> >> -Turner
> >>
> >> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <nipun.saggar@gmail.com
> >> >wrote:
> >>
> >> > This is the command I had executed:
> >> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> >> >
> >> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
> >> > hadoop file system at: file:///
> >> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> >> > processName=JobTracker, sessionId=
> >> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
> with
> >> > processName=JobTracker, sessionId= - already initialized
> >> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
> >> > parsing the arguments. Applications should implement Tool for the
> same.
> >> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce
> job
> >> > failed
> >> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> >> > java.io.IOException: /user/nipuns/passwd does not exist
> >> >    at
> >> >
> >> >
> >>
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >> >    at
> >> >
> >> >
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >> >    at
> >> >
> >> >
> >>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >> >    at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >> >    at
> >> >
> >> >
> >>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >> >    at
> >> >
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >> >    at java.lang.Thread.run(Thread.java:637)
> >> >
> >> > HDFS contains the following files:
> >> > $hadoop fs -ls
> >> > Found 3 items
> >> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> >> > /user/nipuns/excite.log.bz2
> >> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> >> > /user/nipuns/passwd
> >> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> >> > /user/nipuns/test.txt
> >> >
> >> > The same program runs without any problems if I modify the file path
> to
> >> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD
> statement
> >> > is
> >> > reading from local file system instead of HDFS.
> >> > -Nipun
> >> >
> >> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <
> dvryaboy@cloudera.com
> >> > >wrote:
> >> >
> >> > > Nipin,
> >> > > Are you sure you were actually running in mapreduce mode?
> >> > > Did it say something like 'connecting to filesystem at
> >> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
> >> > >
> >> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com>
> >> > wrote:
> >> > > > I was under the impression that it always loads from HDFS under
> Map
> >> > > Reduce
> >> > > > mode.
> >> > > >
> >> > > > -Turner
> >> > > >
> >> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> >> nipun.saggar@gmail.com
> >> > > >wrote:
> >> > > >
> >> > > >> Hi guys,
> >> > > >>
> >> > > >> I have recently started using pig and I have a doubt regarding
> the
> >> > LOAD
> >> > > >> statement. Does the LOAD statement load data from the local file
> >> > system
> >> > > or
> >> > > >> from HDFS? I am asking this question since I was trying to run
> the
> >> > > sample
> >> > > >> program (idmapreduce.java) given at
> >> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
> >> > 'mapreduce'
> >> > > >> mode. I was under the impression that in mapreduce mode data is
> >> looked
> >> > > up
> >> > > >> from HDFS but was getting java.io.IOException passwd not found
> >> > exception
> >> > > >> until I gave the correct path on the local file system.
> >> > > >>
> >> > > >> Does LOAD always read from the local file system or is there a
> way
> >> to
> >> > > load
> >> > > >> data from HDFS?
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Nipun
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> -Turner Kunkel
> >>
> >
>

Re: Load statement

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.

Try this:

export PIG_HADOOP_VERSION=20

Which of the posted patches did you use?

-Dmitriy

On Mon, Aug 10, 2009 at 1:20 PM, Nipun Saggar<ni...@gmail.com> wrote:
> Hi Turner,
>
> Pig is still connecting to file system at file:///
>
> Here is how the environment variables you mentioned look like:
>
> JAVA_HOME = /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
> PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
> PIGDIR = /Users/nipuns/pig/pig-0.3.0
> HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0
>
> Please note that I have applied the patch given at
> https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with pig
> 0.30
>
> -Nipun
>
> On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com> wrote:
>
>> You have to get it to connect to your Hadoop setup.
>>
>> Go to your Pig files directory and type these commands:
>>
>> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your Java
>> install directory
>> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with your
>> 'conf' Hadoop folder location
>> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory where
>> your Pig files are
>> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
>> Hadoop files are
>>
>> Then run Pig and you should get it connecting to the HDFS instead of
>> reporting "file system at: file:///".
>>
>> Hope this helps.
>>
>> -Turner
>>
>> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <nipun.saggar@gmail.com
>> >wrote:
>>
>> > This is the command I had executed:
>> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
>> >
>> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
>> > hadoop file system at: file:///
>> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> > processName=JobTracker, sessionId=
>> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
>> > processName=JobTracker, sessionId= - already initialized
>> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
>> > parsing the arguments. Applications should implement Tool for the same.
>> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
>> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
>> > failed
>> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
>> > java.io.IOException: /user/nipuns/passwd does not exist
>> >    at
>> >
>> >
>> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>> >    at
>> >
>> >
>> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>> >    at
>> >
>> >
>> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>> >    at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>> >    at
>> >
>> >
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>> >    at
>> > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>> >    at java.lang.Thread.run(Thread.java:637)
>> >
>> > HDFS contains the following files:
>> > $hadoop fs -ls
>> > Found 3 items
>> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
>> > /user/nipuns/excite.log.bz2
>> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
>> > /user/nipuns/passwd
>> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
>> > /user/nipuns/test.txt
>> >
>> > The same program runs without any problems if I modify the file path to
>> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD statement
>> > is
>> > reading from local file system instead of HDFS.
>> > -Nipun
>> >
>> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
>> > >wrote:
>> >
>> > > Nipin,
>> > > Are you sure you were actually running in mapreduce mode?
>> > > Did it say something like 'connecting to filesystem at
>> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
>> > >
>> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com>
>> > wrote:
>> > > > I was under the impression that it always loads from HDFS under Map
>> > > Reduce
>> > > > mode.
>> > > >
>> > > > -Turner
>> > > >
>> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
>> nipun.saggar@gmail.com
>> > > >wrote:
>> > > >
>> > > >> Hi guys,
>> > > >>
>> > > >> I have recently started using pig and I have a doubt regarding the
>> > LOAD
>> > > >> statement. Does the LOAD statement load data from the local file
>> > system
>> > > or
>> > > >> from HDFS? I am asking this question since I was trying to run the
>> > > sample
>> > > >> program (idmapreduce.java) given at
>> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
>> > 'mapreduce'
>> > > >> mode. I was under the impression that in mapreduce mode data is
>> looked
>> > > up
>> > > >> from HDFS but was getting java.io.IOException passwd not found
>> > exception
>> > > >> until I gave the correct path on the local file system.
>> > > >>
>> > > >> Does LOAD always read from the local file system or is there a way
>> to
>> > > load
>> > > >> data from HDFS?
>> > > >>
>> > > >> Thanks,
>> > > >> Nipun
>> > > >>
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>>
>> -Turner Kunkel
>>
>

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

Hi Turner,

Pig is still connecting to file system at file:///

Here is how the environment variables you mentioned look like:

JAVA_HOME = /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/
PIG_CLASSPATH = /Users/nipuns/Hadoop/hadoop-0.20.0/conf/
PIGDIR = /Users/nipuns/pig/pig-0.3.0
HADOOP_HOME=/Users/nipuns/Hadoop/hadoop-0.20.0

Please note that I have applied the patch given at
https://issues.apache.org/jira/browse/PIG-660 to use hadoop 0.20 with pig
0.30

-Nipun

On Tue, Aug 11, 2009 at 1:22 AM, Turner Kunkel <th...@gmail.com> wrote:

> You have to get it to connect to your Hadoop setup.
>
> Go to your Pig files directory and type these commands:
>
> export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your Java
> install directory
> export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with your
> 'conf' Hadoop folder location
> expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory where
> your Pig files are
> export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
> Hadoop files are
>
> Then run Pig and you should get it connecting to the HDFS instead of
> reporting "file system at: file:///".
>
> Hope this helps.
>
> -Turner
>
> On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <nipun.saggar@gmail.com
> >wrote:
>
> > This is the command I had executed:
> > $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
> >
> > 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
> > hadoop file system at: file:///
> > 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> > 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
> > failed
> > 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> > java.io.IOException: /user/nipuns/passwd does not exist
> >    at
> >
> >
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
> >    at
> >
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> >    at
> >
> >
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> >    at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> >    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> >    at
> >
> >
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >    at
> > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> >    at java.lang.Thread.run(Thread.java:637)
> >
> > HDFS contains the following files:
> > $hadoop fs -ls
> > Found 3 items
> > -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> > /user/nipuns/excite.log.bz2
> > -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> > /user/nipuns/passwd
> > -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> > /user/nipuns/test.txt
> >
> > The same program runs without any problems if I modify the file path to
> > '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD statement
> > is
> > reading from local file system instead of HDFS.
> > -Nipun
> >
> > On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> > >wrote:
> >
> > > Nipin,
> > > Are you sure you were actually running in mapreduce mode?
> > > Did it say something like 'connecting to filesystem at
> > > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
> > >
> > > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com>
> > wrote:
> > > > I was under the impression that it always loads from HDFS under Map
> > > Reduce
> > > > mode.
> > > >
> > > > -Turner
> > > >
> > > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <
> nipun.saggar@gmail.com
> > > >wrote:
> > > >
> > > >> Hi guys,
> > > >>
> > > >> I have recently started using pig and I have a doubt regarding the
> > LOAD
> > > >> statement. Does the LOAD statement load data from the local file
> > system
> > > or
> > > >> from HDFS? I am asking this question since I was trying to run the
> > > sample
> > > >> program (idmapreduce.java) given at
> > > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
> > 'mapreduce'
> > > >> mode. I was under the impression that in mapreduce mode data is
> looked
> > > up
> > > >> from HDFS but was getting java.io.IOException passwd not found
> > exception
> > > >> until I gave the correct path on the local file system.
> > > >>
> > > >> Does LOAD always read from the local file system or is there a way
> to
> > > load
> > > >> data from HDFS?
> > > >>
> > > >> Thanks,
> > > >> Nipun
> > > >>
> > > >
> > >
> >
>
>
>
> --
>
> -Turner Kunkel
>

Re: Load statement

Posted by Turner Kunkel <th...@gmail.com>.

You have to get it to connect to your Hadoop setup.

Go to your Pig files directory and type these commands:

export JAVA_HOME=/usr/lib/jvm/java-6-sun  <-- replace this with your Java
install directory
export PIG_CLASSPATH=/usr/local/hadoop/conf  <-- replace this with your
'conf' Hadoop folder location
expor PIGDIR=/usr/local/pig-0.3.0  <-- replace this with directory where
your Pig files are
export HADOOP_HOME=/usr/local/hadoop  <-- replace this with where your
Hadoop files are

Then run Pig and you should get it connecting to the HDFS instead of
reporting "file system at: file:///".

Hope this helps.

-Turner

On Mon, Aug 10, 2009 at 2:40 PM, Nipun Saggar <ni...@gmail.com>wrote:

> This is the command I had executed:
> $java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce
>
> 09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
> hadoop file system at: file:///
> 09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> processName=JobTracker, sessionId= - already initialized
> 09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
> 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
> failed
> 09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
> java.io.IOException: /user/nipuns/passwd does not exist
>    at
>
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
>    at
>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>    at
>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>    at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
>    at
>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>    at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>    at java.lang.Thread.run(Thread.java:637)
>
> HDFS contains the following files:
> $hadoop fs -ls
> Found 3 items
> -rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
> /user/nipuns/excite.log.bz2
> -rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
> /user/nipuns/passwd
> -rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
> /user/nipuns/test.txt
>
> The same program runs without any problems if I modify the file path to
> '/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD statement
> is
> reading from local file system instead of HDFS.
> -Nipun
>
> On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <dvryaboy@cloudera.com
> >wrote:
>
> > Nipin,
> > Are you sure you were actually running in mapreduce mode?
> > Did it say something like 'connecting to filesystem at
> > hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
> >
> > On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com>
> wrote:
> > > I was under the impression that it always loads from HDFS under Map
> > Reduce
> > > mode.
> > >
> > > -Turner
> > >
> > > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <nipun.saggar@gmail.com
> > >wrote:
> > >
> > >> Hi guys,
> > >>
> > >> I have recently started using pig and I have a doubt regarding the
> LOAD
> > >> statement. Does the LOAD statement load data from the local file
> system
> > or
> > >> from HDFS? I am asking this question since I was trying to run the
> > sample
> > >> program (idmapreduce.java) given at
> > >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in
> 'mapreduce'
> > >> mode. I was under the impression that in mapreduce mode data is looked
> > up
> > >> from HDFS but was getting java.io.IOException passwd not found
> exception
> > >> until I gave the correct path on the local file system.
> > >>
> > >> Does LOAD always read from the local file system or is there a way to
> > load
> > >> data from HDFS?
> > >>
> > >> Thanks,
> > >> Nipun
> > >>
> > >
> >
>



-- 

-Turner Kunkel

Re: Load statement

Posted by Nipun Saggar <ni...@gmail.com>.

This is the command I had executed:
$java -cp ../../pig.jar:.:$HADOOPDIR idmapreduce

09/08/11 01:02:16 INFO executionengine.HExecutionEngine: Connecting to
hadoop file system at: file:///
09/08/11 01:02:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
09/08/11 01:02:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
09/08/11 01:02:18 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/08/11 01:02:23 INFO mapReduceLayer.MapReduceLauncher: 0% complete
09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
failed
09/08/11 01:02:23 ERROR mapReduceLayer.MapReduceLauncher:
java.io.IOException: /user/nipuns/passwd does not exist
    at
org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
    at
org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
    at
org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
    at
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:637)

HDFS contains the following files:
$hadoop fs -ls
Found 3 items
-rw-r--r--   1 nipuns supergroup   10408717 2009-08-04 20:13
/user/nipuns/excite.log.bz2
-rw-r--r--   1 nipuns supergroup       2888 2009-08-10 23:16
/user/nipuns/passwd
-rw-r--r--   1 nipuns supergroup         14 2009-08-04 19:11
/user/nipuns/test.txt

The same program runs without any problems if I modify the file path to
'/etc/passwd' in idmapreduce.java. Hence, I concluded that LOAD statement is
reading from local file system instead of HDFS.
-Nipun

On Tue, Aug 11, 2009 at 12:49 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:

> Nipin,
> Are you sure you were actually running in mapreduce mode?
> Did it say something like 'connecting to filesystem at
> hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?
>
> On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com> wrote:
> > I was under the impression that it always loads from HDFS under Map
> Reduce
> > mode.
> >
> > -Turner
> >
> > On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <nipun.saggar@gmail.com
> >wrote:
> >
> >> Hi guys,
> >>
> >> I have recently started using pig and I have a doubt regarding the LOAD
> >> statement. Does the LOAD statement load data from the local file system
> or
> >> from HDFS? I am asking this question since I was trying to run the
> sample
> >> program (idmapreduce.java) given at
> >> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in 'mapreduce'
> >> mode. I was under the impression that in mapreduce mode data is looked
> up
> >> from HDFS but was getting java.io.IOException passwd not found exception
> >> until I gave the correct path on the local file system.
> >>
> >> Does LOAD always read from the local file system or is there a way to
> load
> >> data from HDFS?
> >>
> >> Thanks,
> >> Nipun
> >>
> >
>

Re: Load statement

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.

Nipin,
Are you sure you were actually running in mapreduce mode?
Did it say something like 'connecting to filesystem at
hdfs://localhost:xxx' or "connecting to filesystem at file:///" ?

On Mon, Aug 10, 2009 at 12:09 PM, Turner Kunkel<th...@gmail.com> wrote:
> I was under the impression that it always loads from HDFS under Map Reduce
> mode.
>
> -Turner
>
> On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <ni...@gmail.com>wrote:
>
>> Hi guys,
>>
>> I have recently started using pig and I have a doubt regarding the LOAD
>> statement. Does the LOAD statement load data from the local file system or
>> from HDFS? I am asking this question since I was trying to run the sample
>> program (idmapreduce.java) given at
>> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in 'mapreduce'
>> mode. I was under the impression that in mapreduce mode data is looked up
>> from HDFS but was getting java.io.IOException passwd not found exception
>> until I gave the correct path on the local file system.
>>
>> Does LOAD always read from the local file system or is there a way to load
>> data from HDFS?
>>
>> Thanks,
>> Nipun
>>
>

Re: Load statement

Posted by Turner Kunkel <th...@gmail.com>.

I was under the impression that it always loads from HDFS under Map Reduce
mode.

-Turner

On Mon, Aug 10, 2009 at 2:04 PM, Nipun Saggar <ni...@gmail.com>wrote:

> Hi guys,
>
> I have recently started using pig and I have a doubt regarding the LOAD
> statement. Does the LOAD statement load data from the local file system or
> from HDFS? I am asking this question since I was trying to run the sample
> program (idmapreduce.java) given at
> http://hadoop.apache.org/pig/docs/r0.3.0/getstarted.html in 'mapreduce'
> mode. I was under the impression that in mapreduce mode data is looked up
> from HDFS but was getting java.io.IOException passwd not found exception
> until I gave the correct path on the local file system.
>
> Does LOAD always read from the local file system or is there a way to load
> data from HDFS?
>
> Thanks,
> Nipun
>