You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by rahul <rm...@apple.com> on 2010/08/27 02:32:18 UTC

Pig and Hadoop Integration Error

Hi ,

I am trying to integrate Pig with Hadoop for processing of jobs.

I am able to run Pig in local mode and Hadoop with streaming api perfectly.

But when I try to run Pig with Hadoop I get follwong Error:

Pig Stack Trace
---------------
ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out

org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
	at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
	at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
	at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
	at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
	at org.apache.pig.PigServer.validate(PigServer.java:930)
	at org.apache.pig.PigServer.compileLp(PigServer.java:910)
	at org.apache.pig.PigServer.compileLp(PigServer.java:871)
	at org.apache.pig.PigServer.compileLp(PigServer.java:852)
	at org.apache.pig.PigServer.execute(PigServer.java:816)
	at org.apache.pig.PigServer.access$100(PigServer.java:105)
	at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
	at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
	at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
	at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
	at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
	at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
	at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
	at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
	at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
	at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
	at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
	at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
	... 16 more
Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
	at org.apache.hadoop.ipc.Client.call(Client.java:743)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
	at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
	at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
	at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
	at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
	at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
	at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
	... 24 more
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:375)
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================

Did anyone got the same error. I think it related to connection between pig and hadoop.

Can someone tell me how to connect Pig and hadoop.

Thanks.

Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
But according the errer log:
"Could not validate the output specification for:
file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out"

It still try to access local file system rather than HDFS File System



On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
> Hi Jeff,
>
> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>
> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>
> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>
> Please let me know if my understanding is correct ?
>
> I am attaching the conf files as well :
> hdfs-site.xml:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://localhost:9000</value>
>  <description>The name of the default file system.  A URI whose
>  scheme and authority determine the FileSystem implementation.  The
>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>  the FileSystem implementation class.  The uri's authority is used to
>  determine the host, port, etc. for a filesystem.</description>
> </property>
>
> <property>
>  <name>dfs.replication</name>
>  <value>1</value>
>  <description>Default block replication.
>  The actual number of replications can be specified when the file is created.
>  The default is used if replication is not specified in create time.
>  </description>
> </property>
>
> </configuration>
>
> core-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>  <description>A base for other temporary directories.</description>
> </property>
> </configuration>
>
> mapred-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>localhost:9001</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at. If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
> </property>
>
> <property>
> <name>mapred.tasktracker.tasks.maximum</name>
> <value>8</value>
> <description>The maximum number of tasks that will be run simultaneously by a
> a task tracker
> </description>
> </property>
> </configuration>
>
> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>
> Thanks,
> Rahul
>
> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>
>> Do you put the hadoop conf on classpath ? It seems you are still using
>> local file system but conncect Hadoop's JobTracker.
>> Make sure you set the correct configuration in core-site.xml
>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>
>>
>>
>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>> Hi ,
>>>
>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>
>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>
>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>
>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>        at org.apache.pig.Main.main(Main.java:391)
>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>        ... 16 more
>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>        ... 24 more
>>> Caused by: java.io.EOFException
>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>> ================================================================================
>>>
>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>
>>> Can someone tell me how to connect Pig and hadoop.
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Hi Santhosh I tried with absolute path as well but error remains the same.

I think absolute path should not be a issue as both Pig and Hadoop are at the same location.

Please let me know if there is some gap in my understanding ?

Thanks,
Rahul

On Aug 26, 2010, at 6:25 PM, Santhosh Srinivasan wrote:

> Can you try replacing localhost with the fully qualified name of your host?
> 
> Santhosh
> 
> 
> -----Original Message-----
> From: rahul [mailto:rmalviya@apple.com] 
> Sent: Thursday, August 26, 2010 6:22 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: Pig and Hadoop Integration Error
> 
> Hi Jeff,
> 
> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
> 
> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
> 
> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
> 
> Please let me know if my understanding is correct ?
> 
> I am attaching the conf files as well :
> hdfs-site.xml:
> 
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <!-- Put site-specific property overrides in this file. -->
> 
> <configuration>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://localhost:9000</value>
>  <description>The name of the default file system.  A URI whose
>  scheme and authority determine the FileSystem implementation.  The
>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>  the FileSystem implementation class.  The uri's authority is used to
>  determine the host, port, etc. for a filesystem.</description> </property>
> 
> <property>
>  <name>dfs.replication</name>
>  <value>1</value>
>  <description>Default block replication.
>  The actual number of replications can be specified when the file is created.
>  The default is used if replication is not specified in create time.
>  </description>
> </property>
> 
> </configuration>
> 
> core-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <!-- Put site-specific property overrides in this file. -->
> 
> <configuration>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>  <description>A base for other temporary directories.</description> </property> </configuration>
> 
> mapred-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <!-- Put site-specific property overrides in this file. -->
> 
> <configuration>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>localhost:9001</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at. If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
> </property>
> 
> <property>
> <name>mapred.tasktracker.tasks.maximum</name>
> <value>8</value>
> <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration>
> 
> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
> 
> Thanks,
> Rahul
> 
> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
> 
>> Do you put the hadoop conf on classpath ? It seems you are still using 
>> local file system but conncect Hadoop's JobTracker.
>> Make sure you set the correct configuration in core-site.xml 
>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>> 
>> 
>> 
>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>> Hi ,
>>> 
>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>> 
>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>> 
>>> But when I try to run Pig with Hadoop I get follwong Error:
>>> 
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2116: Unexpected error. Could not validate the output 
>>> specification for: 
>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>> 
>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>       at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>       at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>       at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>       at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>       at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>       at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>       at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>       at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>       at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>       at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>       at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>       at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>       at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>       at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>       at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>       at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>       at org.apache.pig.Main.main(Main.java:391)
>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>       at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>       at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>       at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>       at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>       at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>       at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>       at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>       at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>       at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>       ... 16 more
>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>       at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>       at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>       at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>       at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>       at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>       at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>       at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>       at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>       ... 24 more
>>> Caused by: java.io.EOFException
>>>       at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>       at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>       at 
>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>> =====================================================================
>>> ===========
>>> 
>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>> 
>>> Can someone tell me how to connect Pig and hadoop.
>>> 
>>> Thanks.
>>> 
>> 
>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
> 


RE: Pig and Hadoop Integration Error

Posted by Santhosh Srinivasan <sm...@YAHOO-INC.COM>.
Can you try replacing localhost with the fully qualified name of your host?

Santhosh
 

-----Original Message-----
From: rahul [mailto:rmalviya@apple.com] 
Sent: Thursday, August 26, 2010 6:22 PM
To: pig-user@hadoop.apache.org
Subject: Re: Pig and Hadoop Integration Error

Hi Jeff,

I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

Please let me know if my understanding is correct ?

I am attaching the conf files as well :
hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description> </property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

</configuration>

core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description> </property> </configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at. If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration>

Please let me know if there is a issue in my configurations ? Any input is valuable for me.

Thanks,
Rahul

On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

> Do you put the hadoop conf on classpath ? It seems you are still using 
> local file system but conncect Hadoop's JobTracker.
> Make sure you set the correct configuration in core-site.xml 
> hdfs-site.xml, mapred-site.xml, and put them on classpath.
> 
> 
> 
> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>> Hi ,
>> 
>> I am trying to integrate Pig with Hadoop for processing of jobs.
>> 
>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>> 
>> But when I try to run Pig with Hadoop I get follwong Error:
>> 
>> Pig Stack Trace
>> ---------------
>> ERROR 2116: Unexpected error. Could not validate the output 
>> specification for: 
>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>> 
>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>        at org.apache.pig.Main.main(Main.java:391)
>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>        ... 16 more
>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>        ... 24 more
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>        at 
>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>> =====================================================================
>> ===========
>> 
>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>> 
>> Can someone tell me how to connect Pig and hadoop.
>> 
>> Thanks.
>> 
> 
> 
> 
> --
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Sure Zhang.

Thanks for help.

-Rahul

On Aug 26, 2010, at 8:17 PM, Jeff Zhang wrote:

> It's weird.  I doubt maybe there's other configuration file on your
> class path which override your real conf files.
> Could you download a new pig release and follow the instructions on
> http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
> environment ?
> 
> 
> 
> On Thu, Aug 26, 2010 at 7:49 PM, rahul <rm...@apple.com> wrote:
>> Hi ,
>> 
>> I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.
>> 
>> Do you have any further suggestion on that ?
>> 
>> Rahul
>> 
>> On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:
>> 
>>> Connect to 9001 is right,  this is jobtracker's ipc port while 50030
>>> is its http server port.
>>> And have you ever try to run the grunt shell ?
>>> 
>>> On Thu, Aug 26, 2010 at 7:12 PM, rahul <rm...@apple.com> wrote:
>>>> Hi Jeff,
>>>> 
>>>> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp
>>>> 
>>>> And also I can see jobs which I ran directly using the streaming api on hadoop.
>>>> 
>>>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
>>>> and I have also tried changing this location to localhost:50030 but still the error remains the same.
>>>> 
>>>> Can you suggest something further ?
>>>> 
>>>> Thanks,
>>>> Rahul
>>>> 
>>>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>>>> 
>>>>> Can you look at the jobtracker log or access jobtracker web ui ?
>>>>> It seems you can  not connect to jobtracker according your log
>>>>> 
>>>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>>>>> failed on local exception: java.io.EOFException"
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
>>>>>> Yes they are running.
>>>>>> 
>>>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>>>> 
>>>>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>>>>> running correctly.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>>>>>>> Hi Jeff,
>>>>>>>> 
>>>>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>>>>>> 
>>>>>>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>>>>>> 
>>>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>>>> 
>>>>>>>> Steps I did :
>>>>>>>> 
>>>>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>>>>>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>>>>>> 
>>>>>>>> Please let me know if these step miss something ?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>>>> 
>>>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>>> Hi Jeff,
>>>>>>>>>> 
>>>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>>>>>> 
>>>>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>>>>>> 
>>>>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>>>> 
>>>>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>>>> 
>>>>>>>>>> I am attaching the conf files as well :
>>>>>>>>>> hdfs-site.xml:
>>>>>>>>>> 
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>fs.default.name</name>
>>>>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> <property>
>>>>>>>>>>  <name>dfs.replication</name>
>>>>>>>>>>  <value>1</value>
>>>>>>>>>>  <description>Default block replication.
>>>>>>>>>>  The actual number of replications can be specified when the file is created.
>>>>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>>>>  </description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> core-site.xml
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>>>>> </property>
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> mapred-site.xml
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>>>>  <value>localhost:9001</value>
>>>>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>>>>  and reduce task.
>>>>>>>>>>  </description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> <property>
>>>>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>>>>> <value>8</value>
>>>>>>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>>>>>>> a task tracker
>>>>>>>>>> </description>
>>>>>>>>>> </property>
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul
>>>>>>>>>> 
>>>>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>>>> 
>>>>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>> 
>>>>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>>>> 
>>>>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>>>>>> 
>>>>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>>>> 
>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>> 
>>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>>>>        ... 16 more
>>>>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>>>>        ... 24 more
>>>>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>>>>> ================================================================================
>>>>>>>>>>>> 
>>>>>>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>>>>>> 
>>>>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards
>>>>>>>>>>> 
>>>>>>>>>>> Jeff Zhang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>> 
>>>>>>>>> Jeff Zhang
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
It's weird.  I doubt maybe there's other configuration file on your
class path which override your real conf files.
Could you download a new pig release and follow the instructions on
http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
environment ?



On Thu, Aug 26, 2010 at 7:49 PM, rahul <rm...@apple.com> wrote:
> Hi ,
>
> I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.
>
> Do you have any further suggestion on that ?
>
> Rahul
>
> On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:
>
>> Connect to 9001 is right,  this is jobtracker's ipc port while 50030
>> is its http server port.
>> And have you ever try to run the grunt shell ?
>>
>> On Thu, Aug 26, 2010 at 7:12 PM, rahul <rm...@apple.com> wrote:
>>> Hi Jeff,
>>>
>>> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp
>>>
>>> And also I can see jobs which I ran directly using the streaming api on hadoop.
>>>
>>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
>>> and I have also tried changing this location to localhost:50030 but still the error remains the same.
>>>
>>> Can you suggest something further ?
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>>>
>>>> Can you look at the jobtracker log or access jobtracker web ui ?
>>>> It seems you can  not connect to jobtracker according your log
>>>>
>>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>>>> failed on local exception: java.io.EOFException"
>>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
>>>>> Yes they are running.
>>>>>
>>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>>>
>>>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>>>> running correctly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>>>>>
>>>>>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>>>>>
>>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>>>
>>>>>>> Steps I did :
>>>>>>>
>>>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>>>>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>>>>>
>>>>>>> Please let me know if these step miss something ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>>>
>>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>>>>>
>>>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>>>>>
>>>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>>>
>>>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>>>
>>>>>>>>> I am attaching the conf files as well :
>>>>>>>>> hdfs-site.xml:
>>>>>>>>>
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>fs.default.name</name>
>>>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>>  <name>dfs.replication</name>
>>>>>>>>>  <value>1</value>
>>>>>>>>>  <description>Default block replication.
>>>>>>>>>  The actual number of replications can be specified when the file is created.
>>>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> core-site.xml
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>>>> </property>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> mapred-site.xml
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>>>  <value>localhost:9001</value>
>>>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>>>  and reduce task.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>>>> <value>8</value>
>>>>>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>>>>>> a task tracker
>>>>>>>>> </description>
>>>>>>>>> </property>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rahul
>>>>>>>>>
>>>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>>>
>>>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>>>> Hi ,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>>>
>>>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>>>>>
>>>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>>>
>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>> ---------------
>>>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>
>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>>>        ... 16 more
>>>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>>>        ... 24 more
>>>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>>>> ================================================================================
>>>>>>>>>>>
>>>>>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>>>>>
>>>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards
>>>>>>>>>>
>>>>>>>>>> Jeff Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Hi ,

I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.

Do you have any further suggestion on that ?

Rahul

On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:

> Connect to 9001 is right,  this is jobtracker's ipc port while 50030
> is its http server port.
> And have you ever try to run the grunt shell ?
> 
> On Thu, Aug 26, 2010 at 7:12 PM, rahul <rm...@apple.com> wrote:
>> Hi Jeff,
>> 
>> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp
>> 
>> And also I can see jobs which I ran directly using the streaming api on hadoop.
>> 
>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
>> and I have also tried changing this location to localhost:50030 but still the error remains the same.
>> 
>> Can you suggest something further ?
>> 
>> Thanks,
>> Rahul
>> 
>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>> 
>>> Can you look at the jobtracker log or access jobtracker web ui ?
>>> It seems you can  not connect to jobtracker according your log
>>> 
>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>>> failed on local exception: java.io.EOFException"
>>> 
>>> 
>>> 
>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
>>>> Yes they are running.
>>>> 
>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>> 
>>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>>> running correctly.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>>>>> Hi Jeff,
>>>>>> 
>>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>>>> 
>>>>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>>>> 
>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>> 
>>>>>> Steps I did :
>>>>>> 
>>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>>>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>>>> 
>>>>>> Please let me know if these step miss something ?
>>>>>> 
>>>>>> Thanks,
>>>>>> Rahul
>>>>>> 
>>>>>> 
>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>> 
>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>>>>> Hi Jeff,
>>>>>>>> 
>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>>>> 
>>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>>>> 
>>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>> 
>>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>> 
>>>>>>>> I am attaching the conf files as well :
>>>>>>>> hdfs-site.xml:
>>>>>>>> 
>>>>>>>> <?xml version="1.0"?>
>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>> 
>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>> 
>>>>>>>> <configuration>
>>>>>>>> <property>
>>>>>>>>  <name>fs.default.name</name>
>>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>>> </property>
>>>>>>>> 
>>>>>>>> <property>
>>>>>>>>  <name>dfs.replication</name>
>>>>>>>>  <value>1</value>
>>>>>>>>  <description>Default block replication.
>>>>>>>>  The actual number of replications can be specified when the file is created.
>>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>>  </description>
>>>>>>>> </property>
>>>>>>>> 
>>>>>>>> </configuration>
>>>>>>>> 
>>>>>>>> core-site.xml
>>>>>>>> <?xml version="1.0"?>
>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>> 
>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>> 
>>>>>>>> <configuration>
>>>>>>>> <property>
>>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>>> </property>
>>>>>>>> </configuration>
>>>>>>>> 
>>>>>>>> mapred-site.xml
>>>>>>>> <?xml version="1.0"?>
>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>> 
>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>> 
>>>>>>>> <configuration>
>>>>>>>> <property>
>>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>>  <value>localhost:9001</value>
>>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>>  and reduce task.
>>>>>>>>  </description>
>>>>>>>> </property>
>>>>>>>> 
>>>>>>>> <property>
>>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>>> <value>8</value>
>>>>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>>>>> a task tracker
>>>>>>>> </description>
>>>>>>>> </property>
>>>>>>>> </configuration>
>>>>>>>> 
>>>>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>> 
>>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>> 
>>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>>> Hi ,
>>>>>>>>>> 
>>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>> 
>>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>>>> 
>>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>> 
>>>>>>>>>> Pig Stack Trace
>>>>>>>>>> ---------------
>>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>> 
>>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>>        ... 16 more
>>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>>        ... 24 more
>>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>>> ================================================================================
>>>>>>>>>> 
>>>>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>>>> 
>>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>> 
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>> 
>>>>>>>>> Jeff Zhang
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
Connect to 9001 is right,  this is jobtracker's ipc port while 50030
is its http server port.
And have you ever try to run the grunt shell ?

On Thu, Aug 26, 2010 at 7:12 PM, rahul <rm...@apple.com> wrote:
> Hi Jeff,
>
> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp
>
> And also I can see jobs which I ran directly using the streaming api on hadoop.
>
> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file
> and I have also tried changing this location to localhost:50030 but still the error remains the same.
>
> Can you suggest something further ?
>
> Thanks,
> Rahul
>
> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>
>> Can you look at the jobtracker log or access jobtracker web ui ?
>> It seems you can  not connect to jobtracker according your log
>>
>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>> failed on local exception: java.io.EOFException"
>>
>>
>>
>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
>>> Yes they are running.
>>>
>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>
>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>> running correctly.
>>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>>>
>>>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>>>
>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>
>>>>> Steps I did :
>>>>>
>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>>>
>>>>> Please let me know if these step miss something ?
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>
>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>>>
>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>>>
>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>
>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>
>>>>>>> I am attaching the conf files as well :
>>>>>>> hdfs-site.xml:
>>>>>>>
>>>>>>> <?xml version="1.0"?>
>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>
>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>
>>>>>>> <configuration>
>>>>>>> <property>
>>>>>>>  <name>fs.default.name</name>
>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>> </property>
>>>>>>>
>>>>>>> <property>
>>>>>>>  <name>dfs.replication</name>
>>>>>>>  <value>1</value>
>>>>>>>  <description>Default block replication.
>>>>>>>  The actual number of replications can be specified when the file is created.
>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>  </description>
>>>>>>> </property>
>>>>>>>
>>>>>>> </configuration>
>>>>>>>
>>>>>>> core-site.xml
>>>>>>> <?xml version="1.0"?>
>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>
>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>
>>>>>>> <configuration>
>>>>>>> <property>
>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>> </property>
>>>>>>> </configuration>
>>>>>>>
>>>>>>> mapred-site.xml
>>>>>>> <?xml version="1.0"?>
>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>
>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>
>>>>>>> <configuration>
>>>>>>> <property>
>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>  <value>localhost:9001</value>
>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>  and reduce task.
>>>>>>>  </description>
>>>>>>> </property>
>>>>>>>
>>>>>>> <property>
>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>> <value>8</value>
>>>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>>>> a task tracker
>>>>>>> </description>
>>>>>>> </property>
>>>>>>> </configuration>
>>>>>>>
>>>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>
>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>>>> Hi ,
>>>>>>>>>
>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>
>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>>>
>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>
>>>>>>>>> Pig Stack Trace
>>>>>>>>> ---------------
>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>
>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>        ... 16 more
>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>        ... 24 more
>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>> ================================================================================
>>>>>>>>>
>>>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>>>
>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Hi Jeff,

I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp

And also I can see jobs which I ran directly using the streaming api on hadoop.

I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file 
and I have also tried changing this location to localhost:50030 but still the error remains the same.

Can you suggest something further ?

Thanks,
Rahul

On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:

> Can you look at the jobtracker log or access jobtracker web ui ?
> It seems you can  not connect to jobtracker according your log
> 
> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
> failed on local exception: java.io.EOFException"
> 
> 
> 
> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
>> Yes they are running.
>> 
>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>> 
>>> Execute command jps in shell to see whether namenode and jobtracker is
>>> running correctly.
>>> 
>>> 
>>> 
>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>>> Hi Jeff,
>>>> 
>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>> 
>>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>> 
>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>> 
>>>> Steps I did :
>>>> 
>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>> 
>>>> Please let me know if these step miss something ?
>>>> 
>>>> Thanks,
>>>> Rahul
>>>> 
>>>> 
>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>> 
>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>>> Hi Jeff,
>>>>>> 
>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>> 
>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>> 
>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>> 
>>>>>> Please let me know if my understanding is correct ?
>>>>>> 
>>>>>> I am attaching the conf files as well :
>>>>>> hdfs-site.xml:
>>>>>> 
>>>>>> <?xml version="1.0"?>
>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>> 
>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>> 
>>>>>> <configuration>
>>>>>> <property>
>>>>>>  <name>fs.default.name</name>
>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>> </property>
>>>>>> 
>>>>>> <property>
>>>>>>  <name>dfs.replication</name>
>>>>>>  <value>1</value>
>>>>>>  <description>Default block replication.
>>>>>>  The actual number of replications can be specified when the file is created.
>>>>>>  The default is used if replication is not specified in create time.
>>>>>>  </description>
>>>>>> </property>
>>>>>> 
>>>>>> </configuration>
>>>>>> 
>>>>>> core-site.xml
>>>>>> <?xml version="1.0"?>
>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>> 
>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>> 
>>>>>> <configuration>
>>>>>> <property>
>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>  <description>A base for other temporary directories.</description>
>>>>>> </property>
>>>>>> </configuration>
>>>>>> 
>>>>>> mapred-site.xml
>>>>>> <?xml version="1.0"?>
>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>> 
>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>> 
>>>>>> <configuration>
>>>>>> <property>
>>>>>>  <name>mapred.job.tracker</name>
>>>>>>  <value>localhost:9001</value>
>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>  and reduce task.
>>>>>>  </description>
>>>>>> </property>
>>>>>> 
>>>>>> <property>
>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>> <value>8</value>
>>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>>> a task tracker
>>>>>> </description>
>>>>>> </property>
>>>>>> </configuration>
>>>>>> 
>>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>> 
>>>>>> Thanks,
>>>>>> Rahul
>>>>>> 
>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>> 
>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>>> Hi ,
>>>>>>>> 
>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>> 
>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>> 
>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>> 
>>>>>>>> Pig Stack Trace
>>>>>>>> ---------------
>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>> 
>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>        ... 16 more
>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>        ... 24 more
>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>> ================================================================================
>>>>>>>> 
>>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>> 
>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
Can you look at the jobtracker log or access jobtracker web ui ?
It seems you can  not connect to jobtracker according your log

"Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
failed on local exception: java.io.EOFException"



On Fri, Aug 27, 2010 at 10:00 AM, rahul <rm...@apple.com> wrote:
> Yes they are running.
>
> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>
>> Execute command jps in shell to see whether namenode and jobtracker is
>> running correctly.
>>
>>
>>
>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>>> Hi Jeff,
>>>
>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>>>
>>> Does the issue is with the configuration files or with the hdfs files system ?
>>>
>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>
>>> Steps I did :
>>>
>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>>> 2. Then I configured the hadoop conf files and started ./start-all script.
>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>>>
>>> Please let me know if these step miss something ?
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>
>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>
>>>>
>>>>
>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>>>
>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>>>
>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>
>>>>> Please let me know if my understanding is correct ?
>>>>>
>>>>> I am attaching the conf files as well :
>>>>> hdfs-site.xml:
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>
>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>
>>>>> <configuration>
>>>>> <property>
>>>>>  <name>fs.default.name</name>
>>>>>  <value>hdfs://localhost:9000</value>
>>>>>  <description>The name of the default file system.  A URI whose
>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>dfs.replication</name>
>>>>>  <value>1</value>
>>>>>  <description>Default block replication.
>>>>>  The actual number of replications can be specified when the file is created.
>>>>>  The default is used if replication is not specified in create time.
>>>>>  </description>
>>>>> </property>
>>>>>
>>>>> </configuration>
>>>>>
>>>>> core-site.xml
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>
>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>
>>>>> <configuration>
>>>>> <property>
>>>>>  <name>hadoop.tmp.dir</name>
>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>  <description>A base for other temporary directories.</description>
>>>>> </property>
>>>>> </configuration>
>>>>>
>>>>> mapred-site.xml
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>
>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>
>>>>> <configuration>
>>>>> <property>
>>>>>  <name>mapred.job.tracker</name>
>>>>>  <value>localhost:9001</value>
>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>  and reduce task.
>>>>>  </description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>> <value>8</value>
>>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>>> a task tracker
>>>>> </description>
>>>>> </property>
>>>>> </configuration>
>>>>>
>>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>
>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>>> Hi ,
>>>>>>>
>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>
>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>>>
>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>
>>>>>>> Pig Stack Trace
>>>>>>> ---------------
>>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>
>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>        ... 16 more
>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>        ... 24 more
>>>>>>> Caused by: java.io.EOFException
>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>> ================================================================================
>>>>>>>
>>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>>>
>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Yes they are running.

On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:

> Execute command jps in shell to see whether namenode and jobtracker is
> running correctly.
> 
> 
> 
> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
>> Hi Jeff,
>> 
>> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>> 
>> Does the issue is with the configuration files or with the hdfs files system ?
>> 
>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>> 
>> Steps I did :
>> 
>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
>> 2. Then I configured the hadoop conf files and started ./start-all script.
>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>> 
>> Please let me know if these step miss something ?
>> 
>> Thanks,
>> Rahul
>> 
>> 
>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>> 
>>> Try to put the hadoop xml configuration file to pig/conf folder
>>> 
>>> 
>>> 
>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>>> Hi Jeff,
>>>> 
>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>> 
>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>> 
>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>> 
>>>> Please let me know if my understanding is correct ?
>>>> 
>>>> I am attaching the conf files as well :
>>>> hdfs-site.xml:
>>>> 
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>> 
>>>> <!-- Put site-specific property overrides in this file. -->
>>>> 
>>>> <configuration>
>>>> <property>
>>>>  <name>fs.default.name</name>
>>>>  <value>hdfs://localhost:9000</value>
>>>>  <description>The name of the default file system.  A URI whose
>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>  determine the host, port, etc. for a filesystem.</description>
>>>> </property>
>>>> 
>>>> <property>
>>>>  <name>dfs.replication</name>
>>>>  <value>1</value>
>>>>  <description>Default block replication.
>>>>  The actual number of replications can be specified when the file is created.
>>>>  The default is used if replication is not specified in create time.
>>>>  </description>
>>>> </property>
>>>> 
>>>> </configuration>
>>>> 
>>>> core-site.xml
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>> 
>>>> <!-- Put site-specific property overrides in this file. -->
>>>> 
>>>> <configuration>
>>>> <property>
>>>>  <name>hadoop.tmp.dir</name>
>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>  <description>A base for other temporary directories.</description>
>>>> </property>
>>>> </configuration>
>>>> 
>>>> mapred-site.xml
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>> 
>>>> <!-- Put site-specific property overrides in this file. -->
>>>> 
>>>> <configuration>
>>>> <property>
>>>>  <name>mapred.job.tracker</name>
>>>>  <value>localhost:9001</value>
>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>  at. If "local", then jobs are run in-process as a single map
>>>>  and reduce task.
>>>>  </description>
>>>> </property>
>>>> 
>>>> <property>
>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>> <value>8</value>
>>>> <description>The maximum number of tasks that will be run simultaneously by a
>>>> a task tracker
>>>> </description>
>>>> </property>
>>>> </configuration>
>>>> 
>>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>> 
>>>> Thanks,
>>>> Rahul
>>>> 
>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>> 
>>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>>> local file system but conncect Hadoop's JobTracker.
>>>>> Make sure you set the correct configuration in core-site.xml
>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>>> Hi ,
>>>>>> 
>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>> 
>>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>> 
>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>> 
>>>>>> Pig Stack Trace
>>>>>> ---------------
>>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>> 
>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>        ... 16 more
>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>        ... 24 more
>>>>>> Caused by: java.io.EOFException
>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>> ================================================================================
>>>>>> 
>>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>> 
>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
Execute command jps in shell to see whether namenode and jobtracker is
running correctly.



On Fri, Aug 27, 2010 at 9:49 AM, rahul <rm...@apple.com> wrote:
> Hi Jeff,
>
> I transferred the hadoop conf files to the pig/conf location but still i get the same error.
>
> Does the issue is with the configuration files or with the hdfs files system ?
>
> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>
> Steps I did :
>
> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
> 2. Then I configured the hadoop conf files and started ./start-all script.
> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig
>
> Please let me know if these step miss something ?
>
> Thanks,
> Rahul
>
>
> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>
>> Try to put the hadoop xml configuration file to pig/conf folder
>>
>>
>>
>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>>> Hi Jeff,
>>>
>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>>>
>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>>>
>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>
>>> Please let me know if my understanding is correct ?
>>>
>>> I am attaching the conf files as well :
>>> hdfs-site.xml:
>>>
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>hdfs://localhost:9000</value>
>>>  <description>The name of the default file system.  A URI whose
>>>  scheme and authority determine the FileSystem implementation.  The
>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>  the FileSystem implementation class.  The uri's authority is used to
>>>  determine the host, port, etc. for a filesystem.</description>
>>> </property>
>>>
>>> <property>
>>>  <name>dfs.replication</name>
>>>  <value>1</value>
>>>  <description>Default block replication.
>>>  The actual number of replications can be specified when the file is created.
>>>  The default is used if replication is not specified in create time.
>>>  </description>
>>> </property>
>>>
>>> </configuration>
>>>
>>> core-site.xml
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>hadoop.tmp.dir</name>
>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>  <description>A base for other temporary directories.</description>
>>> </property>
>>> </configuration>
>>>
>>> mapred-site.xml
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>mapred.job.tracker</name>
>>>  <value>localhost:9001</value>
>>>  <description>The host and port that the MapReduce job tracker runs
>>>  at. If "local", then jobs are run in-process as a single map
>>>  and reduce task.
>>>  </description>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.tasks.maximum</name>
>>> <value>8</value>
>>> <description>The maximum number of tasks that will be run simultaneously by a
>>> a task tracker
>>> </description>
>>> </property>
>>> </configuration>
>>>
>>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>
>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>> local file system but conncect Hadoop's JobTracker.
>>>> Make sure you set the correct configuration in core-site.xml
>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>
>>>>
>>>>
>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>>> Hi ,
>>>>>
>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>
>>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>>>
>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>
>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>        ... 16 more
>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>        ... 24 more
>>>>> Caused by: java.io.EOFException
>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>> ================================================================================
>>>>>
>>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>>>
>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Hi Jeff,

I transferred the hadoop conf files to the pig/conf location but still i get the same error.

Does the issue is with the configuration files or with the hdfs files system ? 

Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? 

Steps I did :

1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS.
2. Then I configured the hadoop conf files and started ./start-all script.
3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter.
The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig

Please let me know if these step miss something ?

Thanks,
Rahul


On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:

> Try to put the hadoop xml configuration file to pig/conf folder
> 
> 
> 
> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
>> Hi Jeff,
>> 
>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>> 
>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>> 
>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>> 
>> Please let me know if my understanding is correct ?
>> 
>> I am attaching the conf files as well :
>> hdfs-site.xml:
>> 
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>hdfs://localhost:9000</value>
>>  <description>The name of the default file system.  A URI whose
>>  scheme and authority determine the FileSystem implementation.  The
>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>  the FileSystem implementation class.  The uri's authority is used to
>>  determine the host, port, etc. for a filesystem.</description>
>> </property>
>> 
>> <property>
>>  <name>dfs.replication</name>
>>  <value>1</value>
>>  <description>Default block replication.
>>  The actual number of replications can be specified when the file is created.
>>  The default is used if replication is not specified in create time.
>>  </description>
>> </property>
>> 
>> </configuration>
>> 
>> core-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>> <property>
>>  <name>hadoop.tmp.dir</name>
>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>  <description>A base for other temporary directories.</description>
>> </property>
>> </configuration>
>> 
>> mapred-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>> <property>
>>  <name>mapred.job.tracker</name>
>>  <value>localhost:9001</value>
>>  <description>The host and port that the MapReduce job tracker runs
>>  at. If "local", then jobs are run in-process as a single map
>>  and reduce task.
>>  </description>
>> </property>
>> 
>> <property>
>> <name>mapred.tasktracker.tasks.maximum</name>
>> <value>8</value>
>> <description>The maximum number of tasks that will be run simultaneously by a
>> a task tracker
>> </description>
>> </property>
>> </configuration>
>> 
>> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>> 
>> Thanks,
>> Rahul
>> 
>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>> 
>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>> local file system but conncect Hadoop's JobTracker.
>>> Make sure you set the correct configuration in core-site.xml
>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>> 
>>> 
>>> 
>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>>> Hi ,
>>>> 
>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>> 
>>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>> 
>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>> 
>>>> Pig Stack Trace
>>>> ---------------
>>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>> 
>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>        ... 16 more
>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>        ... 24 more
>>>> Caused by: java.io.EOFException
>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>> ================================================================================
>>>> 
>>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>> 
>>>> Can someone tell me how to connect Pig and hadoop.
>>>> 
>>>> Thanks.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
Try to put the hadoop xml configuration file to pig/conf folder



On Thu, Aug 26, 2010 at 6:22 PM, rahul <rm...@apple.com> wrote:
> Hi Jeff,
>
> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.
>
> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.
>
> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>
> Please let me know if my understanding is correct ?
>
> I am attaching the conf files as well :
> hdfs-site.xml:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://localhost:9000</value>
>  <description>The name of the default file system.  A URI whose
>  scheme and authority determine the FileSystem implementation.  The
>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>  the FileSystem implementation class.  The uri's authority is used to
>  determine the host, port, etc. for a filesystem.</description>
> </property>
>
> <property>
>  <name>dfs.replication</name>
>  <value>1</value>
>  <description>Default block replication.
>  The actual number of replications can be specified when the file is created.
>  The default is used if replication is not specified in create time.
>  </description>
> </property>
>
> </configuration>
>
> core-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>  <description>A base for other temporary directories.</description>
> </property>
> </configuration>
>
> mapred-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>localhost:9001</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at. If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
> </property>
>
> <property>
> <name>mapred.tasktracker.tasks.maximum</name>
> <value>8</value>
> <description>The maximum number of tasks that will be run simultaneously by a
> a task tracker
> </description>
> </property>
> </configuration>
>
> Please let me know if there is a issue in my configurations ? Any input is valuable for me.
>
> Thanks,
> Rahul
>
> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>
>> Do you put the hadoop conf on classpath ? It seems you are still using
>> local file system but conncect Hadoop's JobTracker.
>> Make sure you set the correct configuration in core-site.xml
>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>
>>
>>
>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>>> Hi ,
>>>
>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>
>>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>>>
>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>
>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>        at org.apache.pig.Main.main(Main.java:391)
>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>        ... 16 more
>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>        ... 24 more
>>> Caused by: java.io.EOFException
>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>> ================================================================================
>>>
>>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>>>
>>> Can someone tell me how to connect Pig and hadoop.
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Posted by rahul <rm...@apple.com>.
Hi Jeff,

I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.

But I have both Pig and hadoop running at the same machine, so localhost should not make a difference.

So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

Please let me know if my understanding is correct ?

I am attaching the conf files as well :
hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

</configuration>

core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
</configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at. If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of tasks that will be run simultaneously by a
a task tracker
</description>
</property>
</configuration>

Please let me know if there is a issue in my configurations ? Any input is valuable for me.

Thanks,
Rahul

On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

> Do you put the hadoop conf on classpath ? It seems you are still using
> local file system but conncect Hadoop's JobTracker.
> Make sure you set the correct configuration in core-site.xml
> hdfs-site.xml, mapred-site.xml, and put them on classpath.
> 
> 
> 
> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
>> Hi ,
>> 
>> I am trying to integrate Pig with Hadoop for processing of jobs.
>> 
>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>> 
>> But when I try to run Pig with Hadoop I get follwong Error:
>> 
>> Pig Stack Trace
>> ---------------
>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>> 
>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>        at org.apache.pig.Main.main(Main.java:391)
>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>        ... 16 more
>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>        ... 24 more
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>> ================================================================================
>> 
>> Did anyone got the same error. I think it related to connection between pig and hadoop.
>> 
>> Can someone tell me how to connect Pig and hadoop.
>> 
>> Thanks.
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Pig and Hadoop Integration Error

Posted by Jeff Zhang <zj...@gmail.com>.
Do you put the hadoop conf on classpath ? It seems you are still using
local file system but conncect Hadoop's JobTracker.
Make sure you set the correct configuration in core-site.xml
hdfs-site.xml, mapred-site.xml, and put them on classpath.



On Thu, Aug 26, 2010 at 5:32 PM, rahul <rm...@apple.com> wrote:
> Hi ,
>
> I am trying to integrate Pig with Hadoop for processing of jobs.
>
> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>
> But when I try to run Pig with Hadoop I get follwong Error:
>
> Pig Stack Trace
> ---------------
> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>
> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop
>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>        at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>        at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>        at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>        at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>        at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>        ... 16 more
> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>        at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>        ... 24 more
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> ================================================================================
>
> Did anyone got the same error. I think it related to connection between pig and hadoop.
>
> Can someone tell me how to connect Pig and hadoop.
>
> Thanks.
>



-- 
Best Regards

Jeff Zhang