You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by MIS <mi...@gmail.com> on 2011/05/23 11:56:06 UTC

An issue with Hive on hadoop cluster

I'm getting into an issue when trying to run hive over the hadoop cluster.

The hadoop cluster is working fine, in a stand alone manner.
I'm using hadoop 0.20.2 and hive 0.7.0 versions.

The problem is that the hive is not considering the fs.default.name property
that I am setting in the core-site.xml or the mapred.job.tracker in the
mapred-site.xml files.
It always considers that namenode can be accessed at localhost (refer to the
stack trace below)
So I have specified these properties in the hive-site.xml file as well. I
tried making them as final in the hive-site.xml file, but didn't get the
intended result.
Further, I set the above properties through command line as well. Again, no
success.

I looked at the hive code for 0.7.0 branch to debug the issue, to see if it
getting fs.default.name property from the file hive-site.xml, which it does
through clone of the JobConf. So no issues here.

Further, in hive-site.xml, if I make any of the properties as final, then
hive gives me a WARNING log. as below :

*WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
parameter: hive.metastore.warehouse.dir;  Ignoring.*

>From the above message I can assume that it has already read the
property(don't know from where, or it may be trying to read the property
multiple times), but I have explicitly specified the hive conf folder in the
hive-env.sh.

Below is the stack trace I'm getting in the log file:
*2011-05-23 15:11:00,793 ERROR CliDriver (SessionState.java:printError(343))
- Failed with exception java.io.IOException:java.net.ConnectException: Call
to localhost/127.0.0.1:54310 failed on connection exception:
java.net.ConnectException: Connection refused
java.io.IOException: java.net.ConnectException: Call to localhost/
127.0.0.1:54310 failed on connection exception: java.net.ConnectException:
Connection refused
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Call to
localhost/127.0.0.1:54310failed on connection exception:
java.net.ConnectException: Connection
refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy4.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
    ... 10 more
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    ... 25 more
*
Has anybody encountered similar issues earlier ? any thoughts towards
resolving the above issue would be helpful

Thanks.

Re: An issue with Hive on hadoop cluster

Posted by MIS <mi...@gmail.com>.

Thanks for the suggestions.
I had tried out by specifying only the ips instead of hostnames, but now I
modified the /etc/hosts file as suggested below, appropriately. Still no
success.

If I use IPs instead of hostnames I get the below error in the hive cli.

*2011-05-24 14:42:53,485 ERROR ql.Driver (SessionState.java:printError(343))
- FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making
MR scratch directory - check filesystem config (null))
java.lang.RuntimeException: Error while making MR scratch directory - check
filesystem config (null)
    at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:196)
    at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:247)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:900)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6594)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://
192.168.0.18:9000/tmp/hive-hadoop/hive_2011-05-24_14-42-53_287_7078843136333133329,
expected: hdfs://<myHostName>:9000
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
    at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:116)
    at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:146)
    at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:190)
    ... 14 more*

Thanks.


On Tue, May 24, 2011 at 2:01 PM, Eric Djatsa <dj...@gmail.com> wrote:

> Hi, I had similar problems when I was setting up my hadoop cluster. The
> Datanodes where trying to access localhost instead of my namenode.
> To fix this issue I modified my /etc/hosts file on all my nodes (namenode +
> datanodes)  in such a way that the first line corresponds to the binding <IP
> address-->hostname>.
>
> For example on my namenode I have :
> /etc/hosts :
> 192.168.0.1  mynamenode.mydomain.com    mynamenode
> 127.0.0.1       localhost
>
> On my datanodes I have :
> 192.168.0.X  mydatanodeX.mydomain.com    mydatanodeX
> 127.0.0.1       localhost
>
> If this does'nt work in first place, try secifying the namenode in
> core-site.xml on all datanodes with it's IP address rather than it's
> hostname.
>
> Hope this helps !
> Regards
> Eric
>
>
>
> 2011/5/24 MIS <mi...@gmail.com>
>
>> I have the configuration consistent across both the client and server
>> sides. I have checked the hadoop logs on both  the nodes. On both the nodes,
>> in the tasktracker logs, every task attempt is directed towards
>> hdfs://localhost:54310/user/hive/warehouse and not towards
>> hdfs://<myHostName>:54310/user/hive/warehouse.
>>
>> Further, I have given the absolute path for the property
>> hive.metastore.warehouse.dir as
>> hdfs://<myHostName>:54310:/user/hive/warehouse in the file hive-site.xml
>>
>> Also, if I change the port number for fs.default.name across all the
>> locations, the change is visible, but still the hostname comes as localhost.
>>
>> As mentioned earlier, If i give the server running namenode an alias in
>> the /etc/hosts file as localhost, on all the nodes, every thing works fine.
>> But obviously I can't go ahead with this.
>>
>> Thanks.
>>
>>
>> On Tue, May 24, 2011 at 1:50 AM, Ning Zhang <nz...@fb.com> wrote:
>>
>>>  AFAIK, the fs.default.name should be set by both the client and server
>>> side .xml files, and they should be consistent (the URI scheme, the hostname
>>> and port number). The server side config (also called fs.default.name)
>>> should be read by the namenode and the client side is read by any HDFS
>>> clients (Hive is one of them).
>>>
>>>  For example, the setting we have is:
>>>
>>>  server side core-site-custom.xml:
>>>
>>>  <property>
>>>   <name>fs.default.name</name>
>>>   <value>*hdfs://hostname:9000*</value>
>>>   <description>The name of the default file system.  A URI whose
>>>   scheme and authority determine the FileSystem implementation.  The
>>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>   the FileSystem implementation class.  The uri's authority is used to
>>>   determine the host, port, etc. for a filesystem.</description>
>>> </property>
>>>
>>>  client side core-site.xml:
>>>
>>>  <property>
>>>   <name>fs.default.name</name>
>>>   <value>*hdfs://hostname:9000*</value>
>>>   <description>The name of the default file system.  A URI whose
>>>   scheme and authority determine the FileSystem implementation.  The
>>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>   the FileSystem implementation class.  The uri's authority is used to
>>>   determine the host, port, etc. for a filesystem.</description>
>>> </property>
>>>
>>>  From the stack trace it seems Hive is trying to connect to port 54310,
>>> which you should check if it is correct from your server side HDFS config.
>>>
>>>
>>>  On May 23, 2011, at 4:00 AM, MIS wrote:
>>>
>>> I have already tried your suggestion. I have mentioned the same in my
>>> mail.
>>> I have also given the required permissions for the directory
>>> (hive.metastore.warehouse.dir).
>>>
>>> If you look closely at the stack trace , the port number that I have
>>> specified in the config files for the namenode and jobtracker is reflected
>>> but not the hostname. I have also gone through the code base to verify the
>>> issue. But nothing fishy there.
>>> The stand-alone hadoop cluster is working fine, but when I try to run a
>>> simple query a select , to fetch a few rows, hive throws up the exception.
>>>
>>> I was able to get this to work with a few hacks though, like adding
>>> localhost as alias in the /etc/hosts file for the server running the
>>> namenode. But I can't go ahead with this solution, as it'll break other
>>> things.
>>>
>>> Thanks.
>>>
>>>
>>> On Mon, May 23, 2011 at 4:14 PM, jinhang du <du...@gmail.com> wrote:
>>>
>>>> Set the follow property in hive.site.xml.
>>>> fs.default.name = hdfs:<your namenode of hadoop>
>>>> mapred.job.tracker  = <your job tracker:port>
>>>> hive.metastore.warehouse.dir =  <hdfs path>
>>>> Make sure you have the authority to write into this directory
>>>> (hive.metastore.warehouse.dir).
>>>> Try it.
>>>>
>>>>
>>>> 2011/5/23 MIS <mi...@gmail.com>
>>>>
>>>>> I'm getting into an issue when trying to run hive over the hadoop
>>>>> cluster.
>>>>>
>>>>> The hadoop cluster is working fine, in a stand alone manner.
>>>>> I'm using hadoop 0.20.2 and hive 0.7.0 versions.
>>>>>
>>>>> The problem is that the hive is not considering the fs.default.nameproperty that I am setting in the core-site.xml or the mapred.job.tracker in
>>>>> the mapred-site.xml files.
>>>>> It always considers that namenode can be accessed at localhost (refer
>>>>> to the stack trace below)
>>>>> So I have specified these properties in the hive-site.xml file as well.
>>>>> I tried making them as final in the hive-site.xml file, but didn't get the
>>>>> intended result.
>>>>> Further, I set the above properties through command line as well.
>>>>> Again, no success.
>>>>>
>>>>> I looked at the hive code for 0.7.0 branch to debug the issue, to see
>>>>> if it getting fs.default.name property from the file hive-site.xml,
>>>>> which it does through clone of the JobConf. So no issues here.
>>>>>
>>>>> Further, in hive-site.xml, if I make any of the properties as final,
>>>>> then hive gives me a WARNING log. as below :
>>>>>
>>>>> *WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
>>>>> file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
>>>>> parameter: hive.metastore.warehouse.dir;  Ignoring.*
>>>>>
>>>>> From the above message I can assume that it has already read the
>>>>> property(don't know from where, or it may be trying to read the property
>>>>> multiple times), but I have explicitly specified the hive conf folder in the
>>>>> hive-env.sh.
>>>>>
>>>>> Below is the stack trace I'm getting in the log file:
>>>>> *2011-05-23 15:11:00,793 ERROR CliDriver
>>>>> (SessionState.java:printError(343)) - Failed with exception
>>>>> java.io.IOException:java.net.ConnectException: Call to localhost/
>>>>> 127.0.0.1:54310 failed on connection exception:
>>>>> java.net.ConnectException: Connection refused
>>>>> java.io.IOException: java.net.ConnectException: Call to localhost/
>>>>> 127.0.0.1:54310 failed on connection exception:
>>>>> java.net.ConnectException: Connection refused
>>>>>     at
>>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
>>>>>     at
>>>>> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
>>>>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
>>>>>     at
>>>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>>>>>     at
>>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>>>>>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>> Caused by: java.net.ConnectException: Call to localhost/
>>>>> 127.0.0.1:54310 failed on connection exception:
>>>>> java.net.ConnectException: Connection refused
>>>>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>     at $Proxy4.getProtocolVersion(Unknown Source)
>>>>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>>>>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>>>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>>>>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>>>>>     at
>>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
>>>>>     at
>>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
>>>>>     at
>>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
>>>>>     ... 10 more
>>>>> Caused by: java.net.ConnectException: Connection refused
>>>>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>     at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>>>     at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>>>>     at
>>>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>>>>>     at
>>>>> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>>>>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:720)
>>>>>     ... 25 more
>>>>> *
>>>>> Has anybody encountered similar issues earlier ? any thoughts towards
>>>>> resolving the above issue would be helpful
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> dujinhang
>>>>
>>>
>>>
>>>
>>
>

Re: An issue with Hive on hadoop cluster

Posted by Eric Djatsa <dj...@gmail.com>.

Hi, I had similar problems when I was setting up my hadoop cluster. The
Datanodes where trying to access localhost instead of my namenode.
To fix this issue I modified my /etc/hosts file on all my nodes (namenode +
datanodes)  in such a way that the first line corresponds to the binding <IP
address-->hostname>.

For example on my namenode I have :
/etc/hosts :
192.168.0.1  mynamenode.mydomain.com    mynamenode
127.0.0.1       localhost

On my datanodes I have :
192.168.0.X  mydatanodeX.mydomain.com    mydatanodeX
127.0.0.1       localhost

If this does'nt work in first place, try secifying the namenode in
core-site.xml on all datanodes with it's IP address rather than it's
hostname.

Hope this helps !
Regards
Eric



2011/5/24 MIS <mi...@gmail.com>

> I have the configuration consistent across both the client and server
> sides. I have checked the hadoop logs on both  the nodes. On both the nodes,
> in the tasktracker logs, every task attempt is directed towards
> hdfs://localhost:54310/user/hive/warehouse and not towards
> hdfs://<myHostName>:54310/user/hive/warehouse.
>
> Further, I have given the absolute path for the property
> hive.metastore.warehouse.dir as
> hdfs://<myHostName>:54310:/user/hive/warehouse in the file hive-site.xml
>
> Also, if I change the port number for fs.default.name across all the
> locations, the change is visible, but still the hostname comes as localhost.
>
> As mentioned earlier, If i give the server running namenode an alias in the
> /etc/hosts file as localhost, on all the nodes, every thing works fine. But
> obviously I can't go ahead with this.
>
> Thanks.
>
>
> On Tue, May 24, 2011 at 1:50 AM, Ning Zhang <nz...@fb.com> wrote:
>
>>  AFAIK, the fs.default.name should be set by both the client and server
>> side .xml files, and they should be consistent (the URI scheme, the hostname
>> and port number). The server side config (also called fs.default.name)
>> should be read by the namenode and the client side is read by any HDFS
>> clients (Hive is one of them).
>>
>>  For example, the setting we have is:
>>
>>  server side core-site-custom.xml:
>>
>>  <property>
>>   <name>fs.default.name</name>
>>   <value>*hdfs://hostname:9000*</value>
>>   <description>The name of the default file system.  A URI whose
>>   scheme and authority determine the FileSystem implementation.  The
>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>   the FileSystem implementation class.  The uri's authority is used to
>>   determine the host, port, etc. for a filesystem.</description>
>> </property>
>>
>>  client side core-site.xml:
>>
>>  <property>
>>   <name>fs.default.name</name>
>>   <value>*hdfs://hostname:9000*</value>
>>   <description>The name of the default file system.  A URI whose
>>   scheme and authority determine the FileSystem implementation.  The
>>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>>   the FileSystem implementation class.  The uri's authority is used to
>>   determine the host, port, etc. for a filesystem.</description>
>> </property>
>>
>>  From the stack trace it seems Hive is trying to connect to port 54310,
>> which you should check if it is correct from your server side HDFS config.
>>
>>
>>  On May 23, 2011, at 4:00 AM, MIS wrote:
>>
>> I have already tried your suggestion. I have mentioned the same in my
>> mail.
>> I have also given the required permissions for the directory
>> (hive.metastore.warehouse.dir).
>>
>> If you look closely at the stack trace , the port number that I have
>> specified in the config files for the namenode and jobtracker is reflected
>> but not the hostname. I have also gone through the code base to verify the
>> issue. But nothing fishy there.
>> The stand-alone hadoop cluster is working fine, but when I try to run a
>> simple query a select , to fetch a few rows, hive throws up the exception.
>>
>> I was able to get this to work with a few hacks though, like adding
>> localhost as alias in the /etc/hosts file for the server running the
>> namenode. But I can't go ahead with this solution, as it'll break other
>> things.
>>
>> Thanks.
>>
>>
>> On Mon, May 23, 2011 at 4:14 PM, jinhang du <du...@gmail.com> wrote:
>>
>>> Set the follow property in hive.site.xml.
>>> fs.default.name = hdfs:<your namenode of hadoop>
>>> mapred.job.tracker  = <your job tracker:port>
>>> hive.metastore.warehouse.dir =  <hdfs path>
>>> Make sure you have the authority to write into this directory
>>> (hive.metastore.warehouse.dir).
>>> Try it.
>>>
>>>
>>> 2011/5/23 MIS <mi...@gmail.com>
>>>
>>>> I'm getting into an issue when trying to run hive over the hadoop
>>>> cluster.
>>>>
>>>> The hadoop cluster is working fine, in a stand alone manner.
>>>> I'm using hadoop 0.20.2 and hive 0.7.0 versions.
>>>>
>>>> The problem is that the hive is not considering the fs.default.nameproperty that I am setting in the core-site.xml or the mapred.job.tracker in
>>>> the mapred-site.xml files.
>>>> It always considers that namenode can be accessed at localhost (refer to
>>>> the stack trace below)
>>>> So I have specified these properties in the hive-site.xml file as well.
>>>> I tried making them as final in the hive-site.xml file, but didn't get the
>>>> intended result.
>>>> Further, I set the above properties through command line as well. Again,
>>>> no success.
>>>>
>>>> I looked at the hive code for 0.7.0 branch to debug the issue, to see if
>>>> it getting fs.default.name property from the file hive-site.xml, which
>>>> it does through clone of the JobConf. So no issues here.
>>>>
>>>> Further, in hive-site.xml, if I make any of the properties as final,
>>>> then hive gives me a WARNING log. as below :
>>>>
>>>> *WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
>>>> file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
>>>> parameter: hive.metastore.warehouse.dir;  Ignoring.*
>>>>
>>>> From the above message I can assume that it has already read the
>>>> property(don't know from where, or it may be trying to read the property
>>>> multiple times), but I have explicitly specified the hive conf folder in the
>>>> hive-env.sh.
>>>>
>>>> Below is the stack trace I'm getting in the log file:
>>>> *2011-05-23 15:11:00,793 ERROR CliDriver
>>>> (SessionState.java:printError(343)) - Failed with exception
>>>> java.io.IOException:java.net.ConnectException: Call to localhost/
>>>> 127.0.0.1:54310 failed on connection exception:
>>>> java.net.ConnectException: Connection refused
>>>> java.io.IOException: java.net.ConnectException: Call to localhost/
>>>> 127.0.0.1:54310 failed on connection exception:
>>>> java.net.ConnectException: Connection refused
>>>>     at
>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
>>>>     at
>>>> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
>>>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
>>>>     at
>>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>>>>     at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>>>>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>> Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:54310failed on connection exception: java.net.ConnectException: Connection
>>>> refused
>>>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>     at $Proxy4.getProtocolVersion(Unknown Source)
>>>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>>>>     at
>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>>>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>>>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>>>>     at
>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
>>>>     at
>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
>>>>     at
>>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
>>>>     ... 10 more
>>>> Caused by: java.net.ConnectException: Connection refused
>>>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>     at
>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>>     at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>>>     at
>>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>>>>     at
>>>> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>>>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:720)
>>>>     ... 25 more
>>>> *
>>>> Has anybody encountered similar issues earlier ? any thoughts towards
>>>> resolving the above issue would be helpful
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>>  --
>>> dujinhang
>>>
>>
>>
>>
>

Re: An issue with Hive on hadoop cluster

Posted by MIS <mi...@gmail.com>.

I have the configuration consistent across both the client and server sides.
I have checked the hadoop logs on both  the nodes. On both the nodes, in the
tasktracker logs, every task attempt is directed towards
hdfs://localhost:54310/user/hive/warehouse and not towards
hdfs://<myHostName>:54310/user/hive/warehouse.

Further, I have given the absolute path for the property
hive.metastore.warehouse.dir as
hdfs://<myHostName>:54310:/user/hive/warehouse in the file hive-site.xml

Also, if I change the port number for fs.default.name across all the
locations, the change is visible, but still the hostname comes as localhost.

As mentioned earlier, If i give the server running namenode an alias in the
/etc/hosts file as localhost, on all the nodes, every thing works fine. But
obviously I can't go ahead with this.

Thanks.

On Tue, May 24, 2011 at 1:50 AM, Ning Zhang <nz...@fb.com> wrote:

>  AFAIK, the fs.default.name should be set by both the client and server
> side .xml files, and they should be consistent (the URI scheme, the hostname
> and port number). The server side config (also called fs.default.name)
> should be read by the namenode and the client side is read by any HDFS
> clients (Hive is one of them).
>
>  For example, the setting we have is:
>
>  server side core-site-custom.xml:
>
>  <property>
>   <name>fs.default.name</name>
>   <value>*hdfs://hostname:9000*</value>
>   <description>The name of the default file system.  A URI whose
>   scheme and authority determine the FileSystem implementation.  The
>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>   the FileSystem implementation class.  The uri's authority is used to
>   determine the host, port, etc. for a filesystem.</description>
> </property>
>
>  client side core-site.xml:
>
>  <property>
>   <name>fs.default.name</name>
>   <value>*hdfs://hostname:9000*</value>
>   <description>The name of the default file system.  A URI whose
>   scheme and authority determine the FileSystem implementation.  The
>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>   the FileSystem implementation class.  The uri's authority is used to
>   determine the host, port, etc. for a filesystem.</description>
> </property>
>
>  From the stack trace it seems Hive is trying to connect to port 54310,
> which you should check if it is correct from your server side HDFS config.
>
>
>  On May 23, 2011, at 4:00 AM, MIS wrote:
>
> I have already tried your suggestion. I have mentioned the same in my mail.
> I have also given the required permissions for the directory
> (hive.metastore.warehouse.dir).
>
> If you look closely at the stack trace , the port number that I have
> specified in the config files for the namenode and jobtracker is reflected
> but not the hostname. I have also gone through the code base to verify the
> issue. But nothing fishy there.
> The stand-alone hadoop cluster is working fine, but when I try to run a
> simple query a select , to fetch a few rows, hive throws up the exception.
>
> I was able to get this to work with a few hacks though, like adding
> localhost as alias in the /etc/hosts file for the server running the
> namenode. But I can't go ahead with this solution, as it'll break other
> things.
>
> Thanks.
>
>
> On Mon, May 23, 2011 at 4:14 PM, jinhang du <du...@gmail.com> wrote:
>
>> Set the follow property in hive.site.xml.
>> fs.default.name = hdfs:<your namenode of hadoop>
>> mapred.job.tracker  = <your job tracker:port>
>> hive.metastore.warehouse.dir =  <hdfs path>
>> Make sure you have the authority to write into this directory
>> (hive.metastore.warehouse.dir).
>> Try it.
>>
>>
>> 2011/5/23 MIS <mi...@gmail.com>
>>
>>> I'm getting into an issue when trying to run hive over the hadoop
>>> cluster.
>>>
>>> The hadoop cluster is working fine, in a stand alone manner.
>>> I'm using hadoop 0.20.2 and hive 0.7.0 versions.
>>>
>>> The problem is that the hive is not considering the fs.default.nameproperty that I am setting in the core-site.xml or the mapred.job.tracker in
>>> the mapred-site.xml files.
>>> It always considers that namenode can be accessed at localhost (refer to
>>> the stack trace below)
>>> So I have specified these properties in the hive-site.xml file as well. I
>>> tried making them as final in the hive-site.xml file, but didn't get the
>>> intended result.
>>> Further, I set the above properties through command line as well. Again,
>>> no success.
>>>
>>> I looked at the hive code for 0.7.0 branch to debug the issue, to see if
>>> it getting fs.default.name property from the file hive-site.xml, which
>>> it does through clone of the JobConf. So no issues here.
>>>
>>> Further, in hive-site.xml, if I make any of the properties as final, then
>>> hive gives me a WARNING log. as below :
>>>
>>> *WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
>>> file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
>>> parameter: hive.metastore.warehouse.dir;  Ignoring.*
>>>
>>> From the above message I can assume that it has already read the
>>> property(don't know from where, or it may be trying to read the property
>>> multiple times), but I have explicitly specified the hive conf folder in the
>>> hive-env.sh.
>>>
>>> Below is the stack trace I'm getting in the log file:
>>> *2011-05-23 15:11:00,793 ERROR CliDriver
>>> (SessionState.java:printError(343)) - Failed with exception
>>> java.io.IOException:java.net.ConnectException: Call to localhost/
>>> 127.0.0.1:54310 failed on connection exception:
>>> java.net.ConnectException: Connection refused
>>> java.io.IOException: java.net.ConnectException: Call to localhost/
>>> 127.0.0.1:54310 failed on connection exception:
>>> java.net.ConnectException: Connection refused
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
>>>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
>>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>>>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:54310failed on connection exception: java.net.ConnectException: Connection
>>> refused
>>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>     at $Proxy4.getProtocolVersion(Unknown Source)
>>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>     at
>>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>>>     at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
>>>     ... 10 more
>>> Caused by: java.net.ConnectException: Connection refused
>>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>     at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>     at
>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>>     at
>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>>>     at
>>> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:720)
>>>     ... 25 more
>>> *
>>> Has anybody encountered similar issues earlier ? any thoughts towards
>>> resolving the above issue would be helpful
>>>
>>> Thanks.
>>>
>>
>>
>>
>>  --
>> dujinhang
>>
>
>
>

Re: An issue with Hive on hadoop cluster

Posted by Ning Zhang <nz...@fb.com>.

AFAIK, the fs.default.name<http://fs.default.name> should be set by both the client and server side .xml files, and they should be consistent (the URI scheme, the hostname and port number). The server side config (also called fs.default.name<http://fs.default.name>) should be read by the namenode and the client side is read by any HDFS clients (Hive is one of them).

For example, the setting we have is:

server side core-site-custom.xml:

<property>
  <name>fs.default.name</name>
  <value>hdfs://hostname:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

client side core-site.xml:

<property>
  <name>fs.default.name</name>
  <value>hdfs://hostname:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

>From the stack trace it seems Hive is trying to connect to port 54310, which you should check if it is correct from your server side HDFS config.

On May 23, 2011, at 4:00 AM, MIS wrote:

I have already tried your suggestion. I have mentioned the same in my mail.
I have also given the required permissions for the directory (hive.metastore.warehouse.dir).

If you look closely at the stack trace , the port number that I have specified in the config files for the namenode and jobtracker is reflected but not the hostname. I have also gone through the code base to verify the issue. But nothing fishy there.
The stand-alone hadoop cluster is working fine, but when I try to run a simple query a select , to fetch a few rows, hive throws up the exception.

I was able to get this to work with a few hacks though, like adding localhost as alias in the /etc/hosts file for the server running the namenode. But I can't go ahead with this solution, as it'll break other things.

Thanks.

On Mon, May 23, 2011 at 4:14 PM, jinhang du <du...@gmail.com>> wrote:
Set the follow property in hive.site.xml.
fs.default.name<http://fs.default.name/> = hdfs:<your namenode of hadoop>
mapred.job.tracker  = <your job tracker:port>
hive.metastore.warehouse.dir =  <hdfs path>
Make sure you have the authority to write into this directory (hive.metastore.warehouse.dir).
Try it.

2011/5/23 MIS <mi...@gmail.com>>
I'm getting into an issue when trying to run hive over the hadoop cluster.

The hadoop cluster is working fine, in a stand alone manner.
I'm using hadoop 0.20.2 and hive 0.7.0 versions.

The problem is that the hive is not considering the fs.default.name<http://fs.default.name/> property that I am setting in the core-site.xml or the mapred.job.tracker in the mapred-site.xml files.
It always considers that namenode can be accessed at localhost (refer to the stack trace below)
So I have specified these properties in the hive-site.xml file as well. I tried making them as final in the hive-site.xml file, but didn't get the intended result.
Further, I set the above properties through command line as well. Again, no success.

I looked at the hive code for 0.7.0 branch to debug the issue, to see if it getting fs.default.name<http://fs.default.name/> property from the file hive-site.xml, which it does through clone of the JobConf. So no issues here.

Further, in hive-site.xml, if I make any of the properties as final, then hive gives me a WARNING log. as below :

WARN  conf.Configuration (Configuration.java:loadResource(1154)) - file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final parameter: hive.metastore.warehouse.dir;  Ignoring.

>From the above message I can assume that it has already read the property(don't know from where, or it may be trying to read the property multiple times), but I have explicitly specified the hive conf folder in the hive-env.sh.

Below is the stack trace I'm getting in the log file:
2011-05-23 15:11:00,793 ERROR CliDriver (SessionState.java:printError(343)) - Failed with exception java.io.IOException:java.net.ConnectException: Call to localhost/127.0.0.1:54310<http://127.0.0.1:54310/> failed on connection exception: java.net.ConnectException: Connection refused
java.io.IOException: java.net.ConnectException: Call to localhost/127.0.0.1:54310<http://127.0.0.1:54310/> failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:54310<http://127.0.0.1:54310/> failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy4.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
    ... 10 more
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    ... 25 more

Has anybody encountered similar issues earlier ? any thoughts towards resolving the above issue would be helpful

Thanks.

--
dujinhang

Re: An issue with Hive on hadoop cluster

Posted by MIS <mi...@gmail.com>.

I have already tried your suggestion. I have mentioned the same in my mail.
I have also given the required permissions for the directory
(hive.metastore.warehouse.dir).

If you look closely at the stack trace , the port number that I have
specified in the config files for the namenode and jobtracker is reflected
but not the hostname. I have also gone through the code base to verify the
issue. But nothing fishy there.
The stand-alone hadoop cluster is working fine, but when I try to run a
simple query a select , to fetch a few rows, hive throws up the exception.

I was able to get this to work with a few hacks though, like adding
localhost as alias in the /etc/hosts file for the server running the
namenode. But I can't go ahead with this solution, as it'll break other
things.

Thanks.


On Mon, May 23, 2011 at 4:14 PM, jinhang du <du...@gmail.com> wrote:

> Set the follow property in hive.site.xml.
> fs.default.name = hdfs:<your namenode of hadoop>
> mapred.job.tracker  = <your job tracker:port>
> hive.metastore.warehouse.dir =  <hdfs path>
> Make sure you have the authority to write into this directory
> (hive.metastore.warehouse.dir).
> Try it.
>
>
> 2011/5/23 MIS <mi...@gmail.com>
>
>> I'm getting into an issue when trying to run hive over the hadoop cluster.
>>
>>
>> The hadoop cluster is working fine, in a stand alone manner.
>> I'm using hadoop 0.20.2 and hive 0.7.0 versions.
>>
>> The problem is that the hive is not considering the fs.default.nameproperty that I am setting in the core-site.xml or the mapred.job.tracker in
>> the mapred-site.xml files.
>> It always considers that namenode can be accessed at localhost (refer to
>> the stack trace below)
>> So I have specified these properties in the hive-site.xml file as well. I
>> tried making them as final in the hive-site.xml file, but didn't get the
>> intended result.
>> Further, I set the above properties through command line as well. Again,
>> no success.
>>
>> I looked at the hive code for 0.7.0 branch to debug the issue, to see if
>> it getting fs.default.name property from the file hive-site.xml, which it
>> does through clone of the JobConf. So no issues here.
>>
>> Further, in hive-site.xml, if I make any of the properties as final, then
>> hive gives me a WARNING log. as below :
>>
>> *WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
>> file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
>> parameter: hive.metastore.warehouse.dir;  Ignoring.*
>>
>> From the above message I can assume that it has already read the
>> property(don't know from where, or it may be trying to read the property
>> multiple times), but I have explicitly specified the hive conf folder in the
>> hive-env.sh.
>>
>> Below is the stack trace I'm getting in the log file:
>> *2011-05-23 15:11:00,793 ERROR CliDriver
>> (SessionState.java:printError(343)) - Failed with exception
>> java.io.IOException:java.net.ConnectException: Call to localhost/
>> 127.0.0.1:54310 failed on connection exception:
>> java.net.ConnectException: Connection refused
>> java.io.IOException: java.net.ConnectException: Call to localhost/
>> 127.0.0.1:54310 failed on connection exception:
>> java.net.ConnectException: Connection refused
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
>>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
>>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>>     at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:54310failed on connection exception: java.net.ConnectException: Connection
>> refused
>>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>     at $Proxy4.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>     at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>>     at
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
>>     ... 10 more
>> Caused by: java.net.ConnectException: Connection refused
>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>     at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>     at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>>     at
>> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:720)
>>     ... 25 more
>> *
>> Has anybody encountered similar issues earlier ? any thoughts towards
>> resolving the above issue would be helpful
>>
>> Thanks.
>>
>
>
>
> --
> dujinhang
>

Re: An issue with Hive on hadoop cluster

Posted by jinhang du <du...@gmail.com>.

Set the follow property in hive.site.xml.
fs.default.name = hdfs:<your namenode of hadoop>
mapred.job.tracker  = <your job tracker:port>
hive.metastore.warehouse.dir =  <hdfs path>
Make sure you have the authority to write into this directory
(hive.metastore.warehouse.dir).
Try it.

2011/5/23 MIS <mi...@gmail.com>

> I'm getting into an issue when trying to run hive over the hadoop cluster.
>
> The hadoop cluster is working fine, in a stand alone manner.
> I'm using hadoop 0.20.2 and hive 0.7.0 versions.
>
> The problem is that the hive is not considering the fs.default.nameproperty that I am setting in the core-site.xml or the mapred.job.tracker in
> the mapred-site.xml files.
> It always considers that namenode can be accessed at localhost (refer to
> the stack trace below)
> So I have specified these properties in the hive-site.xml file as well. I
> tried making them as final in the hive-site.xml file, but didn't get the
> intended result.
> Further, I set the above properties through command line as well. Again, no
> success.
>
> I looked at the hive code for 0.7.0 branch to debug the issue, to see if it
> getting fs.default.name property from the file hive-site.xml, which it
> does through clone of the JobConf. So no issues here.
>
> Further, in hive-site.xml, if I make any of the properties as final, then
> hive gives me a WARNING log. as below :
>
> *WARN  conf.Configuration (Configuration.java:loadResource(1154)) -
> file:/usr/local/hive-0.7.0/conf/hive-site.xml:a attempt to override final
> parameter: hive.metastore.warehouse.dir;  Ignoring.*
>
> From the above message I can assume that it has already read the
> property(don't know from where, or it may be trying to read the property
> multiple times), but I have explicitly specified the hive conf folder in the
> hive-env.sh.
>
> Below is the stack trace I'm getting in the log file:
> *2011-05-23 15:11:00,793 ERROR CliDriver
> (SessionState.java:printError(343)) - Failed with exception
> java.io.IOException:java.net.ConnectException: Call to localhost/
> 127.0.0.1:54310 failed on connection exception: java.net.ConnectException:
> Connection refused
> java.io.IOException: java.net.ConnectException: Call to localhost/
> 127.0.0.1:54310 failed on connection exception: java.net.ConnectException:
> Connection refused
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:54310failed on connection exception: java.net.ConnectException: Connection
> refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
>     at org.apache.hadoop.ipc.Client.call(Client.java:743)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>     at $Proxy4.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextPath(FetchOperator.java:241)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:259)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320)
>     ... 10 more
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>     at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>     at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>     at org.apache.hadoop.ipc.Client.call(Client.java:720)
>     ... 25 more
> *
> Has anybody encountered similar issues earlier ? any thoughts towards
> resolving the above issue would be helpful
>
> Thanks.
>



-- 
dujinhang