You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Daniel Savard <da...@gmail.com> on 2013/12/02 05:34:27 UTC

Hadoop 2.2.0 from source configuration

I am trying to configure hadoop 2.2.0 from source code and I found the
instructions really crappy and incomplete. It is like they were written to
avoid someone can do the job himself and must contract someone else to do
it or buy a packaged version.

It is about three days I am struggling with this stuff with partial
success. The documentation is less than clear and most of the stuff out
there apply to earlier version and they haven't been updated for version
2.2.0.

I was able to setup HDFS, however I am still unable to use it. I am doing a
single node installation and the instruction page doesn't explain anything
beside telling you to do this and that without documenting what each thing
is doing and what choices are available and what guidelines you should
follow. There is even environment variables you are told to set, but
nothing is said about what they mean and to which value they should be set.
It seems it assumes prior knowledge of everything about hadoop.

Anyone knows a site with proper documentation about hadoop or it's hopeless
and this whole thing is just a piece of toxicware?

I am already looking for alternate solutions to hadoop which for sure will
be a nightmare to manage and install each time a new version, release will
become available.

TIA
-----------------
Daniel Savard

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

FYI,

I did recreate from scratch a new filesystem to hold the HDFS and increased
the size until the put operation succeeded. It took me a minimum of 650MB
filesystem to be able to copy a 100K file. I incremented the space by
chunks of 10MB each time to get the best value.

Here is the output of the dfsadmin -report

Configured Capacity: 684486656 (652.78 MB)
Present Capacity: 682922849 (651.29 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used: 136033 (132.84 KB)
DFS Used%: 0.02%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 684486656 (652.78 MB)
DFS Used: 136033 (132.84 KB)
Non DFS Used: 1563807 (1.49 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used%: 0.02%
DFS Remaining%: 99.75%
Last contact: Tue Dec 03 22:01:05 EST 2013


-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam and others,
>
> I solved my problem by increasing by 3GB the filesystem holding the data.
> I didn't try to increase it by smaller steps, so I don't know exactly at
> which point I had enough space for HDFS to work properly. Is there anywhere
> in the documentation a place we can have a list of guidelines, requirements
> for the filesystem(s). And I suppose it is possible to use much less space
> provided some parameter(s) is/are properly configured to use less space
> (namenode?). Any worksheets to plan the disk space capacity for any
> configuration (standalone single node or complete cluster)?
>
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> here is the link:
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>
>> Then, since it didn't work I tried a number of things, but my
>> configuration files are really skinny and there isn't much stuff in it.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Could you please send me a link to the documentation that you followed
>>> to setup your single-node cluster?
>>> I will go through it and do it step by step, so hopefully at the end
>>> your issue will be solved and the documentation will be improved.
>>>
>>> If you have any non-standard settings in core-site.xml, hdfs-site.xml
>>> and hadoop-env.sh (that were not suggested by the documentation that you
>>> followed), then please share them.
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Adam,
>>>>
>>>> that's not the issue, I did substitute the name in the first report.
>>>> The actual hostname is feynman.cids.ca.
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>>> file, and output of $hostname command?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> I did that more than once, I just retry it from the beginning. I
>>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>>> restarted HDFS and I am still getting the very same error.
>>>>>>
>>>>>> I have posted previously the report. Is there anything in this report
>>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>>> thing I can see may cause this problem after everything I read on the
>>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>>> starting to experiment a while with it before going ahead with a complete
>>>>>> cluster.
>>>>>>
>>>>>> I repost the report for convenience:
>>>>>>
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> DFS Used%: 0.00%
>>>>>> Under replicated blocks: 0
>>>>>> Blocks with corrupt replicas: 0
>>>>>> Missing blocks: 0
>>>>>>
>>>>>> -------------------------------------------------
>>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>>
>>>>>> Live datanodes:
>>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>>> Hostname: feynman.cids.ca
>>>>>> Decommission Status : Normal
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>> DFS Used%: 0.00%
>>>>>> DFS Remaining%: 18.18%
>>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> It looks that you can only communicate with NameNode to do
>>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>>
>>>>>>> Did you format the NameNode correctly?
>>>>>>> A quite similar issue is described here:
>>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>>> overkill but it works."
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Arun,
>>>>>>>>
>>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>>> wrong with the configuration so far.
>>>>>>>>
>>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>>> kind of request message in the datanode logfile?
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>>> you could help us improve docs too.
>>>>>>>>>
>>>>>>>>>  Now, for the problem at hand:
>>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one
>>>>>>>>> place to look. Basically NN cannot find any datanodes. Anything in your NN
>>>>>>>>> logs to indicate trouble?
>>>>>>>>>
>>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Arun
>>>>>>>>>
>>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> André,
>>>>>>>>>
>>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>>> and should come without saying more about it.
>>>>>>>>>
>>>>>>>>> I did try the single node setup, it is worst than the cluster
>>>>>>>>> setup regarding the instructions. You are supposed to already have a near
>>>>>>>>> working system as far as I understand the instructions. It is assumed the
>>>>>>>>> HDFS is already setup and working properly. Try to find the instructions to
>>>>>>>>> setup HDFS for version 2.2.0 and you will end up with a lot of
>>>>>>>>> inappropriate instructions about previous version (some properties were
>>>>>>>>> renamed).
>>>>>>>>>
>>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>>
>>>>>>>>> To go back to my very problem at this point:
>>>>>>>>>
>>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>>> excluded in this operation.
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>>
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>>
>>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>>
>>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>>> there that could help in any way neither.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----------------
>>>>>>>>> Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> first of all, before posting to a mailing list, take a deep
>>>>>>>>>> breath and
>>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>>> getting
>>>>>>>>>> useful responses.
>>>>>>>>>>
>>>>>>>>>> While I agree that the docs can be confusing, we should try to
>>>>>>>>>> stay
>>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>>
>>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>>
>>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>>
>>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>>
>>>>>>>>>> - André
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>>> found the
>>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>>> written to
>>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>>> else to do it
>>>>>>>>>> > or buy a packaged version.
>>>>>>>>>> >
>>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>>> partial success.
>>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>>> there apply
>>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>>> 2.2.0.
>>>>>>>>>> >
>>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it.
>>>>>>>>>> I am doing a
>>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>>> explain anything
>>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>>> each thing
>>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>>> should
>>>>>>>>>> > follow. There is even environment variables you are told to
>>>>>>>>>> set, but nothing
>>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>>> set. It seems
>>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>>> >
>>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>>> it's hopeless
>>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>>> >
>>>>>>>>>> > I am already looking for alternate solutions to hadoop which
>>>>>>>>>> for sure will
>>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>>> release will
>>>>>>>>>> > become available.
>>>>>>>>>> >
>>>>>>>>>> > TIA
>>>>>>>>>> > -----------------
>>>>>>>>>> > Daniel Savard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> André Kelpe
>>>>>>>>>> andre@concurrentinc.com
>>>>>>>>>> http://concurrentinc.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Arun C. Murthy
>>>>>>>>> Hortonworks Inc.
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

FYI,

I did recreate from scratch a new filesystem to hold the HDFS and increased
the size until the put operation succeeded. It took me a minimum of 650MB
filesystem to be able to copy a 100K file. I incremented the space by
chunks of 10MB each time to get the best value.

Here is the output of the dfsadmin -report

Configured Capacity: 684486656 (652.78 MB)
Present Capacity: 682922849 (651.29 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used: 136033 (132.84 KB)
DFS Used%: 0.02%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 684486656 (652.78 MB)
DFS Used: 136033 (132.84 KB)
Non DFS Used: 1563807 (1.49 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used%: 0.02%
DFS Remaining%: 99.75%
Last contact: Tue Dec 03 22:01:05 EST 2013


-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam and others,
>
> I solved my problem by increasing by 3GB the filesystem holding the data.
> I didn't try to increase it by smaller steps, so I don't know exactly at
> which point I had enough space for HDFS to work properly. Is there anywhere
> in the documentation a place we can have a list of guidelines, requirements
> for the filesystem(s). And I suppose it is possible to use much less space
> provided some parameter(s) is/are properly configured to use less space
> (namenode?). Any worksheets to plan the disk space capacity for any
> configuration (standalone single node or complete cluster)?
>
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> here is the link:
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>
>> Then, since it didn't work I tried a number of things, but my
>> configuration files are really skinny and there isn't much stuff in it.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Could you please send me a link to the documentation that you followed
>>> to setup your single-node cluster?
>>> I will go through it and do it step by step, so hopefully at the end
>>> your issue will be solved and the documentation will be improved.
>>>
>>> If you have any non-standard settings in core-site.xml, hdfs-site.xml
>>> and hadoop-env.sh (that were not suggested by the documentation that you
>>> followed), then please share them.
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Adam,
>>>>
>>>> that's not the issue, I did substitute the name in the first report.
>>>> The actual hostname is feynman.cids.ca.
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>>> file, and output of $hostname command?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> I did that more than once, I just retry it from the beginning. I
>>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>>> restarted HDFS and I am still getting the very same error.
>>>>>>
>>>>>> I have posted previously the report. Is there anything in this report
>>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>>> thing I can see may cause this problem after everything I read on the
>>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>>> starting to experiment a while with it before going ahead with a complete
>>>>>> cluster.
>>>>>>
>>>>>> I repost the report for convenience:
>>>>>>
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> DFS Used%: 0.00%
>>>>>> Under replicated blocks: 0
>>>>>> Blocks with corrupt replicas: 0
>>>>>> Missing blocks: 0
>>>>>>
>>>>>> -------------------------------------------------
>>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>>
>>>>>> Live datanodes:
>>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>>> Hostname: feynman.cids.ca
>>>>>> Decommission Status : Normal
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>> DFS Used%: 0.00%
>>>>>> DFS Remaining%: 18.18%
>>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> It looks that you can only communicate with NameNode to do
>>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>>
>>>>>>> Did you format the NameNode correctly?
>>>>>>> A quite similar issue is described here:
>>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>>> overkill but it works."
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Arun,
>>>>>>>>
>>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>>> wrong with the configuration so far.
>>>>>>>>
>>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>>> kind of request message in the datanode logfile?
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>>> you could help us improve docs too.
>>>>>>>>>
>>>>>>>>>  Now, for the problem at hand:
>>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one
>>>>>>>>> place to look. Basically NN cannot find any datanodes. Anything in your NN
>>>>>>>>> logs to indicate trouble?
>>>>>>>>>
>>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Arun
>>>>>>>>>
>>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> André,
>>>>>>>>>
>>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>>> and should come without saying more about it.
>>>>>>>>>
>>>>>>>>> I did try the single node setup, it is worst than the cluster
>>>>>>>>> setup regarding the instructions. You are supposed to already have a near
>>>>>>>>> working system as far as I understand the instructions. It is assumed the
>>>>>>>>> HDFS is already setup and working properly. Try to find the instructions to
>>>>>>>>> setup HDFS for version 2.2.0 and you will end up with a lot of
>>>>>>>>> inappropriate instructions about previous version (some properties were
>>>>>>>>> renamed).
>>>>>>>>>
>>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>>
>>>>>>>>> To go back to my very problem at this point:
>>>>>>>>>
>>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>>> excluded in this operation.
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>>
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>>
>>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>>
>>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>>> there that could help in any way neither.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----------------
>>>>>>>>> Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> first of all, before posting to a mailing list, take a deep
>>>>>>>>>> breath and
>>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>>> getting
>>>>>>>>>> useful responses.
>>>>>>>>>>
>>>>>>>>>> While I agree that the docs can be confusing, we should try to
>>>>>>>>>> stay
>>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>>
>>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>>
>>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>>
>>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>>
>>>>>>>>>> - André
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>>> found the
>>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>>> written to
>>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>>> else to do it
>>>>>>>>>> > or buy a packaged version.
>>>>>>>>>> >
>>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>>> partial success.
>>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>>> there apply
>>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>>> 2.2.0.
>>>>>>>>>> >
>>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it.
>>>>>>>>>> I am doing a
>>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>>> explain anything
>>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>>> each thing
>>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>>> should
>>>>>>>>>> > follow. There is even environment variables you are told to
>>>>>>>>>> set, but nothing
>>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>>> set. It seems
>>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>>> >
>>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>>> it's hopeless
>>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>>> >
>>>>>>>>>> > I am already looking for alternate solutions to hadoop which
>>>>>>>>>> for sure will
>>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>>> release will
>>>>>>>>>> > become available.
>>>>>>>>>> >
>>>>>>>>>> > TIA
>>>>>>>>>> > -----------------
>>>>>>>>>> > Daniel Savard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> André Kelpe
>>>>>>>>>> andre@concurrentinc.com
>>>>>>>>>> http://concurrentinc.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Arun C. Murthy
>>>>>>>>> Hortonworks Inc.
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

FYI,

I did recreate from scratch a new filesystem to hold the HDFS and increased
the size until the put operation succeeded. It took me a minimum of 650MB
filesystem to be able to copy a 100K file. I incremented the space by
chunks of 10MB each time to get the best value.

Here is the output of the dfsadmin -report

Configured Capacity: 684486656 (652.78 MB)
Present Capacity: 682922849 (651.29 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used: 136033 (132.84 KB)
DFS Used%: 0.02%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 684486656 (652.78 MB)
DFS Used: 136033 (132.84 KB)
Non DFS Used: 1563807 (1.49 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used%: 0.02%
DFS Remaining%: 99.75%
Last contact: Tue Dec 03 22:01:05 EST 2013


-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam and others,
>
> I solved my problem by increasing by 3GB the filesystem holding the data.
> I didn't try to increase it by smaller steps, so I don't know exactly at
> which point I had enough space for HDFS to work properly. Is there anywhere
> in the documentation a place we can have a list of guidelines, requirements
> for the filesystem(s). And I suppose it is possible to use much less space
> provided some parameter(s) is/are properly configured to use less space
> (namenode?). Any worksheets to plan the disk space capacity for any
> configuration (standalone single node or complete cluster)?
>
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> here is the link:
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>
>> Then, since it didn't work I tried a number of things, but my
>> configuration files are really skinny and there isn't much stuff in it.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Could you please send me a link to the documentation that you followed
>>> to setup your single-node cluster?
>>> I will go through it and do it step by step, so hopefully at the end
>>> your issue will be solved and the documentation will be improved.
>>>
>>> If you have any non-standard settings in core-site.xml, hdfs-site.xml
>>> and hadoop-env.sh (that were not suggested by the documentation that you
>>> followed), then please share them.
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Adam,
>>>>
>>>> that's not the issue, I did substitute the name in the first report.
>>>> The actual hostname is feynman.cids.ca.
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>>> file, and output of $hostname command?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> I did that more than once, I just retry it from the beginning. I
>>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>>> restarted HDFS and I am still getting the very same error.
>>>>>>
>>>>>> I have posted previously the report. Is there anything in this report
>>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>>> thing I can see may cause this problem after everything I read on the
>>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>>> starting to experiment a while with it before going ahead with a complete
>>>>>> cluster.
>>>>>>
>>>>>> I repost the report for convenience:
>>>>>>
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> DFS Used%: 0.00%
>>>>>> Under replicated blocks: 0
>>>>>> Blocks with corrupt replicas: 0
>>>>>> Missing blocks: 0
>>>>>>
>>>>>> -------------------------------------------------
>>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>>
>>>>>> Live datanodes:
>>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>>> Hostname: feynman.cids.ca
>>>>>> Decommission Status : Normal
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>> DFS Used%: 0.00%
>>>>>> DFS Remaining%: 18.18%
>>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> It looks that you can only communicate with NameNode to do
>>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>>
>>>>>>> Did you format the NameNode correctly?
>>>>>>> A quite similar issue is described here:
>>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>>> overkill but it works."
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Arun,
>>>>>>>>
>>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>>> wrong with the configuration so far.
>>>>>>>>
>>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>>> kind of request message in the datanode logfile?
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>>> you could help us improve docs too.
>>>>>>>>>
>>>>>>>>>  Now, for the problem at hand:
>>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one
>>>>>>>>> place to look. Basically NN cannot find any datanodes. Anything in your NN
>>>>>>>>> logs to indicate trouble?
>>>>>>>>>
>>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Arun
>>>>>>>>>
>>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> André,
>>>>>>>>>
>>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>>> and should come without saying more about it.
>>>>>>>>>
>>>>>>>>> I did try the single node setup, it is worst than the cluster
>>>>>>>>> setup regarding the instructions. You are supposed to already have a near
>>>>>>>>> working system as far as I understand the instructions. It is assumed the
>>>>>>>>> HDFS is already setup and working properly. Try to find the instructions to
>>>>>>>>> setup HDFS for version 2.2.0 and you will end up with a lot of
>>>>>>>>> inappropriate instructions about previous version (some properties were
>>>>>>>>> renamed).
>>>>>>>>>
>>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>>
>>>>>>>>> To go back to my very problem at this point:
>>>>>>>>>
>>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>>> excluded in this operation.
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>>
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>>
>>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>>
>>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>>> there that could help in any way neither.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----------------
>>>>>>>>> Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> first of all, before posting to a mailing list, take a deep
>>>>>>>>>> breath and
>>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>>> getting
>>>>>>>>>> useful responses.
>>>>>>>>>>
>>>>>>>>>> While I agree that the docs can be confusing, we should try to
>>>>>>>>>> stay
>>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>>
>>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>>
>>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>>
>>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>>
>>>>>>>>>> - André
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>>> found the
>>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>>> written to
>>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>>> else to do it
>>>>>>>>>> > or buy a packaged version.
>>>>>>>>>> >
>>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>>> partial success.
>>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>>> there apply
>>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>>> 2.2.0.
>>>>>>>>>> >
>>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it.
>>>>>>>>>> I am doing a
>>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>>> explain anything
>>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>>> each thing
>>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>>> should
>>>>>>>>>> > follow. There is even environment variables you are told to
>>>>>>>>>> set, but nothing
>>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>>> set. It seems
>>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>>> >
>>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>>> it's hopeless
>>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>>> >
>>>>>>>>>> > I am already looking for alternate solutions to hadoop which
>>>>>>>>>> for sure will
>>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>>> release will
>>>>>>>>>> > become available.
>>>>>>>>>> >
>>>>>>>>>> > TIA
>>>>>>>>>> > -----------------
>>>>>>>>>> > Daniel Savard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> André Kelpe
>>>>>>>>>> andre@concurrentinc.com
>>>>>>>>>> http://concurrentinc.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Arun C. Murthy
>>>>>>>>> Hortonworks Inc.
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

FYI,

I did recreate from scratch a new filesystem to hold the HDFS and increased
the size until the put operation succeeded. It took me a minimum of 650MB
filesystem to be able to copy a 100K file. I incremented the space by
chunks of 10MB each time to get the best value.

Here is the output of the dfsadmin -report

Configured Capacity: 684486656 (652.78 MB)
Present Capacity: 682922849 (651.29 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used: 136033 (132.84 KB)
DFS Used%: 0.02%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 684486656 (652.78 MB)
DFS Used: 136033 (132.84 KB)
Non DFS Used: 1563807 (1.49 MB)
DFS Remaining: 682786816 (651.16 MB)
DFS Used%: 0.02%
DFS Remaining%: 99.75%
Last contact: Tue Dec 03 22:01:05 EST 2013


-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam and others,
>
> I solved my problem by increasing by 3GB the filesystem holding the data.
> I didn't try to increase it by smaller steps, so I don't know exactly at
> which point I had enough space for HDFS to work properly. Is there anywhere
> in the documentation a place we can have a list of guidelines, requirements
> for the filesystem(s). And I suppose it is possible to use much less space
> provided some parameter(s) is/are properly configured to use less space
> (namenode?). Any worksheets to plan the disk space capacity for any
> configuration (standalone single node or complete cluster)?
>
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> here is the link:
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>
>> Then, since it didn't work I tried a number of things, but my
>> configuration files are really skinny and there isn't much stuff in it.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Could you please send me a link to the documentation that you followed
>>> to setup your single-node cluster?
>>> I will go through it and do it step by step, so hopefully at the end
>>> your issue will be solved and the documentation will be improved.
>>>
>>> If you have any non-standard settings in core-site.xml, hdfs-site.xml
>>> and hadoop-env.sh (that were not suggested by the documentation that you
>>> followed), then please share them.
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Adam,
>>>>
>>>> that's not the issue, I did substitute the name in the first report.
>>>> The actual hostname is feynman.cids.ca.
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>>> file, and output of $hostname command?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> I did that more than once, I just retry it from the beginning. I
>>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>>> restarted HDFS and I am still getting the very same error.
>>>>>>
>>>>>> I have posted previously the report. Is there anything in this report
>>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>>> thing I can see may cause this problem after everything I read on the
>>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>>> starting to experiment a while with it before going ahead with a complete
>>>>>> cluster.
>>>>>>
>>>>>> I repost the report for convenience:
>>>>>>
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> DFS Used%: 0.00%
>>>>>> Under replicated blocks: 0
>>>>>> Blocks with corrupt replicas: 0
>>>>>> Missing blocks: 0
>>>>>>
>>>>>> -------------------------------------------------
>>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>>
>>>>>> Live datanodes:
>>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>>> Hostname: feynman.cids.ca
>>>>>> Decommission Status : Normal
>>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>>
>>>>>> DFS Used: 4096 (4 KB)
>>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>> DFS Used%: 0.00%
>>>>>> DFS Remaining%: 18.18%
>>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> It looks that you can only communicate with NameNode to do
>>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>>
>>>>>>> Did you format the NameNode correctly?
>>>>>>> A quite similar issue is described here:
>>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>>> overkill but it works."
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Arun,
>>>>>>>>
>>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>>> wrong with the configuration so far.
>>>>>>>>
>>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>>> kind of request message in the datanode logfile?
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>>> you could help us improve docs too.
>>>>>>>>>
>>>>>>>>>  Now, for the problem at hand:
>>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one
>>>>>>>>> place to look. Basically NN cannot find any datanodes. Anything in your NN
>>>>>>>>> logs to indicate trouble?
>>>>>>>>>
>>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Arun
>>>>>>>>>
>>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> André,
>>>>>>>>>
>>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>>> and should come without saying more about it.
>>>>>>>>>
>>>>>>>>> I did try the single node setup, it is worst than the cluster
>>>>>>>>> setup regarding the instructions. You are supposed to already have a near
>>>>>>>>> working system as far as I understand the instructions. It is assumed the
>>>>>>>>> HDFS is already setup and working properly. Try to find the instructions to
>>>>>>>>> setup HDFS for version 2.2.0 and you will end up with a lot of
>>>>>>>>> inappropriate instructions about previous version (some properties were
>>>>>>>>> renamed).
>>>>>>>>>
>>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>>
>>>>>>>>> To go back to my very problem at this point:
>>>>>>>>>
>>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>>> excluded in this operation.
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>>
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>>     at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>>     at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>>
>>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>>
>>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>>> there that could help in any way neither.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----------------
>>>>>>>>> Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> first of all, before posting to a mailing list, take a deep
>>>>>>>>>> breath and
>>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>>> getting
>>>>>>>>>> useful responses.
>>>>>>>>>>
>>>>>>>>>> While I agree that the docs can be confusing, we should try to
>>>>>>>>>> stay
>>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>>
>>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>>
>>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>>
>>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>>
>>>>>>>>>> - André
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>>> found the
>>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>>> written to
>>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>>> else to do it
>>>>>>>>>> > or buy a packaged version.
>>>>>>>>>> >
>>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>>> partial success.
>>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>>> there apply
>>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>>> 2.2.0.
>>>>>>>>>> >
>>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it.
>>>>>>>>>> I am doing a
>>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>>> explain anything
>>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>>> each thing
>>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>>> should
>>>>>>>>>> > follow. There is even environment variables you are told to
>>>>>>>>>> set, but nothing
>>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>>> set. It seems
>>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>>> >
>>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>>> it's hopeless
>>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>>> >
>>>>>>>>>> > I am already looking for alternate solutions to hadoop which
>>>>>>>>>> for sure will
>>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>>> release will
>>>>>>>>>> > become available.
>>>>>>>>>> >
>>>>>>>>>> > TIA
>>>>>>>>>> > -----------------
>>>>>>>>>> > Daniel Savard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> André Kelpe
>>>>>>>>>> andre@concurrentinc.com
>>>>>>>>>> http://concurrentinc.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Arun C. Murthy
>>>>>>>>> Hortonworks Inc.
>>>>>>>>> http://hortonworks.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam and others,

I solved my problem by increasing by 3GB the filesystem holding the data. I
didn't try to increase it by smaller steps, so I don't know exactly at
which point I had enough space for HDFS to work properly. Is there anywhere
in the documentation a place we can have a list of guidelines, requirements
for the filesystem(s). And I suppose it is possible to use much less space
provided some parameter(s) is/are properly configured to use less space
(namenode?). Any worksheets to plan the disk space capacity for any
configuration (standalone single node or complete cluster)?



-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> here is the link:
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>
> Then, since it didn't work I tried a number of things, but my
> configuration files are really skinny and there isn't much stuff in it.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Could you please send me a link to the documentation that you followed to
>> setup your single-node cluster?
>> I will go through it and do it step by step, so hopefully at the end your
>> issue will be solved and the documentation will be improved.
>>
>> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
>> hadoop-env.sh (that were not suggested by the documentation that you
>> followed), then please share them.
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Adam,
>>>
>>> that's not the issue, I did substitute the name in the first report. The
>>> actual hostname is feynman.cids.ca.
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>> file, and output of $hostname command?
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> I did that more than once, I just retry it from the beginning. I
>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>> restarted HDFS and I am still getting the very same error.
>>>>>
>>>>> I have posted previously the report. Is there anything in this report
>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>> thing I can see may cause this problem after everything I read on the
>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>> starting to experiment a while with it before going ahead with a complete
>>>>> cluster.
>>>>>
>>>>> I repost the report for convenience:
>>>>>
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> DFS Used%: 0.00%
>>>>> Under replicated blocks: 0
>>>>> Blocks with corrupt replicas: 0
>>>>> Missing blocks: 0
>>>>>
>>>>> -------------------------------------------------
>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>
>>>>> Live datanodes:
>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>> Hostname: feynman.cids.ca
>>>>> Decommission Status : Normal
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>> DFS Used%: 0.00%
>>>>> DFS Remaining%: 18.18%
>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>> It looks that you can only communicate with NameNode to do
>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>
>>>>>> Did you format the NameNode correctly?
>>>>>> A quite similar issue is described here:
>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>> overkill but it works."
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>
>>>>>>> Thanks Arun,
>>>>>>>
>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>> wrong with the configuration so far.
>>>>>>>
>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>> kind of request message in the datanode logfile?
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>> you could help us improve docs too.
>>>>>>>>
>>>>>>>>  Now, for the problem at hand:
>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>>> to indicate trouble?
>>>>>>>>
>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>> help.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Arun
>>>>>>>>
>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> André,
>>>>>>>>
>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>> and should come without saying more about it.
>>>>>>>>
>>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>>
>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>
>>>>>>>> To go back to my very problem at this point:
>>>>>>>>
>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>> excluded in this operation.
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>     at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>     at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>
>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>
>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>> there that could help in any way neither.
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>>> and
>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>> getting
>>>>>>>>> useful responses.
>>>>>>>>>
>>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>
>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>
>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>
>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>
>>>>>>>>> - André
>>>>>>>>>
>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>> found the
>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>> written to
>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>> else to do it
>>>>>>>>> > or buy a packaged version.
>>>>>>>>> >
>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>> partial success.
>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>> there apply
>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>> 2.2.0.
>>>>>>>>> >
>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>>> am doing a
>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>> explain anything
>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>> each thing
>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>> should
>>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>>> but nothing
>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>> set. It seems
>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>> >
>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>> it's hopeless
>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>> >
>>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>>> sure will
>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>> release will
>>>>>>>>> > become available.
>>>>>>>>> >
>>>>>>>>> > TIA
>>>>>>>>> > -----------------
>>>>>>>>> > Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> André Kelpe
>>>>>>>>> andre@concurrentinc.com
>>>>>>>>> http://concurrentinc.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Arun C. Murthy
>>>>>>>> Hortonworks Inc.
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam and others,

I solved my problem by increasing by 3GB the filesystem holding the data. I
didn't try to increase it by smaller steps, so I don't know exactly at
which point I had enough space for HDFS to work properly. Is there anywhere
in the documentation a place we can have a list of guidelines, requirements
for the filesystem(s). And I suppose it is possible to use much less space
provided some parameter(s) is/are properly configured to use less space
(namenode?). Any worksheets to plan the disk space capacity for any
configuration (standalone single node or complete cluster)?



-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> here is the link:
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>
> Then, since it didn't work I tried a number of things, but my
> configuration files are really skinny and there isn't much stuff in it.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Could you please send me a link to the documentation that you followed to
>> setup your single-node cluster?
>> I will go through it and do it step by step, so hopefully at the end your
>> issue will be solved and the documentation will be improved.
>>
>> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
>> hadoop-env.sh (that were not suggested by the documentation that you
>> followed), then please share them.
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Adam,
>>>
>>> that's not the issue, I did substitute the name in the first report. The
>>> actual hostname is feynman.cids.ca.
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>> file, and output of $hostname command?
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> I did that more than once, I just retry it from the beginning. I
>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>> restarted HDFS and I am still getting the very same error.
>>>>>
>>>>> I have posted previously the report. Is there anything in this report
>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>> thing I can see may cause this problem after everything I read on the
>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>> starting to experiment a while with it before going ahead with a complete
>>>>> cluster.
>>>>>
>>>>> I repost the report for convenience:
>>>>>
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> DFS Used%: 0.00%
>>>>> Under replicated blocks: 0
>>>>> Blocks with corrupt replicas: 0
>>>>> Missing blocks: 0
>>>>>
>>>>> -------------------------------------------------
>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>
>>>>> Live datanodes:
>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>> Hostname: feynman.cids.ca
>>>>> Decommission Status : Normal
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>> DFS Used%: 0.00%
>>>>> DFS Remaining%: 18.18%
>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>> It looks that you can only communicate with NameNode to do
>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>
>>>>>> Did you format the NameNode correctly?
>>>>>> A quite similar issue is described here:
>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>> overkill but it works."
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>
>>>>>>> Thanks Arun,
>>>>>>>
>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>> wrong with the configuration so far.
>>>>>>>
>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>> kind of request message in the datanode logfile?
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>> you could help us improve docs too.
>>>>>>>>
>>>>>>>>  Now, for the problem at hand:
>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>>> to indicate trouble?
>>>>>>>>
>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>> help.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Arun
>>>>>>>>
>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> André,
>>>>>>>>
>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>> and should come without saying more about it.
>>>>>>>>
>>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>>
>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>
>>>>>>>> To go back to my very problem at this point:
>>>>>>>>
>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>> excluded in this operation.
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>     at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>     at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>
>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>
>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>> there that could help in any way neither.
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>>> and
>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>> getting
>>>>>>>>> useful responses.
>>>>>>>>>
>>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>
>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>
>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>
>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>
>>>>>>>>> - André
>>>>>>>>>
>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>> found the
>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>> written to
>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>> else to do it
>>>>>>>>> > or buy a packaged version.
>>>>>>>>> >
>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>> partial success.
>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>> there apply
>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>> 2.2.0.
>>>>>>>>> >
>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>>> am doing a
>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>> explain anything
>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>> each thing
>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>> should
>>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>>> but nothing
>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>> set. It seems
>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>> >
>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>> it's hopeless
>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>> >
>>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>>> sure will
>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>> release will
>>>>>>>>> > become available.
>>>>>>>>> >
>>>>>>>>> > TIA
>>>>>>>>> > -----------------
>>>>>>>>> > Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> André Kelpe
>>>>>>>>> andre@concurrentinc.com
>>>>>>>>> http://concurrentinc.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Arun C. Murthy
>>>>>>>> Hortonworks Inc.
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam and others,

I solved my problem by increasing by 3GB the filesystem holding the data. I
didn't try to increase it by smaller steps, so I don't know exactly at
which point I had enough space for HDFS to work properly. Is there anywhere
in the documentation a place we can have a list of guidelines, requirements
for the filesystem(s). And I suppose it is possible to use much less space
provided some parameter(s) is/are properly configured to use less space
(namenode?). Any worksheets to plan the disk space capacity for any
configuration (standalone single node or complete cluster)?



-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> here is the link:
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>
> Then, since it didn't work I tried a number of things, but my
> configuration files are really skinny and there isn't much stuff in it.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Could you please send me a link to the documentation that you followed to
>> setup your single-node cluster?
>> I will go through it and do it step by step, so hopefully at the end your
>> issue will be solved and the documentation will be improved.
>>
>> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
>> hadoop-env.sh (that were not suggested by the documentation that you
>> followed), then please share them.
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Adam,
>>>
>>> that's not the issue, I did substitute the name in the first report. The
>>> actual hostname is feynman.cids.ca.
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>> file, and output of $hostname command?
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> I did that more than once, I just retry it from the beginning. I
>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>> restarted HDFS and I am still getting the very same error.
>>>>>
>>>>> I have posted previously the report. Is there anything in this report
>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>> thing I can see may cause this problem after everything I read on the
>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>> starting to experiment a while with it before going ahead with a complete
>>>>> cluster.
>>>>>
>>>>> I repost the report for convenience:
>>>>>
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> DFS Used%: 0.00%
>>>>> Under replicated blocks: 0
>>>>> Blocks with corrupt replicas: 0
>>>>> Missing blocks: 0
>>>>>
>>>>> -------------------------------------------------
>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>
>>>>> Live datanodes:
>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>> Hostname: feynman.cids.ca
>>>>> Decommission Status : Normal
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>> DFS Used%: 0.00%
>>>>> DFS Remaining%: 18.18%
>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>> It looks that you can only communicate with NameNode to do
>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>
>>>>>> Did you format the NameNode correctly?
>>>>>> A quite similar issue is described here:
>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>> overkill but it works."
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>
>>>>>>> Thanks Arun,
>>>>>>>
>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>> wrong with the configuration so far.
>>>>>>>
>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>> kind of request message in the datanode logfile?
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>> you could help us improve docs too.
>>>>>>>>
>>>>>>>>  Now, for the problem at hand:
>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>>> to indicate trouble?
>>>>>>>>
>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>> help.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Arun
>>>>>>>>
>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> André,
>>>>>>>>
>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>> and should come without saying more about it.
>>>>>>>>
>>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>>
>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>
>>>>>>>> To go back to my very problem at this point:
>>>>>>>>
>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>> excluded in this operation.
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>     at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>     at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>
>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>
>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>> there that could help in any way neither.
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>>> and
>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>> getting
>>>>>>>>> useful responses.
>>>>>>>>>
>>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>
>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>
>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>
>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>
>>>>>>>>> - André
>>>>>>>>>
>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>> found the
>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>> written to
>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>> else to do it
>>>>>>>>> > or buy a packaged version.
>>>>>>>>> >
>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>> partial success.
>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>> there apply
>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>> 2.2.0.
>>>>>>>>> >
>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>>> am doing a
>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>> explain anything
>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>> each thing
>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>> should
>>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>>> but nothing
>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>> set. It seems
>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>> >
>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>> it's hopeless
>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>> >
>>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>>> sure will
>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>> release will
>>>>>>>>> > become available.
>>>>>>>>> >
>>>>>>>>> > TIA
>>>>>>>>> > -----------------
>>>>>>>>> > Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> André Kelpe
>>>>>>>>> andre@concurrentinc.com
>>>>>>>>> http://concurrentinc.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Arun C. Murthy
>>>>>>>> Hortonworks Inc.
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam and others,

I solved my problem by increasing by 3GB the filesystem holding the data. I
didn't try to increase it by smaller steps, so I don't know exactly at
which point I had enough space for HDFS to work properly. Is there anywhere
in the documentation a place we can have a list of guidelines, requirements
for the filesystem(s). And I suppose it is possible to use much less space
provided some parameter(s) is/are properly configured to use less space
(namenode?). Any worksheets to plan the disk space capacity for any
configuration (standalone single node or complete cluster)?



-----------------
Daniel Savard


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> here is the link:
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>
> Then, since it didn't work I tried a number of things, but my
> configuration files are really skinny and there isn't much stuff in it.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Could you please send me a link to the documentation that you followed to
>> setup your single-node cluster?
>> I will go through it and do it step by step, so hopefully at the end your
>> issue will be solved and the documentation will be improved.
>>
>> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
>> hadoop-env.sh (that were not suggested by the documentation that you
>> followed), then please share them.
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Adam,
>>>
>>> that's not the issue, I did substitute the name in the first report. The
>>> actual hostname is feynman.cids.ca.
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>>> file, and output of $hostname command?
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> I did that more than once, I just retry it from the beginning. I
>>>>> zapped the directories and recreated them with hdfs namenode -format and
>>>>> restarted HDFS and I am still getting the very same error.
>>>>>
>>>>> I have posted previously the report. Is there anything in this report
>>>>> that indicates I am not having enough free space somewhere? That's the only
>>>>> thing I can see may cause this problem after everything I read on the
>>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>>> starting to experiment a while with it before going ahead with a complete
>>>>> cluster.
>>>>>
>>>>> I repost the report for convenience:
>>>>>
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>> Present Capacity: 534421504 (509.66 MB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> DFS Used%: 0.00%
>>>>> Under replicated blocks: 0
>>>>> Blocks with corrupt replicas: 0
>>>>> Missing blocks: 0
>>>>>
>>>>> -------------------------------------------------
>>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>>
>>>>> Live datanodes:
>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>>> Hostname: feynman.cids.ca
>>>>> Decommission Status : Normal
>>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>>
>>>>> DFS Used: 4096 (4 KB)
>>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>> DFS Used%: 0.00%
>>>>> DFS Remaining%: 18.18%
>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>> It looks that you can only communicate with NameNode to do
>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>>
>>>>>> Did you format the NameNode correctly?
>>>>>> A quite similar issue is described here:
>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The
>>>>>> last reply says: "The most common is that you have reformatted the
>>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>>> overkill but it works."
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>>
>>>>>>> Thanks Arun,
>>>>>>>
>>>>>>> I already read and did everything recommended at the referred URL.
>>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>>> wrong with the configuration so far.
>>>>>>>
>>>>>>> Is there some sort of diagnostic tool that can query/ping each
>>>>>>> server to make sure it responds properly to requests? When trying to put my
>>>>>>> file, in the datanode log I see nothing, the message appears in the
>>>>>>> namenode log. Is this the expected behavior or should I see at least some
>>>>>>> kind of request message in the datanode logfile?
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>>  Apologies if you had a bad experience. If you can point them out
>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* it if
>>>>>>>> you could help us improve docs too.
>>>>>>>>
>>>>>>>>  Now, for the problem at hand:
>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>>> to indicate trouble?
>>>>>>>>
>>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>>> help.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Arun
>>>>>>>>
>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> André,
>>>>>>>>
>>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>>> and should come without saying more about it.
>>>>>>>>
>>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>>
>>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>>
>>>>>>>> To go back to my very problem at this point:
>>>>>>>>
>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>>> excluded in this operation.
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>>
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>     at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>     at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>>
>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>>
>>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>>> there that could help in any way neither.
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>>> and
>>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you
>>>>>>>>> getting
>>>>>>>>> useful responses.
>>>>>>>>>
>>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>>
>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>>
>>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>>
>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>>
>>>>>>>>> - André
>>>>>>>>>
>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>>> found the
>>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>>> written to
>>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>>> else to do it
>>>>>>>>> > or buy a packaged version.
>>>>>>>>> >
>>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>>> partial success.
>>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>>> there apply
>>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>>> 2.2.0.
>>>>>>>>> >
>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>>> am doing a
>>>>>>>>> > single node installation and the instruction page doesn't
>>>>>>>>> explain anything
>>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>>> each thing
>>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>>> should
>>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>>> but nothing
>>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>>> set. It seems
>>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>>> >
>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>>> it's hopeless
>>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>>> >
>>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>>> sure will
>>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>>> release will
>>>>>>>>> > become available.
>>>>>>>>> >
>>>>>>>>> > TIA
>>>>>>>>> > -----------------
>>>>>>>>> > Daniel Savard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> André Kelpe
>>>>>>>>> andre@concurrentinc.com
>>>>>>>>> http://concurrentinc.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Arun C. Murthy
>>>>>>>> Hortonworks Inc.
>>>>>>>> http://hortonworks.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

here is the link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Then, since it didn't work I tried a number of things, but my configuration
files are really skinny and there isn't much stuff in it.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Could you please send me a link to the documentation that you followed to
> setup your single-node cluster?
> I will go through it and do it step by step, so hopefully at the end your
> issue will be solved and the documentation will be improved.
>
> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
> hadoop-env.sh (that were not suggested by the documentation that you
> followed), then please share them.
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> that's not the issue, I did substitute the name in the first report. The
>> actual hostname is feynman.cids.ca.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>> file, and output of $hostname command?
>>>
>>>
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> I did that more than once, I just retry it from the beginning. I zapped
>>>> the directories and recreated them with hdfs namenode -format and restarted
>>>> HDFS and I am still getting the very same error.
>>>>
>>>> I have posted previously the report. Is there anything in this report
>>>> that indicates I am not having enough free space somewhere? That's the only
>>>> thing I can see may cause this problem after everything I read on the
>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>> starting to experiment a while with it before going ahead with a complete
>>>> cluster.
>>>>
>>>> I repost the report for convenience:
>>>>
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>> Present Capacity: 534421504 (509.66 MB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> DFS Used%: 0.00%
>>>> Under replicated blocks: 0
>>>> Blocks with corrupt replicas: 0
>>>> Missing blocks: 0
>>>>
>>>> -------------------------------------------------
>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>
>>>> Live datanodes:
>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>> Hostname: feynman.cids.ca
>>>> Decommission Status : Normal
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>> DFS Used%: 0.00%
>>>> DFS Remaining%: 18.18%
>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> It looks that you can only communicate with NameNode to do
>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>
>>>>> Did you format the NameNode correctly?
>>>>> A quite similar issue is described here:
>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>>> reply says: "The most common is that you have reformatted the
>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>> overkill but it works."
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> Thanks Arun,
>>>>>>
>>>>>> I already read and did everything recommended at the referred URL.
>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>> wrong with the configuration so far.
>>>>>>
>>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>>> message in the datanode logfile?
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>>> could help us improve docs too.
>>>>>>>
>>>>>>>  Now, for the problem at hand:
>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>> to indicate trouble?
>>>>>>>
>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>> help.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Arun
>>>>>>>
>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> André,
>>>>>>>
>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>> and should come without saying more about it.
>>>>>>>
>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>
>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>
>>>>>>> To go back to my very problem at this point:
>>>>>>>
>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>> excluded in this operation.
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>     at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>     at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>     at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>
>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>
>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>> there that could help in any way neither.
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>> and
>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>>> useful responses.
>>>>>>>>
>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>
>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>
>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>
>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>
>>>>>>>> - André
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>> found the
>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>> written to
>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>> else to do it
>>>>>>>> > or buy a packaged version.
>>>>>>>> >
>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>> partial success.
>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>> there apply
>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>> 2.2.0.
>>>>>>>> >
>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>> am doing a
>>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>>> anything
>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>> each thing
>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>> should
>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>> but nothing
>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>> set. It seems
>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>> >
>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>> it's hopeless
>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>> >
>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>> sure will
>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>> release will
>>>>>>>> > become available.
>>>>>>>> >
>>>>>>>> > TIA
>>>>>>>> > -----------------
>>>>>>>> > Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> André Kelpe
>>>>>>>> andre@concurrentinc.com
>>>>>>>> http://concurrentinc.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

here is the link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Then, since it didn't work I tried a number of things, but my configuration
files are really skinny and there isn't much stuff in it.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Could you please send me a link to the documentation that you followed to
> setup your single-node cluster?
> I will go through it and do it step by step, so hopefully at the end your
> issue will be solved and the documentation will be improved.
>
> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
> hadoop-env.sh (that were not suggested by the documentation that you
> followed), then please share them.
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> that's not the issue, I did substitute the name in the first report. The
>> actual hostname is feynman.cids.ca.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>> file, and output of $hostname command?
>>>
>>>
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> I did that more than once, I just retry it from the beginning. I zapped
>>>> the directories and recreated them with hdfs namenode -format and restarted
>>>> HDFS and I am still getting the very same error.
>>>>
>>>> I have posted previously the report. Is there anything in this report
>>>> that indicates I am not having enough free space somewhere? That's the only
>>>> thing I can see may cause this problem after everything I read on the
>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>> starting to experiment a while with it before going ahead with a complete
>>>> cluster.
>>>>
>>>> I repost the report for convenience:
>>>>
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>> Present Capacity: 534421504 (509.66 MB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> DFS Used%: 0.00%
>>>> Under replicated blocks: 0
>>>> Blocks with corrupt replicas: 0
>>>> Missing blocks: 0
>>>>
>>>> -------------------------------------------------
>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>
>>>> Live datanodes:
>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>> Hostname: feynman.cids.ca
>>>> Decommission Status : Normal
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>> DFS Used%: 0.00%
>>>> DFS Remaining%: 18.18%
>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> It looks that you can only communicate with NameNode to do
>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>
>>>>> Did you format the NameNode correctly?
>>>>> A quite similar issue is described here:
>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>>> reply says: "The most common is that you have reformatted the
>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>> overkill but it works."
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> Thanks Arun,
>>>>>>
>>>>>> I already read and did everything recommended at the referred URL.
>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>> wrong with the configuration so far.
>>>>>>
>>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>>> message in the datanode logfile?
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>>> could help us improve docs too.
>>>>>>>
>>>>>>>  Now, for the problem at hand:
>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>> to indicate trouble?
>>>>>>>
>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>> help.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Arun
>>>>>>>
>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> André,
>>>>>>>
>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>> and should come without saying more about it.
>>>>>>>
>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>
>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>
>>>>>>> To go back to my very problem at this point:
>>>>>>>
>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>> excluded in this operation.
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>     at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>     at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>     at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>
>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>
>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>> there that could help in any way neither.
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>> and
>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>>> useful responses.
>>>>>>>>
>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>
>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>
>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>
>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>
>>>>>>>> - André
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>> found the
>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>> written to
>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>> else to do it
>>>>>>>> > or buy a packaged version.
>>>>>>>> >
>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>> partial success.
>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>> there apply
>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>> 2.2.0.
>>>>>>>> >
>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>> am doing a
>>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>>> anything
>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>> each thing
>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>> should
>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>> but nothing
>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>> set. It seems
>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>> >
>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>> it's hopeless
>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>> >
>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>> sure will
>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>> release will
>>>>>>>> > become available.
>>>>>>>> >
>>>>>>>> > TIA
>>>>>>>> > -----------------
>>>>>>>> > Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> André Kelpe
>>>>>>>> andre@concurrentinc.com
>>>>>>>> http://concurrentinc.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

here is the link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Then, since it didn't work I tried a number of things, but my configuration
files are really skinny and there isn't much stuff in it.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Could you please send me a link to the documentation that you followed to
> setup your single-node cluster?
> I will go through it and do it step by step, so hopefully at the end your
> issue will be solved and the documentation will be improved.
>
> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
> hadoop-env.sh (that were not suggested by the documentation that you
> followed), then please share them.
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> that's not the issue, I did substitute the name in the first report. The
>> actual hostname is feynman.cids.ca.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>> file, and output of $hostname command?
>>>
>>>
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> I did that more than once, I just retry it from the beginning. I zapped
>>>> the directories and recreated them with hdfs namenode -format and restarted
>>>> HDFS and I am still getting the very same error.
>>>>
>>>> I have posted previously the report. Is there anything in this report
>>>> that indicates I am not having enough free space somewhere? That's the only
>>>> thing I can see may cause this problem after everything I read on the
>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>> starting to experiment a while with it before going ahead with a complete
>>>> cluster.
>>>>
>>>> I repost the report for convenience:
>>>>
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>> Present Capacity: 534421504 (509.66 MB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> DFS Used%: 0.00%
>>>> Under replicated blocks: 0
>>>> Blocks with corrupt replicas: 0
>>>> Missing blocks: 0
>>>>
>>>> -------------------------------------------------
>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>
>>>> Live datanodes:
>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>> Hostname: feynman.cids.ca
>>>> Decommission Status : Normal
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>> DFS Used%: 0.00%
>>>> DFS Remaining%: 18.18%
>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> It looks that you can only communicate with NameNode to do
>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>
>>>>> Did you format the NameNode correctly?
>>>>> A quite similar issue is described here:
>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>>> reply says: "The most common is that you have reformatted the
>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>> overkill but it works."
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> Thanks Arun,
>>>>>>
>>>>>> I already read and did everything recommended at the referred URL.
>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>> wrong with the configuration so far.
>>>>>>
>>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>>> message in the datanode logfile?
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>>> could help us improve docs too.
>>>>>>>
>>>>>>>  Now, for the problem at hand:
>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>> to indicate trouble?
>>>>>>>
>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>> help.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Arun
>>>>>>>
>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> André,
>>>>>>>
>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>> and should come without saying more about it.
>>>>>>>
>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>
>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>
>>>>>>> To go back to my very problem at this point:
>>>>>>>
>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>> excluded in this operation.
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>     at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>     at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>     at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>
>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>
>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>> there that could help in any way neither.
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>> and
>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>>> useful responses.
>>>>>>>>
>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>
>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>
>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>
>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>
>>>>>>>> - André
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>> found the
>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>> written to
>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>> else to do it
>>>>>>>> > or buy a packaged version.
>>>>>>>> >
>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>> partial success.
>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>> there apply
>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>> 2.2.0.
>>>>>>>> >
>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>> am doing a
>>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>>> anything
>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>> each thing
>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>> should
>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>> but nothing
>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>> set. It seems
>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>> >
>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>> it's hopeless
>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>> >
>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>> sure will
>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>> release will
>>>>>>>> > become available.
>>>>>>>> >
>>>>>>>> > TIA
>>>>>>>> > -----------------
>>>>>>>> > Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> André Kelpe
>>>>>>>> andre@concurrentinc.com
>>>>>>>> http://concurrentinc.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

here is the link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Then, since it didn't work I tried a number of things, but my configuration
files are really skinny and there isn't much stuff in it.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Could you please send me a link to the documentation that you followed to
> setup your single-node cluster?
> I will go through it and do it step by step, so hopefully at the end your
> issue will be solved and the documentation will be improved.
>
> If you have any non-standard settings in core-site.xml, hdfs-site.xml and
> hadoop-env.sh (that were not suggested by the documentation that you
> followed), then please share them.
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Adam,
>>
>> that's not the issue, I did substitute the name in the first report. The
>> actual hostname is feynman.cids.ca.
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but
>>> now you have feynman.cids.ca. What is the content of your /etc/hosts
>>> file, and output of $hostname command?
>>>
>>>
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> I did that more than once, I just retry it from the beginning. I zapped
>>>> the directories and recreated them with hdfs namenode -format and restarted
>>>> HDFS and I am still getting the very same error.
>>>>
>>>> I have posted previously the report. Is there anything in this report
>>>> that indicates I am not having enough free space somewhere? That's the only
>>>> thing I can see may cause this problem after everything I read on the
>>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>>> starting to experiment a while with it before going ahead with a complete
>>>> cluster.
>>>>
>>>> I repost the report for convenience:
>>>>
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>> Present Capacity: 534421504 (509.66 MB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> DFS Used%: 0.00%
>>>> Under replicated blocks: 0
>>>> Blocks with corrupt replicas: 0
>>>> Missing blocks: 0
>>>>
>>>> -------------------------------------------------
>>>> Datanodes available: 1 (1 total, 0 dead)
>>>>
>>>> Live datanodes:
>>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>>> Hostname: feynman.cids.ca
>>>> Decommission Status : Normal
>>>> Configured Capacity: 2939899904 (2.74 GB)
>>>>
>>>> DFS Used: 4096 (4 KB)
>>>> Non DFS Used: 2405478400 (2.24 GB)
>>>> DFS Remaining: 534417408 (509.66 MB)
>>>> DFS Used%: 0.00%
>>>> DFS Remaining%: 18.18%
>>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>> It looks that you can only communicate with NameNode to do
>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>>
>>>>> Did you format the NameNode correctly?
>>>>> A quite similar issue is described here:
>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>>> reply says: "The most common is that you have reformatted the
>>>>> namenode leaving it in an inconsistent state. The most common solution is
>>>>> to stop dfs, remove the contents of the dfs directories on all the
>>>>> machines, run “hadoop namenode -format” on the controller, then restart
>>>>> dfs. That consistently fixes the problem for me. This may be serious
>>>>> overkill but it works."
>>>>>
>>>>>
>>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>>
>>>>>> Thanks Arun,
>>>>>>
>>>>>> I already read and did everything recommended at the referred URL.
>>>>>> There isn't any error message in the logfiles. The only error message
>>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>>> wrong with the configuration so far.
>>>>>>
>>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>>> message in the datanode logfile?
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>>> could help us improve docs too.
>>>>>>>
>>>>>>>  Now, for the problem at hand:
>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>>> to indicate trouble?
>>>>>>>
>>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>>> help.
>>>>>>>
>>>>>>> thanks,
>>>>>>> Arun
>>>>>>>
>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> André,
>>>>>>>
>>>>>>> good for you that greedy instructions on the reference page were
>>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>>> and should come without saying more about it.
>>>>>>>
>>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>>> instructions about previous version (some properties were renamed).
>>>>>>>
>>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>>
>>>>>>> To go back to my very problem at this point:
>>>>>>>
>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>>> excluded in this operation.
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>     at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>>
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>>     at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>     at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>     at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>>
>>>>>>> I can copy an empty file, but as soon as its content is non-zero I
>>>>>>> am getting this message. Searching on the message is of no help so far.
>>>>>>>
>>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>>> there that could help in any way neither.
>>>>>>>
>>>>>>>
>>>>>>> -----------------
>>>>>>> Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>>> and
>>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>>> useful responses.
>>>>>>>>
>>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>>
>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>>
>>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>>
>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>>
>>>>>>>> - André
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I
>>>>>>>> found the
>>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>>> written to
>>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>>> else to do it
>>>>>>>> > or buy a packaged version.
>>>>>>>> >
>>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>>> partial success.
>>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>>> there apply
>>>>>>>> > to earlier version and they haven't been updated for version
>>>>>>>> 2.2.0.
>>>>>>>> >
>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>>> am doing a
>>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>>> anything
>>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>>> each thing
>>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>>> should
>>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>>> but nothing
>>>>>>>> > is said about what they mean and to which value they should be
>>>>>>>> set. It seems
>>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>>> >
>>>>>>>> > Anyone knows a site with proper documentation about hadoop or
>>>>>>>> it's hopeless
>>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>>> >
>>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>>> sure will
>>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>>> release will
>>>>>>>> > become available.
>>>>>>>> >
>>>>>>>> > TIA
>>>>>>>> > -----------------
>>>>>>>> > Daniel Savard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> André Kelpe
>>>>>>>> andre@concurrentinc.com
>>>>>>>> http://concurrentinc.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Could you please send me a link to the documentation that you followed to
setup your single-node cluster?
I will go through it and do it step by step, so hopefully at the end your
issue will be solved and the documentation will be improved.

If you have any non-standard settings in core-site.xml, hdfs-site.xml and
hadoop-env.sh (that were not suggested by the documentation that you
followed), then please share them.


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> that's not the issue, I did substitute the name in the first report. The
> actual hostname is feynman.cids.ca.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
>> you have feynman.cids.ca. What is the content of your /etc/hosts file,
>> and output of $hostname command?
>>
>>
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> I did that more than once, I just retry it from the beginning. I zapped
>>> the directories and recreated them with hdfs namenode -format and restarted
>>> HDFS and I am still getting the very same error.
>>>
>>> I have posted previously the report. Is there anything in this report
>>> that indicates I am not having enough free space somewhere? That's the only
>>> thing I can see may cause this problem after everything I read on the
>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>> starting to experiment a while with it before going ahead with a complete
>>> cluster.
>>>
>>> I repost the report for convenience:
>>>
>>> Configured Capacity: 2939899904 (2.74 GB)
>>> Present Capacity: 534421504 (509.66 MB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> DFS Used%: 0.00%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> -------------------------------------------------
>>> Datanodes available: 1 (1 total, 0 dead)
>>>
>>> Live datanodes:
>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>> Hostname: feynman.cids.ca
>>> Decommission Status : Normal
>>> Configured Capacity: 2939899904 (2.74 GB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> Non DFS Used: 2405478400 (2.24 GB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>> DFS Used%: 0.00%
>>> DFS Remaining%: 18.18%
>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> It looks that you can only communicate with NameNode to do
>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>
>>>> Did you format the NameNode correctly?
>>>> A quite similar issue is described here:
>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>> reply says: "The most common is that you have reformatted the namenode
>>>> leaving it in an inconsistent state. The most common solution is to stop
>>>> dfs, remove the contents of the dfs directories on all the machines, run
>>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>>> consistently fixes the problem for me. This may be serious overkill but it
>>>> works."
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> Thanks Arun,
>>>>>
>>>>> I already read and did everything recommended at the referred URL.
>>>>> There isn't any error message in the logfiles. The only error message
>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>> wrong with the configuration so far.
>>>>>
>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>> message in the datanode logfile?
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>> could help us improve docs too.
>>>>>>
>>>>>>  Now, for the problem at hand:
>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>> to indicate trouble?
>>>>>>
>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>> help.
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> André,
>>>>>>
>>>>>> good for you that greedy instructions on the reference page were
>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>> and should come without saying more about it.
>>>>>>
>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>> instructions about previous version (some properties were renamed).
>>>>>>
>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>
>>>>>> To go back to my very problem at this point:
>>>>>>
>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>> excluded in this operation.
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>
>>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>>> getting this message. Searching on the message is of no help so far.
>>>>>>
>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>> there that could help in any way neither.
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>> and
>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>> useful responses.
>>>>>>>
>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>
>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>
>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>
>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>
>>>>>>> - André
>>>>>>>
>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>>> the
>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>> written to
>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>> else to do it
>>>>>>> > or buy a packaged version.
>>>>>>> >
>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>> partial success.
>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>> there apply
>>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>>> >
>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>> am doing a
>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>> anything
>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>> each thing
>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>> should
>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>> but nothing
>>>>>>> > is said about what they mean and to which value they should be
>>>>>>> set. It seems
>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>> >
>>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>>> hopeless
>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>> >
>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>> sure will
>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>> release will
>>>>>>> > become available.
>>>>>>> >
>>>>>>> > TIA
>>>>>>> > -----------------
>>>>>>> > Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> André Kelpe
>>>>>>> andre@concurrentinc.com
>>>>>>> http://concurrentinc.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Could you please send me a link to the documentation that you followed to
setup your single-node cluster?
I will go through it and do it step by step, so hopefully at the end your
issue will be solved and the documentation will be improved.

If you have any non-standard settings in core-site.xml, hdfs-site.xml and
hadoop-env.sh (that were not suggested by the documentation that you
followed), then please share them.


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> that's not the issue, I did substitute the name in the first report. The
> actual hostname is feynman.cids.ca.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
>> you have feynman.cids.ca. What is the content of your /etc/hosts file,
>> and output of $hostname command?
>>
>>
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> I did that more than once, I just retry it from the beginning. I zapped
>>> the directories and recreated them with hdfs namenode -format and restarted
>>> HDFS and I am still getting the very same error.
>>>
>>> I have posted previously the report. Is there anything in this report
>>> that indicates I am not having enough free space somewhere? That's the only
>>> thing I can see may cause this problem after everything I read on the
>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>> starting to experiment a while with it before going ahead with a complete
>>> cluster.
>>>
>>> I repost the report for convenience:
>>>
>>> Configured Capacity: 2939899904 (2.74 GB)
>>> Present Capacity: 534421504 (509.66 MB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> DFS Used%: 0.00%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> -------------------------------------------------
>>> Datanodes available: 1 (1 total, 0 dead)
>>>
>>> Live datanodes:
>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>> Hostname: feynman.cids.ca
>>> Decommission Status : Normal
>>> Configured Capacity: 2939899904 (2.74 GB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> Non DFS Used: 2405478400 (2.24 GB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>> DFS Used%: 0.00%
>>> DFS Remaining%: 18.18%
>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> It looks that you can only communicate with NameNode to do
>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>
>>>> Did you format the NameNode correctly?
>>>> A quite similar issue is described here:
>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>> reply says: "The most common is that you have reformatted the namenode
>>>> leaving it in an inconsistent state. The most common solution is to stop
>>>> dfs, remove the contents of the dfs directories on all the machines, run
>>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>>> consistently fixes the problem for me. This may be serious overkill but it
>>>> works."
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> Thanks Arun,
>>>>>
>>>>> I already read and did everything recommended at the referred URL.
>>>>> There isn't any error message in the logfiles. The only error message
>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>> wrong with the configuration so far.
>>>>>
>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>> message in the datanode logfile?
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>> could help us improve docs too.
>>>>>>
>>>>>>  Now, for the problem at hand:
>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>> to indicate trouble?
>>>>>>
>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>> help.
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> André,
>>>>>>
>>>>>> good for you that greedy instructions on the reference page were
>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>> and should come without saying more about it.
>>>>>>
>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>> instructions about previous version (some properties were renamed).
>>>>>>
>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>
>>>>>> To go back to my very problem at this point:
>>>>>>
>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>> excluded in this operation.
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>
>>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>>> getting this message. Searching on the message is of no help so far.
>>>>>>
>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>> there that could help in any way neither.
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>> and
>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>> useful responses.
>>>>>>>
>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>
>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>
>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>
>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>
>>>>>>> - André
>>>>>>>
>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>>> the
>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>> written to
>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>> else to do it
>>>>>>> > or buy a packaged version.
>>>>>>> >
>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>> partial success.
>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>> there apply
>>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>>> >
>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>> am doing a
>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>> anything
>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>> each thing
>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>> should
>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>> but nothing
>>>>>>> > is said about what they mean and to which value they should be
>>>>>>> set. It seems
>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>> >
>>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>>> hopeless
>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>> >
>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>> sure will
>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>> release will
>>>>>>> > become available.
>>>>>>> >
>>>>>>> > TIA
>>>>>>> > -----------------
>>>>>>> > Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> André Kelpe
>>>>>>> andre@concurrentinc.com
>>>>>>> http://concurrentinc.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Could you please send me a link to the documentation that you followed to
setup your single-node cluster?
I will go through it and do it step by step, so hopefully at the end your
issue will be solved and the documentation will be improved.

If you have any non-standard settings in core-site.xml, hdfs-site.xml and
hadoop-env.sh (that were not suggested by the documentation that you
followed), then please share them.


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> that's not the issue, I did substitute the name in the first report. The
> actual hostname is feynman.cids.ca.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
>> you have feynman.cids.ca. What is the content of your /etc/hosts file,
>> and output of $hostname command?
>>
>>
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> I did that more than once, I just retry it from the beginning. I zapped
>>> the directories and recreated them with hdfs namenode -format and restarted
>>> HDFS and I am still getting the very same error.
>>>
>>> I have posted previously the report. Is there anything in this report
>>> that indicates I am not having enough free space somewhere? That's the only
>>> thing I can see may cause this problem after everything I read on the
>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>> starting to experiment a while with it before going ahead with a complete
>>> cluster.
>>>
>>> I repost the report for convenience:
>>>
>>> Configured Capacity: 2939899904 (2.74 GB)
>>> Present Capacity: 534421504 (509.66 MB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> DFS Used%: 0.00%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> -------------------------------------------------
>>> Datanodes available: 1 (1 total, 0 dead)
>>>
>>> Live datanodes:
>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>> Hostname: feynman.cids.ca
>>> Decommission Status : Normal
>>> Configured Capacity: 2939899904 (2.74 GB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> Non DFS Used: 2405478400 (2.24 GB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>> DFS Used%: 0.00%
>>> DFS Remaining%: 18.18%
>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> It looks that you can only communicate with NameNode to do
>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>
>>>> Did you format the NameNode correctly?
>>>> A quite similar issue is described here:
>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>> reply says: "The most common is that you have reformatted the namenode
>>>> leaving it in an inconsistent state. The most common solution is to stop
>>>> dfs, remove the contents of the dfs directories on all the machines, run
>>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>>> consistently fixes the problem for me. This may be serious overkill but it
>>>> works."
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> Thanks Arun,
>>>>>
>>>>> I already read and did everything recommended at the referred URL.
>>>>> There isn't any error message in the logfiles. The only error message
>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>> wrong with the configuration so far.
>>>>>
>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>> message in the datanode logfile?
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>> could help us improve docs too.
>>>>>>
>>>>>>  Now, for the problem at hand:
>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>> to indicate trouble?
>>>>>>
>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>> help.
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> André,
>>>>>>
>>>>>> good for you that greedy instructions on the reference page were
>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>> and should come without saying more about it.
>>>>>>
>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>> instructions about previous version (some properties were renamed).
>>>>>>
>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>
>>>>>> To go back to my very problem at this point:
>>>>>>
>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>> excluded in this operation.
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>
>>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>>> getting this message. Searching on the message is of no help so far.
>>>>>>
>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>> there that could help in any way neither.
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>> and
>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>> useful responses.
>>>>>>>
>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>
>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>
>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>
>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>
>>>>>>> - André
>>>>>>>
>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>>> the
>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>> written to
>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>> else to do it
>>>>>>> > or buy a packaged version.
>>>>>>> >
>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>> partial success.
>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>> there apply
>>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>>> >
>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>> am doing a
>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>> anything
>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>> each thing
>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>> should
>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>> but nothing
>>>>>>> > is said about what they mean and to which value they should be
>>>>>>> set. It seems
>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>> >
>>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>>> hopeless
>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>> >
>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>> sure will
>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>> release will
>>>>>>> > become available.
>>>>>>> >
>>>>>>> > TIA
>>>>>>> > -----------------
>>>>>>> > Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> André Kelpe
>>>>>>> andre@concurrentinc.com
>>>>>>> http://concurrentinc.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Could you please send me a link to the documentation that you followed to
setup your single-node cluster?
I will go through it and do it step by step, so hopefully at the end your
issue will be solved and the documentation will be improved.

If you have any non-standard settings in core-site.xml, hdfs-site.xml and
hadoop-env.sh (that were not suggested by the documentation that you
followed), then please share them.


2013/12/3 Daniel Savard <da...@gmail.com>

> Adam,
>
> that's not the issue, I did substitute the name in the first report. The
> actual hostname is feynman.cids.ca.
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
>> you have feynman.cids.ca. What is the content of your /etc/hosts file,
>> and output of $hostname command?
>>
>>
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> I did that more than once, I just retry it from the beginning. I zapped
>>> the directories and recreated them with hdfs namenode -format and restarted
>>> HDFS and I am still getting the very same error.
>>>
>>> I have posted previously the report. Is there anything in this report
>>> that indicates I am not having enough free space somewhere? That's the only
>>> thing I can see may cause this problem after everything I read on the
>>> subject. I am new to Hadoop and I just want to setup a standalone node for
>>> starting to experiment a while with it before going ahead with a complete
>>> cluster.
>>>
>>> I repost the report for convenience:
>>>
>>> Configured Capacity: 2939899904 (2.74 GB)
>>> Present Capacity: 534421504 (509.66 MB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> DFS Used%: 0.00%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> -------------------------------------------------
>>> Datanodes available: 1 (1 total, 0 dead)
>>>
>>> Live datanodes:
>>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>>> Hostname: feynman.cids.ca
>>> Decommission Status : Normal
>>> Configured Capacity: 2939899904 (2.74 GB)
>>>
>>> DFS Used: 4096 (4 KB)
>>> Non DFS Used: 2405478400 (2.24 GB)
>>> DFS Remaining: 534417408 (509.66 MB)
>>> DFS Used%: 0.00%
>>> DFS Remaining%: 18.18%
>>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>>
>>>> Daniel,
>>>>
>>>> It looks that you can only communicate with NameNode to do
>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>>
>>>> Did you format the NameNode correctly?
>>>> A quite similar issue is described here:
>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>>> reply says: "The most common is that you have reformatted the namenode
>>>> leaving it in an inconsistent state. The most common solution is to stop
>>>> dfs, remove the contents of the dfs directories on all the machines, run
>>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>>> consistently fixes the problem for me. This may be serious overkill but it
>>>> works."
>>>>
>>>>
>>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>>
>>>>> Thanks Arun,
>>>>>
>>>>> I already read and did everything recommended at the referred URL.
>>>>> There isn't any error message in the logfiles. The only error message
>>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>>> wrong with the configuration so far.
>>>>>
>>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>>> to make sure it responds properly to requests? When trying to put my file,
>>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>>> Is this the expected behavior or should I see at least some kind of request
>>>>> message in the datanode logfile?
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>>> could help us improve docs too.
>>>>>>
>>>>>>  Now, for the problem at hand:
>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place
>>>>>> to look. Basically NN cannot find any datanodes. Anything in your NN logs
>>>>>> to indicate trouble?
>>>>>>
>>>>>>  Also, pls feel free to open liras with issues you find and we'll
>>>>>> help.
>>>>>>
>>>>>> thanks,
>>>>>> Arun
>>>>>>
>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> André,
>>>>>>
>>>>>> good for you that greedy instructions on the reference page were
>>>>>> enough to setup your cluster. However, read them again and see how many
>>>>>> assumptions are made into them about what you are supposed to already know
>>>>>> and should come without saying more about it.
>>>>>>
>>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>>> regarding the instructions. You are supposed to already have a near working
>>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>>> already setup and working properly. Try to find the instructions to setup
>>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>>> instructions about previous version (some properties were renamed).
>>>>>>
>>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>>
>>>>>> To go back to my very problem at this point:
>>>>>>
>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>>> excluded in this operation.
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>>
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>>     at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>>     at
>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>>
>>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>>> getting this message. Searching on the message is of no help so far.
>>>>>>
>>>>>> And I skimmed through the cluster instructions and found nothing
>>>>>> there that could help in any way neither.
>>>>>>
>>>>>>
>>>>>> -----------------
>>>>>> Daniel Savard
>>>>>>
>>>>>>
>>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> first of all, before posting to a mailing list, take a deep breath
>>>>>>> and
>>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>>> useful responses.
>>>>>>>
>>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>>
>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>>
>>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>>
>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>>
>>>>>>> - André
>>>>>>>
>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>>> daniel.savard@gmail.com> wrote:
>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>>> the
>>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>>> written to
>>>>>>> > avoid someone can do the job himself and must contract someone
>>>>>>> else to do it
>>>>>>> > or buy a packaged version.
>>>>>>> >
>>>>>>> > It is about three days I am struggling with this stuff with
>>>>>>> partial success.
>>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>>> there apply
>>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>>> >
>>>>>>> > I was able to setup HDFS, however I am still unable to use it. I
>>>>>>> am doing a
>>>>>>> > single node installation and the instruction page doesn't explain
>>>>>>> anything
>>>>>>> > beside telling you to do this and that without documenting what
>>>>>>> each thing
>>>>>>> > is doing and what choices are available and what guidelines you
>>>>>>> should
>>>>>>> > follow. There is even environment variables you are told to set,
>>>>>>> but nothing
>>>>>>> > is said about what they mean and to which value they should be
>>>>>>> set. It seems
>>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>>> >
>>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>>> hopeless
>>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>>> >
>>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>>> sure will
>>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>>> release will
>>>>>>> > become available.
>>>>>>> >
>>>>>>> > TIA
>>>>>>> > -----------------
>>>>>>> > Daniel Savard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> André Kelpe
>>>>>>> andre@concurrentinc.com
>>>>>>> http://concurrentinc.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

that's not the issue, I did substitute the name in the first report. The
actual hostname is feynman.cids.ca.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
> you have feynman.cids.ca. What is the content of your /etc/hosts file,
> and output of $hostname command?
>
>
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> I did that more than once, I just retry it from the beginning. I zapped
>> the directories and recreated them with hdfs namenode -format and restarted
>> HDFS and I am still getting the very same error.
>>
>> I have posted previously the report. Is there anything in this report
>> that indicates I am not having enough free space somewhere? That's the only
>> thing I can see may cause this problem after everything I read on the
>> subject. I am new to Hadoop and I just want to setup a standalone node for
>> starting to experiment a while with it before going ahead with a complete
>> cluster.
>>
>> I repost the report for convenience:
>>
>> Configured Capacity: 2939899904 (2.74 GB)
>> Present Capacity: 534421504 (509.66 MB)
>> DFS Remaining: 534417408 (509.66 MB)
>>
>> DFS Used: 4096 (4 KB)
>> DFS Used%: 0.00%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> -------------------------------------------------
>> Datanodes available: 1 (1 total, 0 dead)
>>
>> Live datanodes:
>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>> Hostname: feynman.cids.ca
>> Decommission Status : Normal
>> Configured Capacity: 2939899904 (2.74 GB)
>>
>> DFS Used: 4096 (4 KB)
>> Non DFS Used: 2405478400 (2.24 GB)
>> DFS Remaining: 534417408 (509.66 MB)
>> DFS Used%: 0.00%
>> DFS Remaining%: 18.18%
>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> It looks that you can only communicate with NameNode to do
>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>
>>> Did you format the NameNode correctly?
>>> A quite similar issue is described here:
>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>> reply says: "The most common is that you have reformatted the namenode
>>> leaving it in an inconsistent state. The most common solution is to stop
>>> dfs, remove the contents of the dfs directories on all the machines, run
>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>> consistently fixes the problem for me. This may be serious overkill but it
>>> works."
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Thanks Arun,
>>>>
>>>> I already read and did everything recommended at the referred URL.
>>>> There isn't any error message in the logfiles. The only error message
>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>> wrong with the configuration so far.
>>>>
>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>> to make sure it responds properly to requests? When trying to put my file,
>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>> Is this the expected behavior or should I see at least some kind of request
>>>> message in the datanode logfile?
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>> could help us improve docs too.
>>>>>
>>>>>  Now, for the problem at hand:
>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>>> indicate trouble?
>>>>>
>>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>>
>>>>> thanks,
>>>>> Arun
>>>>>
>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> André,
>>>>>
>>>>> good for you that greedy instructions on the reference page were
>>>>> enough to setup your cluster. However, read them again and see how many
>>>>> assumptions are made into them about what you are supposed to already know
>>>>> and should come without saying more about it.
>>>>>
>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>> regarding the instructions. You are supposed to already have a near working
>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>> already setup and working properly. Try to find the instructions to setup
>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>> instructions about previous version (some properties were renamed).
>>>>>
>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>
>>>>> To go back to my very problem at this point:
>>>>>
>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>> excluded in this operation.
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>
>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>> getting this message. Searching on the message is of no help so far.
>>>>>
>>>>> And I skimmed through the cluster instructions and found nothing there
>>>>> that could help in any way neither.
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>> useful responses.
>>>>>>
>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>
>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>
>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>
>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>
>>>>>> - André
>>>>>>
>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>> daniel.savard@gmail.com> wrote:
>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>> the
>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>> written to
>>>>>> > avoid someone can do the job himself and must contract someone else
>>>>>> to do it
>>>>>> > or buy a packaged version.
>>>>>> >
>>>>>> > It is about three days I am struggling with this stuff with partial
>>>>>> success.
>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>> there apply
>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>> >
>>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>>> doing a
>>>>>> > single node installation and the instruction page doesn't explain
>>>>>> anything
>>>>>> > beside telling you to do this and that without documenting what
>>>>>> each thing
>>>>>> > is doing and what choices are available and what guidelines you
>>>>>> should
>>>>>> > follow. There is even environment variables you are told to set,
>>>>>> but nothing
>>>>>> > is said about what they mean and to which value they should be set.
>>>>>> It seems
>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>> >
>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>> hopeless
>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>> >
>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>> sure will
>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>> release will
>>>>>> > become available.
>>>>>> >
>>>>>> > TIA
>>>>>> > -----------------
>>>>>> > Daniel Savard
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> André Kelpe
>>>>>> andre@concurrentinc.com
>>>>>> http://concurrentinc.com
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

that's not the issue, I did substitute the name in the first report. The
actual hostname is feynman.cids.ca.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
> you have feynman.cids.ca. What is the content of your /etc/hosts file,
> and output of $hostname command?
>
>
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> I did that more than once, I just retry it from the beginning. I zapped
>> the directories and recreated them with hdfs namenode -format and restarted
>> HDFS and I am still getting the very same error.
>>
>> I have posted previously the report. Is there anything in this report
>> that indicates I am not having enough free space somewhere? That's the only
>> thing I can see may cause this problem after everything I read on the
>> subject. I am new to Hadoop and I just want to setup a standalone node for
>> starting to experiment a while with it before going ahead with a complete
>> cluster.
>>
>> I repost the report for convenience:
>>
>> Configured Capacity: 2939899904 (2.74 GB)
>> Present Capacity: 534421504 (509.66 MB)
>> DFS Remaining: 534417408 (509.66 MB)
>>
>> DFS Used: 4096 (4 KB)
>> DFS Used%: 0.00%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> -------------------------------------------------
>> Datanodes available: 1 (1 total, 0 dead)
>>
>> Live datanodes:
>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>> Hostname: feynman.cids.ca
>> Decommission Status : Normal
>> Configured Capacity: 2939899904 (2.74 GB)
>>
>> DFS Used: 4096 (4 KB)
>> Non DFS Used: 2405478400 (2.24 GB)
>> DFS Remaining: 534417408 (509.66 MB)
>> DFS Used%: 0.00%
>> DFS Remaining%: 18.18%
>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> It looks that you can only communicate with NameNode to do
>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>
>>> Did you format the NameNode correctly?
>>> A quite similar issue is described here:
>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>> reply says: "The most common is that you have reformatted the namenode
>>> leaving it in an inconsistent state. The most common solution is to stop
>>> dfs, remove the contents of the dfs directories on all the machines, run
>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>> consistently fixes the problem for me. This may be serious overkill but it
>>> works."
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Thanks Arun,
>>>>
>>>> I already read and did everything recommended at the referred URL.
>>>> There isn't any error message in the logfiles. The only error message
>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>> wrong with the configuration so far.
>>>>
>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>> to make sure it responds properly to requests? When trying to put my file,
>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>> Is this the expected behavior or should I see at least some kind of request
>>>> message in the datanode logfile?
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>> could help us improve docs too.
>>>>>
>>>>>  Now, for the problem at hand:
>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>>> indicate trouble?
>>>>>
>>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>>
>>>>> thanks,
>>>>> Arun
>>>>>
>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> André,
>>>>>
>>>>> good for you that greedy instructions on the reference page were
>>>>> enough to setup your cluster. However, read them again and see how many
>>>>> assumptions are made into them about what you are supposed to already know
>>>>> and should come without saying more about it.
>>>>>
>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>> regarding the instructions. You are supposed to already have a near working
>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>> already setup and working properly. Try to find the instructions to setup
>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>> instructions about previous version (some properties were renamed).
>>>>>
>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>
>>>>> To go back to my very problem at this point:
>>>>>
>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>> excluded in this operation.
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>
>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>> getting this message. Searching on the message is of no help so far.
>>>>>
>>>>> And I skimmed through the cluster instructions and found nothing there
>>>>> that could help in any way neither.
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>> useful responses.
>>>>>>
>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>
>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>
>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>
>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>
>>>>>> - André
>>>>>>
>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>> daniel.savard@gmail.com> wrote:
>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>> the
>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>> written to
>>>>>> > avoid someone can do the job himself and must contract someone else
>>>>>> to do it
>>>>>> > or buy a packaged version.
>>>>>> >
>>>>>> > It is about three days I am struggling with this stuff with partial
>>>>>> success.
>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>> there apply
>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>> >
>>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>>> doing a
>>>>>> > single node installation and the instruction page doesn't explain
>>>>>> anything
>>>>>> > beside telling you to do this and that without documenting what
>>>>>> each thing
>>>>>> > is doing and what choices are available and what guidelines you
>>>>>> should
>>>>>> > follow. There is even environment variables you are told to set,
>>>>>> but nothing
>>>>>> > is said about what they mean and to which value they should be set.
>>>>>> It seems
>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>> >
>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>> hopeless
>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>> >
>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>> sure will
>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>> release will
>>>>>> > become available.
>>>>>> >
>>>>>> > TIA
>>>>>> > -----------------
>>>>>> > Daniel Savard
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> André Kelpe
>>>>>> andre@concurrentinc.com
>>>>>> http://concurrentinc.com
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

that's not the issue, I did substitute the name in the first report. The
actual hostname is feynman.cids.ca.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
> you have feynman.cids.ca. What is the content of your /etc/hosts file,
> and output of $hostname command?
>
>
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> I did that more than once, I just retry it from the beginning. I zapped
>> the directories and recreated them with hdfs namenode -format and restarted
>> HDFS and I am still getting the very same error.
>>
>> I have posted previously the report. Is there anything in this report
>> that indicates I am not having enough free space somewhere? That's the only
>> thing I can see may cause this problem after everything I read on the
>> subject. I am new to Hadoop and I just want to setup a standalone node for
>> starting to experiment a while with it before going ahead with a complete
>> cluster.
>>
>> I repost the report for convenience:
>>
>> Configured Capacity: 2939899904 (2.74 GB)
>> Present Capacity: 534421504 (509.66 MB)
>> DFS Remaining: 534417408 (509.66 MB)
>>
>> DFS Used: 4096 (4 KB)
>> DFS Used%: 0.00%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> -------------------------------------------------
>> Datanodes available: 1 (1 total, 0 dead)
>>
>> Live datanodes:
>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>> Hostname: feynman.cids.ca
>> Decommission Status : Normal
>> Configured Capacity: 2939899904 (2.74 GB)
>>
>> DFS Used: 4096 (4 KB)
>> Non DFS Used: 2405478400 (2.24 GB)
>> DFS Remaining: 534417408 (509.66 MB)
>> DFS Used%: 0.00%
>> DFS Remaining%: 18.18%
>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> It looks that you can only communicate with NameNode to do
>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>
>>> Did you format the NameNode correctly?
>>> A quite similar issue is described here:
>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>> reply says: "The most common is that you have reformatted the namenode
>>> leaving it in an inconsistent state. The most common solution is to stop
>>> dfs, remove the contents of the dfs directories on all the machines, run
>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>> consistently fixes the problem for me. This may be serious overkill but it
>>> works."
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Thanks Arun,
>>>>
>>>> I already read and did everything recommended at the referred URL.
>>>> There isn't any error message in the logfiles. The only error message
>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>> wrong with the configuration so far.
>>>>
>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>> to make sure it responds properly to requests? When trying to put my file,
>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>> Is this the expected behavior or should I see at least some kind of request
>>>> message in the datanode logfile?
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>> could help us improve docs too.
>>>>>
>>>>>  Now, for the problem at hand:
>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>>> indicate trouble?
>>>>>
>>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>>
>>>>> thanks,
>>>>> Arun
>>>>>
>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> André,
>>>>>
>>>>> good for you that greedy instructions on the reference page were
>>>>> enough to setup your cluster. However, read them again and see how many
>>>>> assumptions are made into them about what you are supposed to already know
>>>>> and should come without saying more about it.
>>>>>
>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>> regarding the instructions. You are supposed to already have a near working
>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>> already setup and working properly. Try to find the instructions to setup
>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>> instructions about previous version (some properties were renamed).
>>>>>
>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>
>>>>> To go back to my very problem at this point:
>>>>>
>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>> excluded in this operation.
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>
>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>> getting this message. Searching on the message is of no help so far.
>>>>>
>>>>> And I skimmed through the cluster instructions and found nothing there
>>>>> that could help in any way neither.
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>> useful responses.
>>>>>>
>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>
>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>
>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>
>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>
>>>>>> - André
>>>>>>
>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>> daniel.savard@gmail.com> wrote:
>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>> the
>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>> written to
>>>>>> > avoid someone can do the job himself and must contract someone else
>>>>>> to do it
>>>>>> > or buy a packaged version.
>>>>>> >
>>>>>> > It is about three days I am struggling with this stuff with partial
>>>>>> success.
>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>> there apply
>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>> >
>>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>>> doing a
>>>>>> > single node installation and the instruction page doesn't explain
>>>>>> anything
>>>>>> > beside telling you to do this and that without documenting what
>>>>>> each thing
>>>>>> > is doing and what choices are available and what guidelines you
>>>>>> should
>>>>>> > follow. There is even environment variables you are told to set,
>>>>>> but nothing
>>>>>> > is said about what they mean and to which value they should be set.
>>>>>> It seems
>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>> >
>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>> hopeless
>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>> >
>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>> sure will
>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>> release will
>>>>>> > become available.
>>>>>> >
>>>>>> > TIA
>>>>>> > -----------------
>>>>>> > Daniel Savard
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> André Kelpe
>>>>>> andre@concurrentinc.com
>>>>>> http://concurrentinc.com
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Adam,

that's not the issue, I did substitute the name in the first report. The
actual hostname is feynman.cids.ca.

-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
> you have feynman.cids.ca. What is the content of your /etc/hosts file,
> and output of $hostname command?
>
>
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> I did that more than once, I just retry it from the beginning. I zapped
>> the directories and recreated them with hdfs namenode -format and restarted
>> HDFS and I am still getting the very same error.
>>
>> I have posted previously the report. Is there anything in this report
>> that indicates I am not having enough free space somewhere? That's the only
>> thing I can see may cause this problem after everything I read on the
>> subject. I am new to Hadoop and I just want to setup a standalone node for
>> starting to experiment a while with it before going ahead with a complete
>> cluster.
>>
>> I repost the report for convenience:
>>
>> Configured Capacity: 2939899904 (2.74 GB)
>> Present Capacity: 534421504 (509.66 MB)
>> DFS Remaining: 534417408 (509.66 MB)
>>
>> DFS Used: 4096 (4 KB)
>> DFS Used%: 0.00%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> -------------------------------------------------
>> Datanodes available: 1 (1 total, 0 dead)
>>
>> Live datanodes:
>> Name: 127.0.0.1:50010 (feynman.cids.ca)
>> Hostname: feynman.cids.ca
>> Decommission Status : Normal
>> Configured Capacity: 2939899904 (2.74 GB)
>>
>> DFS Used: 4096 (4 KB)
>> Non DFS Used: 2405478400 (2.24 GB)
>> DFS Remaining: 534417408 (509.66 MB)
>> DFS Used%: 0.00%
>> DFS Remaining%: 18.18%
>> Last contact: Tue Dec 03 13:37:02 EST 2013
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/3 Adam Kawa <ka...@gmail.com>
>>
>>> Daniel,
>>>
>>> It looks that you can only communicate with NameNode to do
>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>>
>>> Did you format the NameNode correctly?
>>> A quite similar issue is described here:
>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>>> reply says: "The most common is that you have reformatted the namenode
>>> leaving it in an inconsistent state. The most common solution is to stop
>>> dfs, remove the contents of the dfs directories on all the machines, run
>>> “hadoop namenode -format” on the controller, then restart dfs. That
>>> consistently fixes the problem for me. This may be serious overkill but it
>>> works."
>>>
>>>
>>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>>
>>>> Thanks Arun,
>>>>
>>>> I already read and did everything recommended at the referred URL.
>>>> There isn't any error message in the logfiles. The only error message
>>>> appears when I try to put a non-zero file on the HDFS as posted above.
>>>> Beside that, absolutely nothing in the logs is telling me something is
>>>> wrong with the configuration so far.
>>>>
>>>> Is there some sort of diagnostic tool that can query/ping each server
>>>> to make sure it responds properly to requests? When trying to put my file,
>>>> in the datanode log I see nothing, the message appears in the namenode log.
>>>> Is this the expected behavior or should I see at least some kind of request
>>>> message in the datanode logfile?
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>> Daniel,
>>>>>
>>>>>  Apologies if you had a bad experience. If you can point them out to
>>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>>> could help us improve docs too.
>>>>>
>>>>>  Now, for the problem at hand:
>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>>> indicate trouble?
>>>>>
>>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>>
>>>>> thanks,
>>>>> Arun
>>>>>
>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> André,
>>>>>
>>>>> good for you that greedy instructions on the reference page were
>>>>> enough to setup your cluster. However, read them again and see how many
>>>>> assumptions are made into them about what you are supposed to already know
>>>>> and should come without saying more about it.
>>>>>
>>>>> I did try the single node setup, it is worst than the cluster setup
>>>>> regarding the instructions. You are supposed to already have a near working
>>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>>> already setup and working properly. Try to find the instructions to setup
>>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>>> instructions about previous version (some properties were renamed).
>>>>>
>>>>> It may appear hard at people to say this is toxic, but it is. The
>>>>> first place a newcomer will go is setup a single node. This will be his
>>>>> starting point and he will be left with a bunch of a priori and no clue.
>>>>>
>>>>> To go back to my very problem at this point:
>>>>>
>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>>> excluded in this operation.
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>>
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>>     at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>>     at
>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>>
>>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>>> getting this message. Searching on the message is of no help so far.
>>>>>
>>>>> And I skimmed through the cluster instructions and found nothing there
>>>>> that could help in any way neither.
>>>>>
>>>>>
>>>>> -----------------
>>>>> Daniel Savard
>>>>>
>>>>>
>>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>>> let your frustrations out. Then write the email. Using words like
>>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>>> useful responses.
>>>>>>
>>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>>> constructive. You haven't  mentioned which documentation you are
>>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>>
>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>>
>>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>>
>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>>
>>>>>> - André
>>>>>>
>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <
>>>>>> daniel.savard@gmail.com> wrote:
>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>>> the
>>>>>> > instructions really crappy and incomplete. It is like they were
>>>>>> written to
>>>>>> > avoid someone can do the job himself and must contract someone else
>>>>>> to do it
>>>>>> > or buy a packaged version.
>>>>>> >
>>>>>> > It is about three days I am struggling with this stuff with partial
>>>>>> success.
>>>>>> > The documentation is less than clear and most of the stuff out
>>>>>> there apply
>>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>>> >
>>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>>> doing a
>>>>>> > single node installation and the instruction page doesn't explain
>>>>>> anything
>>>>>> > beside telling you to do this and that without documenting what
>>>>>> each thing
>>>>>> > is doing and what choices are available and what guidelines you
>>>>>> should
>>>>>> > follow. There is even environment variables you are told to set,
>>>>>> but nothing
>>>>>> > is said about what they mean and to which value they should be set.
>>>>>> It seems
>>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>>> >
>>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>>> hopeless
>>>>>> > and this whole thing is just a piece of toxicware?
>>>>>> >
>>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>>> sure will
>>>>>> > be a nightmare to manage and install each time a new version,
>>>>>> release will
>>>>>> > become available.
>>>>>> >
>>>>>> > TIA
>>>>>> > -----------------
>>>>>> > Daniel Savard
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> André Kelpe
>>>>>> andre@concurrentinc.com
>>>>>> http://concurrentinc.com
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
you have feynman.cids.ca. What is the content of your /etc/hosts file, and
output of $hostname command?




2013/12/3 Daniel Savard <da...@gmail.com>

> I did that more than once, I just retry it from the beginning. I zapped
> the directories and recreated them with hdfs namenode -format and restarted
> HDFS and I am still getting the very same error.
>
> I have posted previously the report. Is there anything in this report that
> indicates I am not having enough free space somewhere? That's the only
> thing I can see may cause this problem after everything I read on the
> subject. I am new to Hadoop and I just want to setup a standalone node for
> starting to experiment a while with it before going ahead with a complete
> cluster.
>
> I repost the report for convenience:
>
> Configured Capacity: 2939899904 (2.74 GB)
> Present Capacity: 534421504 (509.66 MB)
> DFS Remaining: 534417408 (509.66 MB)
>
> DFS Used: 4096 (4 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Live datanodes:
> Name: 127.0.0.1:50010 (feynman.cids.ca)
> Hostname: feynman.cids.ca
> Decommission Status : Normal
> Configured Capacity: 2939899904 (2.74 GB)
>
> DFS Used: 4096 (4 KB)
> Non DFS Used: 2405478400 (2.24 GB)
> DFS Remaining: 534417408 (509.66 MB)
> DFS Used%: 0.00%
> DFS Remaining%: 18.18%
> Last contact: Tue Dec 03 13:37:02 EST 2013
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> It looks that you can only communicate with NameNode to do
>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>
>> Did you format the NameNode correctly?
>> A quite similar issue is described here:
>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>> reply says: "The most common is that you have reformatted the namenode
>> leaving it in an inconsistent state. The most common solution is to stop
>> dfs, remove the contents of the dfs directories on all the machines, run
>> “hadoop namenode -format” on the controller, then restart dfs. That
>> consistently fixes the problem for me. This may be serious overkill but it
>> works."
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Thanks Arun,
>>>
>>> I already read and did everything recommended at the referred URL. There
>>> isn't any error message in the logfiles. The only error message appears
>>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>>> absolutely nothing in the logs is telling me something is wrong with the
>>> configuration so far.
>>>
>>> Is there some sort of diagnostic tool that can query/ping each server to
>>> make sure it responds properly to requests? When trying to put my file, in
>>> the datanode log I see nothing, the message appears in the namenode log. Is
>>> this the expected behavior or should I see at least some kind of request
>>> message in the datanode logfile?
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>> Daniel,
>>>>
>>>>  Apologies if you had a bad experience. If you can point them out to
>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>> could help us improve docs too.
>>>>
>>>>  Now, for the problem at hand:
>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>> indicate trouble?
>>>>
>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>>
>>>> André,
>>>>
>>>> good for you that greedy instructions on the reference page were enough
>>>> to setup your cluster. However, read them again and see how many
>>>> assumptions are made into them about what you are supposed to already know
>>>> and should come without saying more about it.
>>>>
>>>> I did try the single node setup, it is worst than the cluster setup
>>>> regarding the instructions. You are supposed to already have a near working
>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>> already setup and working properly. Try to find the instructions to setup
>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>> instructions about previous version (some properties were renamed).
>>>>
>>>> It may appear hard at people to say this is toxic, but it is. The first
>>>> place a newcomer will go is setup a single node. This will be his starting
>>>> point and he will be left with a bunch of a priori and no clue.
>>>>
>>>> To go back to my very problem at this point:
>>>>
>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>> excluded in this operation.
>>>>     at
>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>
>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>> getting this message. Searching on the message is of no help so far.
>>>>
>>>> And I skimmed through the cluster instructions and found nothing there
>>>> that could help in any way neither.
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>> let your frustrations out. Then write the email. Using words like
>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>> useful responses.
>>>>>
>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>> constructive. You haven't  mentioned which documentation you are
>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>
>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>
>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>
>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>
>>>>> - André
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>> the
>>>>> > instructions really crappy and incomplete. It is like they were
>>>>> written to
>>>>> > avoid someone can do the job himself and must contract someone else
>>>>> to do it
>>>>> > or buy a packaged version.
>>>>> >
>>>>> > It is about three days I am struggling with this stuff with partial
>>>>> success.
>>>>> > The documentation is less than clear and most of the stuff out there
>>>>> apply
>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>> >
>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>> doing a
>>>>> > single node installation and the instruction page doesn't explain
>>>>> anything
>>>>> > beside telling you to do this and that without documenting what each
>>>>> thing
>>>>> > is doing and what choices are available and what guidelines you
>>>>> should
>>>>> > follow. There is even environment variables you are told to set, but
>>>>> nothing
>>>>> > is said about what they mean and to which value they should be set.
>>>>> It seems
>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>> >
>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>> hopeless
>>>>> > and this whole thing is just a piece of toxicware?
>>>>> >
>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>> sure will
>>>>> > be a nightmare to manage and install each time a new version,
>>>>> release will
>>>>> > become available.
>>>>> >
>>>>> > TIA
>>>>> > -----------------
>>>>> > Daniel Savard
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> André Kelpe
>>>>> andre@concurrentinc.com
>>>>> http://concurrentinc.com
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
you have feynman.cids.ca. What is the content of your /etc/hosts file, and
output of $hostname command?




2013/12/3 Daniel Savard <da...@gmail.com>

> I did that more than once, I just retry it from the beginning. I zapped
> the directories and recreated them with hdfs namenode -format and restarted
> HDFS and I am still getting the very same error.
>
> I have posted previously the report. Is there anything in this report that
> indicates I am not having enough free space somewhere? That's the only
> thing I can see may cause this problem after everything I read on the
> subject. I am new to Hadoop and I just want to setup a standalone node for
> starting to experiment a while with it before going ahead with a complete
> cluster.
>
> I repost the report for convenience:
>
> Configured Capacity: 2939899904 (2.74 GB)
> Present Capacity: 534421504 (509.66 MB)
> DFS Remaining: 534417408 (509.66 MB)
>
> DFS Used: 4096 (4 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Live datanodes:
> Name: 127.0.0.1:50010 (feynman.cids.ca)
> Hostname: feynman.cids.ca
> Decommission Status : Normal
> Configured Capacity: 2939899904 (2.74 GB)
>
> DFS Used: 4096 (4 KB)
> Non DFS Used: 2405478400 (2.24 GB)
> DFS Remaining: 534417408 (509.66 MB)
> DFS Used%: 0.00%
> DFS Remaining%: 18.18%
> Last contact: Tue Dec 03 13:37:02 EST 2013
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> It looks that you can only communicate with NameNode to do
>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>
>> Did you format the NameNode correctly?
>> A quite similar issue is described here:
>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>> reply says: "The most common is that you have reformatted the namenode
>> leaving it in an inconsistent state. The most common solution is to stop
>> dfs, remove the contents of the dfs directories on all the machines, run
>> “hadoop namenode -format” on the controller, then restart dfs. That
>> consistently fixes the problem for me. This may be serious overkill but it
>> works."
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Thanks Arun,
>>>
>>> I already read and did everything recommended at the referred URL. There
>>> isn't any error message in the logfiles. The only error message appears
>>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>>> absolutely nothing in the logs is telling me something is wrong with the
>>> configuration so far.
>>>
>>> Is there some sort of diagnostic tool that can query/ping each server to
>>> make sure it responds properly to requests? When trying to put my file, in
>>> the datanode log I see nothing, the message appears in the namenode log. Is
>>> this the expected behavior or should I see at least some kind of request
>>> message in the datanode logfile?
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>> Daniel,
>>>>
>>>>  Apologies if you had a bad experience. If you can point them out to
>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>> could help us improve docs too.
>>>>
>>>>  Now, for the problem at hand:
>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>> indicate trouble?
>>>>
>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>>
>>>> André,
>>>>
>>>> good for you that greedy instructions on the reference page were enough
>>>> to setup your cluster. However, read them again and see how many
>>>> assumptions are made into them about what you are supposed to already know
>>>> and should come without saying more about it.
>>>>
>>>> I did try the single node setup, it is worst than the cluster setup
>>>> regarding the instructions. You are supposed to already have a near working
>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>> already setup and working properly. Try to find the instructions to setup
>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>> instructions about previous version (some properties were renamed).
>>>>
>>>> It may appear hard at people to say this is toxic, but it is. The first
>>>> place a newcomer will go is setup a single node. This will be his starting
>>>> point and he will be left with a bunch of a priori and no clue.
>>>>
>>>> To go back to my very problem at this point:
>>>>
>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>> excluded in this operation.
>>>>     at
>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>
>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>> getting this message. Searching on the message is of no help so far.
>>>>
>>>> And I skimmed through the cluster instructions and found nothing there
>>>> that could help in any way neither.
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>> let your frustrations out. Then write the email. Using words like
>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>> useful responses.
>>>>>
>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>> constructive. You haven't  mentioned which documentation you are
>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>
>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>
>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>
>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>
>>>>> - André
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>> the
>>>>> > instructions really crappy and incomplete. It is like they were
>>>>> written to
>>>>> > avoid someone can do the job himself and must contract someone else
>>>>> to do it
>>>>> > or buy a packaged version.
>>>>> >
>>>>> > It is about three days I am struggling with this stuff with partial
>>>>> success.
>>>>> > The documentation is less than clear and most of the stuff out there
>>>>> apply
>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>> >
>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>> doing a
>>>>> > single node installation and the instruction page doesn't explain
>>>>> anything
>>>>> > beside telling you to do this and that without documenting what each
>>>>> thing
>>>>> > is doing and what choices are available and what guidelines you
>>>>> should
>>>>> > follow. There is even environment variables you are told to set, but
>>>>> nothing
>>>>> > is said about what they mean and to which value they should be set.
>>>>> It seems
>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>> >
>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>> hopeless
>>>>> > and this whole thing is just a piece of toxicware?
>>>>> >
>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>> sure will
>>>>> > be a nightmare to manage and install each time a new version,
>>>>> release will
>>>>> > become available.
>>>>> >
>>>>> > TIA
>>>>> > -----------------
>>>>> > Daniel Savard
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> André Kelpe
>>>>> andre@concurrentinc.com
>>>>> http://concurrentinc.com
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
you have feynman.cids.ca. What is the content of your /etc/hosts file, and
output of $hostname command?




2013/12/3 Daniel Savard <da...@gmail.com>

> I did that more than once, I just retry it from the beginning. I zapped
> the directories and recreated them with hdfs namenode -format and restarted
> HDFS and I am still getting the very same error.
>
> I have posted previously the report. Is there anything in this report that
> indicates I am not having enough free space somewhere? That's the only
> thing I can see may cause this problem after everything I read on the
> subject. I am new to Hadoop and I just want to setup a standalone node for
> starting to experiment a while with it before going ahead with a complete
> cluster.
>
> I repost the report for convenience:
>
> Configured Capacity: 2939899904 (2.74 GB)
> Present Capacity: 534421504 (509.66 MB)
> DFS Remaining: 534417408 (509.66 MB)
>
> DFS Used: 4096 (4 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Live datanodes:
> Name: 127.0.0.1:50010 (feynman.cids.ca)
> Hostname: feynman.cids.ca
> Decommission Status : Normal
> Configured Capacity: 2939899904 (2.74 GB)
>
> DFS Used: 4096 (4 KB)
> Non DFS Used: 2405478400 (2.24 GB)
> DFS Remaining: 534417408 (509.66 MB)
> DFS Used%: 0.00%
> DFS Remaining%: 18.18%
> Last contact: Tue Dec 03 13:37:02 EST 2013
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> It looks that you can only communicate with NameNode to do
>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>
>> Did you format the NameNode correctly?
>> A quite similar issue is described here:
>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>> reply says: "The most common is that you have reformatted the namenode
>> leaving it in an inconsistent state. The most common solution is to stop
>> dfs, remove the contents of the dfs directories on all the machines, run
>> “hadoop namenode -format” on the controller, then restart dfs. That
>> consistently fixes the problem for me. This may be serious overkill but it
>> works."
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Thanks Arun,
>>>
>>> I already read and did everything recommended at the referred URL. There
>>> isn't any error message in the logfiles. The only error message appears
>>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>>> absolutely nothing in the logs is telling me something is wrong with the
>>> configuration so far.
>>>
>>> Is there some sort of diagnostic tool that can query/ping each server to
>>> make sure it responds properly to requests? When trying to put my file, in
>>> the datanode log I see nothing, the message appears in the namenode log. Is
>>> this the expected behavior or should I see at least some kind of request
>>> message in the datanode logfile?
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>> Daniel,
>>>>
>>>>  Apologies if you had a bad experience. If you can point them out to
>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>> could help us improve docs too.
>>>>
>>>>  Now, for the problem at hand:
>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>> indicate trouble?
>>>>
>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>>
>>>> André,
>>>>
>>>> good for you that greedy instructions on the reference page were enough
>>>> to setup your cluster. However, read them again and see how many
>>>> assumptions are made into them about what you are supposed to already know
>>>> and should come without saying more about it.
>>>>
>>>> I did try the single node setup, it is worst than the cluster setup
>>>> regarding the instructions. You are supposed to already have a near working
>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>> already setup and working properly. Try to find the instructions to setup
>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>> instructions about previous version (some properties were renamed).
>>>>
>>>> It may appear hard at people to say this is toxic, but it is. The first
>>>> place a newcomer will go is setup a single node. This will be his starting
>>>> point and he will be left with a bunch of a priori and no clue.
>>>>
>>>> To go back to my very problem at this point:
>>>>
>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>> excluded in this operation.
>>>>     at
>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>
>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>> getting this message. Searching on the message is of no help so far.
>>>>
>>>> And I skimmed through the cluster instructions and found nothing there
>>>> that could help in any way neither.
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>> let your frustrations out. Then write the email. Using words like
>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>> useful responses.
>>>>>
>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>> constructive. You haven't  mentioned which documentation you are
>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>
>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>
>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>
>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>
>>>>> - André
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>> the
>>>>> > instructions really crappy and incomplete. It is like they were
>>>>> written to
>>>>> > avoid someone can do the job himself and must contract someone else
>>>>> to do it
>>>>> > or buy a packaged version.
>>>>> >
>>>>> > It is about three days I am struggling with this stuff with partial
>>>>> success.
>>>>> > The documentation is less than clear and most of the stuff out there
>>>>> apply
>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>> >
>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>> doing a
>>>>> > single node installation and the instruction page doesn't explain
>>>>> anything
>>>>> > beside telling you to do this and that without documenting what each
>>>>> thing
>>>>> > is doing and what choices are available and what guidelines you
>>>>> should
>>>>> > follow. There is even environment variables you are told to set, but
>>>>> nothing
>>>>> > is said about what they mean and to which value they should be set.
>>>>> It seems
>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>> >
>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>> hopeless
>>>>> > and this whole thing is just a piece of toxicware?
>>>>> >
>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>> sure will
>>>>> > be a nightmare to manage and install each time a new version,
>>>>> release will
>>>>> > become available.
>>>>> >
>>>>> > TIA
>>>>> > -----------------
>>>>> > Daniel Savard
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> André Kelpe
>>>>> andre@concurrentinc.com
>>>>> http://concurrentinc.com
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

I see that in previous hdfs report, you had: hosta.subdom1.tld1, but now
you have feynman.cids.ca. What is the content of your /etc/hosts file, and
output of $hostname command?




2013/12/3 Daniel Savard <da...@gmail.com>

> I did that more than once, I just retry it from the beginning. I zapped
> the directories and recreated them with hdfs namenode -format and restarted
> HDFS and I am still getting the very same error.
>
> I have posted previously the report. Is there anything in this report that
> indicates I am not having enough free space somewhere? That's the only
> thing I can see may cause this problem after everything I read on the
> subject. I am new to Hadoop and I just want to setup a standalone node for
> starting to experiment a while with it before going ahead with a complete
> cluster.
>
> I repost the report for convenience:
>
> Configured Capacity: 2939899904 (2.74 GB)
> Present Capacity: 534421504 (509.66 MB)
> DFS Remaining: 534417408 (509.66 MB)
>
> DFS Used: 4096 (4 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Live datanodes:
> Name: 127.0.0.1:50010 (feynman.cids.ca)
> Hostname: feynman.cids.ca
> Decommission Status : Normal
> Configured Capacity: 2939899904 (2.74 GB)
>
> DFS Used: 4096 (4 KB)
> Non DFS Used: 2405478400 (2.24 GB)
> DFS Remaining: 534417408 (509.66 MB)
> DFS Used%: 0.00%
> DFS Remaining%: 18.18%
> Last contact: Tue Dec 03 13:37:02 EST 2013
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/3 Adam Kawa <ka...@gmail.com>
>
>> Daniel,
>>
>> It looks that you can only communicate with NameNode to do
>> "metadata-only" operations (e.g. listing, creating a dir, empty file)...
>>
>> Did you format the NameNode correctly?
>> A quite similar issue is described here:
>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
>> reply says: "The most common is that you have reformatted the namenode
>> leaving it in an inconsistent state. The most common solution is to stop
>> dfs, remove the contents of the dfs directories on all the machines, run
>> “hadoop namenode -format” on the controller, then restart dfs. That
>> consistently fixes the problem for me. This may be serious overkill but it
>> works."
>>
>>
>> 2013/12/3 Daniel Savard <da...@gmail.com>
>>
>>> Thanks Arun,
>>>
>>> I already read and did everything recommended at the referred URL. There
>>> isn't any error message in the logfiles. The only error message appears
>>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>>> absolutely nothing in the logs is telling me something is wrong with the
>>> configuration so far.
>>>
>>> Is there some sort of diagnostic tool that can query/ping each server to
>>> make sure it responds properly to requests? When trying to put my file, in
>>> the datanode log I see nothing, the message appears in the namenode log. Is
>>> this the expected behavior or should I see at least some kind of request
>>> message in the datanode logfile?
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>> Daniel,
>>>>
>>>>  Apologies if you had a bad experience. If you can point them out to
>>>> us, we'd be more than happy to fix it - alternately, we'd *love* it if you
>>>> could help us improve docs too.
>>>>
>>>>  Now, for the problem at hand:
>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>>> indicate trouble?
>>>>
>>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>>
>>>> André,
>>>>
>>>> good for you that greedy instructions on the reference page were enough
>>>> to setup your cluster. However, read them again and see how many
>>>> assumptions are made into them about what you are supposed to already know
>>>> and should come without saying more about it.
>>>>
>>>> I did try the single node setup, it is worst than the cluster setup
>>>> regarding the instructions. You are supposed to already have a near working
>>>> system as far as I understand the instructions. It is assumed the HDFS is
>>>> already setup and working properly. Try to find the instructions to setup
>>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>>> instructions about previous version (some properties were renamed).
>>>>
>>>> It may appear hard at people to say this is toxic, but it is. The first
>>>> place a newcomer will go is setup a single node. This will be his starting
>>>> point and he will be left with a bunch of a priori and no clue.
>>>>
>>>> To go back to my very problem at this point:
>>>>
>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>>> excluded in this operation.
>>>>     at
>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>>     at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>>
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>>     at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>>     at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>>     at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>>     at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>>
>>>> I can copy an empty file, but as soon as its content is non-zero I am
>>>> getting this message. Searching on the message is of no help so far.
>>>>
>>>> And I skimmed through the cluster instructions and found nothing there
>>>> that could help in any way neither.
>>>>
>>>>
>>>> -----------------
>>>> Daniel Savard
>>>>
>>>>
>>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> first of all, before posting to a mailing list, take a deep breath and
>>>>> let your frustrations out. Then write the email. Using words like
>>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>>> useful responses.
>>>>>
>>>>> While I agree that the docs can be confusing, we should try to stay
>>>>> constructive. You haven't  mentioned which documentation you are
>>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>>
>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>>
>>>>> If you are looking for an easy way to spin up a small cluster with
>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>>
>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>>
>>>>> - André
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>>> wrote:
>>>>> > I am trying to configure hadoop 2.2.0 from source code and I found
>>>>> the
>>>>> > instructions really crappy and incomplete. It is like they were
>>>>> written to
>>>>> > avoid someone can do the job himself and must contract someone else
>>>>> to do it
>>>>> > or buy a packaged version.
>>>>> >
>>>>> > It is about three days I am struggling with this stuff with partial
>>>>> success.
>>>>> > The documentation is less than clear and most of the stuff out there
>>>>> apply
>>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>>> >
>>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>>> doing a
>>>>> > single node installation and the instruction page doesn't explain
>>>>> anything
>>>>> > beside telling you to do this and that without documenting what each
>>>>> thing
>>>>> > is doing and what choices are available and what guidelines you
>>>>> should
>>>>> > follow. There is even environment variables you are told to set, but
>>>>> nothing
>>>>> > is said about what they mean and to which value they should be set.
>>>>> It seems
>>>>> > it assumes prior knowledge of everything about hadoop.
>>>>> >
>>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>>> hopeless
>>>>> > and this whole thing is just a piece of toxicware?
>>>>> >
>>>>> > I am already looking for alternate solutions to hadoop which for
>>>>> sure will
>>>>> > be a nightmare to manage and install each time a new version,
>>>>> release will
>>>>> > become available.
>>>>> >
>>>>> > TIA
>>>>> > -----------------
>>>>> > Daniel Savard
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> André Kelpe
>>>>> andre@concurrentinc.com
>>>>> http://concurrentinc.com
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

I did that more than once, I just retry it from the beginning. I zapped the
directories and recreated them with hdfs namenode -format and restarted
HDFS and I am still getting the very same error.

I have posted previously the report. Is there anything in this report that
indicates I am not having enough free space somewhere? That's the only
thing I can see may cause this problem after everything I read on the
subject. I am new to Hadoop and I just want to setup a standalone node for
starting to experiment a while with it before going ahead with a complete
cluster.

I repost the report for convenience:

Configured Capacity: 2939899904 (2.74 GB)
Present Capacity: 534421504 (509.66 MB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 2939899904 (2.74 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2405478400 (2.24 GB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used%: 0.00%
DFS Remaining%: 18.18%
Last contact: Tue Dec 03 13:37:02 EST 2013


-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> It looks that you can only communicate with NameNode to do "metadata-only"
> operations (e.g. listing, creating a dir, empty file)...
>
> Did you format the NameNode correctly?
> A quite similar issue is described here:
> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
> reply says: "The most common is that you have reformatted the namenode
> leaving it in an inconsistent state. The most common solution is to stop
> dfs, remove the contents of the dfs directories on all the machines, run
> “hadoop namenode -format” on the controller, then restart dfs. That
> consistently fixes the problem for me. This may be serious overkill but it
> works."
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Thanks Arun,
>>
>> I already read and did everything recommended at the referred URL. There
>> isn't any error message in the logfiles. The only error message appears
>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>> absolutely nothing in the logs is telling me something is wrong with the
>> configuration so far.
>>
>> Is there some sort of diagnostic tool that can query/ping each server to
>> make sure it responds properly to requests? When trying to put my file, in
>> the datanode log I see nothing, the message appears in the namenode log. Is
>> this the expected behavior or should I see at least some kind of request
>> message in the datanode logfile?
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>
>>> Daniel,
>>>
>>>  Apologies if you had a bad experience. If you can point them out to us,
>>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>>> could help us improve docs too.
>>>
>>>  Now, for the problem at hand:
>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>> indicate trouble?
>>>
>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>>
>>> André,
>>>
>>> good for you that greedy instructions on the reference page were enough
>>> to setup your cluster. However, read them again and see how many
>>> assumptions are made into them about what you are supposed to already know
>>> and should come without saying more about it.
>>>
>>> I did try the single node setup, it is worst than the cluster setup
>>> regarding the instructions. You are supposed to already have a near working
>>> system as far as I understand the instructions. It is assumed the HDFS is
>>> already setup and working properly. Try to find the instructions to setup
>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>> instructions about previous version (some properties were renamed).
>>>
>>> It may appear hard at people to say this is toxic, but it is. The first
>>> place a newcomer will go is setup a single node. This will be his starting
>>> point and he will be left with a bunch of a priori and no clue.
>>>
>>> To go back to my very problem at this point:
>>>
>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>> excluded in this operation.
>>>     at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>     at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>
>>> I can copy an empty file, but as soon as its content is non-zero I am
>>> getting this message. Searching on the message is of no help so far.
>>>
>>> And I skimmed through the cluster instructions and found nothing there
>>> that could help in any way neither.
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>
>>>> Hi Daniel,
>>>>
>>>> first of all, before posting to a mailing list, take a deep breath and
>>>> let your frustrations out. Then write the email. Using words like
>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>> useful responses.
>>>>
>>>> While I agree that the docs can be confusing, we should try to stay
>>>> constructive. You haven't  mentioned which documentation you are
>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>
>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>
>>>> If you are looking for an easy way to spin up a small cluster with
>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>
>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>
>>>> - André
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>>> > instructions really crappy and incomplete. It is like they were
>>>> written to
>>>> > avoid someone can do the job himself and must contract someone else
>>>> to do it
>>>> > or buy a packaged version.
>>>> >
>>>> > It is about three days I am struggling with this stuff with partial
>>>> success.
>>>> > The documentation is less than clear and most of the stuff out there
>>>> apply
>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>> >
>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>> doing a
>>>> > single node installation and the instruction page doesn't explain
>>>> anything
>>>> > beside telling you to do this and that without documenting what each
>>>> thing
>>>> > is doing and what choices are available and what guidelines you should
>>>> > follow. There is even environment variables you are told to set, but
>>>> nothing
>>>> > is said about what they mean and to which value they should be set.
>>>> It seems
>>>> > it assumes prior knowledge of everything about hadoop.
>>>> >
>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>> hopeless
>>>> > and this whole thing is just a piece of toxicware?
>>>> >
>>>> > I am already looking for alternate solutions to hadoop which for sure
>>>> will
>>>> > be a nightmare to manage and install each time a new version, release
>>>> will
>>>> > become available.
>>>> >
>>>> > TIA
>>>> > -----------------
>>>> > Daniel Savard
>>>>
>>>>
>>>>
>>>> --
>>>> André Kelpe
>>>> andre@concurrentinc.com
>>>> http://concurrentinc.com
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

I did that more than once, I just retry it from the beginning. I zapped the
directories and recreated them with hdfs namenode -format and restarted
HDFS and I am still getting the very same error.

I have posted previously the report. Is there anything in this report that
indicates I am not having enough free space somewhere? That's the only
thing I can see may cause this problem after everything I read on the
subject. I am new to Hadoop and I just want to setup a standalone node for
starting to experiment a while with it before going ahead with a complete
cluster.

I repost the report for convenience:

Configured Capacity: 2939899904 (2.74 GB)
Present Capacity: 534421504 (509.66 MB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 2939899904 (2.74 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2405478400 (2.24 GB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used%: 0.00%
DFS Remaining%: 18.18%
Last contact: Tue Dec 03 13:37:02 EST 2013


-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> It looks that you can only communicate with NameNode to do "metadata-only"
> operations (e.g. listing, creating a dir, empty file)...
>
> Did you format the NameNode correctly?
> A quite similar issue is described here:
> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
> reply says: "The most common is that you have reformatted the namenode
> leaving it in an inconsistent state. The most common solution is to stop
> dfs, remove the contents of the dfs directories on all the machines, run
> “hadoop namenode -format” on the controller, then restart dfs. That
> consistently fixes the problem for me. This may be serious overkill but it
> works."
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Thanks Arun,
>>
>> I already read and did everything recommended at the referred URL. There
>> isn't any error message in the logfiles. The only error message appears
>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>> absolutely nothing in the logs is telling me something is wrong with the
>> configuration so far.
>>
>> Is there some sort of diagnostic tool that can query/ping each server to
>> make sure it responds properly to requests? When trying to put my file, in
>> the datanode log I see nothing, the message appears in the namenode log. Is
>> this the expected behavior or should I see at least some kind of request
>> message in the datanode logfile?
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>
>>> Daniel,
>>>
>>>  Apologies if you had a bad experience. If you can point them out to us,
>>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>>> could help us improve docs too.
>>>
>>>  Now, for the problem at hand:
>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>> indicate trouble?
>>>
>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>>
>>> André,
>>>
>>> good for you that greedy instructions on the reference page were enough
>>> to setup your cluster. However, read them again and see how many
>>> assumptions are made into them about what you are supposed to already know
>>> and should come without saying more about it.
>>>
>>> I did try the single node setup, it is worst than the cluster setup
>>> regarding the instructions. You are supposed to already have a near working
>>> system as far as I understand the instructions. It is assumed the HDFS is
>>> already setup and working properly. Try to find the instructions to setup
>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>> instructions about previous version (some properties were renamed).
>>>
>>> It may appear hard at people to say this is toxic, but it is. The first
>>> place a newcomer will go is setup a single node. This will be his starting
>>> point and he will be left with a bunch of a priori and no clue.
>>>
>>> To go back to my very problem at this point:
>>>
>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>> excluded in this operation.
>>>     at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>     at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>
>>> I can copy an empty file, but as soon as its content is non-zero I am
>>> getting this message. Searching on the message is of no help so far.
>>>
>>> And I skimmed through the cluster instructions and found nothing there
>>> that could help in any way neither.
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>
>>>> Hi Daniel,
>>>>
>>>> first of all, before posting to a mailing list, take a deep breath and
>>>> let your frustrations out. Then write the email. Using words like
>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>> useful responses.
>>>>
>>>> While I agree that the docs can be confusing, we should try to stay
>>>> constructive. You haven't  mentioned which documentation you are
>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>
>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>
>>>> If you are looking for an easy way to spin up a small cluster with
>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>
>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>
>>>> - André
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>>> > instructions really crappy and incomplete. It is like they were
>>>> written to
>>>> > avoid someone can do the job himself and must contract someone else
>>>> to do it
>>>> > or buy a packaged version.
>>>> >
>>>> > It is about three days I am struggling with this stuff with partial
>>>> success.
>>>> > The documentation is less than clear and most of the stuff out there
>>>> apply
>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>> >
>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>> doing a
>>>> > single node installation and the instruction page doesn't explain
>>>> anything
>>>> > beside telling you to do this and that without documenting what each
>>>> thing
>>>> > is doing and what choices are available and what guidelines you should
>>>> > follow. There is even environment variables you are told to set, but
>>>> nothing
>>>> > is said about what they mean and to which value they should be set.
>>>> It seems
>>>> > it assumes prior knowledge of everything about hadoop.
>>>> >
>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>> hopeless
>>>> > and this whole thing is just a piece of toxicware?
>>>> >
>>>> > I am already looking for alternate solutions to hadoop which for sure
>>>> will
>>>> > be a nightmare to manage and install each time a new version, release
>>>> will
>>>> > become available.
>>>> >
>>>> > TIA
>>>> > -----------------
>>>> > Daniel Savard
>>>>
>>>>
>>>>
>>>> --
>>>> André Kelpe
>>>> andre@concurrentinc.com
>>>> http://concurrentinc.com
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

I did that more than once, I just retry it from the beginning. I zapped the
directories and recreated them with hdfs namenode -format and restarted
HDFS and I am still getting the very same error.

I have posted previously the report. Is there anything in this report that
indicates I am not having enough free space somewhere? That's the only
thing I can see may cause this problem after everything I read on the
subject. I am new to Hadoop and I just want to setup a standalone node for
starting to experiment a while with it before going ahead with a complete
cluster.

I repost the report for convenience:

Configured Capacity: 2939899904 (2.74 GB)
Present Capacity: 534421504 (509.66 MB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 2939899904 (2.74 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2405478400 (2.24 GB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used%: 0.00%
DFS Remaining%: 18.18%
Last contact: Tue Dec 03 13:37:02 EST 2013


-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> It looks that you can only communicate with NameNode to do "metadata-only"
> operations (e.g. listing, creating a dir, empty file)...
>
> Did you format the NameNode correctly?
> A quite similar issue is described here:
> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
> reply says: "The most common is that you have reformatted the namenode
> leaving it in an inconsistent state. The most common solution is to stop
> dfs, remove the contents of the dfs directories on all the machines, run
> “hadoop namenode -format” on the controller, then restart dfs. That
> consistently fixes the problem for me. This may be serious overkill but it
> works."
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Thanks Arun,
>>
>> I already read and did everything recommended at the referred URL. There
>> isn't any error message in the logfiles. The only error message appears
>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>> absolutely nothing in the logs is telling me something is wrong with the
>> configuration so far.
>>
>> Is there some sort of diagnostic tool that can query/ping each server to
>> make sure it responds properly to requests? When trying to put my file, in
>> the datanode log I see nothing, the message appears in the namenode log. Is
>> this the expected behavior or should I see at least some kind of request
>> message in the datanode logfile?
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>
>>> Daniel,
>>>
>>>  Apologies if you had a bad experience. If you can point them out to us,
>>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>>> could help us improve docs too.
>>>
>>>  Now, for the problem at hand:
>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>> indicate trouble?
>>>
>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>>
>>> André,
>>>
>>> good for you that greedy instructions on the reference page were enough
>>> to setup your cluster. However, read them again and see how many
>>> assumptions are made into them about what you are supposed to already know
>>> and should come without saying more about it.
>>>
>>> I did try the single node setup, it is worst than the cluster setup
>>> regarding the instructions. You are supposed to already have a near working
>>> system as far as I understand the instructions. It is assumed the HDFS is
>>> already setup and working properly. Try to find the instructions to setup
>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>> instructions about previous version (some properties were renamed).
>>>
>>> It may appear hard at people to say this is toxic, but it is. The first
>>> place a newcomer will go is setup a single node. This will be his starting
>>> point and he will be left with a bunch of a priori and no clue.
>>>
>>> To go back to my very problem at this point:
>>>
>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>> excluded in this operation.
>>>     at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>     at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>
>>> I can copy an empty file, but as soon as its content is non-zero I am
>>> getting this message. Searching on the message is of no help so far.
>>>
>>> And I skimmed through the cluster instructions and found nothing there
>>> that could help in any way neither.
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>
>>>> Hi Daniel,
>>>>
>>>> first of all, before posting to a mailing list, take a deep breath and
>>>> let your frustrations out. Then write the email. Using words like
>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>> useful responses.
>>>>
>>>> While I agree that the docs can be confusing, we should try to stay
>>>> constructive. You haven't  mentioned which documentation you are
>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>
>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>
>>>> If you are looking for an easy way to spin up a small cluster with
>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>
>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>
>>>> - André
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>>> > instructions really crappy and incomplete. It is like they were
>>>> written to
>>>> > avoid someone can do the job himself and must contract someone else
>>>> to do it
>>>> > or buy a packaged version.
>>>> >
>>>> > It is about three days I am struggling with this stuff with partial
>>>> success.
>>>> > The documentation is less than clear and most of the stuff out there
>>>> apply
>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>> >
>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>> doing a
>>>> > single node installation and the instruction page doesn't explain
>>>> anything
>>>> > beside telling you to do this and that without documenting what each
>>>> thing
>>>> > is doing and what choices are available and what guidelines you should
>>>> > follow. There is even environment variables you are told to set, but
>>>> nothing
>>>> > is said about what they mean and to which value they should be set.
>>>> It seems
>>>> > it assumes prior knowledge of everything about hadoop.
>>>> >
>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>> hopeless
>>>> > and this whole thing is just a piece of toxicware?
>>>> >
>>>> > I am already looking for alternate solutions to hadoop which for sure
>>>> will
>>>> > be a nightmare to manage and install each time a new version, release
>>>> will
>>>> > become available.
>>>> >
>>>> > TIA
>>>> > -----------------
>>>> > Daniel Savard
>>>>
>>>>
>>>>
>>>> --
>>>> André Kelpe
>>>> andre@concurrentinc.com
>>>> http://concurrentinc.com
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

I did that more than once, I just retry it from the beginning. I zapped the
directories and recreated them with hdfs namenode -format and restarted
HDFS and I am still getting the very same error.

I have posted previously the report. Is there anything in this report that
indicates I am not having enough free space somewhere? That's the only
thing I can see may cause this problem after everything I read on the
subject. I am new to Hadoop and I just want to setup a standalone node for
starting to experiment a while with it before going ahead with a complete
cluster.

I repost the report for convenience:

Configured Capacity: 2939899904 (2.74 GB)
Present Capacity: 534421504 (509.66 MB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (feynman.cids.ca)
Hostname: feynman.cids.ca
Decommission Status : Normal
Configured Capacity: 2939899904 (2.74 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2405478400 (2.24 GB)
DFS Remaining: 534417408 (509.66 MB)
DFS Used%: 0.00%
DFS Remaining%: 18.18%
Last contact: Tue Dec 03 13:37:02 EST 2013


-----------------
Daniel Savard


2013/12/3 Adam Kawa <ka...@gmail.com>

> Daniel,
>
> It looks that you can only communicate with NameNode to do "metadata-only"
> operations (e.g. listing, creating a dir, empty file)...
>
> Did you format the NameNode correctly?
> A quite similar issue is described here:
> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last
> reply says: "The most common is that you have reformatted the namenode
> leaving it in an inconsistent state. The most common solution is to stop
> dfs, remove the contents of the dfs directories on all the machines, run
> “hadoop namenode -format” on the controller, then restart dfs. That
> consistently fixes the problem for me. This may be serious overkill but it
> works."
>
>
> 2013/12/3 Daniel Savard <da...@gmail.com>
>
>> Thanks Arun,
>>
>> I already read and did everything recommended at the referred URL. There
>> isn't any error message in the logfiles. The only error message appears
>> when I try to put a non-zero file on the HDFS as posted above. Beside that,
>> absolutely nothing in the logs is telling me something is wrong with the
>> configuration so far.
>>
>> Is there some sort of diagnostic tool that can query/ping each server to
>> make sure it responds properly to requests? When trying to put my file, in
>> the datanode log I see nothing, the message appears in the namenode log. Is
>> this the expected behavior or should I see at least some kind of request
>> message in the datanode logfile?
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>>
>>> Daniel,
>>>
>>>  Apologies if you had a bad experience. If you can point them out to us,
>>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>>> could help us improve docs too.
>>>
>>>  Now, for the problem at hand:
>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>>> indicate trouble?
>>>
>>>  Also, pls feel free to open liras with issues you find and we'll help.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>>
>>> André,
>>>
>>> good for you that greedy instructions on the reference page were enough
>>> to setup your cluster. However, read them again and see how many
>>> assumptions are made into them about what you are supposed to already know
>>> and should come without saying more about it.
>>>
>>> I did try the single node setup, it is worst than the cluster setup
>>> regarding the instructions. You are supposed to already have a near working
>>> system as far as I understand the instructions. It is assumed the HDFS is
>>> already setup and working properly. Try to find the instructions to setup
>>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>>> instructions about previous version (some properties were renamed).
>>>
>>> It may appear hard at people to say this is toxic, but it is. The first
>>> place a newcomer will go is setup a single node. This will be his starting
>>> point and he will be left with a bunch of a priori and no clue.
>>>
>>> To go back to my very problem at this point:
>>>
>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /test._COPYING_ could only be replicated to 0 nodes instead of
>>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>>> excluded in this operation.
>>>     at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>>     at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>>
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>>     at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>>     at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>>
>>> I can copy an empty file, but as soon as its content is non-zero I am
>>> getting this message. Searching on the message is of no help so far.
>>>
>>> And I skimmed through the cluster instructions and found nothing there
>>> that could help in any way neither.
>>>
>>>
>>> -----------------
>>> Daniel Savard
>>>
>>>
>>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>>
>>>> Hi Daniel,
>>>>
>>>> first of all, before posting to a mailing list, take a deep breath and
>>>> let your frustrations out. Then write the email. Using words like
>>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>>> useful responses.
>>>>
>>>> While I agree that the docs can be confusing, we should try to stay
>>>> constructive. You haven't  mentioned which documentation you are
>>>> using. I found the cluster tutorial sufficient to get me started:
>>>>
>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>>
>>>> If you are looking for an easy way to spin up a small cluster with
>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>>
>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>>
>>>> - André
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>>> wrote:
>>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>>> > instructions really crappy and incomplete. It is like they were
>>>> written to
>>>> > avoid someone can do the job himself and must contract someone else
>>>> to do it
>>>> > or buy a packaged version.
>>>> >
>>>> > It is about three days I am struggling with this stuff with partial
>>>> success.
>>>> > The documentation is less than clear and most of the stuff out there
>>>> apply
>>>> > to earlier version and they haven't been updated for version 2.2.0.
>>>> >
>>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>>> doing a
>>>> > single node installation and the instruction page doesn't explain
>>>> anything
>>>> > beside telling you to do this and that without documenting what each
>>>> thing
>>>> > is doing and what choices are available and what guidelines you should
>>>> > follow. There is even environment variables you are told to set, but
>>>> nothing
>>>> > is said about what they mean and to which value they should be set.
>>>> It seems
>>>> > it assumes prior knowledge of everything about hadoop.
>>>> >
>>>> > Anyone knows a site with proper documentation about hadoop or it's
>>>> hopeless
>>>> > and this whole thing is just a piece of toxicware?
>>>> >
>>>> > I am already looking for alternate solutions to hadoop which for sure
>>>> will
>>>> > be a nightmare to manage and install each time a new version, release
>>>> will
>>>> > become available.
>>>> >
>>>> > TIA
>>>> > -----------------
>>>> > Daniel Savard
>>>>
>>>>
>>>>
>>>> --
>>>> André Kelpe
>>>> andre@concurrentinc.com
>>>> http://concurrentinc.com
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

It looks that you can only communicate with NameNode to do "metadata-only"
operations (e.g. listing, creating a dir, empty file)...

Did you format the NameNode correctly?
A quite similar issue is described here:
http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last reply
says: "The most common is that you have reformatted the namenode leaving it
in an inconsistent state. The most common solution is to stop dfs, remove
the contents of the dfs directories on all the machines, run “hadoop
namenode -format” on the controller, then restart dfs. That consistently
fixes the problem for me. This may be serious overkill but it works."


2013/12/3 Daniel Savard <da...@gmail.com>

> Thanks Arun,
>
> I already read and did everything recommended at the referred URL. There
> isn't any error message in the logfiles. The only error message appears
> when I try to put a non-zero file on the HDFS as posted above. Beside that,
> absolutely nothing in the logs is telling me something is wrong with the
> configuration so far.
>
> Is there some sort of diagnostic tool that can query/ping each server to
> make sure it responds properly to requests? When trying to put my file, in
> the datanode log I see nothing, the message appears in the namenode log. Is
> this the expected behavior or should I see at least some kind of request
> message in the datanode logfile?
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>
>> Daniel,
>>
>>  Apologies if you had a bad experience. If you can point them out to us,
>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>> could help us improve docs too.
>>
>>  Now, for the problem at hand:
>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>> indicate trouble?
>>
>>  Also, pls feel free to open liras with issues you find and we'll help.
>>
>> thanks,
>> Arun
>>
>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>>
>> André,
>>
>> good for you that greedy instructions on the reference page were enough
>> to setup your cluster. However, read them again and see how many
>> assumptions are made into them about what you are supposed to already know
>> and should come without saying more about it.
>>
>> I did try the single node setup, it is worst than the cluster setup
>> regarding the instructions. You are supposed to already have a near working
>> system as far as I understand the instructions. It is assumed the HDFS is
>> already setup and working properly. Try to find the instructions to setup
>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>> instructions about previous version (some properties were renamed).
>>
>> It may appear hard at people to say this is toxic, but it is. The first
>> place a newcomer will go is setup a single node. This will be his starting
>> point and he will be left with a bunch of a priori and no clue.
>>
>> To go back to my very problem at this point:
>>
>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /test._COPYING_ could only be replicated to 0 nodes instead of
>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>> excluded in this operation.
>>     at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>
>> I can copy an empty file, but as soon as its content is non-zero I am
>> getting this message. Searching on the message is of no help so far.
>>
>> And I skimmed through the cluster instructions and found nothing there
>> that could help in any way neither.
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>
>>> Hi Daniel,
>>>
>>> first of all, before posting to a mailing list, take a deep breath and
>>> let your frustrations out. Then write the email. Using words like
>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>> useful responses.
>>>
>>> While I agree that the docs can be confusing, we should try to stay
>>> constructive. You haven't  mentioned which documentation you are
>>> using. I found the cluster tutorial sufficient to get me started:
>>>
>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>
>>> If you are looking for an easy way to spin up a small cluster with
>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>
>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>
>>> - André
>>>
>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>> > instructions really crappy and incomplete. It is like they were
>>> written to
>>> > avoid someone can do the job himself and must contract someone else to
>>> do it
>>> > or buy a packaged version.
>>> >
>>> > It is about three days I am struggling with this stuff with partial
>>> success.
>>> > The documentation is less than clear and most of the stuff out there
>>> apply
>>> > to earlier version and they haven't been updated for version 2.2.0.
>>> >
>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>> doing a
>>> > single node installation and the instruction page doesn't explain
>>> anything
>>> > beside telling you to do this and that without documenting what each
>>> thing
>>> > is doing and what choices are available and what guidelines you should
>>> > follow. There is even environment variables you are told to set, but
>>> nothing
>>> > is said about what they mean and to which value they should be set. It
>>> seems
>>> > it assumes prior knowledge of everything about hadoop.
>>> >
>>> > Anyone knows a site with proper documentation about hadoop or it's
>>> hopeless
>>> > and this whole thing is just a piece of toxicware?
>>> >
>>> > I am already looking for alternate solutions to hadoop which for sure
>>> will
>>> > be a nightmare to manage and install each time a new version, release
>>> will
>>> > become available.
>>> >
>>> > TIA
>>> > -----------------
>>> > Daniel Savard
>>>
>>>
>>>
>>> --
>>> André Kelpe
>>> andre@concurrentinc.com
>>> http://concurrentinc.com
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

It looks that you can only communicate with NameNode to do "metadata-only"
operations (e.g. listing, creating a dir, empty file)...

Did you format the NameNode correctly?
A quite similar issue is described here:
http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last reply
says: "The most common is that you have reformatted the namenode leaving it
in an inconsistent state. The most common solution is to stop dfs, remove
the contents of the dfs directories on all the machines, run “hadoop
namenode -format” on the controller, then restart dfs. That consistently
fixes the problem for me. This may be serious overkill but it works."


2013/12/3 Daniel Savard <da...@gmail.com>

> Thanks Arun,
>
> I already read and did everything recommended at the referred URL. There
> isn't any error message in the logfiles. The only error message appears
> when I try to put a non-zero file on the HDFS as posted above. Beside that,
> absolutely nothing in the logs is telling me something is wrong with the
> configuration so far.
>
> Is there some sort of diagnostic tool that can query/ping each server to
> make sure it responds properly to requests? When trying to put my file, in
> the datanode log I see nothing, the message appears in the namenode log. Is
> this the expected behavior or should I see at least some kind of request
> message in the datanode logfile?
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>
>> Daniel,
>>
>>  Apologies if you had a bad experience. If you can point them out to us,
>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>> could help us improve docs too.
>>
>>  Now, for the problem at hand:
>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>> indicate trouble?
>>
>>  Also, pls feel free to open liras with issues you find and we'll help.
>>
>> thanks,
>> Arun
>>
>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>>
>> André,
>>
>> good for you that greedy instructions on the reference page were enough
>> to setup your cluster. However, read them again and see how many
>> assumptions are made into them about what you are supposed to already know
>> and should come without saying more about it.
>>
>> I did try the single node setup, it is worst than the cluster setup
>> regarding the instructions. You are supposed to already have a near working
>> system as far as I understand the instructions. It is assumed the HDFS is
>> already setup and working properly. Try to find the instructions to setup
>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>> instructions about previous version (some properties were renamed).
>>
>> It may appear hard at people to say this is toxic, but it is. The first
>> place a newcomer will go is setup a single node. This will be his starting
>> point and he will be left with a bunch of a priori and no clue.
>>
>> To go back to my very problem at this point:
>>
>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /test._COPYING_ could only be replicated to 0 nodes instead of
>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>> excluded in this operation.
>>     at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>
>> I can copy an empty file, but as soon as its content is non-zero I am
>> getting this message. Searching on the message is of no help so far.
>>
>> And I skimmed through the cluster instructions and found nothing there
>> that could help in any way neither.
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>
>>> Hi Daniel,
>>>
>>> first of all, before posting to a mailing list, take a deep breath and
>>> let your frustrations out. Then write the email. Using words like
>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>> useful responses.
>>>
>>> While I agree that the docs can be confusing, we should try to stay
>>> constructive. You haven't  mentioned which documentation you are
>>> using. I found the cluster tutorial sufficient to get me started:
>>>
>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>
>>> If you are looking for an easy way to spin up a small cluster with
>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>
>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>
>>> - André
>>>
>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>> > instructions really crappy and incomplete. It is like they were
>>> written to
>>> > avoid someone can do the job himself and must contract someone else to
>>> do it
>>> > or buy a packaged version.
>>> >
>>> > It is about three days I am struggling with this stuff with partial
>>> success.
>>> > The documentation is less than clear and most of the stuff out there
>>> apply
>>> > to earlier version and they haven't been updated for version 2.2.0.
>>> >
>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>> doing a
>>> > single node installation and the instruction page doesn't explain
>>> anything
>>> > beside telling you to do this and that without documenting what each
>>> thing
>>> > is doing and what choices are available and what guidelines you should
>>> > follow. There is even environment variables you are told to set, but
>>> nothing
>>> > is said about what they mean and to which value they should be set. It
>>> seems
>>> > it assumes prior knowledge of everything about hadoop.
>>> >
>>> > Anyone knows a site with proper documentation about hadoop or it's
>>> hopeless
>>> > and this whole thing is just a piece of toxicware?
>>> >
>>> > I am already looking for alternate solutions to hadoop which for sure
>>> will
>>> > be a nightmare to manage and install each time a new version, release
>>> will
>>> > become available.
>>> >
>>> > TIA
>>> > -----------------
>>> > Daniel Savard
>>>
>>>
>>>
>>> --
>>> André Kelpe
>>> andre@concurrentinc.com
>>> http://concurrentinc.com
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

It looks that you can only communicate with NameNode to do "metadata-only"
operations (e.g. listing, creating a dir, empty file)...

Did you format the NameNode correctly?
A quite similar issue is described here:
http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last reply
says: "The most common is that you have reformatted the namenode leaving it
in an inconsistent state. The most common solution is to stop dfs, remove
the contents of the dfs directories on all the machines, run “hadoop
namenode -format” on the controller, then restart dfs. That consistently
fixes the problem for me. This may be serious overkill but it works."


2013/12/3 Daniel Savard <da...@gmail.com>

> Thanks Arun,
>
> I already read and did everything recommended at the referred URL. There
> isn't any error message in the logfiles. The only error message appears
> when I try to put a non-zero file on the HDFS as posted above. Beside that,
> absolutely nothing in the logs is telling me something is wrong with the
> configuration so far.
>
> Is there some sort of diagnostic tool that can query/ping each server to
> make sure it responds properly to requests? When trying to put my file, in
> the datanode log I see nothing, the message appears in the namenode log. Is
> this the expected behavior or should I see at least some kind of request
> message in the datanode logfile?
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>
>> Daniel,
>>
>>  Apologies if you had a bad experience. If you can point them out to us,
>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>> could help us improve docs too.
>>
>>  Now, for the problem at hand:
>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>> indicate trouble?
>>
>>  Also, pls feel free to open liras with issues you find and we'll help.
>>
>> thanks,
>> Arun
>>
>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>>
>> André,
>>
>> good for you that greedy instructions on the reference page were enough
>> to setup your cluster. However, read them again and see how many
>> assumptions are made into them about what you are supposed to already know
>> and should come without saying more about it.
>>
>> I did try the single node setup, it is worst than the cluster setup
>> regarding the instructions. You are supposed to already have a near working
>> system as far as I understand the instructions. It is assumed the HDFS is
>> already setup and working properly. Try to find the instructions to setup
>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>> instructions about previous version (some properties were renamed).
>>
>> It may appear hard at people to say this is toxic, but it is. The first
>> place a newcomer will go is setup a single node. This will be his starting
>> point and he will be left with a bunch of a priori and no clue.
>>
>> To go back to my very problem at this point:
>>
>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /test._COPYING_ could only be replicated to 0 nodes instead of
>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>> excluded in this operation.
>>     at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>
>> I can copy an empty file, but as soon as its content is non-zero I am
>> getting this message. Searching on the message is of no help so far.
>>
>> And I skimmed through the cluster instructions and found nothing there
>> that could help in any way neither.
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>
>>> Hi Daniel,
>>>
>>> first of all, before posting to a mailing list, take a deep breath and
>>> let your frustrations out. Then write the email. Using words like
>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>> useful responses.
>>>
>>> While I agree that the docs can be confusing, we should try to stay
>>> constructive. You haven't  mentioned which documentation you are
>>> using. I found the cluster tutorial sufficient to get me started:
>>>
>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>
>>> If you are looking for an easy way to spin up a small cluster with
>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>
>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>
>>> - André
>>>
>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>> > instructions really crappy and incomplete. It is like they were
>>> written to
>>> > avoid someone can do the job himself and must contract someone else to
>>> do it
>>> > or buy a packaged version.
>>> >
>>> > It is about three days I am struggling with this stuff with partial
>>> success.
>>> > The documentation is less than clear and most of the stuff out there
>>> apply
>>> > to earlier version and they haven't been updated for version 2.2.0.
>>> >
>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>> doing a
>>> > single node installation and the instruction page doesn't explain
>>> anything
>>> > beside telling you to do this and that without documenting what each
>>> thing
>>> > is doing and what choices are available and what guidelines you should
>>> > follow. There is even environment variables you are told to set, but
>>> nothing
>>> > is said about what they mean and to which value they should be set. It
>>> seems
>>> > it assumes prior knowledge of everything about hadoop.
>>> >
>>> > Anyone knows a site with proper documentation about hadoop or it's
>>> hopeless
>>> > and this whole thing is just a piece of toxicware?
>>> >
>>> > I am already looking for alternate solutions to hadoop which for sure
>>> will
>>> > be a nightmare to manage and install each time a new version, release
>>> will
>>> > become available.
>>> >
>>> > TIA
>>> > -----------------
>>> > Daniel Savard
>>>
>>>
>>>
>>> --
>>> André Kelpe
>>> andre@concurrentinc.com
>>> http://concurrentinc.com
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Adam Kawa <ka...@gmail.com>.

Daniel,

It looks that you can only communicate with NameNode to do "metadata-only"
operations (e.g. listing, creating a dir, empty file)...

Did you format the NameNode correctly?
A quite similar issue is described here:
http://www.manning-sandbox.com/thread.jspa?messageID=126741. The last reply
says: "The most common is that you have reformatted the namenode leaving it
in an inconsistent state. The most common solution is to stop dfs, remove
the contents of the dfs directories on all the machines, run “hadoop
namenode -format” on the controller, then restart dfs. That consistently
fixes the problem for me. This may be serious overkill but it works."


2013/12/3 Daniel Savard <da...@gmail.com>

> Thanks Arun,
>
> I already read and did everything recommended at the referred URL. There
> isn't any error message in the logfiles. The only error message appears
> when I try to put a non-zero file on the HDFS as posted above. Beside that,
> absolutely nothing in the logs is telling me something is wrong with the
> configuration so far.
>
> Is there some sort of diagnostic tool that can query/ping each server to
> make sure it responds properly to requests? When trying to put my file, in
> the datanode log I see nothing, the message appears in the namenode log. Is
> this the expected behavior or should I see at least some kind of request
> message in the datanode logfile?
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Arun C Murthy <ac...@hortonworks.com>
>
>> Daniel,
>>
>>  Apologies if you had a bad experience. If you can point them out to us,
>> we'd be more than happy to fix it - alternately, we'd *love* it if you
>> could help us improve docs too.
>>
>>  Now, for the problem at hand:
>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
>> look. Basically NN cannot find any datanodes. Anything in your NN logs to
>> indicate trouble?
>>
>>  Also, pls feel free to open liras with issues you find and we'll help.
>>
>> thanks,
>> Arun
>>
>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>>
>> André,
>>
>> good for you that greedy instructions on the reference page were enough
>> to setup your cluster. However, read them again and see how many
>> assumptions are made into them about what you are supposed to already know
>> and should come without saying more about it.
>>
>> I did try the single node setup, it is worst than the cluster setup
>> regarding the instructions. You are supposed to already have a near working
>> system as far as I understand the instructions. It is assumed the HDFS is
>> already setup and working properly. Try to find the instructions to setup
>> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
>> instructions about previous version (some properties were renamed).
>>
>> It may appear hard at people to say this is toxic, but it is. The first
>> place a newcomer will go is setup a single node. This will be his starting
>> point and he will be left with a bunch of a priori and no clue.
>>
>> To go back to my very problem at this point:
>>
>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /test._COPYING_ could only be replicated to 0 nodes instead of
>> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
>> excluded in this operation.
>>     at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>>
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>>     at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>     at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>>     at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>>     at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>
>> I can copy an empty file, but as soon as its content is non-zero I am
>> getting this message. Searching on the message is of no help so far.
>>
>> And I skimmed through the cluster instructions and found nothing there
>> that could help in any way neither.
>>
>>
>> -----------------
>> Daniel Savard
>>
>>
>> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>>
>>> Hi Daniel,
>>>
>>> first of all, before posting to a mailing list, take a deep breath and
>>> let your frustrations out. Then write the email. Using words like
>>> "crappy", "toxicware", "nightmare" are not going to help you getting
>>> useful responses.
>>>
>>> While I agree that the docs can be confusing, we should try to stay
>>> constructive. You haven't  mentioned which documentation you are
>>> using. I found the cluster tutorial sufficient to get me started:
>>>
>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>>
>>> If you are looking for an easy way to spin up a small cluster with
>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>>
>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>>
>>> - André
>>>
>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>>> wrote:
>>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>>> > instructions really crappy and incomplete. It is like they were
>>> written to
>>> > avoid someone can do the job himself and must contract someone else to
>>> do it
>>> > or buy a packaged version.
>>> >
>>> > It is about three days I am struggling with this stuff with partial
>>> success.
>>> > The documentation is less than clear and most of the stuff out there
>>> apply
>>> > to earlier version and they haven't been updated for version 2.2.0.
>>> >
>>> > I was able to setup HDFS, however I am still unable to use it. I am
>>> doing a
>>> > single node installation and the instruction page doesn't explain
>>> anything
>>> > beside telling you to do this and that without documenting what each
>>> thing
>>> > is doing and what choices are available and what guidelines you should
>>> > follow. There is even environment variables you are told to set, but
>>> nothing
>>> > is said about what they mean and to which value they should be set. It
>>> seems
>>> > it assumes prior knowledge of everything about hadoop.
>>> >
>>> > Anyone knows a site with proper documentation about hadoop or it's
>>> hopeless
>>> > and this whole thing is just a piece of toxicware?
>>> >
>>> > I am already looking for alternate solutions to hadoop which for sure
>>> will
>>> > be a nightmare to manage and install each time a new version, release
>>> will
>>> > become available.
>>> >
>>> > TIA
>>> > -----------------
>>> > Daniel Savard
>>>
>>>
>>>
>>> --
>>> André Kelpe
>>> andre@concurrentinc.com
>>> http://concurrentinc.com
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Thanks Arun,

I already read and did everything recommended at the referred URL. There
isn't any error message in the logfiles. The only error message appears
when I try to put a non-zero file on the HDFS as posted above. Beside that,
absolutely nothing in the logs is telling me something is wrong with the
configuration so far.

Is there some sort of diagnostic tool that can query/ping each server to
make sure it responds properly to requests? When trying to put my file, in
the datanode log I see nothing, the message appears in the namenode log. Is
this the expected behavior or should I see at least some kind of request
message in the datanode logfile?


-----------------
Daniel Savard


2013/12/2 Arun C Murthy <ac...@hortonworks.com>

> Daniel,
>
>  Apologies if you had a bad experience. If you can point them out to us,
> we'd be more than happy to fix it - alternately, we'd *love* it if you
> could help us improve docs too.
>
>  Now, for the problem at hand:
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
> look. Basically NN cannot find any datanodes. Anything in your NN logs to
> indicate trouble?
>
>  Also, pls feel free to open liras with issues you find and we'll help.
>
> thanks,
> Arun
>
> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:
>
> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Thanks Arun,

I already read and did everything recommended at the referred URL. There
isn't any error message in the logfiles. The only error message appears
when I try to put a non-zero file on the HDFS as posted above. Beside that,
absolutely nothing in the logs is telling me something is wrong with the
configuration so far.

Is there some sort of diagnostic tool that can query/ping each server to
make sure it responds properly to requests? When trying to put my file, in
the datanode log I see nothing, the message appears in the namenode log. Is
this the expected behavior or should I see at least some kind of request
message in the datanode logfile?


-----------------
Daniel Savard


2013/12/2 Arun C Murthy <ac...@hortonworks.com>

> Daniel,
>
>  Apologies if you had a bad experience. If you can point them out to us,
> we'd be more than happy to fix it - alternately, we'd *love* it if you
> could help us improve docs too.
>
>  Now, for the problem at hand:
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
> look. Basically NN cannot find any datanodes. Anything in your NN logs to
> indicate trouble?
>
>  Also, pls feel free to open liras with issues you find and we'll help.
>
> thanks,
> Arun
>
> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:
>
> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Thanks Arun,

I already read and did everything recommended at the referred URL. There
isn't any error message in the logfiles. The only error message appears
when I try to put a non-zero file on the HDFS as posted above. Beside that,
absolutely nothing in the logs is telling me something is wrong with the
configuration so far.

Is there some sort of diagnostic tool that can query/ping each server to
make sure it responds properly to requests? When trying to put my file, in
the datanode log I see nothing, the message appears in the namenode log. Is
this the expected behavior or should I see at least some kind of request
message in the datanode logfile?


-----------------
Daniel Savard


2013/12/2 Arun C Murthy <ac...@hortonworks.com>

> Daniel,
>
>  Apologies if you had a bad experience. If you can point them out to us,
> we'd be more than happy to fix it - alternately, we'd *love* it if you
> could help us improve docs too.
>
>  Now, for the problem at hand:
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
> look. Basically NN cannot find any datanodes. Anything in your NN logs to
> indicate trouble?
>
>  Also, pls feel free to open liras with issues you find and we'll help.
>
> thanks,
> Arun
>
> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:
>
> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Thanks Arun,

I already read and did everything recommended at the referred URL. There
isn't any error message in the logfiles. The only error message appears
when I try to put a non-zero file on the HDFS as posted above. Beside that,
absolutely nothing in the logs is telling me something is wrong with the
configuration so far.

Is there some sort of diagnostic tool that can query/ping each server to
make sure it responds properly to requests? When trying to put my file, in
the datanode log I see nothing, the message appears in the namenode log. Is
this the expected behavior or should I see at least some kind of request
message in the datanode logfile?


-----------------
Daniel Savard


2013/12/2 Arun C Murthy <ac...@hortonworks.com>

> Daniel,
>
>  Apologies if you had a bad experience. If you can point them out to us,
> we'd be more than happy to fix it - alternately, we'd *love* it if you
> could help us improve docs too.
>
>  Now, for the problem at hand:
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to
> look. Basically NN cannot find any datanodes. Anything in your NN logs to
> indicate trouble?
>
>  Also, pls feel free to open liras with issues you find and we'll help.
>
> thanks,
> Arun
>
> On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:
>
> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Arun C Murthy <ac...@hortonworks.com>.

Daniel,

 Apologies if you had a bad experience. If you can point them out to us, we'd be more than happy to fix it - alternately, we'd *love* it if you could help us improve docs too.

 Now, for the problem at hand: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to look. Basically NN cannot find any datanodes. Anything in your NN logs to indicate trouble?

 Also, pls feel free to open liras with issues you find and we'll help.

thanks,
Arun

On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:

> André,
> 
> good for you that greedy instructions on the reference page were enough to setup your cluster. However, read them again and see how many assumptions are made into them about what you are supposed to already know and should come without saying more about it.
> 
> I did try the single node setup, it is worst than the cluster setup regarding the instructions. You are supposed to already have a near working system as far as I understand the instructions. It is assumed the HDFS is already setup and working properly. Try to find the instructions to setup HDFS for version 2.2.0 and you will end up with a lot of inappropriate instructions about previous version (some properties were renamed).
> 
> It may appear hard at people to say this is toxic, but it is. The first place a newcomer will go is setup a single node. This will be his starting point and he will be left with a bunch of a priori and no clue.
> 
> To go back to my very problem at this point: 
> 
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
>     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 
> I can copy an empty file, but as soon as its content is non-zero I am getting this message. Searching on the message is of no help so far.
> 
> And I skimmed through the cluster instructions and found nothing there that could help in any way neither.
> 
> 
> -----------------
> Daniel Savard
> 
> 
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
> Hi Daniel,
> 
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
> 
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
> 
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
> 
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
> 
> - André
> 
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written to
> > avoid someone can do the job himself and must contract someone else to do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial success.
> > The documentation is less than clear and most of the stuff out there apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am doing a
> > single node installation and the instruction page doesn't explain anything
> > beside telling you to do this and that without documenting what each thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but nothing
> > is said about what they mean and to which value they should be set. It seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure will
> > be a nightmare to manage and install each time a new version, release will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
> 
> 
> 
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Daniel,

I agree with you that 2.2 documents are very unfriendly.
In many documents, the change in 2.2 from 1.x is just a format.
There are still many documents to be converted. (ex. Hadoop Streaming)
Furthermore, there are a lot of dead links in documents.

I've been trying to fix dead links, convert 1.x documents, and update 
deprecated instructions.
   https://issues.apache.org/jira/browse/HADOOP-9982
   https://issues.apache.org/jira/browse/MAPREDUCE-5636

I'll file a JIRA and try to update Single Node Setup document.

Thanks,
Akira

(2013/12/03 1:44), Daniel Savard wrote:
> André,
>
> good for you that greedy instructions on the reference page were enough
> to setup your cluster. However, read them again and see how many
> assumptions are made into them about what you are supposed to already
> know and should come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near
> working system as far as I understand the instructions. It is assumed
> the HDFS is already setup and working properly. Try to find the
> instructions to setup HDFS for version 2.2.0 and you will end up with a
> lot of inappropriate instructions about previous version (some
> properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his
> starting point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>      at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>      at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:415)
>      at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>      at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:606)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <akelpe@concurrentinc.com
> <ma...@concurrentinc.com>>
>
>     Hi Daniel,
>
>     first of all, before posting to a mailing list, take a deep breath and
>     let your frustrations out. Then write the email. Using words like
>     "crappy", "toxicware", "nightmare" are not going to help you getting
>     useful responses.
>
>     While I agree that the docs can be confusing, we should try to stay
>     constructive. You haven't  mentioned which documentation you are
>     using. I found the cluster tutorial sufficient to get me started:
>     http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
>     If you are looking for an easy way to spin up a small cluster with
>     hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
>     https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
>     - André
>
>     On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard
>     <daniel.savard@gmail.com <ma...@gmail.com>> wrote:
>      > I am trying to configure hadoop 2.2.0 from source code and I
>     found the
>      > instructions really crappy and incomplete. It is like they were
>     written to
>      > avoid someone can do the job himself and must contract someone
>     else to do it
>      > or buy a packaged version.
>      >
>      > It is about three days I am struggling with this stuff with
>     partial success.
>      > The documentation is less than clear and most of the stuff out
>     there apply
>      > to earlier version and they haven't been updated for version 2.2.0.
>      >
>      > I was able to setup HDFS, however I am still unable to use it. I
>     am doing a
>      > single node installation and the instruction page doesn't explain
>     anything
>      > beside telling you to do this and that without documenting what
>     each thing
>      > is doing and what choices are available and what guidelines you
>     should
>      > follow. There is even environment variables you are told to set,
>     but nothing
>      > is said about what they mean and to which value they should be
>     set. It seems
>      > it assumes prior knowledge of everything about hadoop.
>      >
>      > Anyone knows a site with proper documentation about hadoop or
>     it's hopeless
>      > and this whole thing is just a piece of toxicware?
>      >
>      > I am already looking for alternate solutions to hadoop which for
>     sure will
>      > be a nightmare to manage and install each time a new version,
>     release will
>      > become available.
>      >
>      > TIA
>      > -----------------
>      > Daniel Savard
>
>
>
>     --
>     André Kelpe
>     andre@concurrentinc.com <ma...@concurrentinc.com>
>     http://concurrentinc.com
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Daniel,

I agree with you that 2.2 documents are very unfriendly.
In many documents, the change in 2.2 from 1.x is just a format.
There are still many documents to be converted. (ex. Hadoop Streaming)
Furthermore, there are a lot of dead links in documents.

I've been trying to fix dead links, convert 1.x documents, and update 
deprecated instructions.
   https://issues.apache.org/jira/browse/HADOOP-9982
   https://issues.apache.org/jira/browse/MAPREDUCE-5636

I'll file a JIRA and try to update Single Node Setup document.

Thanks,
Akira

(2013/12/03 1:44), Daniel Savard wrote:
> André,
>
> good for you that greedy instructions on the reference page were enough
> to setup your cluster. However, read them again and see how many
> assumptions are made into them about what you are supposed to already
> know and should come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near
> working system as far as I understand the instructions. It is assumed
> the HDFS is already setup and working properly. Try to find the
> instructions to setup HDFS for version 2.2.0 and you will end up with a
> lot of inappropriate instructions about previous version (some
> properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his
> starting point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>      at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>      at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:415)
>      at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>      at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:606)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <akelpe@concurrentinc.com
> <ma...@concurrentinc.com>>
>
>     Hi Daniel,
>
>     first of all, before posting to a mailing list, take a deep breath and
>     let your frustrations out. Then write the email. Using words like
>     "crappy", "toxicware", "nightmare" are not going to help you getting
>     useful responses.
>
>     While I agree that the docs can be confusing, we should try to stay
>     constructive. You haven't  mentioned which documentation you are
>     using. I found the cluster tutorial sufficient to get me started:
>     http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
>     If you are looking for an easy way to spin up a small cluster with
>     hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
>     https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
>     - André
>
>     On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard
>     <daniel.savard@gmail.com <ma...@gmail.com>> wrote:
>      > I am trying to configure hadoop 2.2.0 from source code and I
>     found the
>      > instructions really crappy and incomplete. It is like they were
>     written to
>      > avoid someone can do the job himself and must contract someone
>     else to do it
>      > or buy a packaged version.
>      >
>      > It is about three days I am struggling with this stuff with
>     partial success.
>      > The documentation is less than clear and most of the stuff out
>     there apply
>      > to earlier version and they haven't been updated for version 2.2.0.
>      >
>      > I was able to setup HDFS, however I am still unable to use it. I
>     am doing a
>      > single node installation and the instruction page doesn't explain
>     anything
>      > beside telling you to do this and that without documenting what
>     each thing
>      > is doing and what choices are available and what guidelines you
>     should
>      > follow. There is even environment variables you are told to set,
>     but nothing
>      > is said about what they mean and to which value they should be
>     set. It seems
>      > it assumes prior knowledge of everything about hadoop.
>      >
>      > Anyone knows a site with proper documentation about hadoop or
>     it's hopeless
>      > and this whole thing is just a piece of toxicware?
>      >
>      > I am already looking for alternate solutions to hadoop which for
>     sure will
>      > be a nightmare to manage and install each time a new version,
>     release will
>      > become available.
>      >
>      > TIA
>      > -----------------
>      > Daniel Savard
>
>
>
>     --
>     André Kelpe
>     andre@concurrentinc.com <ma...@concurrentinc.com>
>     http://concurrentinc.com
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Here is additional information about the HDFS:

$ hdfs dfsadmin -report
Configured Capacity: 3208335360 (2.99 GB)
Present Capacity: 534454272 (509.70 MB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (hosta.subdom1.tld1)
Hostname: hosta.subdom1.tld1
Decommission Status : Normal
Configured Capacity: 3208335360 (2.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2673881088 (2.49 GB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used%: 0.00%
DFS Remaining%: 16.66%
Last contact: Mon Dec 02 12:07:28 EST 2013


I see nothing that could explain the error. I can mkdir, put empty files,
list content.

-----------------
Daniel Savard


2013/12/2 Daniel Savard <da...@gmail.com>

> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Here is additional information about the HDFS:

$ hdfs dfsadmin -report
Configured Capacity: 3208335360 (2.99 GB)
Present Capacity: 534454272 (509.70 MB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (hosta.subdom1.tld1)
Hostname: hosta.subdom1.tld1
Decommission Status : Normal
Configured Capacity: 3208335360 (2.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2673881088 (2.49 GB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used%: 0.00%
DFS Remaining%: 16.66%
Last contact: Mon Dec 02 12:07:28 EST 2013


I see nothing that could explain the error. I can mkdir, put empty files,
list content.

-----------------
Daniel Savard


2013/12/2 Daniel Savard <da...@gmail.com>

> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Arun C Murthy <ac...@hortonworks.com>.

Daniel,

 Apologies if you had a bad experience. If you can point them out to us, we'd be more than happy to fix it - alternately, we'd *love* it if you could help us improve docs too.

 Now, for the problem at hand: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to look. Basically NN cannot find any datanodes. Anything in your NN logs to indicate trouble?

 Also, pls feel free to open liras with issues you find and we'll help.

thanks,
Arun

On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:

> André,
> 
> good for you that greedy instructions on the reference page were enough to setup your cluster. However, read them again and see how many assumptions are made into them about what you are supposed to already know and should come without saying more about it.
> 
> I did try the single node setup, it is worst than the cluster setup regarding the instructions. You are supposed to already have a near working system as far as I understand the instructions. It is assumed the HDFS is already setup and working properly. Try to find the instructions to setup HDFS for version 2.2.0 and you will end up with a lot of inappropriate instructions about previous version (some properties were renamed).
> 
> It may appear hard at people to say this is toxic, but it is. The first place a newcomer will go is setup a single node. This will be his starting point and he will be left with a bunch of a priori and no clue.
> 
> To go back to my very problem at this point: 
> 
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
>     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 
> I can copy an empty file, but as soon as its content is non-zero I am getting this message. Searching on the message is of no help so far.
> 
> And I skimmed through the cluster instructions and found nothing there that could help in any way neither.
> 
> 
> -----------------
> Daniel Savard
> 
> 
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
> Hi Daniel,
> 
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
> 
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
> 
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
> 
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
> 
> - André
> 
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written to
> > avoid someone can do the job himself and must contract someone else to do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial success.
> > The documentation is less than clear and most of the stuff out there apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am doing a
> > single node installation and the instruction page doesn't explain anything
> > beside telling you to do this and that without documenting what each thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but nothing
> > is said about what they mean and to which value they should be set. It seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure will
> > be a nightmare to manage and install each time a new version, release will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
> 
> 
> 
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Here is additional information about the HDFS:

$ hdfs dfsadmin -report
Configured Capacity: 3208335360 (2.99 GB)
Present Capacity: 534454272 (509.70 MB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (hosta.subdom1.tld1)
Hostname: hosta.subdom1.tld1
Decommission Status : Normal
Configured Capacity: 3208335360 (2.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2673881088 (2.49 GB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used%: 0.00%
DFS Remaining%: 16.66%
Last contact: Mon Dec 02 12:07:28 EST 2013


I see nothing that could explain the error. I can mkdir, put empty files,
list content.

-----------------
Daniel Savard


2013/12/2 Daniel Savard <da...@gmail.com>

> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Arun C Murthy <ac...@hortonworks.com>.

Daniel,

 Apologies if you had a bad experience. If you can point them out to us, we'd be more than happy to fix it - alternately, we'd *love* it if you could help us improve docs too.

 Now, for the problem at hand: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to look. Basically NN cannot find any datanodes. Anything in your NN logs to indicate trouble?

 Also, pls feel free to open liras with issues you find and we'll help.

thanks,
Arun

On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:

> André,
> 
> good for you that greedy instructions on the reference page were enough to setup your cluster. However, read them again and see how many assumptions are made into them about what you are supposed to already know and should come without saying more about it.
> 
> I did try the single node setup, it is worst than the cluster setup regarding the instructions. You are supposed to already have a near working system as far as I understand the instructions. It is assumed the HDFS is already setup and working properly. Try to find the instructions to setup HDFS for version 2.2.0 and you will end up with a lot of inappropriate instructions about previous version (some properties were renamed).
> 
> It may appear hard at people to say this is toxic, but it is. The first place a newcomer will go is setup a single node. This will be his starting point and he will be left with a bunch of a priori and no clue.
> 
> To go back to my very problem at this point: 
> 
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
>     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 
> I can copy an empty file, but as soon as its content is non-zero I am getting this message. Searching on the message is of no help so far.
> 
> And I skimmed through the cluster instructions and found nothing there that could help in any way neither.
> 
> 
> -----------------
> Daniel Savard
> 
> 
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
> Hi Daniel,
> 
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
> 
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
> 
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
> 
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
> 
> - André
> 
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written to
> > avoid someone can do the job himself and must contract someone else to do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial success.
> > The documentation is less than clear and most of the stuff out there apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am doing a
> > single node installation and the instruction page doesn't explain anything
> > beside telling you to do this and that without documenting what each thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but nothing
> > is said about what they mean and to which value they should be set. It seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure will
> > be a nightmare to manage and install each time a new version, release will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
> 
> 
> 
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Arun C Murthy <ac...@hortonworks.com>.

Daniel,

 Apologies if you had a bad experience. If you can point them out to us, we'd be more than happy to fix it - alternately, we'd *love* it if you could help us improve docs too.

 Now, for the problem at hand: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one place to look. Basically NN cannot find any datanodes. Anything in your NN logs to indicate trouble?

 Also, pls feel free to open liras with issues you find and we'll help.

thanks,
Arun

On Dec 2, 2013, at 8:44 AM, Daniel Savard <da...@gmail.com> wrote:

> André,
> 
> good for you that greedy instructions on the reference page were enough to setup your cluster. However, read them again and see how many assumptions are made into them about what you are supposed to already know and should come without saying more about it.
> 
> I did try the single node setup, it is worst than the cluster setup regarding the instructions. You are supposed to already have a near working system as far as I understand the instructions. It is assumed the HDFS is already setup and working properly. Try to find the instructions to setup HDFS for version 2.2.0 and you will end up with a lot of inappropriate instructions about previous version (some properties were renamed).
> 
> It may appear hard at people to say this is toxic, but it is. The first place a newcomer will go is setup a single node. This will be his starting point and he will be left with a bunch of a priori and no clue.
> 
> To go back to my very problem at this point: 
> 
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
>     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 
> I can copy an empty file, but as soon as its content is non-zero I am getting this message. Searching on the message is of no help so far.
> 
> And I skimmed through the cluster instructions and found nothing there that could help in any way neither.
> 
> 
> -----------------
> Daniel Savard
> 
> 
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
> Hi Daniel,
> 
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
> 
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
> 
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
> 
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
> 
> - André
> 
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written to
> > avoid someone can do the job himself and must contract someone else to do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial success.
> > The documentation is less than clear and most of the stuff out there apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am doing a
> > single node installation and the instruction page doesn't explain anything
> > beside telling you to do this and that without documenting what each thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but nothing
> > is said about what they mean and to which value they should be set. It seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure will
> > be a nightmare to manage and install each time a new version, release will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
> 
> 
> 
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Hadoop 2.2.0 from source configuration

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Daniel,

I agree with you that 2.2 documents are very unfriendly.
In many documents, the change in 2.2 from 1.x is just a format.
There are still many documents to be converted. (ex. Hadoop Streaming)
Furthermore, there are a lot of dead links in documents.

I've been trying to fix dead links, convert 1.x documents, and update 
deprecated instructions.
   https://issues.apache.org/jira/browse/HADOOP-9982
   https://issues.apache.org/jira/browse/MAPREDUCE-5636

I'll file a JIRA and try to update Single Node Setup document.

Thanks,
Akira

(2013/12/03 1:44), Daniel Savard wrote:
> André,
>
> good for you that greedy instructions on the reference page were enough
> to setup your cluster. However, read them again and see how many
> assumptions are made into them about what you are supposed to already
> know and should come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near
> working system as far as I understand the instructions. It is assumed
> the HDFS is already setup and working properly. Try to find the
> instructions to setup HDFS for version 2.2.0 and you will end up with a
> lot of inappropriate instructions about previous version (some
> properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his
> starting point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>      at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>      at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:415)
>      at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>      at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:606)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <akelpe@concurrentinc.com
> <ma...@concurrentinc.com>>
>
>     Hi Daniel,
>
>     first of all, before posting to a mailing list, take a deep breath and
>     let your frustrations out. Then write the email. Using words like
>     "crappy", "toxicware", "nightmare" are not going to help you getting
>     useful responses.
>
>     While I agree that the docs can be confusing, we should try to stay
>     constructive. You haven't  mentioned which documentation you are
>     using. I found the cluster tutorial sufficient to get me started:
>     http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
>     If you are looking for an easy way to spin up a small cluster with
>     hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
>     https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
>     - André
>
>     On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard
>     <daniel.savard@gmail.com <ma...@gmail.com>> wrote:
>      > I am trying to configure hadoop 2.2.0 from source code and I
>     found the
>      > instructions really crappy and incomplete. It is like they were
>     written to
>      > avoid someone can do the job himself and must contract someone
>     else to do it
>      > or buy a packaged version.
>      >
>      > It is about three days I am struggling with this stuff with
>     partial success.
>      > The documentation is less than clear and most of the stuff out
>     there apply
>      > to earlier version and they haven't been updated for version 2.2.0.
>      >
>      > I was able to setup HDFS, however I am still unable to use it. I
>     am doing a
>      > single node installation and the instruction page doesn't explain
>     anything
>      > beside telling you to do this and that without documenting what
>     each thing
>      > is doing and what choices are available and what guidelines you
>     should
>      > follow. There is even environment variables you are told to set,
>     but nothing
>      > is said about what they mean and to which value they should be
>     set. It seems
>      > it assumes prior knowledge of everything about hadoop.
>      >
>      > Anyone knows a site with proper documentation about hadoop or
>     it's hopeless
>      > and this whole thing is just a piece of toxicware?
>      >
>      > I am already looking for alternate solutions to hadoop which for
>     sure will
>      > be a nightmare to manage and install each time a new version,
>     release will
>      > become available.
>      >
>      > TIA
>      > -----------------
>      > Daniel Savard
>
>
>
>     --
>     André Kelpe
>     andre@concurrentinc.com <ma...@concurrentinc.com>
>     http://concurrentinc.com
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

Here is additional information about the HDFS:

$ hdfs dfsadmin -report
Configured Capacity: 3208335360 (2.99 GB)
Present Capacity: 534454272 (509.70 MB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (hosta.subdom1.tld1)
Hostname: hosta.subdom1.tld1
Decommission Status : Normal
Configured Capacity: 3208335360 (2.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2673881088 (2.49 GB)
DFS Remaining: 534450176 (509.69 MB)
DFS Used%: 0.00%
DFS Remaining%: 16.66%
Last contact: Mon Dec 02 12:07:28 EST 2013


I see nothing that could explain the error. I can mkdir, put empty files,
list content.

-----------------
Daniel Savard


2013/12/2 Daniel Savard <da...@gmail.com>

> André,
>
> good for you that greedy instructions on the reference page were enough to
> setup your cluster. However, read them again and see how many assumptions
> are made into them about what you are supposed to already know and should
> come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near working
> system as far as I understand the instructions. It is assumed the HDFS is
> already setup and working properly. Try to find the instructions to setup
> HDFS for version 2.2.0 and you will end up with a lot of inappropriate
> instructions about previous version (some properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his starting
> point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>     at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>     at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>     at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>     at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <ak...@concurrentinc.com>
>
>> Hi Daniel,
>>
>> first of all, before posting to a mailing list, take a deep breath and
>> let your frustrations out. Then write the email. Using words like
>> "crappy", "toxicware", "nightmare" are not going to help you getting
>> useful responses.
>>
>> While I agree that the docs can be confusing, we should try to stay
>> constructive. You haven't  mentioned which documentation you are
>> using. I found the cluster tutorial sufficient to get me started:
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> If you are looking for an easy way to spin up a small cluster with
>> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>>
>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>>
>> - André
>>
>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
>> wrote:
>> > I am trying to configure hadoop 2.2.0 from source code and I found the
>> > instructions really crappy and incomplete. It is like they were written
>> to
>> > avoid someone can do the job himself and must contract someone else to
>> do it
>> > or buy a packaged version.
>> >
>> > It is about three days I am struggling with this stuff with partial
>> success.
>> > The documentation is less than clear and most of the stuff out there
>> apply
>> > to earlier version and they haven't been updated for version 2.2.0.
>> >
>> > I was able to setup HDFS, however I am still unable to use it. I am
>> doing a
>> > single node installation and the instruction page doesn't explain
>> anything
>> > beside telling you to do this and that without documenting what each
>> thing
>> > is doing and what choices are available and what guidelines you should
>> > follow. There is even environment variables you are told to set, but
>> nothing
>> > is said about what they mean and to which value they should be set. It
>> seems
>> > it assumes prior knowledge of everything about hadoop.
>> >
>> > Anyone knows a site with proper documentation about hadoop or it's
>> hopeless
>> > and this whole thing is just a piece of toxicware?
>> >
>> > I am already looking for alternate solutions to hadoop which for sure
>> will
>> > be a nightmare to manage and install each time a new version, release
>> will
>> > become available.
>> >
>> > TIA
>> > -----------------
>> > Daniel Savard
>>
>>
>>
>> --
>> André Kelpe
>> andre@concurrentinc.com
>> http://concurrentinc.com
>>
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Daniel,

I agree with you that 2.2 documents are very unfriendly.
In many documents, the change in 2.2 from 1.x is just a format.
There are still many documents to be converted. (ex. Hadoop Streaming)
Furthermore, there are a lot of dead links in documents.

I've been trying to fix dead links, convert 1.x documents, and update 
deprecated instructions.
   https://issues.apache.org/jira/browse/HADOOP-9982
   https://issues.apache.org/jira/browse/MAPREDUCE-5636

I'll file a JIRA and try to update Single Node Setup document.

Thanks,
Akira

(2013/12/03 1:44), Daniel Savard wrote:
> André,
>
> good for you that greedy instructions on the reference page were enough
> to setup your cluster. However, read them again and see how many
> assumptions are made into them about what you are supposed to already
> know and should come without saying more about it.
>
> I did try the single node setup, it is worst than the cluster setup
> regarding the instructions. You are supposed to already have a near
> working system as far as I understand the instructions. It is assumed
> the HDFS is already setup and working properly. Try to find the
> instructions to setup HDFS for version 2.2.0 and you will end up with a
> lot of inappropriate instructions about previous version (some
> properties were renamed).
>
> It may appear hard at people to say this is toxic, but it is. The first
> place a newcomer will go is setup a single node. This will be his
> starting point and he will be left with a bunch of a priori and no clue.
>
> To go back to my very problem at this point:
>
> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test._COPYING_ could only be replicated to 0 nodes instead of
> minReplication (=1).  There are 1 datanode(s) running and no node(s) are
> excluded in this operation.
>      at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>      at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:415)
>      at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>
>      at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>      at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:606)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>      at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>      at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>      at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>      at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>
> I can copy an empty file, but as soon as its content is non-zero I am
> getting this message. Searching on the message is of no help so far.
>
> And I skimmed through the cluster instructions and found nothing there
> that could help in any way neither.
>
>
> -----------------
> Daniel Savard
>
>
> 2013/12/2 Andre Kelpe <akelpe@concurrentinc.com
> <ma...@concurrentinc.com>>
>
>     Hi Daniel,
>
>     first of all, before posting to a mailing list, take a deep breath and
>     let your frustrations out. Then write the email. Using words like
>     "crappy", "toxicware", "nightmare" are not going to help you getting
>     useful responses.
>
>     While I agree that the docs can be confusing, we should try to stay
>     constructive. You haven't  mentioned which documentation you are
>     using. I found the cluster tutorial sufficient to get me started:
>     http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
>     If you are looking for an easy way to spin up a small cluster with
>     hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
>     https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
>     - André
>
>     On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard
>     <daniel.savard@gmail.com <ma...@gmail.com>> wrote:
>      > I am trying to configure hadoop 2.2.0 from source code and I
>     found the
>      > instructions really crappy and incomplete. It is like they were
>     written to
>      > avoid someone can do the job himself and must contract someone
>     else to do it
>      > or buy a packaged version.
>      >
>      > It is about three days I am struggling with this stuff with
>     partial success.
>      > The documentation is less than clear and most of the stuff out
>     there apply
>      > to earlier version and they haven't been updated for version 2.2.0.
>      >
>      > I was able to setup HDFS, however I am still unable to use it. I
>     am doing a
>      > single node installation and the instruction page doesn't explain
>     anything
>      > beside telling you to do this and that without documenting what
>     each thing
>      > is doing and what choices are available and what guidelines you
>     should
>      > follow. There is even environment variables you are told to set,
>     but nothing
>      > is said about what they mean and to which value they should be
>     set. It seems
>      > it assumes prior knowledge of everything about hadoop.
>      >
>      > Anyone knows a site with proper documentation about hadoop or
>     it's hopeless
>      > and this whole thing is just a piece of toxicware?
>      >
>      > I am already looking for alternate solutions to hadoop which for
>     sure will
>      > be a nightmare to manage and install each time a new version,
>     release will
>      > become available.
>      >
>      > TIA
>      > -----------------
>      > Daniel Savard
>
>
>
>     --
>     André Kelpe
>     andre@concurrentinc.com <ma...@concurrentinc.com>
>     http://concurrentinc.com
>
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

André,

good for you that greedy instructions on the reference page were enough to
setup your cluster. However, read them again and see how many assumptions
are made into them about what you are supposed to already know and should
come without saying more about it.

I did try the single node setup, it is worst than the cluster setup
regarding the instructions. You are supposed to already have a near working
system as far as I understand the instructions. It is assumed the HDFS is
already setup and working properly. Try to find the instructions to setup
HDFS for version 2.2.0 and you will end up with a lot of inappropriate
instructions about previous version (some properties were renamed).

It may appear hard at people to say this is toxic, but it is. The first
place a newcomer will go is setup a single node. This will be his starting
point and he will be left with a bunch of a priori and no clue.

To go back to my very problem at this point:

13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test._COPYING_ could only be replicated to 0 nodes instead of
minReplication (=1).  There are 1 datanode(s) running and no node(s) are
excluded in this operation.
    at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
    at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
    at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

    at org.apache.hadoop.ipc.Client.call(Client.java:1347)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

I can copy an empty file, but as soon as its content is non-zero I am
getting this message. Searching on the message is of no help so far.

And I skimmed through the cluster instructions and found nothing there that
could help in any way neither.


-----------------
Daniel Savard


2013/12/2 Andre Kelpe <ak...@concurrentinc.com>

> Hi Daniel,
>
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
>
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
> - André
>
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written
> to
> > avoid someone can do the job himself and must contract someone else to
> do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial
> success.
> > The documentation is less than clear and most of the stuff out there
> apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am
> doing a
> > single node installation and the instruction page doesn't explain
> anything
> > beside telling you to do this and that without documenting what each
> thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but
> nothing
> > is said about what they mean and to which value they should be set. It
> seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's
> hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure
> will
> > be a nightmare to manage and install each time a new version, release
> will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
>
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

André,

good for you that greedy instructions on the reference page were enough to
setup your cluster. However, read them again and see how many assumptions
are made into them about what you are supposed to already know and should
come without saying more about it.

I did try the single node setup, it is worst than the cluster setup
regarding the instructions. You are supposed to already have a near working
system as far as I understand the instructions. It is assumed the HDFS is
already setup and working properly. Try to find the instructions to setup
HDFS for version 2.2.0 and you will end up with a lot of inappropriate
instructions about previous version (some properties were renamed).

It may appear hard at people to say this is toxic, but it is. The first
place a newcomer will go is setup a single node. This will be his starting
point and he will be left with a bunch of a priori and no clue.

To go back to my very problem at this point:

13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test._COPYING_ could only be replicated to 0 nodes instead of
minReplication (=1).  There are 1 datanode(s) running and no node(s) are
excluded in this operation.
    at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
    at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
    at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

    at org.apache.hadoop.ipc.Client.call(Client.java:1347)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

I can copy an empty file, but as soon as its content is non-zero I am
getting this message. Searching on the message is of no help so far.

And I skimmed through the cluster instructions and found nothing there that
could help in any way neither.


-----------------
Daniel Savard


2013/12/2 Andre Kelpe <ak...@concurrentinc.com>

> Hi Daniel,
>
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
>
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
> - André
>
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written
> to
> > avoid someone can do the job himself and must contract someone else to
> do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial
> success.
> > The documentation is less than clear and most of the stuff out there
> apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am
> doing a
> > single node installation and the instruction page doesn't explain
> anything
> > beside telling you to do this and that without documenting what each
> thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but
> nothing
> > is said about what they mean and to which value they should be set. It
> seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's
> hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure
> will
> > be a nightmare to manage and install each time a new version, release
> will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
>
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

André,

good for you that greedy instructions on the reference page were enough to
setup your cluster. However, read them again and see how many assumptions
are made into them about what you are supposed to already know and should
come without saying more about it.

I did try the single node setup, it is worst than the cluster setup
regarding the instructions. You are supposed to already have a near working
system as far as I understand the instructions. It is assumed the HDFS is
already setup and working properly. Try to find the instructions to setup
HDFS for version 2.2.0 and you will end up with a lot of inappropriate
instructions about previous version (some properties were renamed).

It may appear hard at people to say this is toxic, but it is. The first
place a newcomer will go is setup a single node. This will be his starting
point and he will be left with a bunch of a priori and no clue.

To go back to my very problem at this point:

13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test._COPYING_ could only be replicated to 0 nodes instead of
minReplication (=1).  There are 1 datanode(s) running and no node(s) are
excluded in this operation.
    at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
    at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
    at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

    at org.apache.hadoop.ipc.Client.call(Client.java:1347)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

I can copy an empty file, but as soon as its content is non-zero I am
getting this message. Searching on the message is of no help so far.

And I skimmed through the cluster instructions and found nothing there that
could help in any way neither.


-----------------
Daniel Savard


2013/12/2 Andre Kelpe <ak...@concurrentinc.com>

> Hi Daniel,
>
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
>
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
> - André
>
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written
> to
> > avoid someone can do the job himself and must contract someone else to
> do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial
> success.
> > The documentation is less than clear and most of the stuff out there
> apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am
> doing a
> > single node installation and the instruction page doesn't explain
> anything
> > beside telling you to do this and that without documenting what each
> thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but
> nothing
> > is said about what they mean and to which value they should be set. It
> seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's
> hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure
> will
> > be a nightmare to manage and install each time a new version, release
> will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
>
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Re: Hadoop 2.2.0 from source configuration

Posted by Daniel Savard <da...@gmail.com>.

André,

good for you that greedy instructions on the reference page were enough to
setup your cluster. However, read them again and see how many assumptions
are made into them about what you are supposed to already know and should
come without saying more about it.

I did try the single node setup, it is worst than the cluster setup
regarding the instructions. You are supposed to already have a near working
system as far as I understand the instructions. It is assumed the HDFS is
already setup and working properly. Try to find the instructions to setup
HDFS for version 2.2.0 and you will end up with a lot of inappropriate
instructions about previous version (some properties were renamed).

It may appear hard at people to say this is toxic, but it is. The first
place a newcomer will go is setup a single node. This will be his starting
point and he will be left with a bunch of a priori and no clue.

To go back to my very problem at this point:

13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test._COPYING_ could only be replicated to 0 nodes instead of
minReplication (=1).  There are 1 datanode(s) running and no node(s) are
excluded in this operation.
    at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
    at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
    at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

    at org.apache.hadoop.ipc.Client.call(Client.java:1347)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
    at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

I can copy an empty file, but as soon as its content is non-zero I am
getting this message. Searching on the message is of no help so far.

And I skimmed through the cluster instructions and found nothing there that
could help in any way neither.


-----------------
Daniel Savard


2013/12/2 Andre Kelpe <ak...@concurrentinc.com>

> Hi Daniel,
>
> first of all, before posting to a mailing list, take a deep breath and
> let your frustrations out. Then write the email. Using words like
> "crappy", "toxicware", "nightmare" are not going to help you getting
> useful responses.
>
> While I agree that the docs can be confusing, we should try to stay
> constructive. You haven't  mentioned which documentation you are
> using. I found the cluster tutorial sufficient to get me started:
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> If you are looking for an easy way to spin up a small cluster with
> hadoop 2.2, try the hadoop2 branch of this vagrant setup:
>
> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2
>
> - André
>
> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com>
> wrote:
> > I am trying to configure hadoop 2.2.0 from source code and I found the
> > instructions really crappy and incomplete. It is like they were written
> to
> > avoid someone can do the job himself and must contract someone else to
> do it
> > or buy a packaged version.
> >
> > It is about three days I am struggling with this stuff with partial
> success.
> > The documentation is less than clear and most of the stuff out there
> apply
> > to earlier version and they haven't been updated for version 2.2.0.
> >
> > I was able to setup HDFS, however I am still unable to use it. I am
> doing a
> > single node installation and the instruction page doesn't explain
> anything
> > beside telling you to do this and that without documenting what each
> thing
> > is doing and what choices are available and what guidelines you should
> > follow. There is even environment variables you are told to set, but
> nothing
> > is said about what they mean and to which value they should be set. It
> seems
> > it assumes prior knowledge of everything about hadoop.
> >
> > Anyone knows a site with proper documentation about hadoop or it's
> hopeless
> > and this whole thing is just a piece of toxicware?
> >
> > I am already looking for alternate solutions to hadoop which for sure
> will
> > be a nightmare to manage and install each time a new version, release
> will
> > become available.
> >
> > TIA
> > -----------------
> > Daniel Savard
>
>
>
> --
> André Kelpe
> andre@concurrentinc.com
> http://concurrentinc.com
>

Re: Hadoop 2.2.0 from source configuration

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Hi Daniel,

first of all, before posting to a mailing list, take a deep breath and
let your frustrations out. Then write the email. Using words like
"crappy", "toxicware", "nightmare" are not going to help you getting
useful responses.

While I agree that the docs can be confusing, we should try to stay
constructive. You haven't  mentioned which documentation you are
using. I found the cluster tutorial sufficient to get me started:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

If you are looking for an easy way to spin up a small cluster with
hadoop 2.2, try the hadoop2 branch of this vagrant setup:

https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2

- André

On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> I am trying to configure hadoop 2.2.0 from source code and I found the
> instructions really crappy and incomplete. It is like they were written to
> avoid someone can do the job himself and must contract someone else to do it
> or buy a packaged version.
>
> It is about three days I am struggling with this stuff with partial success.
> The documentation is less than clear and most of the stuff out there apply
> to earlier version and they haven't been updated for version 2.2.0.
>
> I was able to setup HDFS, however I am still unable to use it. I am doing a
> single node installation and the instruction page doesn't explain anything
> beside telling you to do this and that without documenting what each thing
> is doing and what choices are available and what guidelines you should
> follow. There is even environment variables you are told to set, but nothing
> is said about what they mean and to which value they should be set. It seems
> it assumes prior knowledge of everything about hadoop.
>
> Anyone knows a site with proper documentation about hadoop or it's hopeless
> and this whole thing is just a piece of toxicware?
>
> I am already looking for alternate solutions to hadoop which for sure will
> be a nightmare to manage and install each time a new version, release will
> become available.
>
> TIA
> -----------------
> Daniel Savard



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: Hadoop 2.2.0 from source configuration

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Hi Daniel,

first of all, before posting to a mailing list, take a deep breath and
let your frustrations out. Then write the email. Using words like
"crappy", "toxicware", "nightmare" are not going to help you getting
useful responses.

While I agree that the docs can be confusing, we should try to stay
constructive. You haven't  mentioned which documentation you are
using. I found the cluster tutorial sufficient to get me started:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

If you are looking for an easy way to spin up a small cluster with
hadoop 2.2, try the hadoop2 branch of this vagrant setup:

https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2

- André

On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> I am trying to configure hadoop 2.2.0 from source code and I found the
> instructions really crappy and incomplete. It is like they were written to
> avoid someone can do the job himself and must contract someone else to do it
> or buy a packaged version.
>
> It is about three days I am struggling with this stuff with partial success.
> The documentation is less than clear and most of the stuff out there apply
> to earlier version and they haven't been updated for version 2.2.0.
>
> I was able to setup HDFS, however I am still unable to use it. I am doing a
> single node installation and the instruction page doesn't explain anything
> beside telling you to do this and that without documenting what each thing
> is doing and what choices are available and what guidelines you should
> follow. There is even environment variables you are told to set, but nothing
> is said about what they mean and to which value they should be set. It seems
> it assumes prior knowledge of everything about hadoop.
>
> Anyone knows a site with proper documentation about hadoop or it's hopeless
> and this whole thing is just a piece of toxicware?
>
> I am already looking for alternate solutions to hadoop which for sure will
> be a nightmare to manage and install each time a new version, release will
> become available.
>
> TIA
> -----------------
> Daniel Savard



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: Hadoop 2.2.0 from source configuration

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Hi Daniel,

first of all, before posting to a mailing list, take a deep breath and
let your frustrations out. Then write the email. Using words like
"crappy", "toxicware", "nightmare" are not going to help you getting
useful responses.

While I agree that the docs can be confusing, we should try to stay
constructive. You haven't  mentioned which documentation you are
using. I found the cluster tutorial sufficient to get me started:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

If you are looking for an easy way to spin up a small cluster with
hadoop 2.2, try the hadoop2 branch of this vagrant setup:

https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2

- André

On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> I am trying to configure hadoop 2.2.0 from source code and I found the
> instructions really crappy and incomplete. It is like they were written to
> avoid someone can do the job himself and must contract someone else to do it
> or buy a packaged version.
>
> It is about three days I am struggling with this stuff with partial success.
> The documentation is less than clear and most of the stuff out there apply
> to earlier version and they haven't been updated for version 2.2.0.
>
> I was able to setup HDFS, however I am still unable to use it. I am doing a
> single node installation and the instruction page doesn't explain anything
> beside telling you to do this and that without documenting what each thing
> is doing and what choices are available and what guidelines you should
> follow. There is even environment variables you are told to set, but nothing
> is said about what they mean and to which value they should be set. It seems
> it assumes prior knowledge of everything about hadoop.
>
> Anyone knows a site with proper documentation about hadoop or it's hopeless
> and this whole thing is just a piece of toxicware?
>
> I am already looking for alternate solutions to hadoop which for sure will
> be a nightmare to manage and install each time a new version, release will
> become available.
>
> TIA
> -----------------
> Daniel Savard



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com

Re: Hadoop 2.2.0 from source configuration

Posted by Andre Kelpe <ak...@concurrentinc.com>.

Hi Daniel,

first of all, before posting to a mailing list, take a deep breath and
let your frustrations out. Then write the email. Using words like
"crappy", "toxicware", "nightmare" are not going to help you getting
useful responses.

While I agree that the docs can be confusing, we should try to stay
constructive. You haven't  mentioned which documentation you are
using. I found the cluster tutorial sufficient to get me started:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

If you are looking for an easy way to spin up a small cluster with
hadoop 2.2, try the hadoop2 branch of this vagrant setup:

https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2

- André

On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard <da...@gmail.com> wrote:
> I am trying to configure hadoop 2.2.0 from source code and I found the
> instructions really crappy and incomplete. It is like they were written to
> avoid someone can do the job himself and must contract someone else to do it
> or buy a packaged version.
>
> It is about three days I am struggling with this stuff with partial success.
> The documentation is less than clear and most of the stuff out there apply
> to earlier version and they haven't been updated for version 2.2.0.
>
> I was able to setup HDFS, however I am still unable to use it. I am doing a
> single node installation and the instruction page doesn't explain anything
> beside telling you to do this and that without documenting what each thing
> is doing and what choices are available and what guidelines you should
> follow. There is even environment variables you are told to set, but nothing
> is said about what they mean and to which value they should be set. It seems
> it assumes prior knowledge of everything about hadoop.
>
> Anyone knows a site with proper documentation about hadoop or it's hopeless
> and this whole thing is just a piece of toxicware?
>
> I am already looking for alternate solutions to hadoop which for sure will
> be a nightmare to manage and install each time a new version, release will
> become available.
>
> TIA
> -----------------
> Daniel Savard



-- 
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com