You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jay Wilson <re...@circle-cross-jn.com> on 2012/07/01 23:05:20 UTC

HBASE -- Regionserver and QuorumPeer ?

Can a regionserver and quorumpeer reside on the same node?

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by yu...@gmail.com.

Yes. 



On Jul 1, 2012, at 2:05 PM, Jay Wilson <re...@circle-cross-jn.com> wrote:

> Can a regionserver and quorumpeer reside on the same node?
> 
> 
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Mohammad Tariq <do...@gmail.com>.

Not necessarily...Both are totally different processes..In a Hadoop
cluster typically HBase Master and a ZooKeeper quorum peer run on a
machine and regionservers are spread across the cluster. But this
totally depends on you.

Regards,
    Mohammad Tariq

On Mon, Jul 2, 2012 at 2:35 AM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> Can a regionserver and quorumpeer reside on the same node?
>
>
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Lars George <la...@gmail.com>.

Hi Mike,

> Running  RS on a machine where DN isn't running? 

I am not following here. Andy said that both are on the same node. Where in this thread did someone imply something else? Just curious.

Cheers,
Lars


On Jul 2, 2012, at 7:11 AM, Michael Segel wrote:

> I'm sorry I'm losing it. 
> 
> Running  RS on a machine where DN isn't running? 
> So then the RS can't store its regions locally. Not sure if that would ever be a good idea or recommended. 
> 
> Thought the initial question is running ZK on the same node as a RS which isn't a good idea and a recipe for failure....
> 
> Following KISS is a much better way of life than taking Crystal Meth. Its one way to avoid those nasty 'dead hooker problems'. *
> 
> *<rant>
> 	<explanation>
> 		Just to explain KISS and what I mean by a 'dead hooker' problem...
> 
> 		KISS = Keep It Simple Stupid 
> 		This is an engineering principle used to teach engineering students that the best solutions are the ones that are straight forward and that if you attempt to get too clever, you always 		get some sort of blow back in your face.  It usually hurts and its always self inflicted.
> 
> 		'dead hooker problems'  - are the theoretical problems of how to get rid of the dead hooker from your hotel room after your party of Hookers, Booze and either Crystal Meth or 				Cocaine goes terribly wrong and you wake up the next morning with a nasty hangover and a dead body that you have to get out of your hotel room before the cleaning ladies come 		knocking on your hotel room door. While I've never experienced this... I can't recall how many movies have this as a plot or sub plot. 
> 
> 		Not that I'm attempting to advocate drugs or killing hookers, unless its with a type writer or text editor when you want to write your next failed movie script. 
> 	</explanation>
> 
> 	So here's my rant... 
> 
> 	I'm not picking on the OP, but in general there's a class of posts where the OP starts a thread by ignoring the common wisdom captured in books, blogs and Apache wikis when setting up 	a cluster. 
> 
> 	When things don't work, they ultimately post here and wonder why they don't work. 
> 
> 	The key to happiness is to not ignore the conventional wisdom and when starting out with Hadoop,  follow the suggested set ups. Remember that the key is to first grok Hadoop before 	you attempt and doing more advanced things in terms of cluster configurations. That is what is meant by KISS. Accept that Hadoop is just a tool used by many to solve problems requiring 	a parallel framework. 
> 
> 	Dead Hooker problems may be a great plot device, but in real life, when under a time crunch, they are something one should avoid.  ;-) 
> 
> </rant>
> 
> For those of you who don't appreciate my sense of humor, try another example... (Also note... I don't know how this will translate to another language other than English so the meaning of this could be lost in translation...) 
> 
> Your wife has invited a bunch of her co-workers, including her boss, over for a dinner. You, being the good spouse are responsible for some of the meal prep. Rather than go with a tried and true recipe, you decide to try something new. And not only try a new recipe, you also decide to improvise and try new ingredients and do your own thing.   Not really a good idea, and unless you are incredibly lucky, or a really good cook with a talent for creating new recipes, you are more than likely going to end up in the dog house. 
> 
> Take it from a guy who usually lives in the dog house for one reason or another... following the recipes and not trying something new when the pressure for success is on... much less stress in your life.  :-) 
> 
> Again, with respect to Hadoop, there are a lot of moving parts where things can go wrong. I've got this drinking buddy named Murphy... you know the guy, he wrote this law... ;-)
> 
> HTH
> 
> -Mikey
> 
> 
> 
> On Jul 1, 2012, at 7:41 PM, Andrew Purtell wrote:
> 
>> A typical and recommended configuration is HBase RegionServer and HDFS
>> DataNode colocated on the nodes. The DataNode will use locally
>> attached disk to store and serve blocks.
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Michael Segel <mi...@hotmail.com>.

Well...

I wasn't sure if St.Ack was displeased by my comments on Andrew's response, or my references to KISS where the second S is stupid, reference to 'dead hookers' or reference to drugs. 
I was just covering my bases. :-)

With respect to Andrew's response, I saw something that I wasn't sure if I was reading too much in to his response.  Hence my start with that I may be losing it because I was probably reading something in to his response that he may not have intended. 

I guess its a problem many of us have, myself included, where we are sometimes intentionally vague in our response. 

There are times when someone asks a question, the response is that they shouldn't do X, that while its not a good idea to do something, its still theoretically possible to do. 
In this case running a RS and ZK on the same node.  Yes, it could be done with the proper configuration where you isolate your disk I/O as much as possible between ZK and the RS. However the better solution is to run the ZK along with the JT, NN, HM and even SN on the same node. (For a small dev cluster.)

Another case in point is that we see things taken out of context. As an example, there was a presentation by Facebook I think... where they run their HBase on nodes where they don't run TT. In context, this could make a lot of sense when they are using HBase to deliver real time response to an app outside of the cluster, and are not using it as part of a M/R job. The problem is that someone sees this and takes it out of context saying that FB does it and the best way to run HBase is to not run it on the same nodes you have TT running.  (Data Locality? Forget about it...) 

Note I don't believe that this is what the FB presentation was suggesting except in their specific solution. 

In another thread ,  someone was asking for help because they were having problems with their cluster. One node was in India, Two were in the US. The response was along the idea that its not a good thing to do this.   While I agree with the response, I have to wonder if it shouldn't have been worded more strongly. We aren't saying it can't be done, we're saying that its not something we'd recommend. I don't know if that's a strong enough response to really discourage an OP from actually doing it. 

Is that a better explanation?

On Jul 2, 2012, at 6:53 AM, Mohammad Tariq wrote:

> What kind of explanation is this???????????
> 
> Regards,
>    Mohammad Tariq
> 
> 
> On Mon, Jul 2, 2012 at 5:10 PM, Michael Segel <mi...@hotmail.com> wrote:
>> Sorry St. Ack,
>> 
>> Which is why I said that I was losing it...
>> 
>> The entire quote was...
>> "On Sun, Jul 1, 2012 at 2:05 PM, Jay Wilson
>> <re...@circle-cross-jn.com> wrote:
>>> Can a regionserver and quorumpeer reside on the same node?
>> 
>> It can, but you want to consider how disk is allocated in the cluster.
>> 
>> A typical and recommended configuration is HBase RegionServer and HDFS
>> DataNode colocated on the nodes. The DataNode will use locally
>> attached disk to store and serve blocks.
>> "
>> 
>> Looking at and parsing this you have two things...
>> 
>> 1) When reading the 'A typical and recommended configuration...' can imply that its possible while not recommended to try and run an HBase RS while not running a DN service on the same node.
>> 
>> 2) "It can, but you want to consider how disk is allocated in the cluster."
>> While on a single machine running as a pseudo cluster is one thing, running a fully distributed cluster is another.
>> 
>> 
>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics.  And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something.  I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>> 
>> Now I'm not sure if my KISS statement or my 'dead hooker' analogy or my jokes about drugs.
>> 
>> KISS, I guess goes back to when I first learned that term. It was a 200 level Engineering graphics course where the instructor mentioned KISS and then stalled on the second S (KIS == Keep it Simple) and used the term 'Stupid' to refer back to the engineer who didn't keep it simple. Of course he was the same Professor who couldn't figure out an algorithm without using a GOTO statement and got huffy when I made the mistake of correcting him in class.  (But that's another story.) Not sure if it should be KIS or if the second S in KISS was for something else.
>> 
>> The 'dead hooker' analogy goes back to watching movie plots and subplots where the hero wakes up next to a body of a dead woman in bed.  While in James Bond films its the evil turned good hottie that gets it, I was thinking back to the Cameron Diaz flick 'Very Bad Things' - 1998 movie where the plot line is based on a prostitute getting killed at a bachelor party. Also for some reason the movie Barton Fink comes to mind, or the Great Gatsby.
>> 
>> And while I don't advocate drugs, that too is a reference to movies. Its the whole 'Airplane' spoofs where Lloyd Bridges talks about how today was a bad day for giving up  <insert your favorite drug> ...
>> 
>> Sorry to side track but I thought I'd give a more detailed explanation ...
>> 
>> 
>> On Jul 2, 2012, at 2:51 AM, Stack wrote:
>> 
>>> On Mon, Jul 2, 2012 at 7:11 AM, Michael Segel <mi...@hotmail.com> wrote:
>>>> I'm sorry I'm losing it.
>>>> 
>>> 
>>> Its plain.   Do us a favor and try keeping your psychotic breakdown to
>>> yourself going forward.
>>> 
>>> St.Ack
>>> 
>> 
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Mohammad Tariq <do...@gmail.com>.

What kind of explanation is this???????????

Regards,
    Mohammad Tariq


On Mon, Jul 2, 2012 at 5:10 PM, Michael Segel <mi...@hotmail.com> wrote:
> Sorry St. Ack,
>
> Which is why I said that I was losing it...
>
> The entire quote was...
> "On Sun, Jul 1, 2012 at 2:05 PM, Jay Wilson
> <re...@circle-cross-jn.com> wrote:
>> Can a regionserver and quorumpeer reside on the same node?
>
> It can, but you want to consider how disk is allocated in the cluster.
>
> A typical and recommended configuration is HBase RegionServer and HDFS
> DataNode colocated on the nodes. The DataNode will use locally
> attached disk to store and serve blocks.
> "
>
> Looking at and parsing this you have two things...
>
> 1) When reading the 'A typical and recommended configuration...' can imply that its possible while not recommended to try and run an HBase RS while not running a DN service on the same node.
>
> 2) "It can, but you want to consider how disk is allocated in the cluster."
> While on a single machine running as a pseudo cluster is one thing, running a fully distributed cluster is another.
>
>
> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics.  And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something.  I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>
> Now I'm not sure if my KISS statement or my 'dead hooker' analogy or my jokes about drugs.
>
> KISS, I guess goes back to when I first learned that term. It was a 200 level Engineering graphics course where the instructor mentioned KISS and then stalled on the second S (KIS == Keep it Simple) and used the term 'Stupid' to refer back to the engineer who didn't keep it simple. Of course he was the same Professor who couldn't figure out an algorithm without using a GOTO statement and got huffy when I made the mistake of correcting him in class.  (But that's another story.) Not sure if it should be KIS or if the second S in KISS was for something else.
>
> The 'dead hooker' analogy goes back to watching movie plots and subplots where the hero wakes up next to a body of a dead woman in bed.  While in James Bond films its the evil turned good hottie that gets it, I was thinking back to the Cameron Diaz flick 'Very Bad Things' - 1998 movie where the plot line is based on a prostitute getting killed at a bachelor party. Also for some reason the movie Barton Fink comes to mind, or the Great Gatsby.
>
> And while I don't advocate drugs, that too is a reference to movies. Its the whole 'Airplane' spoofs where Lloyd Bridges talks about how today was a bad day for giving up  <insert your favorite drug> ...
>
> Sorry to side track but I thought I'd give a more detailed explanation ...
>
>
> On Jul 2, 2012, at 2:51 AM, Stack wrote:
>
>> On Mon, Jul 2, 2012 at 7:11 AM, Michael Segel <mi...@hotmail.com> wrote:
>>> I'm sorry I'm losing it.
>>>
>>
>> Its plain.   Do us a favor and try keeping your psychotic breakdown to
>> yourself going forward.
>>
>> St.Ack
>>
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Andrew Purtell <ap...@apache.org>.

On Mon, Jul 2, 2012 at 4:40 AM, Michael Segel <mi...@hotmail.com> wrote:
> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics.  And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something.

Hey Michael your point about weasel words is not without merit.

However, I try to limit my use of strong language because I know I
have a tendency toward strong opinions. (On some days I am more
successful than others.) I find it is generally not appropriate to
take a strong tone with users, the impression it leaves is that you
are an asshole, and your community by extension. In this thread I
think you are suffering that very effect.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Suraj Varma <sv...@gmail.com>.

I think your devrackA-03 zookeeper is not quite "ok" - it doesn't seem
to be part of the quorum.

http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-JMX-Monitoring-suggestion-td6681354.html

>>> [hadoop@devrackA-00 ~]$ zookeeper-check
>>> devrackA-03
>>> imok
>>> This ZooKeeper instance is not currently serving requests
>>> This ZooKeeper instance is not currently serving requests

Check it's logs to see why it is not able to respond to stat command.
>>> imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181
>>> This ZooKeeper instance is not currently serving requests


--Suraj

On Mon, Jul 2, 2012 at 5:51 PM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> When I do "locate hbase-site.xml", "locate hdfs-site.xml", and "locate
> core-site.xml" there are 2 locations for each on the HRegionServers.
> All files are either in $HADOOP_HOME/conf or $HBASE_HOME/conf and there
> are files of the same name in "example" directories.
>
> I moved my HRegionServers back to my original nodes that also have the
> HQuorumPeers on them.  My HMaster and HRegionServers are now running
> again.  I suspect they will terminate after 30 minutes like they have
> been doing, but at least they are running.
>
> ---
> Jay Wilson
>
> On 7/2/2012 4:43 PM, Suraj Varma wrote:
>> Ok - thanks for checking connectivity.
>>
>> I presume you already have doublechecked the hbase-site.xml in your
>> region server that points to the zookeeper and hdfs-site.xml pointed
>> to the namenode.
>>
>> I once got a similar error when HBase was picking up a stray
>> core-site.xml / hdfs-site.xml from the hdfs install or hbase-site.xml
>> from another hbase install (perhaps a stray local install)
>>
>> If connectivity is all right, and you are getting connection refused,
>> I think your region server is picking up the wrong configuration file.
>> So - do a "locate" on the region server configuration files to see if
>> there are others on the box.
>>
>> Just trying to eliminate basic setup issues ...
>> --Suraj
>>
>>
>> On Mon, Jul 2, 2012 at 3:55 PM, Jay Wilson
>> <re...@circle-cross-jn.com> wrote:
>>> First, thank you.
>>>
>>> I moved my HRegionservers not my HQuorumPeers.
>>>
>>> I have checked the network and everyone can talk to everyone.  I can
>>> even talk to my HQuorumPeers via "nc" from the nodes that should be
>>> running my HMaster on it and my HRegionservers.
>>>
>>> [hadoop@devrackA-00 ~]$ zookeeper-check
>>> devrackA-03
>>> imok
>>> This ZooKeeper instance is not currently serving requests
>>> This ZooKeeper instance is not currently serving requests
>>>
>>>
>>>
>>> devrackA-04
>>> imok
>>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>>> Clients:
>>>  /172.18.0.1:41582[0](queued=0,recved=1,sent=0)
>>>
>>> Latency min/avg/max: 0/0/0
>>> Received: 5
>>> Sent: 4
>>> Outstanding: 0
>>> Zxid: 0x0
>>> Mode: follower
>>> Node count: 4
>>>  /172.18.0.1:41583[0](queued=0,recved=1,sent=0)
>>>
>>>
>>>
>>>
>>> devrackA-05
>>> imok
>>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>>> Clients:
>>>  /172.18.0.1:35517[0](queued=0,recved=1,sent=0)
>>>
>>> Latency min/avg/max: 0/0/0
>>> Received: 5
>>> Sent: 4
>>> Outstanding: 0
>>> Zxid: 0x0
>>> Mode: follower
>>> Node count: 4
>>>  /172.18.0.1:35518[0](queued=0,recved=1,sent=0)
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>> [hadoop@devrackA-06 ~]$ jps
>>> 21276 Jps
>>> 20641 DataNode
>>> [hadoop@devrackA-06 ~]$ echo ruok | nc devrackA-04 2181
>>> imok[hadoop@devrackA-06 ~]$ echo stat | nc devrackA-04 2181
>>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>>> Clients:
>>>  /172.18.0.7:37950[0](queued=0,recved=1,sent=0)
>>>
>>> Latency min/avg/max: 0/0/0
>>> Received: 8
>>> Sent: 7
>>> Outstanding: 0
>>> Zxid: 0x0
>>> Mode: follower
>>> Node count: 4
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>> [hadoop@devrackB-07 ~]$ echo ruok | nc devrackA-04 2181
>>> imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181
>>> This ZooKeeper instance is not currently serving requests
>>> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-05 2181
>>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>>> Clients:
>>>  /172.18.0.72:40784[0](queued=0,recved=1,sent=0)
>>>
>>> Latency min/avg/max: 0/0/0
>>> Received: 7
>>> Sent: 6
>>> Outstanding: 0
>>> Zxid: 0x0
>>> Mode: follower
>>> Node count: 4
>>> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-04 2181
>>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>>> Clients:
>>>  /172.18.0.72:60795[0](queued=0,recved=1,sent=0)
>>>
>>> Latency min/avg/max: 0/0/0
>>> Received: 10
>>> Sent: 9
>>> Outstanding: 0
>>> Zxid: 0x0
>>> Mode: follower
>>> Node count: 4
>>> [hadoop@devrackB-07 ~]$
>>>
>>> ~~~~~~~~~~~
>>>
>>> I know it says connection refused in the error, but are there files
>>> associated with a HRegionServer that I need to clean up?  I did NOT move
>>> the HMaster or HQuorumPeers.  I only moved the HRegionServers
>>>
>>> Thanks you for the help.
>>>
>>> ---
>>> Jay Wilson
>>>
>>>
>>>
>>>
>>>
>>> On 7/2/2012 2:43 PM, Suraj Varma wrote:
>>>> The error you are getting is:
>>>>
>>>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server devrackA-05/172.18.0.6:2181
>>>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>>>> 0x0 for server null, unexpected error, closing socket connection and
>>>>> attempting reconnect
>>>>> java.net.ConnectException: Connection refused
>>>>
>>>>
>>>> This means this server is not able to reach the zookeeper. Did you
>>>> change your hbase-site.xml as well with the new zookeeper quorum?
>>>> Do basic connectivity testing to ensure that your hosts / DNS is all
>>>> in place after your relocations - checkout
>>>> http://hbase.apache.org/book.html#d1952e311 and see if the dns checker
>>>> tool might help.
>>>> --S
>>>>
>>>>
>>>>
>>>> On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson
>>>> <re...@circle-cross-jn.com> wrote:
>>>>> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
>>>>> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
>>>>> is pure book learning and understanding the log messages is very vexing.
>>>>>
>>>>> Second, based on the recommendations of this mail-list I decided to move
>>>>> my HRegionservers to nodes other than where where my HQuorumpeers are.
>>>>> I updated my regionservers file on every node in the cluster. I ran
>>>>> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
>>>>> ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
>>>>> and HRegionservers terminate within seconds.  Before I had them at least
>>>>> running for 30 minutes.  The message is:
>>>>>
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:java.io.tmpdir=/tmp
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:java.compiler=<NA>
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:os.name=Linux
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:os.arch=amd64
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:os.version=2.6.18-194.el5
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:user.name=hadoop
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:user.home=/home/hadoop
>>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>> environment:user.dir=/home/hadoop/jscripts
>>>>> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>>> client connection,
>>>>> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
>>>>> sessionTimeout=180000 watcher=master:60000
>>>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server devrackA-05/172.18.0.6:2181
>>>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>>>> 0x0 for server null, unexpected error, closing socket connection and
>>>>> attempting reconnect
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
>>>>> up zookeeper), but I get the same result (Connection refused).  Is there
>>>>> something else I need to do when I move a regionserver?
>>>>>
>>>>> My zookeeper working directory is /home/hbase/zookeeper.  Would there be
>>>>> other places that I need to clean up?
>>>>>
>>>>>
>>>>>
>>>>> Thank You
>>>>> --
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>> On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
>>>>>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
>>>>>>
>>>>>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.
>>>>>>
>>>>>> Bringing the thread back to track:
>>>>>>
>>>>>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
>>>>>>
>>>>>> Datanode + Region Servers on the same physical nodes
>>>>>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
>>>>>> Namenode on an independent node
>>>>>> Secondary Namenode on an independent node
>>>>>>
>>>>>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
>>>>>>
>>>>>> Hope that helps
>>>>>>
>>>>>> -Amandeep
>>>>>>
>>>>>>
>>>>>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
>>>>>>
>>>>>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Jay Wilson <re...@circle-cross-jn.com>.

When I do "locate hbase-site.xml", "locate hdfs-site.xml", and "locate
core-site.xml" there are 2 locations for each on the HRegionServers.
All files are either in $HADOOP_HOME/conf or $HBASE_HOME/conf and there
are files of the same name in "example" directories.

I moved my HRegionServers back to my original nodes that also have the
HQuorumPeers on them.  My HMaster and HRegionServers are now running
again.  I suspect they will terminate after 30 minutes like they have
been doing, but at least they are running.

---
Jay Wilson

On 7/2/2012 4:43 PM, Suraj Varma wrote:
> Ok - thanks for checking connectivity.
> 
> I presume you already have doublechecked the hbase-site.xml in your
> region server that points to the zookeeper and hdfs-site.xml pointed
> to the namenode.
> 
> I once got a similar error when HBase was picking up a stray
> core-site.xml / hdfs-site.xml from the hdfs install or hbase-site.xml
> from another hbase install (perhaps a stray local install)
> 
> If connectivity is all right, and you are getting connection refused,
> I think your region server is picking up the wrong configuration file.
> So - do a "locate" on the region server configuration files to see if
> there are others on the box.
> 
> Just trying to eliminate basic setup issues ...
> --Suraj
> 
> 
> On Mon, Jul 2, 2012 at 3:55 PM, Jay Wilson
> <re...@circle-cross-jn.com> wrote:
>> First, thank you.
>>
>> I moved my HRegionservers not my HQuorumPeers.
>>
>> I have checked the network and everyone can talk to everyone.  I can
>> even talk to my HQuorumPeers via "nc" from the nodes that should be
>> running my HMaster on it and my HRegionservers.
>>
>> [hadoop@devrackA-00 ~]$ zookeeper-check
>> devrackA-03
>> imok
>> This ZooKeeper instance is not currently serving requests
>> This ZooKeeper instance is not currently serving requests
>>
>>
>>
>> devrackA-04
>> imok
>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>> Clients:
>>  /172.18.0.1:41582[0](queued=0,recved=1,sent=0)
>>
>> Latency min/avg/max: 0/0/0
>> Received: 5
>> Sent: 4
>> Outstanding: 0
>> Zxid: 0x0
>> Mode: follower
>> Node count: 4
>>  /172.18.0.1:41583[0](queued=0,recved=1,sent=0)
>>
>>
>>
>>
>> devrackA-05
>> imok
>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>> Clients:
>>  /172.18.0.1:35517[0](queued=0,recved=1,sent=0)
>>
>> Latency min/avg/max: 0/0/0
>> Received: 5
>> Sent: 4
>> Outstanding: 0
>> Zxid: 0x0
>> Mode: follower
>> Node count: 4
>>  /172.18.0.1:35518[0](queued=0,recved=1,sent=0)
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~
>>
>>
>> [hadoop@devrackA-06 ~]$ jps
>> 21276 Jps
>> 20641 DataNode
>> [hadoop@devrackA-06 ~]$ echo ruok | nc devrackA-04 2181
>> imok[hadoop@devrackA-06 ~]$ echo stat | nc devrackA-04 2181
>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>> Clients:
>>  /172.18.0.7:37950[0](queued=0,recved=1,sent=0)
>>
>> Latency min/avg/max: 0/0/0
>> Received: 8
>> Sent: 7
>> Outstanding: 0
>> Zxid: 0x0
>> Mode: follower
>> Node count: 4
>>
>>
>> ~~~~~~~~~~~~~~~~~~~
>>
>>
>> [hadoop@devrackB-07 ~]$ echo ruok | nc devrackA-04 2181
>> imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181
>> This ZooKeeper instance is not currently serving requests
>> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-05 2181
>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>> Clients:
>>  /172.18.0.72:40784[0](queued=0,recved=1,sent=0)
>>
>> Latency min/avg/max: 0/0/0
>> Received: 7
>> Sent: 6
>> Outstanding: 0
>> Zxid: 0x0
>> Mode: follower
>> Node count: 4
>> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-04 2181
>> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
>> Clients:
>>  /172.18.0.72:60795[0](queued=0,recved=1,sent=0)
>>
>> Latency min/avg/max: 0/0/0
>> Received: 10
>> Sent: 9
>> Outstanding: 0
>> Zxid: 0x0
>> Mode: follower
>> Node count: 4
>> [hadoop@devrackB-07 ~]$
>>
>> ~~~~~~~~~~~
>>
>> I know it says connection refused in the error, but are there files
>> associated with a HRegionServer that I need to clean up?  I did NOT move
>> the HMaster or HQuorumPeers.  I only moved the HRegionServers
>>
>> Thanks you for the help.
>>
>> ---
>> Jay Wilson
>>
>>
>>
>>
>>
>> On 7/2/2012 2:43 PM, Suraj Varma wrote:
>>> The error you are getting is:
>>>
>>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server devrackA-05/172.18.0.6:2181
>>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>>> 0x0 for server null, unexpected error, closing socket connection and
>>>> attempting reconnect
>>>> java.net.ConnectException: Connection refused
>>>
>>>
>>> This means this server is not able to reach the zookeeper. Did you
>>> change your hbase-site.xml as well with the new zookeeper quorum?
>>> Do basic connectivity testing to ensure that your hosts / DNS is all
>>> in place after your relocations - checkout
>>> http://hbase.apache.org/book.html#d1952e311 and see if the dns checker
>>> tool might help.
>>> --S
>>>
>>>
>>>
>>> On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson
>>> <re...@circle-cross-jn.com> wrote:
>>>> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
>>>> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
>>>> is pure book learning and understanding the log messages is very vexing.
>>>>
>>>> Second, based on the recommendations of this mail-list I decided to move
>>>> my HRegionservers to nodes other than where where my HQuorumpeers are.
>>>> I updated my regionservers file on every node in the cluster. I ran
>>>> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
>>>> ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
>>>> and HRegionservers terminate within seconds.  Before I had them at least
>>>> running for 30 minutes.  The message is:
>>>>
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:java.io.tmpdir=/tmp
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:java.compiler=<NA>
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:os.name=Linux
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:os.arch=amd64
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:os.version=2.6.18-194.el5
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:user.name=hadoop
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:user.home=/home/hadoop
>>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>>> environment:user.dir=/home/hadoop/jscripts
>>>> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection,
>>>> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
>>>> sessionTimeout=180000 watcher=master:60000
>>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server devrackA-05/172.18.0.6:2181
>>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>>> 0x0 for server null, unexpected error, closing socket connection and
>>>> attempting reconnect
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
>>>> up zookeeper), but I get the same result (Connection refused).  Is there
>>>> something else I need to do when I move a regionserver?
>>>>
>>>> My zookeeper working directory is /home/hbase/zookeeper.  Would there be
>>>> other places that I need to clean up?
>>>>
>>>>
>>>>
>>>> Thank You
>>>> --
>>>> Jay
>>>>
>>>>
>>>>
>>>> On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
>>>>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
>>>>>
>>>>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.
>>>>>
>>>>> Bringing the thread back to track:
>>>>>
>>>>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
>>>>>
>>>>> Datanode + Region Servers on the same physical nodes
>>>>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
>>>>> Namenode on an independent node
>>>>> Secondary Namenode on an independent node
>>>>>
>>>>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
>>>>>
>>>>> Hope that helps
>>>>>
>>>>> -Amandeep
>>>>>
>>>>>
>>>>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
>>>>>
>>>>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Suraj Varma <sv...@gmail.com>.

Ok - thanks for checking connectivity.

I presume you already have doublechecked the hbase-site.xml in your
region server that points to the zookeeper and hdfs-site.xml pointed
to the namenode.

I once got a similar error when HBase was picking up a stray
core-site.xml / hdfs-site.xml from the hdfs install or hbase-site.xml
from another hbase install (perhaps a stray local install)

If connectivity is all right, and you are getting connection refused,
I think your region server is picking up the wrong configuration file.
So - do a "locate" on the region server configuration files to see if
there are others on the box.

Just trying to eliminate basic setup issues ...
--Suraj


On Mon, Jul 2, 2012 at 3:55 PM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> First, thank you.
>
> I moved my HRegionservers not my HQuorumPeers.
>
> I have checked the network and everyone can talk to everyone.  I can
> even talk to my HQuorumPeers via "nc" from the nodes that should be
> running my HMaster on it and my HRegionservers.
>
> [hadoop@devrackA-00 ~]$ zookeeper-check
> devrackA-03
> imok
> This ZooKeeper instance is not currently serving requests
> This ZooKeeper instance is not currently serving requests
>
>
>
> devrackA-04
> imok
> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
> Clients:
>  /172.18.0.1:41582[0](queued=0,recved=1,sent=0)
>
> Latency min/avg/max: 0/0/0
> Received: 5
> Sent: 4
> Outstanding: 0
> Zxid: 0x0
> Mode: follower
> Node count: 4
>  /172.18.0.1:41583[0](queued=0,recved=1,sent=0)
>
>
>
>
> devrackA-05
> imok
> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
> Clients:
>  /172.18.0.1:35517[0](queued=0,recved=1,sent=0)
>
> Latency min/avg/max: 0/0/0
> Received: 5
> Sent: 4
> Outstanding: 0
> Zxid: 0x0
> Mode: follower
> Node count: 4
>  /172.18.0.1:35518[0](queued=0,recved=1,sent=0)
>
>
> ~~~~~~~~~~~~~~~~~~~~
>
>
> [hadoop@devrackA-06 ~]$ jps
> 21276 Jps
> 20641 DataNode
> [hadoop@devrackA-06 ~]$ echo ruok | nc devrackA-04 2181
> imok[hadoop@devrackA-06 ~]$ echo stat | nc devrackA-04 2181
> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
> Clients:
>  /172.18.0.7:37950[0](queued=0,recved=1,sent=0)
>
> Latency min/avg/max: 0/0/0
> Received: 8
> Sent: 7
> Outstanding: 0
> Zxid: 0x0
> Mode: follower
> Node count: 4
>
>
> ~~~~~~~~~~~~~~~~~~~
>
>
> [hadoop@devrackB-07 ~]$ echo ruok | nc devrackA-04 2181
> imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181
> This ZooKeeper instance is not currently serving requests
> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-05 2181
> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
> Clients:
>  /172.18.0.72:40784[0](queued=0,recved=1,sent=0)
>
> Latency min/avg/max: 0/0/0
> Received: 7
> Sent: 6
> Outstanding: 0
> Zxid: 0x0
> Mode: follower
> Node count: 4
> [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-04 2181
> Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
> Clients:
>  /172.18.0.72:60795[0](queued=0,recved=1,sent=0)
>
> Latency min/avg/max: 0/0/0
> Received: 10
> Sent: 9
> Outstanding: 0
> Zxid: 0x0
> Mode: follower
> Node count: 4
> [hadoop@devrackB-07 ~]$
>
> ~~~~~~~~~~~
>
> I know it says connection refused in the error, but are there files
> associated with a HRegionServer that I need to clean up?  I did NOT move
> the HMaster or HQuorumPeers.  I only moved the HRegionServers
>
> Thanks you for the help.
>
> ---
> Jay Wilson
>
>
>
>
>
> On 7/2/2012 2:43 PM, Suraj Varma wrote:
>> The error you are getting is:
>>
>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server devrackA-05/172.18.0.6:2181
>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>> 0x0 for server null, unexpected error, closing socket connection and
>>> attempting reconnect
>>> java.net.ConnectException: Connection refused
>>
>>
>> This means this server is not able to reach the zookeeper. Did you
>> change your hbase-site.xml as well with the new zookeeper quorum?
>> Do basic connectivity testing to ensure that your hosts / DNS is all
>> in place after your relocations - checkout
>> http://hbase.apache.org/book.html#d1952e311 and see if the dns checker
>> tool might help.
>> --S
>>
>>
>>
>> On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson
>> <re...@circle-cross-jn.com> wrote:
>>> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
>>> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
>>> is pure book learning and understanding the log messages is very vexing.
>>>
>>> Second, based on the recommendations of this mail-list I decided to move
>>> my HRegionservers to nodes other than where where my HQuorumpeers are.
>>> I updated my regionservers file on every node in the cluster. I ran
>>> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
>>> ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
>>> and HRegionservers terminate within seconds.  Before I had them at least
>>> running for 30 minutes.  The message is:
>>>
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/tmp
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.name=Linux
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.arch=amd64
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.version=2.6.18-194.el5
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.name=hadoop
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/hadoop
>>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.dir=/home/hadoop/jscripts
>>> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection,
>>> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
>>> sessionTimeout=180000 watcher=master:60000
>>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server devrackA-05/172.18.0.6:2181
>>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>>> 0x0 for server null, unexpected error, closing socket connection and
>>> attempting reconnect
>>> java.net.ConnectException: Connection refused
>>>
>>> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
>>> up zookeeper), but I get the same result (Connection refused).  Is there
>>> something else I need to do when I move a regionserver?
>>>
>>> My zookeeper working directory is /home/hbase/zookeeper.  Would there be
>>> other places that I need to clean up?
>>>
>>>
>>>
>>> Thank You
>>> --
>>> Jay
>>>
>>>
>>>
>>> On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
>>>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
>>>>
>>>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.
>>>>
>>>> Bringing the thread back to track:
>>>>
>>>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
>>>>
>>>> Datanode + Region Servers on the same physical nodes
>>>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
>>>> Namenode on an independent node
>>>> Secondary Namenode on an independent node
>>>>
>>>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
>>>>
>>>> Hope that helps
>>>>
>>>> -Amandeep
>>>>
>>>>
>>>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
>>>>
>>>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>>>>
>>>>
>>>
>>
>>
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Jay Wilson <re...@circle-cross-jn.com>.

First, thank you.

I moved my HRegionservers not my HQuorumPeers.

I have checked the network and everyone can talk to everyone.  I can
even talk to my HQuorumPeers via "nc" from the nodes that should be
running my HMaster on it and my HRegionservers.

[hadoop@devrackA-00 ~]$ zookeeper-check
devrackA-03
imok
This ZooKeeper instance is not currently serving requests
This ZooKeeper instance is not currently serving requests



devrackA-04
imok
Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
Clients:
 /172.18.0.1:41582[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 5
Sent: 4
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
 /172.18.0.1:41583[0](queued=0,recved=1,sent=0)




devrackA-05
imok
Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
Clients:
 /172.18.0.1:35517[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 5
Sent: 4
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
 /172.18.0.1:35518[0](queued=0,recved=1,sent=0)


~~~~~~~~~~~~~~~~~~~~


[hadoop@devrackA-06 ~]$ jps
21276 Jps
20641 DataNode
[hadoop@devrackA-06 ~]$ echo ruok | nc devrackA-04 2181
imok[hadoop@devrackA-06 ~]$ echo stat | nc devrackA-04 2181
Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
Clients:
 /172.18.0.7:37950[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 8
Sent: 7
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4


~~~~~~~~~~~~~~~~~~~


[hadoop@devrackB-07 ~]$ echo ruok | nc devrackA-04 2181
imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181
This ZooKeeper instance is not currently serving requests
[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-05 2181
Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
Clients:
 /172.18.0.72:40784[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 7
Sent: 6
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-04 2181
Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT
Clients:
 /172.18.0.72:60795[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 10
Sent: 9
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
[hadoop@devrackB-07 ~]$

~~~~~~~~~~~

I know it says connection refused in the error, but are there files
associated with a HRegionServer that I need to clean up?  I did NOT move
the HMaster or HQuorumPeers.  I only moved the HRegionServers

Thanks you for the help.

---
Jay Wilson





On 7/2/2012 2:43 PM, Suraj Varma wrote:
> The error you are getting is:
> 
>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server devrackA-05/172.18.0.6:2181
>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x0 for server null, unexpected error, closing socket connection and
>> attempting reconnect
>> java.net.ConnectException: Connection refused
> 
> 
> This means this server is not able to reach the zookeeper. Did you
> change your hbase-site.xml as well with the new zookeeper quorum?
> Do basic connectivity testing to ensure that your hosts / DNS is all
> in place after your relocations - checkout
> http://hbase.apache.org/book.html#d1952e311 and see if the dns checker
> tool might help.
> --S
> 
> 
> 
> On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson
> <re...@circle-cross-jn.com> wrote:
>> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
>> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
>> is pure book learning and understanding the log messages is very vexing.
>>
>> Second, based on the recommendations of this mail-list I decided to move
>> my HRegionservers to nodes other than where where my HQuorumpeers are.
>> I updated my regionservers file on every node in the cluster. I ran
>> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
>> ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
>> and HRegionservers terminate within seconds.  Before I had them at least
>> running for 30 minutes.  The message is:
>>
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/tmp
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.arch=amd64
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.18-194.el5
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.name=hadoop
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.home=/home/hadoop
>> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.dir=/home/hadoop/jscripts
>> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection,
>> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
>> sessionTimeout=180000 watcher=master:60000
>> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server devrackA-05/172.18.0.6:2181
>> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x0 for server null, unexpected error, closing socket connection and
>> attempting reconnect
>> java.net.ConnectException: Connection refused
>>
>> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
>> up zookeeper), but I get the same result (Connection refused).  Is there
>> something else I need to do when I move a regionserver?
>>
>> My zookeeper working directory is /home/hbase/zookeeper.  Would there be
>> other places that I need to clean up?
>>
>>
>>
>> Thank You
>> --
>> Jay
>>
>>
>>
>> On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
>>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
>>>
>>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.
>>>
>>> Bringing the thread back to track:
>>>
>>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
>>>
>>> Datanode + Region Servers on the same physical nodes
>>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
>>> Namenode on an independent node
>>> Secondary Namenode on an independent node
>>>
>>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
>>>
>>> Hope that helps
>>>
>>> -Amandeep
>>>
>>>
>>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
>>>
>>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>>>
>>>
>>
> 
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Suraj Varma <sv...@gmail.com>.

The error you are getting is:

> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server devrackA-05/172.18.0.6:2181
> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x0 for server null, unexpected error, closing socket connection and
> attempting reconnect
> java.net.ConnectException: Connection refused


This means this server is not able to reach the zookeeper. Did you
change your hbase-site.xml as well with the new zookeeper quorum?
Do basic connectivity testing to ensure that your hosts / DNS is all
in place after your relocations - checkout
http://hbase.apache.org/book.html#d1952e311 and see if the dns checker
tool might help.
--S



On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
> is pure book learning and understanding the log messages is very vexing.
>
> Second, based on the recommendations of this mail-list I decided to move
> my HRegionservers to nodes other than where where my HQuorumpeers are.
> I updated my regionservers file on every node in the cluster. I ran
> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
> ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
> and HRegionservers terminate within seconds.  Before I had them at least
> running for 30 minutes.  The message is:
>
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-194.el5
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=hadoop
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=/home/hadoop
> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=/home/hadoop/jscripts
> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
> sessionTimeout=180000 watcher=master:60000
> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server devrackA-05/172.18.0.6:2181
> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x0 for server null, unexpected error, closing socket connection and
> attempting reconnect
> java.net.ConnectException: Connection refused
>
> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
> up zookeeper), but I get the same result (Connection refused).  Is there
> something else I need to do when I move a regionserver?
>
> My zookeeper working directory is /home/hbase/zookeeper.  Would there be
> other places that I need to clean up?
>
>
>
> Thank You
> --
> Jay
>
>
>
> On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.
>>
>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.
>>
>> Bringing the thread back to track:
>>
>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
>>
>> Datanode + Region Servers on the same physical nodes
>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
>> Namenode on an independent node
>> Secondary Namenode on an independent node
>>
>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
>>
>> Hope that helps
>>
>> -Amandeep
>>
>>
>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
>>
>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.
>>
>>
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Jay Wilson <re...@circle-cross-jn.com>.

First, Yep I am a newbie to Hadoop/Hbase. I have read both of the
O'Reilly books (Hadoop and Hbase), so my knowledge level at this point
is pure book learning and understanding the log messages is very vexing.

Second, based on the recommendations of this mail-list I decided to move
my HRegionservers to nodes other than where where my HQuorumpeers are.
I updated my regionservers file on every node in the cluster. I ran
stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files.  Then I
ran start-all.sh, waited, and then ran start-hbase.sh.  Now my HMaster
and HRegionservers terminate within seconds.  Before I had them at least
running for 30 minutes.  The message is:

2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.el5
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=hadoop
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/home/hadoop/jscripts
2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181
sessionTimeout=180000 watcher=master:60000
2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server devrackA-05/172.18.0.6:2181
2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session
0x0 for server null, unexpected error, closing socket connection and
attempting reconnect
java.net.ConnectException: Connection refused

I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned
up zookeeper), but I get the same result (Connection refused).  Is there
something else I need to do when I move a regionserver?

My zookeeper working directory is /home/hbase/zookeeper.  Would there be
other places that I need to clean up?



Thank You
--
Jay



On 7/2/2012 11:25 AM, Amandeep Khurana wrote:
> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints. 
> 
> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself. 
> 
> Bringing the thread back to track:
> 
> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:
> 
> Datanode + Region Servers on the same physical nodes
> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
> Namenode on an independent node
> Secondary Namenode on an independent node
> 
> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.
> 
> Hope that helps
> 
> -Amandeep 
> 
> 
> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:
> 
>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation. 
> 
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Amandeep Khurana <am...@gmail.com>.

As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints.

Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself.

Bringing the thread back to track:

Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups:

Datanode + Region Servers on the same physical nodes
Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle)
Namenode on an independent node
Secondary Namenode on an independent node

These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host.

Hope that helps

-Amandeep

On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote:

> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation.

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Michael Segel <mi...@hotmail.com>.

Sorry St. Ack, 

Which is why I said that I was losing it...

The entire quote was...
"On Sun, Jul 1, 2012 at 2:05 PM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> Can a regionserver and quorumpeer reside on the same node?

It can, but you want to consider how disk is allocated in the cluster.

A typical and recommended configuration is HBase RegionServer and HDFS
DataNode colocated on the nodes. The DataNode will use locally
attached disk to store and serve blocks.
" 

Looking at and parsing this you have two things...

1) When reading the 'A typical and recommended configuration...' can imply that its possible while not recommended to try and run an HBase RS while not running a DN service on the same node. 

2) "It can, but you want to consider how disk is allocated in the cluster." 
While on a single machine running as a pseudo cluster is one thing, running a fully distributed cluster is another.  

I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics.  And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something.  I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation. 

Now I'm not sure if my KISS statement or my 'dead hooker' analogy or my jokes about drugs.

KISS, I guess goes back to when I first learned that term. It was a 200 level Engineering graphics course where the instructor mentioned KISS and then stalled on the second S (KIS == Keep it Simple) and used the term 'Stupid' to refer back to the engineer who didn't keep it simple. Of course he was the same Professor who couldn't figure out an algorithm without using a GOTO statement and got huffy when I made the mistake of correcting him in class.  (But that's another story.) Not sure if it should be KIS or if the second S in KISS was for something else. 

The 'dead hooker' analogy goes back to watching movie plots and subplots where the hero wakes up next to a body of a dead woman in bed.  While in James Bond films its the evil turned good hottie that gets it, I was thinking back to the Cameron Diaz flick 'Very Bad Things' - 1998 movie where the plot line is based on a prostitute getting killed at a bachelor party. Also for some reason the movie Barton Fink comes to mind, or the Great Gatsby. 

And while I don't advocate drugs, that too is a reference to movies. Its the whole 'Airplane' spoofs where Lloyd Bridges talks about how today was a bad day for giving up  <insert your favorite drug> ...

Sorry to side track but I thought I'd give a more detailed explanation ...

On Jul 2, 2012, at 2:51 AM, Stack wrote:

> On Mon, Jul 2, 2012 at 7:11 AM, Michael Segel <mi...@hotmail.com> wrote:
>> I'm sorry I'm losing it.
>> 
> 
> Its plain.   Do us a favor and try keeping your psychotic breakdown to
> yourself going forward.
> 
> St.Ack
>

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Stack <st...@duboce.net>.

On Mon, Jul 2, 2012 at 7:11 AM, Michael Segel <mi...@hotmail.com> wrote:
> I'm sorry I'm losing it.
>

Its plain.   Do us a favor and try keeping your psychotic breakdown to
yourself going forward.

St.Ack

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Michael Segel <mi...@hotmail.com>.

I'm sorry I'm losing it.

Running RS on a machine where DN isn't running?
So then the RS can't store its regions locally. Not sure if that would ever be a good idea or recommended.

Thought the initial question is running ZK on the same node as a RS which isn't a good idea and a recipe for failure....

Following KISS is a much better way of life than taking Crystal Meth. Its one way to avoid those nasty 'dead hooker problems'. *

*<rant>
<explanation>
Just to explain KISS and what I mean by a 'dead hooker' problem...

KISS = Keep It Simple Stupid
This is an engineering principle used to teach engineering students that the best solutions are the ones that are straight forward and that if you attempt to get too clever, you always get some sort of blow back in your face. It usually hurts and its always self inflicted.

'dead hooker problems' - are the theoretical problems of how to get rid of the dead hooker from your hotel room after your party of Hookers, Booze and either Crystal Meth or Cocaine goes terribly wrong and you wake up the next morning with a nasty hangover and a dead body that you have to get out of your hotel room before the cleaning ladies come knocking on your hotel room door. While I've never experienced this... I can't recall how many movies have this as a plot or sub plot.

Not that I'm attempting to advocate drugs or killing hookers, unless its with a type writer or text editor when you want to write your next failed movie script.
</explanation>

So here's my rant...

I'm not picking on the OP, but in general there's a class of posts where the OP starts a thread by ignoring the common wisdom captured in books, blogs and Apache wikis when setting up a cluster.

When things don't work, they ultimately post here and wonder why they don't work.

The key to happiness is to not ignore the conventional wisdom and when starting out with Hadoop, follow the suggested set ups. Remember that the key is to first grok Hadoop before you attempt and doing more advanced things in terms of cluster configurations. That is what is meant by KISS. Accept that Hadoop is just a tool used by many to solve problems requiring a parallel framework.

Dead Hooker problems may be a great plot device, but in real life, when under a time crunch, they are something one should avoid. ;-)

</rant>

For those of you who don't appreciate my sense of humor, try another example... (Also note... I don't know how this will translate to another language other than English so the meaning of this could be lost in translation...)

Your wife has invited a bunch of her co-workers, including her boss, over for a dinner. You, being the good spouse are responsible for some of the meal prep. Rather than go with a tried and true recipe, you decide to try something new. And not only try a new recipe, you also decide to improvise and try new ingredients and do your own thing. Not really a good idea, and unless you are incredibly lucky, or a really good cook with a talent for creating new recipes, you are more than likely going to end up in the dog house.

Take it from a guy who usually lives in the dog house for one reason or another... following the recipes and not trying something new when the pressure for success is on... much less stress in your life. :-)

Again, with respect to Hadoop, there are a lot of moving parts where things can go wrong. I've got this drinking buddy named Murphy... you know the guy, he wrote this law... ;-)

HTH

-Mikey

On Jul 1, 2012, at 7:41 PM, Andrew Purtell wrote:

> A typical and recommended configuration is HBase RegionServer and HDFS
> DataNode colocated on the nodes. The DataNode will use locally
> attached disk to store and serve blocks.

Re: HBASE -- Regionserver and QuorumPeer ?

Posted by Andrew Purtell <ap...@apache.org>.

On Sun, Jul 1, 2012 at 2:05 PM, Jay Wilson
<re...@circle-cross-jn.com> wrote:
> Can a regionserver and quorumpeer reside on the same node?

It can, but you want to consider how disk is allocated in the cluster.

A typical and recommended configuration is HBase RegionServer and HDFS
DataNode colocated on the nodes. The DataNode will use locally
attached disk to store and serve blocks.

A ZooKeeper quorum peer must record transactions to its local log
before acking writes as part of the agreement protocol. Therefore you
will want to dedicate a storage device for this independent of other
use to minimize latency. If you are also putting a quorum peer aside a
RegionServer aside a DataNode, then you lose one block device.

Otherwise, during periods of heavy filesystem I/O the latency of
ZooKeeper writes may become quite large. Often heavy filesystem I/O
and ZooKeeper write demand coincide with HBase region transitions or
node failure recovery, so you are impacted most by this when you least
would want to be.

IMO, it is better to run a separate ZooKeeper ensemble, point HBase to
it, and then it is also available as an independent coordination
service for your applications because HBase use of it will mostly be
light.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)