You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/06/11 08:43:23 UTC

Is it necessary to cache metadata in client side?

Hi all,

According the GFS paper claims, GFS will cache meta data in client.
But when I check the source code of hadoop, it seems that hadoop won't
cache it in client side. I just wan to make sure whether I am right ?
And wondering whether there's someone work on it ? One advantage of
caching metadata in client side I can think of is that tasktracker
will fetch job.xml in HDFS. And most of time we will run multiple task
in one node, so if tasktrack cache the metadata, it can reduce the
communication with namenode.



-- 
Best Regards

Jeff Zhang

Re: experiences with hbase-2492

Posted by Friso van Vollenhoven <fv...@xebia.com>.

I did not try the tcp_tw_reuse, because I am a bit fuzzy on what that exactly does. That is, what is the difference between recycling and reusing?

The man page for tcp just mentions that tcp_tw_reuse allows to reuse TIME_WAIT sockets if it is safe from the protocol viewpoint and that it should not be set without consulting a technical expert (which I do not consider myself to be when it comes to TCP implementation internals). So I wouldn't know how it is determined that something is safe from the protocol viewpoint. Many Linux/UNIX faqs online mostly tell me exactly the same. I do see some recommendations to set tcp_tw_reuse for heavily loaded web servers, but the problem here is running out of client connections, not server sockets.

For recycling I expect that the socket in TIME_WAIT is just considered closed and a new connection is built originating from the source port of the socket previously in TW. I would also expect that when a packet from the previous connection leaks through into the new connection that that would lead to a reset, because the sequence number would be wrong (right?). We are definitely not behind at NAT router (actually, at the RIPE NCC all machines have actual public IPs, although I never understood why).

All of my assumptions might be wrong, but as I mentioned I am no expert on the subject. I was also lead by the fact that the report for one of the hbase-2492 related issues actually mentions tcp_tw_recycle as a possible solution in the comments. We are not in production yet, so for now we have room to experiment. Nonetheless, if tcp_tw_reuse is actually safer (and I understand why), then I would like to use that instead.

(And yes, can't wait for HDFS-941 some day)

Friso

On Jun 15, 2010, at 6:54 PM, Todd Lipcon wrote:

> Might be worth trying tcp_tw_reuse before turning on tw_recycle - as I
> understand it, the former is a lot safer than the latter.
> 
> Can't wait for HDFS-941 some day :)
> 
> -Todd
> 
> On Tue, Jun 15, 2010 at 9:10 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
> 
>> Friso,
>> 
>> This is very interesting, and nobody answered probably because no one
>> tried tcp_tw_recycle. I personally didn't even know about that config
>> until a few minutes ago ;)
>> 
>> So from the varnish mailing list, it seems that machines behind
>> firewalls or NAT won't play well with that config, but I don't expect
>> anyone running a cluster with that kind of setup... unless they are
>> doing cross-DC or whatnot.
>> http://www.mail-archive.com/varnish-misc@projects.linpro.no/msg02912.html
>> 
>> Good stuff!
>> 
>> J-D
>> 
>> On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven
>> <fv...@xebia.com> wrote:
>>> Hi all,
>>> 
>>> Since I got no replies to my previous message (see below), I went ahead
>> and set the tcp_tw_recycle to true. This worked like a charm. The number of
>> sockets in TIME_WAIT went down from many thousands to just a couple (tens).
>> Apparently, once set to true, the recycling happens quite eagerly. Most
>> importantly, the regionservers no longer shut down (which was the goal). I
>> am sharing the info here, just in case it might help someone sometime.
>>> 
>>> 
>>> Cheers,
>>> Friso
>>> 
>>> 
>>> 
>>> On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote:
>>> 
>>>> Hi all,
>>>> We are experiencing a lot of "java.net.BindException: Cannot assign
>> requested address", which is a case of
>> https://issues.apache.org/jira/browse/hbase-2492. At some point, all
>> grinds to a halt and regionservers start to shut down.
>>>> 
>>>> I was wondering if anyone has found a way around this problem (other
>> than adding more machines to spread the load or reduce the work load). Has
>> anyone been able to successfully apply the patch in
>> https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone
>> have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1
>> (true) at the OS level?
>>>> 
>>>> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.
>>>> 
>>>> Any experiences that anyone can share are greatly appreciated.
>>>> 
>>>> 
>>>> Best regards,
>>>> Friso
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: experiences with hbase-2492

Posted by Todd Lipcon <to...@cloudera.com>.

Might be worth trying tcp_tw_reuse before turning on tw_recycle - as I
understand it, the former is a lot safer than the latter.

Can't wait for HDFS-941 some day :)

-Todd

On Tue, Jun 15, 2010 at 9:10 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Friso,
>
> This is very interesting, and nobody answered probably because no one
> tried tcp_tw_recycle. I personally didn't even know about that config
> until a few minutes ago ;)
>
> So from the varnish mailing list, it seems that machines behind
> firewalls or NAT won't play well with that config, but I don't expect
> anyone running a cluster with that kind of setup... unless they are
> doing cross-DC or whatnot.
> http://www.mail-archive.com/varnish-misc@projects.linpro.no/msg02912.html
>
> Good stuff!
>
> J-D
>
> On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven
> <fv...@xebia.com> wrote:
> > Hi all,
> >
> > Since I got no replies to my previous message (see below), I went ahead
> and set the tcp_tw_recycle to true. This worked like a charm. The number of
> sockets in TIME_WAIT went down from many thousands to just a couple (tens).
> Apparently, once set to true, the recycling happens quite eagerly. Most
> importantly, the regionservers no longer shut down (which was the goal). I
> am sharing the info here, just in case it might help someone sometime.
> >
> >
> > Cheers,
> > Friso
> >
> >
> >
> > On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote:
> >
> >> Hi all,
> >> We are experiencing a lot of "java.net.BindException: Cannot assign
> requested address", which is a case of
> https://issues.apache.org/jira/browse/hbase-2492. At some point, all
> grinds to a halt and regionservers start to shut down.
> >>
> >> I was wondering if anyone has found a way around this problem (other
> than adding more machines to spread the load or reduce the work load). Has
> anyone been able to successfully apply the patch in
> https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone
> have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1
> (true) at the OS level?
> >>
> >> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.
> >>
> >> Any experiences that anyone can share are greatly appreciated.
> >>
> >>
> >> Best regards,
> >> Friso
> >>
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: experiences with hbase-2492

Posted by Andrew Purtell <ap...@apache.org>.

Doh! I meant tcp_tw_reuse. Sorry, paste-o.

  - Andy

--- On Tue, 6/15/10, Andrew Purtell <ap...@apache.org> wrote:

> From: Andrew Purtell <ap...@apache.org>
> Subject: Re: experiences with hbase-2492
> To: user@hbase.apache.org
> Cc: fvanvollenhoven@xebia.com
> Date: Tuesday, June 15, 2010, 9:59 AM
> tcp_tw_recycle did not do what you
> needed?
> 
>    - Andy
> 
> > On Mon, Jun 14, 2010 at 11:40 PM, Friso van
> Vollenhoven wrote:
> > Hi all,
> >
> > Since I got no replies to my previous message (see
> > below), I went ahead and set the tcp_tw_recycle to
> true.
> > This worked like a charm. The number of sockets in
> TIME_WAIT
> > went down from many thousands to just a couple
> (tens).
> > Apparently, once set to true, the recycling happens
> quite
> > eagerly. Most importantly, the regionservers no longer
> shut
> > down (which was the goal). I am sharing the info here,
> just
> > in case it might help someone sometime.
> >
> >
> > Cheers,
> > Friso
> 
> 
> 
>       
> 
>

Re: experiences with hbase-2492

Posted by Andrew Purtell <ap...@apache.org>.

tcp_tw_recycle did not do what you needed?

   - Andy

> On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven wrote:
> Hi all,
>
> Since I got no replies to my previous message (see
> below), I went ahead and set the tcp_tw_recycle to true.
> This worked like a charm. The number of sockets in TIME_WAIT
> went down from many thousands to just a couple (tens).
> Apparently, once set to true, the recycling happens quite
> eagerly. Most importantly, the regionservers no longer shut
> down (which was the goal). I am sharing the info here, just
> in case it might help someone sometime.
>
>
> Cheers,
> Friso

Re: experiences with hbase-2492

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.

Friso,
   You may be knowing this already, but please bear in mind there is a potential risk of packets from previous connections that were in flight reach the new connections (that's the reason for the TIME_WAIT state in TCP).. And that may lead to unexpected behaviour..

Vidhya

On 6/15/10 9:10 AM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

Friso,

This is very interesting, and nobody answered probably because no one
tried tcp_tw_recycle. I personally didn't even know about that config
until a few minutes ago ;)

So from the varnish mailing list, it seems that machines behind
firewalls or NAT won't play well with that config, but I don't expect
anyone running a cluster with that kind of setup... unless they are
doing cross-DC or whatnot.
http://www.mail-archive.com/varnish-misc@projects.linpro.no/msg02912.html

Good stuff!

J-D

On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi all,
>
> Since I got no replies to my previous message (see below), I went ahead and set the tcp_tw_recycle to true. This worked like a charm. The number of sockets in TIME_WAIT went down from many thousands to just a couple (tens). Apparently, once set to true, the recycling happens quite eagerly. Most importantly, the regionservers no longer shut down (which was the goal). I am sharing the info here, just in case it might help someone sometime.
>
>
> Cheers,
> Friso
>
>
>
> On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote:
>
>> Hi all,
>> We are experiencing a lot of "java.net.BindException: Cannot assign requested address", which is a case of https://issues.apache.org/jira/browse/hbase-2492. At some point, all grinds to a halt and regionservers start to shut down.
>>
>> I was wondering if anyone has found a way around this problem (other than adding more machines to spread the load or reduce the work load). Has anyone been able to successfully apply the patch in https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1 (true) at the OS level?
>>
>> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.
>>
>> Any experiences that anyone can share are greatly appreciated.
>>
>>
>> Best regards,
>> Friso
>>
>
>

Re: experiences with hbase-2492

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Friso,

This is very interesting, and nobody answered probably because no one
tried tcp_tw_recycle. I personally didn't even know about that config
until a few minutes ago ;)

So from the varnish mailing list, it seems that machines behind
firewalls or NAT won't play well with that config, but I don't expect
anyone running a cluster with that kind of setup... unless they are
doing cross-DC or whatnot.
http://www.mail-archive.com/varnish-misc@projects.linpro.no/msg02912.html

Good stuff!

J-D

On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi all,
>
> Since I got no replies to my previous message (see below), I went ahead and set the tcp_tw_recycle to true. This worked like a charm. The number of sockets in TIME_WAIT went down from many thousands to just a couple (tens). Apparently, once set to true, the recycling happens quite eagerly. Most importantly, the regionservers no longer shut down (which was the goal). I am sharing the info here, just in case it might help someone sometime.
>
>
> Cheers,
> Friso
>
>
>
> On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote:
>
>> Hi all,
>> We are experiencing a lot of "java.net.BindException: Cannot assign requested address", which is a case of https://issues.apache.org/jira/browse/hbase-2492. At some point, all grinds to a halt and regionservers start to shut down.
>>
>> I was wondering if anyone has found a way around this problem (other than adding more machines to spread the load or reduce the work load). Has anyone been able to successfully apply the patch in https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1 (true) at the OS level?
>>
>> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.
>>
>> Any experiences that anyone can share are greatly appreciated.
>>
>>
>> Best regards,
>> Friso
>>
>
>

Re: experiences with hbase-2492

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hi all,

Since I got no replies to my previous message (see below), I went ahead and set the tcp_tw_recycle to true. This worked like a charm. The number of sockets in TIME_WAIT went down from many thousands to just a couple (tens). Apparently, once set to true, the recycling happens quite eagerly. Most importantly, the regionservers no longer shut down (which was the goal). I am sharing the info here, just in case it might help someone sometime.

Cheers,
Friso

On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote:

> Hi all,
> We are experiencing a lot of "java.net.BindException: Cannot assign requested address", which is a case of https://issues.apache.org/jira/browse/hbase-2492. At some point, all grinds to a halt and regionservers start to shut down.
> 
> I was wondering if anyone has found a way around this problem (other than adding more machines to spread the load or reduce the work load). Has anyone been able to successfully apply the patch in https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1 (true) at the OS level?
> 
> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.
> 
> Any experiences that anyone can share are greatly appreciated.
> 
> 
> Best regards,
> Friso
>

experiences with hbase-2492

Posted by Friso van Vollenhoven <fv...@xebia.com>.

Hi all,
We are experiencing a lot of "java.net.BindException: Cannot assign requested address", which is a case of https://issues.apache.org/jira/browse/hbase-2492. At some point, all grinds to a halt and regionservers start to shut down.

I was wondering if anyone has found a way around this problem (other than adding more machines to spread the load or reduce the work load). Has anyone been able to successfully apply the patch in https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1 (true) at the OS level?

We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2.

Any experiences that anyone can share are greatly appreciated.


Best regards,
Friso

Re: Is it necessary to cache metadata in client side?

Posted by Jeff Zhang <zj...@gmail.com>.

Per inputstream means the cache can only been used in the scope of one
file. I think it's will be better if there's a cache in DFSClient.



On Fri, Jun 11, 2010 at 5:02 PM, Todd Lipcon <to...@cloudera.com> wrote:
> It is cached per input stream - see DFSInputStream.locatedBlocks,
> prefetchSize, etc.
>
> -Todd
> On Thu, Jun 10, 2010 at 11:43 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>> Hi all,
>>
>> According the GFS paper claims, GFS will cache meta data in client.
>> But when I check the source code of hadoop, it seems that hadoop won't
>> cache it in client side. I just wan to make sure whether I am right ?
>> And wondering whether there's someone work on it ? One advantage of
>> caching metadata in client side I can think of is that tasktracker
>> will fetch job.xml in HDFS. And most of time we will run multiple task
>> in one node, so if tasktrack cache the metadata, it can reduce the
>> communication with namenode.
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Best Regards

Jeff Zhang

Re: Is it necessary to cache metadata in client side?

Posted by Todd Lipcon <to...@cloudera.com>.

It is cached per input stream - see DFSInputStream.locatedBlocks,
prefetchSize, etc.

-Todd

On Thu, Jun 10, 2010 at 11:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> According the GFS paper claims, GFS will cache meta data in client.
> But when I check the source code of hadoop, it seems that hadoop won't
> cache it in client side. I just wan to make sure whether I am right ?
> And wondering whether there's someone work on it ? One advantage of
> caching metadata in client side I can think of is that tasktracker
> will fetch job.xml in HDFS. And most of time we will run multiple task
> in one node, so if tasktrack cache the metadata, it can reduce the
> communication with namenode.
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Todd Lipcon
Software Engineer, Cloudera