You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@thrift.apache.org by Anthony Molinaro <an...@alumni.caltech.edu> on 2010/08/13 07:22:54 UTC

erlang server/client closing connections

Hi,

  I'm trying to use pg2 to cache several thrift client connections so I
can spread load across them.  This seems to work great however, the
connections seem to go stale, I think the server is dropping them, however
looking through the thrift code is seems like keepalive is true, so I'm
not sure why this would be the case.

I start my server with

thrift_server:start_link/3

and the client processes are started with

thrift_client:start_link/3

The process stays alive fine on the client, but goes away after about
30 seconds or so on the server (probably less they seem to go away
quick).  Since the client is alive, when I do a call I get this
exception.

{{case_clause,{error,closed}},
 [{thrift_client,read_result,3},
  {thrift_client,catch_function_exceptions,2},
  {thrift_client,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}

Is there anyway to keep this from happening?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: erlang server/client closing connections

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Fri, Aug 13, 2010 at 02:56:51PM -0700, David Reiss wrote:
> > I like that it has -spec's now, does it cleanly pass the dialyzer?
> I don't remember.  Based on my incremental commit messages (which you can
> see at that gitweb link), it looks like I was running dializer while I was
> working on this.

Nice, then definitely commit :)

> > Also, I know with the current version there are lots of
> > warnings about unused variables, it might be worthwhile to prepend _ to
> > them, so it builds without warning
> Yeah, I think I got those, too.

Again, great!

> > The one thing that I'm still curious about is why the client doesn't receive
> > some sort of notification that the server has shutdown.  It must be that
> > gen_tcp doesn't let the client know?
> Not sure.  In C, you don't get any notification until you try to read or write
> (or poll for readability or writability).

Yeah, I was hoping by using {keepalive, true} in the client connection, I
would get some notification, but actually, I think the keepalive interval
is extremely long, so maybe I would have gotten something after a few
hours.  Anyway, I think I just need to just retry the call if it fails
because the connection has closed.  I can try some small number of times
and otherwise bail completely.

I'll see if I can't try the patch out soon.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: erlang server/client closing connections

Posted by Bruce Lowekamp <br...@skype.net>.

Hadn't found this before---we have some similar code to improve
performance of the erlang thrift library, but I think this is a bit
more thorough.  Would be nice to see this in trunk.

Bruce


On Fri, Aug 13, 2010 at 2:56 PM, David Reiss <dr...@facebook.com> wrote:
>> I like that it has -spec's now, does it cleanly pass the dialyzer?
> I don't remember.  Based on my incremental commit messages (which you can
> see at that gitweb link), it looks like I was running dializer while I was
> working on this.
>
>> Also, I know with the current version there are lots of
>> warnings about unused variables, it might be worthwhile to prepend _ to
>> them, so it builds without warning
> Yeah, I think I got those, too.
>
>> The one thing that I'm still curious about is why the client doesn't receive
>> some sort of notification that the server has shutdown.  It must be that
>> gen_tcp doesn't let the client know?
> Not sure.  In C, you don't get any notification until you try to read or write
> (or poll for readability or writability).
>
> --David
>
> On 08/13/2010 02:51 PM, Anthony Molinaro wrote:
>>
>> On Fri, Aug 13, 2010 at 11:49:13AM -0700, David Reiss wrote:
>>> I did a major refactor of the Erlang library that I think might resolve this
>>> issue.  https://issues.apache.org/jira/browse/THRIFT-599  With my patch,
>>> thrift_buffered_transport is no longer a separate process, so there is no
>>> need for a gen_server call.  This patch hasn't been committed yet because
>>> at the time I posted it, Facebook hadn't deployed it in production anywhere.
>>> We have now, though, so if people want, I can check it in.
>>
>> I would not be opposed, I took a quick look through the client code and it
>> seems a lot more streamlined, it sort of leaves it up to the client to
>> spawn a process to handle a client which seems fine.  It probably will help
>> with this case because it honor's recv_timeout in the server, so if I set
>> it to something high it will wait that long.
>>
>> So +1 from me for committing.  I like that it has -spec's now, does it cleanly
>> pass the dialyzer?  Also, I know with the current version there are lots of
>> warnings about unused variables, it might be worthwhile to prepend _ to
>> them, so it builds without warning (this only happens when you use erlc directly
>> the makefiles hide warnings at the moment).
>>
>> The one thing that I'm still curious about is why the client doesn't receive
>> some sort of notification that the server has shutdown.  It must be that
>> gen_tcp doesn't let the client know?
>>
>> Again, patch looks good to me, I'll try it out soon.
>>
>> -Anthony
>>
>

Re: erlang server/client closing connections

Posted by David Reiss <dr...@facebook.com>.

> I like that it has -spec's now, does it cleanly pass the dialyzer?
I don't remember.  Based on my incremental commit messages (which you can
see at that gitweb link), it looks like I was running dializer while I was
working on this.

> Also, I know with the current version there are lots of
> warnings about unused variables, it might be worthwhile to prepend _ to
> them, so it builds without warning
Yeah, I think I got those, too.

> The one thing that I'm still curious about is why the client doesn't receive
> some sort of notification that the server has shutdown.  It must be that
> gen_tcp doesn't let the client know?
Not sure.  In C, you don't get any notification until you try to read or write
(or poll for readability or writability).

--David

On 08/13/2010 02:51 PM, Anthony Molinaro wrote:
> 
> On Fri, Aug 13, 2010 at 11:49:13AM -0700, David Reiss wrote:
>> I did a major refactor of the Erlang library that I think might resolve this
>> issue.  https://issues.apache.org/jira/browse/THRIFT-599  With my patch,
>> thrift_buffered_transport is no longer a separate process, so there is no
>> need for a gen_server call.  This patch hasn't been committed yet because
>> at the time I posted it, Facebook hadn't deployed it in production anywhere.
>> We have now, though, so if people want, I can check it in.
> 
> I would not be opposed, I took a quick look through the client code and it
> seems a lot more streamlined, it sort of leaves it up to the client to
> spawn a process to handle a client which seems fine.  It probably will help
> with this case because it honor's recv_timeout in the server, so if I set
> it to something high it will wait that long.
> 
> So +1 from me for committing.  I like that it has -spec's now, does it cleanly
> pass the dialyzer?  Also, I know with the current version there are lots of
> warnings about unused variables, it might be worthwhile to prepend _ to
> them, so it builds without warning (this only happens when you use erlc directly
> the makefiles hide warnings at the moment).
> 
> The one thing that I'm still curious about is why the client doesn't receive
> some sort of notification that the server has shutdown.  It must be that
> gen_tcp doesn't let the client know?
> 
> Again, patch looks good to me, I'll try it out soon.
> 
> -Anthony
>

Re: erlang server/client closing connections

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Fri, Aug 13, 2010 at 11:49:13AM -0700, David Reiss wrote:
> I did a major refactor of the Erlang library that I think might resolve this
> issue.  https://issues.apache.org/jira/browse/THRIFT-599  With my patch,
> thrift_buffered_transport is no longer a separate process, so there is no
> need for a gen_server call.  This patch hasn't been committed yet because
> at the time I posted it, Facebook hadn't deployed it in production anywhere.
> We have now, though, so if people want, I can check it in.

I would not be opposed, I took a quick look through the client code and it
seems a lot more streamlined, it sort of leaves it up to the client to
spawn a process to handle a client which seems fine.  It probably will help
with this case because it honor's recv_timeout in the server, so if I set
it to something high it will wait that long.

So +1 from me for committing.  I like that it has -spec's now, does it cleanly
pass the dialyzer?  Also, I know with the current version there are lots of
warnings about unused variables, it might be worthwhile to prepend _ to
them, so it builds without warning (this only happens when you use erlc directly
the makefiles hide warnings at the moment).

The one thing that I'm still curious about is why the client doesn't receive
some sort of notification that the server has shutdown.  It must be that
gen_tcp doesn't let the client know?

Again, patch looks good to me, I'll try it out soon.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: erlang server/client closing connections

Posted by David Reiss <dr...@facebook.com>.

I did a major refactor of the Erlang library that I think might resolve this
issue.  https://issues.apache.org/jira/browse/THRIFT-599  With my patch,
thrift_buffered_transport is no longer a separate process, so there is no
need for a gen_server call.  This patch hasn't been committed yet because
at the time I posted it, Facebook hadn't deployed it in production anywhere.
We have now, though, so if people want, I can check it in.

--David

On 08/13/2010 11:08 AM, Anthony Molinaro wrote:
> Okay, another update, the problem is the recv_timeout, and it's almost possible
> to get it to work the way I want it too, but it required a hack to work.
> 
> I switched to thrift_socket_server, and for those who were using thrift_server.
> Instead of creating like
> 
> thrift_server:start_link(Port, ServiceModule, HandlerModule).
> 
> you do the following
> 
> thrift_socket_server:start ([{port, Port)},
>                              {service, ServiceModule},
>                              {handler, HandlerModule}]).
> 
> By default the recv_timeout is set to 500ms, so the connections shut down
> almost immediately, you can add a higher recv_timeout like.
> 
> thrift_socket_server:start ([{port, Port)},
>                              {service, ServiceModule},
>                              {handler, HandlerModule},
>                              {socket_opts, [{recv_timeout, 60*60*1000}]}]).
> 
> However, then you get a timeout on gen_server:call/3 which crashes the
> processs.  I tracked down the timeout to this call in
> thrift_buffered_transport.erl
> 
> read(Transport, Len) when is_integer(Len) ->
>     gen_server:call(Transport, {read, Len}, _Timeout=10000).
> 
> So to check I just changed 10000 to 60*60*1000 and connections seem to
> stay around now, at least for an hour of inactivity, which is fine for
> my testing.
> 
> I think the appropriate fix would to somehow expose that timeout value
> as an option to the server.  Maybe something like idle_timeout or
> read_timeout, then the trick is getting it tunneled to that call, currently
> thrift_buffered_transport doesn't accept any options, it could be added
> as a third parameter to read, but it would have to happen in all the
> transports, which looking through the code doesn't seem that bad, the
> only minor issue would be with thrift_socket_transport which uses recv_timeout
> right now as a read_timeout, so you have 2 timeouts to choose from.
> Also, you still need to get that timeout to the places read is called.
> 
> Well not sure if its worth it or not?  If I have a chance I can hack at it,
> but for the moment I have to finish some things off, so will have to get
> back to this later.
> 
> -Anthony
> 
> On Fri, Aug 13, 2010 at 12:12:49AM -0700, Anthony Molinaro wrote:
>>
>> On Thu, Aug 12, 2010 at 10:41:50PM -0700, David Reiss wrote:
>>> usually, this sort of thing happens because the server has a recv timeout
>>> set.  I see that thrift_socket_server sets a recv timeout, but I can't tell
>>> if thrift_server is doing so.  One possibility might be to put some debugging
>>> code in thrift_processor to determine if it is terminating and closing the
>>> connection.
>>
>> So looking again, it looks like I was mistaken about keepalive being true.
>> It's inherited from the listen process, but there doesn't seem to be a way
>> to pass options in (this is for the thrift_server).  I hardcoded it and
>> passed the option to the client, but it doesn't seem to help.
>> So a receive timeout might be a problem as I create connections at startup
>> but in my dev env don't really use them for a while.  So if the server decides
>> the client isn't going to send anything it might close down it's connection.
>> I tried to dig down and see this happen but I don't see the processor break
>> out of it's loop, I dropped some io:formats, but it doesn't seem to trigger
>> any branch of the case, so I'm not certain what is happening.  I think I'll
>> have to see if I can trace it and see what I find.
>>
>>> I'm not sure if thrift_server is supposed to be deprecated in favor of
>>> thrift_socket_server.  Chris Piro or Todd Lipcon might know.
>>
>> I got this usage of thrift_server from Todd's thrift_erl_skel, but
>> maybe it's out of date.  I'll take a look at thrift_socket_server
>> tomorrow to see what it looks like.  A quick glance and it looks very
>> different from thrift_server.
>>
>> I may just try to rewrite my pooling mechanism so that instead of
>> starting processes when my server starts, start them the first time
>> a request is made.  The only problem is since the only way for the
>> client to know the server has hung up on him is to make a call, I'll
>> have to retry if I create a process, stick it into the pool to reuse,
>> pull it out a few seconds later, get an exception then have to re-connect
>> and rerun the call :(
>>
>> -Anthony
>>  
>>> On 08/12/2010 10:22 PM, Anthony Molinaro wrote:
>>>> Hi,
>>>>
>>>>   I'm trying to use pg2 to cache several thrift client connections so I
>>>> can spread load across them.  This seems to work great however, the
>>>> connections seem to go stale, I think the server is dropping them, however
>>>> looking through the thrift code is seems like keepalive is true, so I'm
>>>> not sure why this would be the case.
>>>>
>>>> I start my server with
>>>>
>>>> thrift_server:start_link/3
>>>>
>>>> and the client processes are started with
>>>>
>>>> thrift_client:start_link/3
>>>>
>>>> The process stays alive fine on the client, but goes away after about
>>>> 30 seconds or so on the server (probably less they seem to go away
>>>> quick).  Since the client is alive, when I do a call I get this
>>>> exception.
>>>>
>>>> {{case_clause,{error,closed}},
>>>>  [{thrift_client,read_result,3},
>>>>   {thrift_client,catch_function_exceptions,2},
>>>>   {thrift_client,handle_call,3},
>>>>   {gen_server,handle_msg,5},
>>>>   {proc_lib,init_p_do_apply,3}]}
>>>>
>>>> Is there anyway to keep this from happening?
>>>>
>>>> Thanks,
>>>>
>>>> -Anthony
>>>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Anthony Molinaro                           <an...@alumni.caltech.edu>
>

Re: erlang server/client closing connections

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

Okay, another update, the problem is the recv_timeout, and it's almost possible
to get it to work the way I want it too, but it required a hack to work.

I switched to thrift_socket_server, and for those who were using thrift_server.
Instead of creating like

thrift_server:start_link(Port, ServiceModule, HandlerModule).

you do the following

thrift_socket_server:start ([{port, Port)},
                             {service, ServiceModule},
                             {handler, HandlerModule}]).

By default the recv_timeout is set to 500ms, so the connections shut down
almost immediately, you can add a higher recv_timeout like.

thrift_socket_server:start ([{port, Port)},
                             {service, ServiceModule},
                             {handler, HandlerModule},
                             {socket_opts, [{recv_timeout, 60*60*1000}]}]).

However, then you get a timeout on gen_server:call/3 which crashes the
processs.  I tracked down the timeout to this call in
thrift_buffered_transport.erl

read(Transport, Len) when is_integer(Len) ->
    gen_server:call(Transport, {read, Len}, _Timeout=10000).

So to check I just changed 10000 to 60*60*1000 and connections seem to
stay around now, at least for an hour of inactivity, which is fine for
my testing.

I think the appropriate fix would to somehow expose that timeout value
as an option to the server.  Maybe something like idle_timeout or
read_timeout, then the trick is getting it tunneled to that call, currently
thrift_buffered_transport doesn't accept any options, it could be added
as a third parameter to read, but it would have to happen in all the
transports, which looking through the code doesn't seem that bad, the
only minor issue would be with thrift_socket_transport which uses recv_timeout
right now as a read_timeout, so you have 2 timeouts to choose from.
Also, you still need to get that timeout to the places read is called.

Well not sure if its worth it or not?  If I have a chance I can hack at it,
but for the moment I have to finish some things off, so will have to get
back to this later.

-Anthony

On Fri, Aug 13, 2010 at 12:12:49AM -0700, Anthony Molinaro wrote:
> 
> On Thu, Aug 12, 2010 at 10:41:50PM -0700, David Reiss wrote:
> > usually, this sort of thing happens because the server has a recv timeout
> > set.  I see that thrift_socket_server sets a recv timeout, but I can't tell
> > if thrift_server is doing so.  One possibility might be to put some debugging
> > code in thrift_processor to determine if it is terminating and closing the
> > connection.
> 
> So looking again, it looks like I was mistaken about keepalive being true.
> It's inherited from the listen process, but there doesn't seem to be a way
> to pass options in (this is for the thrift_server).  I hardcoded it and
> passed the option to the client, but it doesn't seem to help.
> So a receive timeout might be a problem as I create connections at startup
> but in my dev env don't really use them for a while.  So if the server decides
> the client isn't going to send anything it might close down it's connection.
> I tried to dig down and see this happen but I don't see the processor break
> out of it's loop, I dropped some io:formats, but it doesn't seem to trigger
> any branch of the case, so I'm not certain what is happening.  I think I'll
> have to see if I can trace it and see what I find.
> 
> > I'm not sure if thrift_server is supposed to be deprecated in favor of
> > thrift_socket_server.  Chris Piro or Todd Lipcon might know.
> 
> I got this usage of thrift_server from Todd's thrift_erl_skel, but
> maybe it's out of date.  I'll take a look at thrift_socket_server
> tomorrow to see what it looks like.  A quick glance and it looks very
> different from thrift_server.
> 
> I may just try to rewrite my pooling mechanism so that instead of
> starting processes when my server starts, start them the first time
> a request is made.  The only problem is since the only way for the
> client to know the server has hung up on him is to make a call, I'll
> have to retry if I create a process, stick it into the pool to reuse,
> pull it out a few seconds later, get an exception then have to re-connect
> and rerun the call :(
> 
> -Anthony
>  
> > On 08/12/2010 10:22 PM, Anthony Molinaro wrote:
> > > Hi,
> > > 
> > >   I'm trying to use pg2 to cache several thrift client connections so I
> > > can spread load across them.  This seems to work great however, the
> > > connections seem to go stale, I think the server is dropping them, however
> > > looking through the thrift code is seems like keepalive is true, so I'm
> > > not sure why this would be the case.
> > > 
> > > I start my server with
> > > 
> > > thrift_server:start_link/3
> > > 
> > > and the client processes are started with
> > > 
> > > thrift_client:start_link/3
> > > 
> > > The process stays alive fine on the client, but goes away after about
> > > 30 seconds or so on the server (probably less they seem to go away
> > > quick).  Since the client is alive, when I do a call I get this
> > > exception.
> > > 
> > > {{case_clause,{error,closed}},
> > >  [{thrift_client,read_result,3},
> > >   {thrift_client,catch_function_exceptions,2},
> > >   {thrift_client,handle_call,3},
> > >   {gen_server,handle_msg,5},
> > >   {proc_lib,init_p_do_apply,3}]}
> > > 
> > > Is there anyway to keep this from happening?
> > > 
> > > Thanks,
> > > 
> > > -Anthony
> > > 
> 
> -- 
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: erlang server/client closing connections

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Thu, Aug 12, 2010 at 10:41:50PM -0700, David Reiss wrote:
> usually, this sort of thing happens because the server has a recv timeout
> set.  I see that thrift_socket_server sets a recv timeout, but I can't tell
> if thrift_server is doing so.  One possibility might be to put some debugging
> code in thrift_processor to determine if it is terminating and closing the
> connection.

So looking again, it looks like I was mistaken about keepalive being true.
It's inherited from the listen process, but there doesn't seem to be a way
to pass options in (this is for the thrift_server).  I hardcoded it and
passed the option to the client, but it doesn't seem to help.
So a receive timeout might be a problem as I create connections at startup
but in my dev env don't really use them for a while.  So if the server decides
the client isn't going to send anything it might close down it's connection.
I tried to dig down and see this happen but I don't see the processor break
out of it's loop, I dropped some io:formats, but it doesn't seem to trigger
any branch of the case, so I'm not certain what is happening.  I think I'll
have to see if I can trace it and see what I find.

> I'm not sure if thrift_server is supposed to be deprecated in favor of
> thrift_socket_server.  Chris Piro or Todd Lipcon might know.

I got this usage of thrift_server from Todd's thrift_erl_skel, but
maybe it's out of date.  I'll take a look at thrift_socket_server
tomorrow to see what it looks like.  A quick glance and it looks very
different from thrift_server.

I may just try to rewrite my pooling mechanism so that instead of
starting processes when my server starts, start them the first time
a request is made.  The only problem is since the only way for the
client to know the server has hung up on him is to make a call, I'll
have to retry if I create a process, stick it into the pool to reuse,
pull it out a few seconds later, get an exception then have to re-connect
and rerun the call :(

-Anthony

> On 08/12/2010 10:22 PM, Anthony Molinaro wrote:
> > Hi,
> > 
> >   I'm trying to use pg2 to cache several thrift client connections so I
> > can spread load across them.  This seems to work great however, the
> > connections seem to go stale, I think the server is dropping them, however
> > looking through the thrift code is seems like keepalive is true, so I'm
> > not sure why this would be the case.
> > 
> > I start my server with
> > 
> > thrift_server:start_link/3
> > 
> > and the client processes are started with
> > 
> > thrift_client:start_link/3
> > 
> > The process stays alive fine on the client, but goes away after about
> > 30 seconds or so on the server (probably less they seem to go away
> > quick).  Since the client is alive, when I do a call I get this
> > exception.
> > 
> > {{case_clause,{error,closed}},
> >  [{thrift_client,read_result,3},
> >   {thrift_client,catch_function_exceptions,2},
> >   {thrift_client,handle_call,3},
> >   {gen_server,handle_msg,5},
> >   {proc_lib,init_p_do_apply,3}]}
> > 
> > Is there anyway to keep this from happening?
> > 
> > Thanks,
> > 
> > -Anthony
> > 

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: erlang server/client closing connections

Posted by David Reiss <dr...@facebook.com>.

usually, this sort of thing happens because the server has a recv timeout
set.  I see that thrift_socket_server sets a recv timeout, but I can't tell
if thrift_server is doing so.  One possibility might be to put some debugging
code in thrift_processor to determine if it is terminating and closing the
connection.

I'm not sure if thrift_server is supposed to be deprecated in favor of
thrift_socket_server.  Chris Piro or Todd Lipcon might know.

--David

On 08/12/2010 10:22 PM, Anthony Molinaro wrote:
> Hi,
> 
>   I'm trying to use pg2 to cache several thrift client connections so I
> can spread load across them.  This seems to work great however, the
> connections seem to go stale, I think the server is dropping them, however
> looking through the thrift code is seems like keepalive is true, so I'm
> not sure why this would be the case.
> 
> I start my server with
> 
> thrift_server:start_link/3
> 
> and the client processes are started with
> 
> thrift_client:start_link/3
> 
> The process stays alive fine on the client, but goes away after about
> 30 seconds or so on the server (probably less they seem to go away
> quick).  Since the client is alive, when I do a call I get this
> exception.
> 
> {{case_clause,{error,closed}},
>  [{thrift_client,read_result,3},
>   {thrift_client,catch_function_exceptions,2},
>   {thrift_client,handle_call,3},
>   {gen_server,handle_msg,5},
>   {proc_lib,init_p_do_apply,3}]}
> 
> Is there anyway to keep this from happening?
> 
> Thanks,
> 
> -Anthony
>