You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Jules Cisek <ju...@luminate.com> on 2014/01/06 22:55:45 UTC

very large request to HsHa server

hello,

TL;DR version: i need to transfer a huge amount of data (millions of small
objects) using a single call to a THsHaServer (or TThreaderSelectorServer)

The long version:

i built a java server around TThreadPoolServer that accepts an extremely
large amount of data (millions of small objects) via a single call every
hour or so (from a python script) and also accepts various requests for
small amounts of data tens of times per second.  the requests come from
python, java, and perl hence why i went with thrift in the first place.

this worked great using TBufferedTransport until i had to modify the java
server to support async clients.  i've rewritten my service to use the
THsHaServer and switched to TFramedTransport on the clients.

everything worked great (my test suite was happy) but it turns out in
production the hourly data update call far far exceeds the maximum size
allowed in TFramedTransport:

Frame size (150283373) larger than max length (16384000)!

yes i realize this is not what thrift was really designed for but it was
working great until the async requirement came about.

the data is a list of millions of small objects, not one large piece of
data that could be streamed, by the way.

my options at this point are (in order of difficulty):
. run a second thrift server in my java server that uses TThreadPoolServer
on a different port to allow using TBufferedTransport for the update call
. find a different (non-thrift) mechanism to transfer data from the python
update script to the java service
. rewrite the update logic to chunk the data

none of these are particularly appealing due to the nature of how my
service works (the update needs to be fairly atomic, the objects are not
trivial, etc.).

i'm going to go with option a unless there is a way to transfer a huge
amount of data in a single call that does work with HsHa (or
ThreadedSelector) and was just wondering if anyone had any better ideas.

thanks,
~j

-- 
jules cisek | jules@luminate.com

Re: very large request to HsHa server

Posted by Jules Cisek <ju...@luminate.com>.
oh cool, i totally missed that!

thanks!

On Mon, Jan 6, 2014 at 2:10 PM, Ben Craig <be...@ni.com> wrote:

> The maximum length for TFramedTransport is configurable.
> TFramedTransport's ctor accepts a max length argument, and
> TFramedTransport::Factory also accepts a maxLength.
>

Re: very large request to HsHa server

Posted by Ben Craig <be...@ni.com>.
The maximum length for TFramedTransport is configurable. 
TFramedTransport's ctor accepts a max length argument, and 
TFramedTransport::Factory also accepts a maxLength.

Jules Cisek <ju...@luminate.com> wrote on 01/06/2014 03:55:45 PM:

> From: Jules Cisek <ju...@luminate.com>
> To: user@thrift.apache.org, 
> Date: 01/06/2014 03:56 PM
> Subject: very large request to HsHa server
> 
> hello,
> 
> TL;DR version: i need to transfer a huge amount of data (millions of 
small
> objects) using a single call to a THsHaServer (or 
TThreaderSelectorServer)
> 
> The long version:
> 
> i built a java server around TThreadPoolServer that accepts an extremely
> large amount of data (millions of small objects) via a single call every
> hour or so (from a python script) and also accepts various requests for
> small amounts of data tens of times per second.  the requests come from
> python, java, and perl hence why i went with thrift in the first place.
> 
> this worked great using TBufferedTransport until i had to modify the 
java
> server to support async clients.  i've rewritten my service to use the
> THsHaServer and switched to TFramedTransport on the clients.
> 
> everything worked great (my test suite was happy) but it turns out in
> production the hourly data update call far far exceeds the maximum size
> allowed in TFramedTransport:
> 
> Frame size (150283373) larger than max length (16384000)!
> 
> yes i realize this is not what thrift was really designed for but it was
> working great until the async requirement came about.
> 
> the data is a list of millions of small objects, not one large piece of
> data that could be streamed, by the way.
> 
> my options at this point are (in order of difficulty):
> . run a second thrift server in my java server that uses 
TThreadPoolServer
> on a different port to allow using TBufferedTransport for the update 
call
> . find a different (non-thrift) mechanism to transfer data from the 
python
> update script to the java service
> . rewrite the update logic to chunk the data
> 
> none of these are particularly appealing due to the nature of how my
> service works (the update needs to be fairly atomic, the objects are not
> trivial, etc.).
> 
> i'm going to go with option a unless there is a way to transfer a huge
> amount of data in a single call that does work with HsHa (or
> ThreadedSelector) and was just wondering if anyone had any better ideas.
> 
> thanks,
> ~j
> 
> -- 
> jules cisek | jules@luminate.com