You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Tenghuan He <te...@gmail.com> on 2016/04/22 13:11:02 UTC

Thrift RPC database query results

Hi there

    I have a PostgreSQL database on my server machine, which can only be
queried locally. Now I want to query the data from another machine. It
seems that Thrift RPC is a natural and good choice. However I found there
is some problem when the ResultSet is very large, say there are millions of
rows in the ResultSet. Since ResultSet is not serializable and serializing
it is meaningless. Returning the materialized rows all at once consumes two
much memory.
I consider returning such as 1000 rows each time
Is there any other idea or advice?

Thanks in advance

Tenghuan He

Re: Thrift RPC database query results

Posted by Andrew de Andrade <aa...@uber.com>.
I'm curious to hear more about the engineering decisions behind CQL and
it's native protocol that allow it to be better suited for large scale data
transfer. Have anything worth reading on the subject?

Also, of the optimizations and choices made for CQL and Cassandra's native
protocol, which could be adopted by Thrift or other RPC frameworks to get
many of the same gains?


On Mon, Apr 25, 2016 at 1:37 PM, Randy Abernethy <ra...@gmail.com>
wrote:

> While I am the largest of thrift fans, it is worth noting that Thrift is
> great for RPC, that is to say, creating really fast cross language
> microservice interfaces. It is not great for large scale data transfer.
> Cassandra was originally built with a Thrift API but moved to CQL and a
> native protocol, purpose built for returning large datasets. API
> interactions that are characterized by fast, modestly sized transfers work
> great with Thrift but transferring anything bigger than a few megs, while
> possible, may not be practical or optimal. I think you could say this about
> RPC systems in general (protobuf/grpc, MSRPC, RMI, etc.).
>
> On Fri, Apr 22, 2016 at 5:43 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> > Thrif  messages have to be buffered into memory. I suggest to use paging
> n
> > rows at a time to keep latency predictable.
> >
> > On Friday, April 22, 2016, Tenghuan He <te...@gmail.com> wrote:
> >
> > > Hi there
> > >
> > >     I have a PostgreSQL database on my server machine, which can only
> be
> > > queried locally. Now I want to query the data from another machine. It
> > > seems that Thrift RPC is a natural and good choice. However I found
> there
> > > is some problem when the ResultSet is very large, say there are
> millions
> > of
> > > rows in the ResultSet. Since ResultSet is not serializable and
> > serializing
> > > it is meaningless. Returning the materialized rows all at once consumes
> > two
> > > much memory.
> > > I consider returning such as 1000 rows each time
> > > Is there any other idea or advice?
> > >
> > > Thanks in advance
> > >
> > > Tenghuan He
> > >
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
> >
>

Re: Thrift RPC database query results

Posted by Wellington Moreno <jw...@gmail.com>.
Agree with Randy. A more indirect option is to use Thrift to send a
"pointer" to the actual file, and then use another protocol to serve the
data (HTTP, FTP, etc).

Kind Regards,
Wellington Moreno
*​Software Emperor*
*​RedRoma*

On Mon, Apr 25, 2016 at 1:37 PM, Randy Abernethy <ra...@gmail.com>
wrote:

> While I am the largest of thrift fans, it is worth noting that Thrift is
> great for RPC, that is to say, creating really fast cross language
> microservice interfaces. It is not great for large scale data transfer.
> Cassandra was originally built with a Thrift API but moved to CQL and a
> native protocol, purpose built for returning large datasets. API
> interactions that are characterized by fast, modestly sized transfers work
> great with Thrift but transferring anything bigger than a few megs, while
> possible, may not be practical or optimal. I think you could say this about
> RPC systems in general (protobuf/grpc, MSRPC, RMI, etc.).
>
> On Fri, Apr 22, 2016 at 5:43 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> > Thrif  messages have to be buffered into memory. I suggest to use paging
> n
> > rows at a time to keep latency predictable.
> >
> > On Friday, April 22, 2016, Tenghuan He <te...@gmail.com> wrote:
> >
> > > Hi there
> > >
> > >     I have a PostgreSQL database on my server machine, which can only
> be
> > > queried locally. Now I want to query the data from another machine. It
> > > seems that Thrift RPC is a natural and good choice. However I found
> there
> > > is some problem when the ResultSet is very large, say there are
> millions
> > of
> > > rows in the ResultSet. Since ResultSet is not serializable and
> > serializing
> > > it is meaningless. Returning the materialized rows all at once consumes
> > two
> > > much memory.
> > > I consider returning such as 1000 rows each time
> > > Is there any other idea or advice?
> > >
> > > Thanks in advance
> > >
> > > Tenghuan He
> > >
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
> >
>

Re: Thrift RPC database query results

Posted by Randy Abernethy <ra...@gmail.com>.
While I am the largest of thrift fans, it is worth noting that Thrift is
great for RPC, that is to say, creating really fast cross language
microservice interfaces. It is not great for large scale data transfer.
Cassandra was originally built with a Thrift API but moved to CQL and a
native protocol, purpose built for returning large datasets. API
interactions that are characterized by fast, modestly sized transfers work
great with Thrift but transferring anything bigger than a few megs, while
possible, may not be practical or optimal. I think you could say this about
RPC systems in general (protobuf/grpc, MSRPC, RMI, etc.).

On Fri, Apr 22, 2016 at 5:43 AM, Edward Capriolo <ed...@gmail.com>
wrote:

> Thrif  messages have to be buffered into memory. I suggest to use paging n
> rows at a time to keep latency predictable.
>
> On Friday, April 22, 2016, Tenghuan He <te...@gmail.com> wrote:
>
> > Hi there
> >
> >     I have a PostgreSQL database on my server machine, which can only be
> > queried locally. Now I want to query the data from another machine. It
> > seems that Thrift RPC is a natural and good choice. However I found there
> > is some problem when the ResultSet is very large, say there are millions
> of
> > rows in the ResultSet. Since ResultSet is not serializable and
> serializing
> > it is meaningless. Returning the materialized rows all at once consumes
> two
> > much memory.
> > I consider returning such as 1000 rows each time
> > Is there any other idea or advice?
> >
> > Thanks in advance
> >
> > Tenghuan He
> >
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Thrift RPC database query results

Posted by Edward Capriolo <ed...@gmail.com>.
Thrif  messages have to be buffered into memory. I suggest to use paging n
rows at a time to keep latency predictable.

On Friday, April 22, 2016, Tenghuan He <te...@gmail.com> wrote:

> Hi there
>
>     I have a PostgreSQL database on my server machine, which can only be
> queried locally. Now I want to query the data from another machine. It
> seems that Thrift RPC is a natural and good choice. However I found there
> is some problem when the ResultSet is very large, say there are millions of
> rows in the ResultSet. Since ResultSet is not serializable and serializing
> it is meaningless. Returning the materialized rows all at once consumes two
> much memory.
> I consider returning such as 1000 rows each time
> Is there any other idea or advice?
>
> Thanks in advance
>
> Tenghuan He
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.