You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@thrift.apache.org by Gary Moore <ga...@gmail.com> on 2010/04/21 21:17:57 UTC

Anyone using Thrift for public endpoints?

Hey all,

I've been playing with Thrift with personal projects but now my employer (at
my behest) is considering using Thrift for a public-facing endpoint so that
others can write web clients to consume our resources.  I'd like to chat
with people on this list who are using Thrift int hat capacity to get
thoughts/suggestions/etc...  Get back to me via email or on Twitter
@gsmoore.

Thanks,
Gary

-- 
Gary Moore
http://www.gmoore.net

Re: Anyone using Thrift for public endpoints?

Posted by Mayan Moudgill <ma...@bestweb.net>.

Here are some issues about Thrift using TBinaryProtocol over TSocket - 
they may or may not arise in other methods.

1. It is possible for a client to send an unbounded size message to the 
server. Even if the server is only expecting scalar values, a client 
could send an extra argument, and use it to contain an arbitrary sized 
string, which would then have to get skipped.

2. If the server has RPCs with a container class or a string argument, 
then the client can send an arbitrary sized argument (say an EXTREMELY 
long string), which will then cause the program to allocate extermely 
large amouts of memory, with the attendant consequences.

3. A client can leave a server hanging by sending an incomplete message 
and never sending the remainder. Depending on the way the server is 
programmed, this can result in a complete stall of the server or just 
result in OS/program resources being blocked.

These are perhaps fundamental to the philosophy of thrift. I would 
advocate adding a message byte-count and a timeout period for the 
socket. The byte-count takes care of problems 1&2 - if the message is 
too long, abort the message processing, and close the socket (possibly 
after sending it a E2BIG equivalent T_EXCEPTION). The timeout period 
takes care of hangs; if the timeout period expires, abort the message 
processing and close the socket.

Other problems are related to the interface to the user

4. An RPC call will get executed even if not all the arguments are 
satisfied. Thus, if the client calls foo(1: arg1, 2: arg2) and the 
server is expecting foo(1: arg1, 2: arg2, 3:arg3), the server will get 
invoked with some None (or not isset) arguments.

5. The incomplete type problem persists can occur later, too - for 
instance, arg2 may be a pointer to a list of structs with 3 fields, but 
the client may have populated arg2 with a list of structs of _2_ fields.

Solving this problem is an exercise in checking that the function is 
invoked with the proper number of arguments with the proper type 
properties. However, it would make this kind of programming easier if, 
for each argument, we could ask if:
- the argument was passed
- the passed type is a super set of the expected type (i.e. for 
non-struct types, the type matches exactly, and for a struct, it is 
allowed to have more fields, but not less than the expected struct).

There are also problems related to the actual implementation of Thrift - 
that is not an area I feel competent to speak to.

Laurens Van Houtven wrote:

> If nobody minds, could this discussion be public? I care about the answer
> too, and I can't imagine I'm the only person considering exposing a Thrift
> service. It doesn't appear Thrift implementations are necessarily
> specifically tested about what happens when you feed them random junk (this
> was a mailing list topic a while ago), which would of course be a problem if
> you're using it as an external interface, where every request is a potential
> attack.
> 
> tia
> lvh
>

Re: Anyone using Thrift for public endpoints?

Posted by Aron Sogor <bi...@gmail.com>.

Well that is good to know

On Mon, Apr 26, 2010 at 11:38 AM, Mark Slee <ms...@facebook.com> wrote:

> Wrapping Thrift up in protocols like HTTP can help alleviate some of the
> common issues and is a pretty reasonable thing to do.
>
> But make no mistake, you still need to protect against the real issues.
> Even if you use HTTP, someone can still send a bogus request that *claims*
> to contain a 1GB string and trick the server into a huge allocation, even if
> the HTTP POST request is < 1K in size.
>
> Same goes for arguments about framing, etc. Even with a framed transport,
> nothing stops someone from intentionally sending a bogus frame size. You do
> need to make sure that your Thrift code is equipped to handle all these
> cases.
>
> -----Original Message-----
> From: Aron Sogor [mailto:bigman@gmail.com]
> Sent: Sunday, April 25, 2010 5:50 PM
> To: thrift-dev@incubator.apache.org
> Subject: Re: Anyone using Thrift for public endpoints?
>
> It depends.. If you talking about raw socket protocol, sure you need some
> flow control and there is no such thing out of box.
>
> If run over HTTP that your HTTP container probably can limit the POST
> size... so you probably will not crash the server.
>
> Aron
>
> On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
> mathias.herberts@gmail.com> wrote:
>
> > Given that Thrift still suffers from crashes due to invalid data being
> > entered, I would not yet consider this a safe practice.
> >
> > Maybe after GSoC.
> >
> > But facebook does it for a test service IIRC.
> >
>

RE: Anyone using Thrift for public endpoints?

Posted by Mark Slee <ms...@facebook.com>.

Definitely feel free to open JIRA tickets (or just fire off emails w/ specific details) when you come across this stuff. I'm sure there are folks willing and interested in improving such things.

-----Original Message-----
From: Mayan Moudgill [mailto:mayan@bestweb.net] 
Sent: Monday, April 26, 2010 4:45 PM
To: thrift-dev@incubator.apache.org; bryan@rapleaf.com
Subject: Re: Anyone using Thrift for public endpoints?

There are places where the code is not all that great; repeated checks, 
useless allocations, etc.

Bryan Duxbury wrote:
> On Mon, Apr 26, 2010 at 4:22 PM, Mayan Moudgill <ma...@bestweb.net> wrote:
> 
> 
>>Looking over some of the java code, I'm guessing that
>>marshalling/demarshalling efficiency was not considered to be of much
>>importance.
> 
> 
> 
> Huh?
>

Re: Anyone using Thrift for public endpoints?

Posted by Laurens Van Houtven <lv...@laurensvh.be>.

How will GC fix arbitrary sized allocation? Stuff has to be allocated before
it can be filled with something that can eventually be collected by a GC.

lvh

Re: Anyone using Thrift for public endpoints?

Posted by Mayan Moudgill <ma...@bestweb.net>.

There are places where the code is not all that great; repeated checks, 
useless allocations, etc.

Bryan Duxbury wrote:
> On Mon, Apr 26, 2010 at 4:22 PM, Mayan Moudgill <ma...@bestweb.net> wrote:
> 
> 
>>Looking over some of the java code, I'm guessing that
>>marshalling/demarshalling efficiency was not considered to be of much
>>importance.
> 
> 
> 
> Huh?
>

Re: Anyone using Thrift for public endpoints?

Posted by Bryan Duxbury <br...@rapleaf.com>.

On Mon, Apr 26, 2010 at 4:22 PM, Mayan Moudgill <ma...@bestweb.net> wrote:

> Looking over some of the java code, I'm guessing that
> marshalling/demarshalling efficiency was not considered to be of much
> importance.


Huh?

Re: Anyone using Thrift for public endpoints?

Posted by Mayan Moudgill <ma...@bestweb.net>.

I'm guessing you mean the setReadLength() and setMaxSkipDepth stuff, 
which forces lengths of strings etc. to be less than a particular limit, 
and limits the maximum depth that will be skipped.

Unfortunately, one can still (I think) force an arbitrary amount of 
memory to be allocated. (I think garbage collection should clean it up, 
though).

Looking over some of the java code, I'm guessing that 
marshalling/demarshalling efficiency was not considered to be of much 
importance.

Mark Slee wrote:

> Some of the libraries do have APIs to let the user specify maximum size-limits (for instance I think the Java protocol implementations support this).
> 
> Most of it is not rocket science, it's just a matter of going in and making sure the library objects all have the appropriate APIs to let the user specify what the size limits should be. This work is not complete across all the language library implementations.
> 
> In C++ there is a more complex vulnerability, which is that stack overflow can be created by a rogue client sending an infinitely-nested sturct-of-struct-of-struct..., which would cause TProtocol::skip() to keep allocating stack frames.
> 
> 
> -----Original Message-----
> From: Mayan Moudgill [mailto:mayan@bestweb.net] 
> Sent: Monday, April 26, 2010 1:52 PM
> To: thrift-dev@incubator.apache.org; Mark Slee
> Subject: Re: Anyone using Thrift for public endpoints?
> 
> 
> I was wondering - isn't part of the problem that there is no way for a 
> user to handle these issues, that they are handled in the Thrift library 
> layer, so to speak?
> 
> Suppose a user wanted to add the behavior: "abort if an RPC message is 
> going to allocate more than 1MB data". Can a user do this? Or will he 
> have to hack the library code?
> 
> Mark Slee wrote:
> 
>>Wrapping Thrift up in protocols like HTTP can help alleviate some of the common
>>issues and is a pretty reasonable thing to do.
>>
>>But make no mistake, you still need to protect against the real issues. Even if
>>you use HTTP, someone can still send a bogus request that *claims* to contain a
>>1GB string and trick the server into a huge allocation, even if the HTTP POST
> 
>  > request is < 1K in size.
> 
>>Same goes for arguments about framing, etc. Even with a framed transport,
> 
>  > nothing stops someone form intentionally sending a bogus frame size.
> 
>>You do  need to make sure that your Thrift code is equipped to handle all 
> 
> these cases.
> 
>>-----Original Message-----
>>From: Aron Sogor [mailto:bigman@gmail.com] 
>>Sent: Sunday, April 25, 2010 5:50 PM
>>To: thrift-dev@incubator.apache.org
>>Subject: Re: Anyone using Thrift for public endpoints?
>>
>>It depends.. If you talking about raw socket protocol, sure you need some
>>flow control and there is no such thing out of box.
>>
>>If run over HTTP that your HTTP container probably can limit the POST
>>size... so you probably will not crash the server.
>>
>>Aron
>>
>>On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
>>mathias.herberts@gmail.com> wrote:
>>
>>
>>
>>>Given that Thrift still suffers from crashes due to invalid data being
>>>entered, I would not yet consider this a safe practice.
>>>
>>>Maybe after GSoC.
>>>
>>>But facebook does it for a test service IIRC.
>>>
>>
>>
>>
> 
> 
>

RE: Anyone using Thrift for public endpoints?

Posted by Mark Slee <ms...@facebook.com>.

Some of the libraries do have APIs to let the user specify maximum size-limits (for instance I think the Java protocol implementations support this).

Most of it is not rocket science, it's just a matter of going in and making sure the library objects all have the appropriate APIs to let the user specify what the size limits should be. This work is not complete across all the language library implementations.

In C++ there is a more complex vulnerability, which is that stack overflow can be created by a rogue client sending an infinitely-nested sturct-of-struct-of-struct..., which would cause TProtocol::skip() to keep allocating stack frames.

-----Original Message-----
From: Mayan Moudgill [mailto:mayan@bestweb.net] 
Sent: Monday, April 26, 2010 1:52 PM
To: thrift-dev@incubator.apache.org; Mark Slee
Subject: Re: Anyone using Thrift for public endpoints?

I was wondering - isn't part of the problem that there is no way for a 
user to handle these issues, that they are handled in the Thrift library 
layer, so to speak?

Suppose a user wanted to add the behavior: "abort if an RPC message is 
going to allocate more than 1MB data". Can a user do this? Or will he 
have to hack the library code?

Mark Slee wrote:
> Wrapping Thrift up in protocols like HTTP can help alleviate some of the common
> issues and is a pretty reasonable thing to do.
> 
> But make no mistake, you still need to protect against the real issues. Even if
> you use HTTP, someone can still send a bogus request that *claims* to contain a
> 1GB string and trick the server into a huge allocation, even if the HTTP POST
 > request is < 1K in size.
> 
> Same goes for arguments about framing, etc. Even with a framed transport,
 > nothing stops someone form intentionally sending a bogus frame size.
> You do  need to make sure that your Thrift code is equipped to handle all 
these cases.
> -----Original Message-----
> From: Aron Sogor [mailto:bigman@gmail.com] 
> Sent: Sunday, April 25, 2010 5:50 PM
> To: thrift-dev@incubator.apache.org
> Subject: Re: Anyone using Thrift for public endpoints?
> 
> It depends.. If you talking about raw socket protocol, sure you need some
> flow control and there is no such thing out of box.
> 
> If run over HTTP that your HTTP container probably can limit the POST
> size... so you probably will not crash the server.
> 
> Aron
> 
> On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
> mathias.herberts@gmail.com> wrote:
> 
> 
>>Given that Thrift still suffers from crashes due to invalid data being
>>entered, I would not yet consider this a safe practice.
>>
>>Maybe after GSoC.
>>
>>But facebook does it for a test service IIRC.
>>
> 
> 
>

Re: Anyone using Thrift for public endpoints?

Posted by Mayan Moudgill <ma...@bestweb.net>.

I was wondering - isn't part of the problem that there is no way for a 
user to handle these issues, that they are handled in the Thrift library 
layer, so to speak?

Suppose a user wanted to add the behavior: "abort if an RPC message is 
going to allocate more than 1MB data". Can a user do this? Or will he 
have to hack the library code?

Mark Slee wrote:
> Wrapping Thrift up in protocols like HTTP can help alleviate some of the common
> issues and is a pretty reasonable thing to do.
> 
> But make no mistake, you still need to protect against the real issues. Even if
> you use HTTP, someone can still send a bogus request that *claims* to contain a
> 1GB string and trick the server into a huge allocation, even if the HTTP POST
 > request is < 1K in size.
> 
> Same goes for arguments about framing, etc. Even with a framed transport,
 > nothing stops someone form intentionally sending a bogus frame size.
> You do  need to make sure that your Thrift code is equipped to handle all 
these cases.
> -----Original Message-----
> From: Aron Sogor [mailto:bigman@gmail.com] 
> Sent: Sunday, April 25, 2010 5:50 PM
> To: thrift-dev@incubator.apache.org
> Subject: Re: Anyone using Thrift for public endpoints?
> 
> It depends.. If you talking about raw socket protocol, sure you need some
> flow control and there is no such thing out of box.
> 
> If run over HTTP that your HTTP container probably can limit the POST
> size... so you probably will not crash the server.
> 
> Aron
> 
> On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
> mathias.herberts@gmail.com> wrote:
> 
> 
>>Given that Thrift still suffers from crashes due to invalid data being
>>entered, I would not yet consider this a safe practice.
>>
>>Maybe after GSoC.
>>
>>But facebook does it for a test service IIRC.
>>
> 
> 
>

RE: Anyone using Thrift for public endpoints?

Posted by Mark Slee <ms...@facebook.com>.

Wrapping Thrift up in protocols like HTTP can help alleviate some of the common issues and is a pretty reasonable thing to do.

But make no mistake, you still need to protect against the real issues. Even if you use HTTP, someone can still send a bogus request that *claims* to contain a 1GB string and trick the server into a huge allocation, even if the HTTP POST request is < 1K in size.

Same goes for arguments about framing, etc. Even with a framed transport, nothing stops someone from intentionally sending a bogus frame size. You do need to make sure that your Thrift code is equipped to handle all these cases.

-----Original Message-----
From: Aron Sogor [mailto:bigman@gmail.com] 
Sent: Sunday, April 25, 2010 5:50 PM
To: thrift-dev@incubator.apache.org
Subject: Re: Anyone using Thrift for public endpoints?

It depends.. If you talking about raw socket protocol, sure you need some
flow control and there is no such thing out of box.

If run over HTTP that your HTTP container probably can limit the POST
size... so you probably will not crash the server.

Aron

On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Given that Thrift still suffers from crashes due to invalid data being
> entered, I would not yet consider this a safe practice.
>
> Maybe after GSoC.
>
> But facebook does it for a test service IIRC.
>

Re: Anyone using Thrift for public endpoints?

Posted by Aron Sogor <bi...@gmail.com>.

It depends.. If you talking about raw socket protocol, sure you need some
flow control and there is no such thing out of box.

If run over HTTP that your HTTP container probably can limit the POST
size... so you probably will not crash the server.

Aron

On Thu, Apr 22, 2010 at 1:24 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Given that Thrift still suffers from crashes due to invalid data being
> entered, I would not yet consider this a safe practice.
>
> Maybe after GSoC.
>
> But facebook does it for a test service IIRC.
>

Re: Anyone using Thrift for public endpoints?

Posted by Mathias Herberts <ma...@gmail.com>.

Given that Thrift still suffers from crashes due to invalid data being
entered, I would not yet consider this a safe practice.

Maybe after GSoC.

But facebook does it for a test service IIRC.

Re: Anyone using Thrift for public endpoints?

Posted by Laurens Van Houtven <lv...@laurensvh.be>.

If nobody minds, could this discussion be public? I care about the answer
too, and I can't imagine I'm the only person considering exposing a Thrift
service. It doesn't appear Thrift implementations are necessarily
specifically tested about what happens when you feed them random junk (this
was a mailing list topic a while ago), which would of course be a problem if
you're using it as an external interface, where every request is a potential
attack.

tia
lvh

Re: Anyone using Thrift for public endpoints?

Posted by Seth Hitchings <se...@evernote.com>.

Evernote's public-facing API uses Thrift and serves over 3 million users.
The API is used by our own clients (Mac, Win, iPhone, iPad, Android, etc) as
well as a large number of third party applications and services.

As others have said, running over HTTP can help with some of the common
problems, and our server side does some checking to limit the maximum
message size, sets a maximum skip depth, verifies required parameters, etc.

There's more detail about the API at
http://www.evernote.com/about/developer/api

Seth Hitchings
Evernote

On Wed, Apr 21, 2010 at 3:17 PM, Gary Moore <ga...@gmail.com> wrote:

> Hey all,
>
> I've been playing with Thrift with personal projects but now my employer
> (at
> my behest) is considering using Thrift for a public-facing endpoint so that
> others can write web clients to consume our resources.  I'd like to chat
> with people on this list who are using Thrift int hat capacity to get
> thoughts/suggestions/etc...  Get back to me via email or on Twitter
> @gsmoore.
>
> Thanks,
> Gary
>
> --
> Gary Moore
> http://www.gmoore.net
>