You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Doug Cutting <cu...@apache.org> on 2009/09/11 23:41:23 UTC
HTTP transport?
I'm considering an HTTP-based transport for Avro as the preferred,
high-performance option.
HTTP has lots of advantages. In particular, it already has
- lots of authentication, authorization and encryption support;
- highly optimized servers;
- monitoring, logging, etc.
Tomcat and other servlet containers support async NIO, where a thread is
not required per connection. A servlet can process bulk data with a
single copy to and from the socket (bypassing stream buffers). Calls
can be multiplexed over a single HTTP connection using Comet events.
http://tomcat.apache.org/tomcat-6.0-doc/aio.html
Zero copy is not an option for servlets that generate arbitrary data,
but one can specify a file/start/length tuple and Tomcat will use
sendfile to write the response. That means that while HDFS datanode
file reads could not be done via RPC, they could be done via HTTP with
zero-copy. If authentication and authorization are already done in the
HTTP server, this may not be a big loss. The HDFS client might make two
HTTP requests, one to read a files data, and another to read its
checksums. The server would then stream the entire block to the client
using sendfile, using TCP flow control as today.
Thoughts?
Doug
Re: HTTP transport?
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote:
> I'm considering an HTTP-based transport for Avro as the preferred,
> high-performance option.
>
> HTTP has lots of advantages. In particular, it already has
> - lots of authentication, authorization and encryption support;
> - highly optimized servers;
> - monitoring, logging, etc.
>
Q. Is this to replace the client-DN data-transfer protocol or for ALL
Hadoop rpc?
Q. Was authentication one of your main motivation?
The current plans for authentication is centered around kerberos.
HTTP does not fit in too well in that picture.
sanjay
>
> Tomcat and other servlet containers support async NIO, where a
> thread is
> not required per connection. A servlet can process bulk data with a
> single copy to and from the socket (bypassing stream buffers). Calls
> can be multiplexed over a single HTTP connection using Comet events.
>
> http://tomcat.apache.org/tomcat-6.0-doc/aio.html
>
> Zero copy is not an option for servlets that generate arbitrary data,
> but one can specify a file/start/length tuple and Tomcat will use
> sendfile to write the response. That means that while HDFS datanode
> file reads could not be done via RPC, they could be done via HTTP with
> zero-copy. If authentication and authorization are already done in
> the
> HTTP server, this may not be a big loss. The HDFS client might make
> two
> HTTP requests, one to read a files data, and another to read its
> checksums. The server would then stream the entire block to the
> client
> using sendfile, using TCP flow control as today.
>
> Thoughts?
>
> Doug
>
Re: HTTP transport?
Posted by Patrick Hunt <ph...@apache.org>.
One additional benefit of using HTTP is that people are always working
to improve performance, and not only optimizing servers -- Google's SPDY:
http://www.readwriteweb.com/archives/spdy_google_wants_to_speed_up_the_web.php
Multiplexed requests, compressed headers, etc...
Patrick
Doug Cutting wrote:
> I'm considering an HTTP-based transport for Avro as the preferred,
> high-performance option.
>
> HTTP has lots of advantages. In particular, it already has
> - lots of authentication, authorization and encryption support;
> - highly optimized servers;
> - monitoring, logging, etc.
>
> Tomcat and other servlet containers support async NIO, where a thread is
> not required per connection. A servlet can process bulk data with a
> single copy to and from the socket (bypassing stream buffers). Calls
> can be multiplexed over a single HTTP connection using Comet events.
>
> http://tomcat.apache.org/tomcat-6.0-doc/aio.html
>
> Zero copy is not an option for servlets that generate arbitrary data,
> but one can specify a file/start/length tuple and Tomcat will use
> sendfile to write the response. That means that while HDFS datanode
> file reads could not be done via RPC, they could be done via HTTP with
> zero-copy. If authentication and authorization are already done in the
> HTTP server, this may not be a big loss. The HDFS client might make two
> HTTP requests, one to read a files data, and another to read its
> checksums. The server would then stream the entire block to the client
> using sendfile, using TCP flow control as today.
>
> Thoughts?
>
> Doug
Re: HTTP transport?
Posted by Eric Sammer <er...@lifeless.net>.
Ryan Rawson wrote:
> That's good to know. I thought ka would help... but I was also talking about
> the overhead of a header where the payload is smaller than the framing. Eg:
> 8 byte requests, excluding which rpc. This seems like we could be hurt since
> the headers are potentially 5x the size of our payload/request params.
>
Oh, got you. That's the classic SOAP problem. ;) I think it's possible,
but to what degree I couldn't be sure. HTTP has that kind of overhead
because of the generality. I think you'd get that with anything that
isn't specifically designed to be wire efficient. Of course, you wind up
having to do what Doug originally mentioned; rebuilding and maintaining
the original stuff that HTTP (and the supporting clients) already supports.
If over-optimization is in fact the root of all evil (which I've heard
once or twice) then maybe it makes sense to start simple and iterate if
necessary. In other words, say screw it, use HTTP, strip unnecessary
headers, but design the code such that Avro's transport is interface
based in case it needs to change. I think prototyping an Avro transport
with HTTP, optimizing, and then dropping lower level if necessary is a
better approach than going straight to the latter.
All of that said, I don't have the insight into the code base that some
of the other folks do. This is based on my experience with similar high
throughput systems, but I wouldn't say I'm 100% convinced it applies
here as the payloads in those systems were bigger than 8 bytes.
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com
Re: HTTP transport?
Posted by Ryan Rawson <ry...@gmail.com>.
That's good to know. I thought ka would help... but I was also talking about
the overhead of a header where the payload is smaller than the framing. Eg:
8 byte requests, excluding which rpc. This seems like we could be hurt since
the headers are potentially 5x the size of our payload/request params.
On Oct 5, 2009 4:54 PM, "Eric Sammer" <er...@lifeless.net> wrote:
Ryan:
Certainly keep alive will help in this case, if that's what you're
referring to. The server holds the socket for N seconds or M requests,
which ever comes first. What you're saving with KA is the connection
setup / tear down. If you have a lot of cases where the client makes a
single request and goes away, then KA hurts because the server holds the
connection for the KA timeout (N seconds). This *really* helps if you're
using TLS due to the additional connection setup overhead.
It's my opinion and experience that KA helps greatly in the case of many
exchanges between a small to medium number of clients and a server such
as RPC. The anti-example is an ad server or web beacon server, for instance.
Regards.
Ryan Rawson wrote: > I have a question about these headers... will they
impact the ability to do > ...
--
Eric Sammer eric@lifless.net http://esammer.blogspot.com
Re: HTTP transport?
Posted by Ryan Rawson <ry...@gmail.com>.
I wanted to chime in on a few things, since avro is a candidate for
the HBase RPC.
I am not sure that "browser compatibility" is a legitimate requirement
for this kind of thing. It is at odds with high performance in a
number of areas, and isn't the driving factor for using HTTP anyways.
Security - you can get the advantage of security standards, such as
the X.509 SSL cert, without actually using HTTPS.
Headers - I don't really think providing a caching mechanism built
into the RPC layer is a top requirement. We'd then have to build in a
GET/POST idempotent flag into the Avro IDL, and everyone would have to
get it right, etc.
Considering my top requirement is "make bulk data access and RPC
rate/sec as high as possible", I'm not sure caching fits in here and
can work against that.
On Tue, Sep 29, 2009 at 8:06 PM, Scott Carey <sc...@richrelevance.com> wrote:
>
>
>
> On 9/29/09 2:57 PM, "stack" <st...@duboce.net> wrote:
>
>> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:
>>
>>>
>>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>>> stuff out of Avro's payload into HTTP headers. The downside of that would
>>> be that, if we still wish to support non-HTTP transports, we'd end up with
>>> duplicated logic.
>>>
>>
>>
>> There would be loads of upside I'd imagine if there was a natural mapping of
>> avro payload specifiers and metadata up into http headers in terms of
>> visibility
>>
>
> There are some very serious disadvantages to headers if overused.
>
> I highly advise keeping what goes into the URL and headers very specific to
> support well defined features for this specific transport type. Otherwise,
> put it in the data payload for all transports.
>
> A couple header disadvantages:
> * Limited character set allowed. You can't put any data in there you want,
> and you can end up with an inefficient encoding mess that is not easy to
> read.
> * Headers don't take advantage of other transport features. For example,
> Content-Encoding:gzip provides gzip compression support for the data
> payload, but you can't compress the headers in HTTP.
>
> On the other hand, Custom headers can be handy ways to implement transport
> specific handshakes or advertize capabilities (which helps build in
> cross-version compatibility).
> But browsers only work with the standard ones, so whatever 'browser
> requirement' is out there is going to be a limited subset no matter how you
> do it.
>
> This thread brings up the security features. Payload encryption does not
> seem to be a transport feature -- but it could be done via something like
> Content-Encoding (X-Avro-Content-Encrypted?). It seems to fit better IMO
> within the payload itself, or at the socket / network level via SSH or a
> secure tunnel.
>
> Authentication is a better fit for the transport layer -- but as mentioned
> elsewhere if it has to be done for all transports, could it fit in the
> payload somehow?
>
>>
>> So, are we're talking about doing something like following for a
>> request/response:
>>
>> GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>> Host: www.example.com
>>
>>
>> HTTP/1.1 200 OK
>> Date: Mon, 23 May 2005 22:38:34 GMT
>> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
>> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>> Etag: "3f80f-1b6-3e1cb03b"
>> Accept-Ranges: bytes
>> Content-Length: 438
>> Connection: close
>> Content-Type: X-avro/binary
>>
>
> Its acceptable to drop a lot of the headers above. Some of them are useful
> to implement extended functionality -- the Etag can be used for caching if
> that were desired, for example. Keep-Alive connections and chunked
> responses are nice built-ins too.
>
>>
>> ... or some variation on above on each and every RPC?
>>
>> St.Ack
>>
>
>
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
On 9/29/09 2:57 PM, "stack" <st...@duboce.net> wrote:
> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:
>
>>
>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>> stuff out of Avro's payload into HTTP headers. The downside of that would
>> be that, if we still wish to support non-HTTP transports, we'd end up with
>> duplicated logic.
>>
>
>
> There would be loads of upside I'd imagine if there was a natural mapping of
> avro payload specifiers and metadata up into http headers in terms of
> visibility
>
There are some very serious disadvantages to headers if overused.
I highly advise keeping what goes into the URL and headers very specific to
support well defined features for this specific transport type. Otherwise,
put it in the data payload for all transports.
A couple header disadvantages:
* Limited character set allowed. You can't put any data in there you want,
and you can end up with an inefficient encoding mess that is not easy to
read.
* Headers don't take advantage of other transport features. For example,
Content-Encoding:gzip provides gzip compression support for the data
payload, but you can't compress the headers in HTTP.
On the other hand, Custom headers can be handy ways to implement transport
specific handshakes or advertize capabilities (which helps build in
cross-version compatibility).
But browsers only work with the standard ones, so whatever 'browser
requirement' is out there is going to be a limited subset no matter how you
do it.
This thread brings up the security features. Payload encryption does not
seem to be a transport feature -- but it could be done via something like
Content-Encoding (X-Avro-Content-Encrypted?). It seems to fit better IMO
within the payload itself, or at the socket / network level via SSH or a
secure tunnel.
Authentication is a better fit for the transport layer -- but as mentioned
elsewhere if it has to be done for all transports, could it fit in the
payload somehow?
>
> So, are we're talking about doing something like following for a
> request/response:
>
> GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
> Host: www.example.com
>
>
> HTTP/1.1 200 OK
> Date: Mon, 23 May 2005 22:38:34 GMT
> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
> Etag: "3f80f-1b6-3e1cb03b"
> Accept-Ranges: bytes
> Content-Length: 438
> Connection: close
> Content-Type: X-avro/binary
>
Its acceptable to drop a lot of the headers above. Some of them are useful
to implement extended functionality -- the Etag can be used for caching if
that were desired, for example. Keep-Alive connections and chunked
responses are nice built-ins too.
>
> ... or some variation on above on each and every RPC?
>
> St.Ack
>
Re: HTTP transport?
Posted by Devaraj Das <dd...@yahoo-inc.com>.
Out of curiosity, do we have such numbers for the current hadoop RPC?
On 9/29/09 4:17 PM, "Doug Cutting" <cu...@apache.org> wrote:
stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
>
> GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
> Host: www.example.com
>
>
> HTTP/1.1 200 OK
> Date: Mon, 23 May 2005 22:38:34 GMT
> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
> Etag: "3f80f-1b6-3e1cb03b"
> Accept-Ranges: bytes
> Content-Length: 438
> Connection: close
> Content-Type: X-avro/binary
>
>
> ... or some variation on above on each and every RPC?
More or less. Except we can probably arrange to omit most of those
response headers except Content-Length. Are any others strictly required?
I today implemented a simple HTTP-based transport for Avro:
https://issues.apache.org/jira/browse/AVRO-129
In some simple benchmarks I am able to make over 5000 sequential
RPCs/second, each with ~100 bytes of response payload. Increasing
response payloads to 100kB slows this to around 2500 RPCs/second, giving
throughput of 250MB/second, or 2.5Gbit/s. This is with both client and
server running on my laptop. The client is java.net.URLConnection and
the server is Jetty with its default configuration.
Doug
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
On 10/5/09 1:47 PM, "Ryan Rawson" <ry...@gmail.com> wrote:
> I have a question about these headers... will they impact the ability to do
> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
> rpcs/second. Would this help or hinder?
>
As long as the HTTP response and request fit in one network packet
(pessimistic - 1KB or so) there is not much overhead.
50k rpcs/sec with gigabit ethernet saturated (~100MB/sec) is ~2KB per
request.
So, on faster networks an extra 100 to 200 bytes or so won't matter.
On the WAN, it will have more of an effect if the bandwidth is low and the
latency also very low if the RPC is very 'chatty' and not 'chunky' enough.
However, on most WAN links network latency is going to kill you far, far
more than an extra 200 bytes.
For example, imagine a 20ms latency link. The max RPC throughput to a
single client is then 50/sec (one per 20ms). With a 1k payload per request,
that's 50k per sec max data transfer. HTTP pipelining could help here --
but isn't as well supported as one would like.
If WAN level RPC is a goal, the main challenges there will be latency
related first, and packet size related second.
On a fast local network (gigabit) I suspect throughput problems of other
sorts to be the issue before bandwidth from slightly larger packets.
Furthermore, its not like a TCP packet is 0 bytes on its own. HTTP adds
some overhead, but it can be kept relatively trim.
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
On 10/9/09 10:49 AM, "Doug Cutting" <cu...@apache.org> wrote:
>
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
>
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it. With HttpClient one can limit the number of pooled
> connnections per host:
>
Also since HTTP supports in-order pipelining out of the box, its only
out-of-order stuff that would require additional connections.
>
> Doug
>
Requirements may end up ruling out HTTP, but I doubt that performance (in
the insecure case) will be the cause since there are so many high
performance client and server implementations available.
Consider something lower level than the Servlet API for the server side --
it is baggage-laden and does not allow access to all data in unconverted
form or any asynchronous i/o.
In this respect, jetty has lower level, light-weight API access points.
http://docs.codehaus.org/display/JETTY/Architecture
If HTTP is not used, I suggest a strong look at apache MINA for constructing
high performance NIO clients and servers with Java http://mina.apache.org/
Re: HTTP transport?
Posted by Kan Zhang <ka...@yahoo-inc.com>.
On 10/14/09 9:37 AM, "Doug Cutting" <cu...@apache.org> wrote:
> Kan Zhang wrote:
>> One problem I see with using HTTP is that it's expensive to provide data
>> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
>> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
>> for subsequent communication over the authenticated channel. However, when
>> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
>> DIGEST, respectively), they don't provide data encryption (correct me if I'm
>> wrong). For data encryption over HTTP, one has to use SSL, which is
>> expensive.
>
> Java supports using Kerberos-based encryption for TLS (nee SSL):
>
> http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB
>
This addresses part of my concern (the Kerberos part). I wasn't aware Java
already supports it. Thanks for pointing it out.
Kan
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Kan Zhang wrote:
> Thanks for pointing this out. I did a little testing on it. It seems that
> when you use Kerberos cipher suites with SSL, the Kerberos service name for
> a TLS server has to be literally "host." For example, a TLS server running
> on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use
> host/mach1.imc.org@IMC.ORG as its Kerberos principal name. I couldn't find a
> way to specify a different service name. Can someone confirm this? This can
> be a limitation since we typically run DN and TT on the same set of nodes.
This is unfortunate. It looks to be part of the specification.
BTW, I found an approach to Kerberos over HTTP bypassing SPNEGO:
http://beamdocs.fnal.gov/DocDB/0019/001987/001/KMJ3_1-guide.pdf
Starting on page 13, he suggests having an applet that the browser loads
to create a ticket. The ticket is created by the user's browser talking
directly to Kerberos. Then the ticket can be used in subsequent
requests to identify the user. An application using HTTP could
similarly contact Kerberos directly to create tickets that are sent with
requests. No multi-step HTTP handshake is thus required.
Doug
Re: HTTP transport?
Posted by Kan Zhang <ka...@yahoo-inc.com>.
On 10/14/09 9:37 AM, "Doug Cutting" <cu...@apache.org> wrote:
> Kan Zhang wrote:
>> One problem I see with using HTTP is that it's expensive to provide data
>> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
>> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
>> for subsequent communication over the authenticated channel. However, when
>> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
>> DIGEST, respectively), they don't provide data encryption (correct me if I'm
>> wrong). For data encryption over HTTP, one has to use SSL, which is
>> expensive.
>
> Java supports using Kerberos-based encryption for TLS (nee SSL):
>
> http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB
>
> http://tools.ietf.org/html/rfc2712
>
Thanks for pointing this out. I did a little testing on it. It seems that
when you use Kerberos cipher suites with SSL, the Kerberos service name for
a TLS server has to be literally "host." For example, a TLS server running
on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use
host/mach1.imc.org@IMC.ORG as its Kerberos principal name. I couldn't find a
way to specify a different service name. Can someone confirm this? This can
be a limitation since we typically run DN and TT on the same set of nodes.
Kan
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Kan Zhang wrote:
> One problem I see with using HTTP is that it's expensive to provide data
> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
> for subsequent communication over the authenticated channel. However, when
> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
> DIGEST, respectively), they don't provide data encryption (correct me if I'm
> wrong). For data encryption over HTTP, one has to use SSL, which is
> expensive.
Java supports using Kerberos-based encryption for TLS (nee SSL):
http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB
http://tools.ietf.org/html/rfc2712
There's also a standard way to use tickets over TLS:
http://tools.ietf.org/html/rfc4507
Doug
Re: HTTP transport?
Posted by Kan Zhang <ka...@yahoo-inc.com>.
On 10/9/09 12:56 PM, "Doug Cutting" <cu...@apache.org> wrote:
> Sanjay Radia wrote:
>> Will the RPC over HTTP be transparent so that that we can replace with a
>> different layer if needed?
>
> Yes.
>
>> My worry was the separation of data and checksums; someone had mentioned
>> that one could do this over 2 RPCs - that is not transparent.
>
> That was suggested as a possibility if we did not want to use RPC for
> data, but rather raw HTTP, e.g., with a separate URL per block. The
> zerocopy support built into most HTTP servers only supports entire
> responses from a single file, so if we wanted to take advantage of these
> zerocopy implementations we'd not use RPC for block access, but could
> use HTTP and hence share security, etc. Using raw HTTP for block access
> might also perform better, since it can use TCP flow control, rather
> than RPC call/response. In my microbenchmarks, RPC call/response was
> fast enough to easily saturate disks and networks, so that might be
> moot, although RPC call/response for file data may use more CPU than
> we'd like. With our own transport implementation we could get RPC
> call/response to use zerocopy for file data.
>
One problem I see with using HTTP is that it's expensive to provide data
encryption. We're currently adding 2 authentication mechanisms (Kerberos and
DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
for subsequent communication over the authenticated channel. However, when
similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
DIGEST, respectively), they don't provide data encryption (correct me if I'm
wrong). For data encryption over HTTP, one has to use SSL, which is
expensive.
Kan
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> Will the RPC over HTTP be transparent so that that we can replace with a
> different layer if needed?
Yes.
> My worry was the separation of data and checksums; someone had mentioned
> that one could do this over 2 RPCs - that is not transparent.
That was suggested as a possibility if we did not want to use RPC for
data, but rather raw HTTP, e.g., with a separate URL per block. The
zerocopy support built into most HTTP servers only supports entire
responses from a single file, so if we wanted to take advantage of these
zerocopy implementations we'd not use RPC for block access, but could
use HTTP and hence share security, etc. Using raw HTTP for block access
might also perform better, since it can use TCP flow control, rather
than RPC call/response. In my microbenchmarks, RPC call/response was
fast enough to easily saturate disks and networks, so that might be
moot, although RPC call/response for file data may use more CPU than
we'd like. With our own transport implementation we could get RPC
call/response to use zerocopy for file data.
> I assume that we
> going to create a branch that moves the data transfer protocols to RPC and
> test the performance and if it is good then we commit and move to RPC?
Yes. We obviously cannot change the file data transfer protocol without
benchmarking. Ideally file data transfer can share as much as possible
with other protocols. The most optimistic approach would be to use
HTTP-based RPC call/response, so we ought to benchmark that. This was
the purpose of my recently-reported microbenchmarks.
We also need to determine whether both TCP flow-control and zerocopy are
critical to data file performance. If both are indeed critical, and
HTTP proves sufficient for everything else, then we should consider
using non-RPC HTTP for file data transfer, since it supports both
zerocopy and TCP-based flow control, and the implementation of security,
etc. could be shared. But, on the other hand, if HTTP is deemed
inappropriate for security and we develop our own RPC transport that
permits zerocopy, and TCP flow-control over entire blocks is not
required, then we might use RPC for file data. What I'm hoping we can
avoid is, as today, using different transports for different protocols,
re-implementing security, connection pooling, async request processing,
etc. for each, requiring separate configuration and ports for each, etc.
But even that might be required. We don't know yet.
I think starting with HTTP as a hypothesis permits us to make progress
without a lot of up-front investment.
Doug
Re: HTTP transport?
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On 10/9/09 10:49 AM, "Doug Cutting" <cu...@apache.org> wrote:
> Owen O'Malley wrote:
>> SPNEGO is the
>> standard method of using Kerberos with HTTP and we are planning to use
>> that for the web UI's.
>
> Java 6 also supports using SPNEGO for RPC over HTTP out of the box:
>
> http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html
>
>> I also have serious doubts about performance, but that is hard to answer
>> until we have code to test.
>
> The good news is that, since the HTTP stuff is already implemented, we
> can test its performance easily. Performance of insecure access over
> HTTP looks good so far. It's an open question are how much HTTP-based
> security will slow things versus non-HTTP-based security.
>
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
>
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it. With HttpClient one can limit the number of pooled
> connnections per host:
>
> http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/Mult
> iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29
>
> Connections are not free of course, but Jetty has been benchmarked at
> 20,000 concurrent connections:
>
> http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/
>
>> In short, I think that an HTTP transport is great for playing with, but
>> I don't think you can assume it will work as the primary transport.
>
> I agree, we cannot assume it. But it's easy to try it and see how it
> fares. Any investment in getting it working is perhaps not wasted,
> since, besides providing a performance baseline, it also may be useful
> to provide HTTP-based access to services even if a higher-performance
> option is implemented.
Will the RPC over HTTP be transparent so that that we can replace with a
different layer if needed?
My worry was the separation of data and checksums; someone had mentioned
that one could do this over 2 RPCs - that is not transparent.
Also the other issue is porting from data transfer socket streams to RPC -
that port will not be transparent. We cannot afford to loose performance
over that change. Further, moving from streaming sockets to RPC is a very
significant code change to the dfs-client and data nodes. I assume that we
going to create a branch that moves the data transfer protocols to RPC and
test the performance and if it is good then we commit and move to RPC?
I am worried about this part - I am surprised that you two are not. Am I
missing something here?
sanjay
>
> Doug
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Owen O'Malley wrote:
> SPNEGO is the
> standard method of using Kerberos with HTTP and we are planning to use
> that for the web UI's.
Java 6 also supports using SPNEGO for RPC over HTTP out of the box:
http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html
> I also have serious doubts about performance, but that is hard to answer
> until we have code to test.
The good news is that, since the HTTP stuff is already implemented, we
can test its performance easily. Performance of insecure access over
HTTP looks good so far. It's an open question are how much HTTP-based
security will slow things versus non-HTTP-based security.
> It is an interesting question how much we
> depend on being able to answer queries out of order. There are some
> parts of the code where overlapping requests from the same client
> matter. In particular, the terasort scheduler uses threads to access the
> namenode. That would stop providing any pipelining, which I believe
> would be significant.
No, we wouldn't stop any pipelining, we'd just use more connections to
implement it. With HttpClient one can limit the number of pooled
connnections per host:
http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29
Connections are not free of course, but Jetty has been benchmarked at
20,000 concurrent connections:
http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/
> In short, I think that an HTTP transport is great for playing with, but
> I don't think you can assume it will work as the primary transport.
I agree, we cannot assume it. But it's easy to try it and see how it
fares. Any investment in getting it working is perhaps not wasted,
since, besides providing a performance baseline, it also may be useful
to provide HTTP-based access to services even if a higher-performance
option is implemented.
Doug
Re: HTTP transport?
Posted by Owen O'Malley <om...@apache.org>.
I still don't see how to make this play well with security. Security
needs to go under the transport layer so that it is easy to add
encryption on the wire. If you go with HTTP, the only way that is
portable at all is to use HTTP over SSL. SSL is for when there aren't
shared keys and Kerberos provides those shared keys. SPNEGO is the
standard method of using Kerberos with HTTP and we are planning to use
that for the web UI's. But SPNEGO is very much the least painful of
the alternatives and I'd rather not force our RPC services into that
corner.
I also have serious doubts about performance, but that is hard to
answer until we have code to test. It is an interesting question how
much we depend on being able to answer queries out of order. There are
some parts of the code where overlapping requests from the same client
matter. In particular, the terasort scheduler uses threads to access
the namenode. That would stop providing any pipelining, which I
believe would be significant.
In short, I think that an HTTP transport is great for playing with,
but I don't think you can assume it will work as the primary transport.
-- Owen
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
>> With respect to Avro/Hadoop, I suspect requests from clients to be time
>> clustered.
>
> That was my thought as well. The thing that gets me is that in the case
> of Hadoop (and the related subprojects) the clients utilizing this
> particular HTTP connection are probably going to be pretty small (maybe
> low thousands?). This is even better for keep alive as there's a solid
> chance you're going to have a high reuse rate. Of course, I'm assuming
> we're talking about things like name node to data node, hbase client to
> region servers, and those types of communications. Even if you just used
> Jetty (or any other thin HTTP 1.1 container that supports KA), one
> should easily be able to see good performance.
>
Absolutely.
Additionally, I realized one more thing -- when a client knows they aren't
likely to send another request soon, they can send a request with
Connection: close.
Well behaved clients can help maximize the benefit.
> Regards.
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
>
Re: HTTP transport?
Posted by Eric Sammer <er...@lifeless.net>.
Scott Carey wrote:
> Even in the beacon case, if the browser is likely to send another request
> shortly, it cuts the effective network latency in half.
Which is generally not the case in the beacon / ad server use case. That
was the only point I was making. That's besides the point, though. I
think we both agree that KA for something like Avro transport is
probably good.
> Establishing a TCP
> connection is at minimum one full round trip -- before the request. If
> latency is important KeepAlive is useful as long as a second request is
> expected in a short enough time.
>
> As long as the server is not process or thread per connection, one can scale
> up connection count rather high (20k) if necessary.
>
> With respect to Avro/Hadoop, I suspect requests from clients to be time
> clustered.
That was my thought as well. The thing that gets me is that in the case
of Hadoop (and the related subprojects) the clients utilizing this
particular HTTP connection are probably going to be pretty small (maybe
low thousands?). This is even better for keep alive as there's a solid
chance you're going to have a high reuse rate. Of course, I'm assuming
we're talking about things like name node to data node, hbase client to
region servers, and those types of communications. Even if you just used
Jetty (or any other thin HTTP 1.1 container that supports KA), one
should easily be able to see good performance.
Regards.
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
On 10/5/09 1:53 PM, "Eric Sammer" <er...@lifeless.net> wrote:
> Ryan:
>
> Certainly keep alive will help in this case, if that's what you're
> referring to. The server holds the socket for N seconds or M requests,
> which ever comes first. What you're saving with KA is the connection
> setup / tear down. If you have a lot of cases where the client makes a
> single request and goes away, then KA hurts because the server holds the
> connection for the KA timeout (N seconds). This *really* helps if you're
> using TLS due to the additional connection setup overhead.
>
> It's my opinion and experience that KA helps greatly in the case of many
> exchanges between a small to medium number of clients and a server such
> as RPC. The anti-example is an ad server or web beacon server, for instance.
>
Even in the beacon case, if the browser is likely to send another request
shortly, it cuts the effective network latency in half. Establishing a TCP
connection is at minimum one full round trip -- before the request. If
latency is important KeepAlive is useful as long as a second request is
expected in a short enough time.
As long as the server is not process or thread per connection, one can scale
up connection count rather high (20k) if necessary.
With respect to Avro/Hadoop, I suspect requests from clients to be time
clustered.
> Regards.
>
> Ryan Rawson wrote:
>> I have a question about these headers... will they impact the ability to do
>> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
>> rpcs/second. Would this help or hinder?
>>
>> On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:
>>
>> Doug Cutting wrote: > More or less. Except we can probably arrange to omit
>> most of those > response...
>> Content-Type and Server are probably unavoidable. Some of the others are
>> extremely helpful during development / debugging / etc. It depends on
>> how "open" you are about HTTP being the transport (i.e. do you let
>> developers augment these headers to support additional features, etc.).
>> This may not make sense in the context of something specialized like
>> Avro transport.
>>
>>> I today implemented a simple HTTP-based transport for Avro: > >
>> https://issues.apache.org/jira...
>> Just out of curiousity, were you using HTTP keep alive? During testing
>> on a project a few years ago, I found a huge difference if Keep Alive is
>> supported. In retrospect, that should have been obvious. I'd imagine the
>> usage pattern here would be a large number of repeated calls between the
>> same client / server within a short period of time; perfect for KA.
>>
>> Regards.
>> --
>> Eric Sammer
>> eric@lifless.net
>> http://esammer.blogspot.com
>>
>
>
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
>
Re: HTTP transport?
Posted by Eric Sammer <er...@lifeless.net>.
Ryan:
Certainly keep alive will help in this case, if that's what you're
referring to. The server holds the socket for N seconds or M requests,
which ever comes first. What you're saving with KA is the connection
setup / tear down. If you have a lot of cases where the client makes a
single request and goes away, then KA hurts because the server holds the
connection for the KA timeout (N seconds). This *really* helps if you're
using TLS due to the additional connection setup overhead.
It's my opinion and experience that KA helps greatly in the case of many
exchanges between a small to medium number of clients and a server such
as RPC. The anti-example is an ad server or web beacon server, for instance.
Regards.
Ryan Rawson wrote:
> I have a question about these headers... will they impact the ability to do
> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
> rpcs/second. Would this help or hinder?
>
> On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:
>
> Doug Cutting wrote: > More or less. Except we can probably arrange to omit
> most of those > response...
> Content-Type and Server are probably unavoidable. Some of the others are
> extremely helpful during development / debugging / etc. It depends on
> how "open" you are about HTTP being the transport (i.e. do you let
> developers augment these headers to support additional features, etc.).
> This may not make sense in the context of something specialized like
> Avro transport.
>
>> I today implemented a simple HTTP-based transport for Avro: > >
> https://issues.apache.org/jira...
> Just out of curiousity, were you using HTTP keep alive? During testing
> on a project a few years ago, I found a huge difference if Keep Alive is
> supported. In retrospect, that should have been obvious. I'd imagine the
> usage pattern here would be a large number of repeated calls between the
> same client / server within a short period of time; perfect for KA.
>
> Regards.
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
>
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com
Re: HTTP transport?
Posted by Ryan Rawson <ry...@gmail.com>.
I have a question about these headers... will they impact the ability to do
many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
rpcs/second. Would this help or hinder?
On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:
Doug Cutting wrote: > More or less. Except we can probably arrange to omit
most of those > response...
Content-Type and Server are probably unavoidable. Some of the others are
extremely helpful during development / debugging / etc. It depends on
how "open" you are about HTTP being the transport (i.e. do you let
developers augment these headers to support additional features, etc.).
This may not make sense in the context of something specialized like
Avro transport.
> I today implemented a simple HTTP-based transport for Avro: > >
https://issues.apache.org/jira...
Just out of curiousity, were you using HTTP keep alive? During testing
on a project a few years ago, I found a huge difference if Keep Alive is
supported. In retrospect, that should have been obvious. I'd imagine the
usage pattern here would be a large number of repeated calls between the
same client / server within a short period of time; perfect for KA.
Regards.
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com
Re: HTTP transport?
Posted by Eric Sammer <er...@lifeless.net>.
Doug Cutting wrote:
> More or less. Except we can probably arrange to omit most of those
> response headers except Content-Length. Are any others strictly required?
Content-Type and Server are probably unavoidable. Some of the others are
extremely helpful during development / debugging / etc. It depends on
how "open" you are about HTTP being the transport (i.e. do you let
developers augment these headers to support additional features, etc.).
This may not make sense in the context of something specialized like
Avro transport.
> I today implemented a simple HTTP-based transport for Avro:
>
> https://issues.apache.org/jira/browse/AVRO-129
>
> In some simple benchmarks I am able to make over 5000 sequential
> RPCs/second, each with ~100 bytes of response payload.
Just out of curiousity, were you using HTTP keep alive? During testing
on a project a few years ago, I found a huge difference if Keep Alive is
supported. In retrospect, that should have been obvious. I'd imagine the
usage pattern here would be a large number of repeated calls between the
same client / server within a short period of time; perfect for KA.
Regards.
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
BTW, java.net.UrlConnection is the likely bottleneck there - it stinks performance-wise. The Apache commons http client is much faster. Try out using Jmeter and switch from one connector to the other for an example.
On 9/29/09 4:17 PM, "Doug Cutting" <cu...@apache.org> wrote:
stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
>
> GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
> Host: www.example.com
>
>
> HTTP/1.1 200 OK
> Date: Mon, 23 May 2005 22:38:34 GMT
> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
> Etag: "3f80f-1b6-3e1cb03b"
> Accept-Ranges: bytes
> Content-Length: 438
> Connection: close
> Content-Type: X-avro/binary
>
>
> ... or some variation on above on each and every RPC?
More or less. Except we can probably arrange to omit most of those
response headers except Content-Length. Are any others strictly required?
I today implemented a simple HTTP-based transport for Avro:
https://issues.apache.org/jira/browse/AVRO-129
In some simple benchmarks I am able to make over 5000 sequential
RPCs/second, each with ~100 bytes of response payload. Increasing
response payloads to 100kB slows this to around 2500 RPCs/second, giving
throughput of 250MB/second, or 2.5Gbit/s. This is with both client and
server running on my laptop. The client is java.net.URLConnection and
the server is Jetty with its default configuration.
Doug
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
>
> GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
> Host: www.example.com
>
>
> HTTP/1.1 200 OK
> Date: Mon, 23 May 2005 22:38:34 GMT
> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
> Etag: "3f80f-1b6-3e1cb03b"
> Accept-Ranges: bytes
> Content-Length: 438
> Connection: close
> Content-Type: X-avro/binary
>
>
> ... or some variation on above on each and every RPC?
More or less. Except we can probably arrange to omit most of those
response headers except Content-Length. Are any others strictly required?
I today implemented a simple HTTP-based transport for Avro:
https://issues.apache.org/jira/browse/AVRO-129
In some simple benchmarks I am able to make over 5000 sequential
RPCs/second, each with ~100 bytes of response payload. Increasing
response payloads to 100kB slows this to around 2500 RPCs/second, giving
throughput of 250MB/second, or 2.5Gbit/s. This is with both client and
server running on my laptop. The client is java.net.URLConnection and
the server is Jetty with its default configuration.
Doug
Re: HTTP transport?
Posted by stack <st...@duboce.net>.
On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:
>
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
> stuff out of Avro's payload into HTTP headers. The downside of that would
> be that, if we still wish to support non-HTTP transports, we'd end up with
> duplicated logic.
>
There would be loads of upside I'd imagine if there was a natural mapping of
avro payload specifiers and metadata up into http headers in terms of
visibility
So, are we're talking about doing something like following for a
request/response:
GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
Host: www.example.com
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: X-avro/binary
... or some variation on above on each and every RPC?
St.Ack
Re: HTTP transport?
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 29, 2009, at 2:08 PM, Doug Cutting wrote:
> ...
>
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and
> pull stuff out of Avro's payload into HTTP headers. The downside of
> that would be that, if we still wish to support non-HTTP transports,
> we'd end up with duplicated logic.
>
I would prefer to retain layer independence so that we can use other
transports.
(I am still not sold on HTTP as a transport so far but am listening
with an open mind).
>
>
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Raghu Angadi wrote:
> Does this mean current Avro RPC transport (an improved version of Hadoop
> RPC) can still exist as long as it supported by developers?
Sure, folks can create new transports for Avro. There is, for example,
in Hadoop Common some code that tunnels Avro RPCs inside Hadoop RPCs.
> Where does security lie : Avro or Transport layer?
That's not yet clear. If we settle on HTTP as the preferred transport,
then the transport should probably handle security, since many security
standards already exist for HTTP and many HTTP servers and clients
already support adding new security mechanisms. I'd rather not
re-invent all this in Avro if we can avoid it.
> If it is part of transport : How does an app get hold of required
> information (e.g. user identity).
Perhaps the way we currently do this in the RPC server, with thread
locals? For example, the Avro RPC servlet could have a static method
that returns that returns the value of
HttpServletRequest#getUserPrincipal().
> May be 'transceiver' can have an interface that can transfer security
> information between transport layer and Avro.
Yes, we could add methods like getPrincipal() to Transciever, but we'd
still probably need to use a thread local accessed by a static method to
get the Transciever if we continue to use reflection for server
implementations. Or we could stray from reflection, and make services
implement an interface through which we can pass them things like the
principal.
Doug
Re: HTTP transport?
Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Doug Cutting wrote:
> stack wrote:
>> What do you think the path on the first line look like? Will it be a
>> method
>> name or will it be customizable?
>
> Avro RPC currently includes the message name in the payload, so, unless
> that changes, for Avro RPC, we'd probably use a different URL per
> protocol. As a convention we might use the namespace-qualified protocol
> name as the URL path.
>
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and
> pull stuff out of Avro's payload into HTTP headers. The downside of
> that would be that, if we still wish to support non-HTTP transports,
> we'd end up with duplicated logic.
Keeping Avro payload independent of transport seems pretty useful, at
least for now. As understand Avro payload is Avro 'proper' (i.e. it
issupported in all the languages supported by Avro... and other goodies).
I just noticed AVRO-129 and it seems like a great example of using HTTP
transport.
Does this mean current Avro RPC transport (an improved version of Hadoop
RPC) can still exist as long as it supported by developers?
Where does security lie : Avro or Transport layer?
If it is part of Avro : transport layer does not matter for security.
If it is part of transport : How does an app get hold of required
information (e.g. user identity).
May be 'transceiver' can have an interface that can transfer security
information between transport layer and Avro.
Raghu.
> If we fully embraced HTTP as Avro's primary RPC transport then it might
> make sense to move the message name to the URL and to use the HTTP
> return code to determine whether the response is an error or not. Avro's
> RPC payload also currently includes request and response metadata, which
> are functionally redundant with HTTP headers.
>
>> (In hbase, it might be nice to have path be
>> /tablename/row/family/qualifier etc).
>
> It sounds like you'd perhaps like to be able to put RPC request
> parameters into the URL? I don't see that being done automatically in a
> general way for arbitrary parameter types without the URLs getting
> really ugly and adding a lot of complexity. For this it might be better
> to write a servlet filter that constructs the appropriate Avro-format
> request and forwards it to the RPC url.
>
> Doug
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
stack wrote:
> What do you think the path on the first line look like? Will it be a method
> name or will it be customizable?
Avro RPC currently includes the message name in the payload, so, unless
that changes, for Avro RPC, we'd probably use a different URL per
protocol. As a convention we might use the namespace-qualified protocol
name as the URL path.
Alternately, we could try to make Avro's RPC more HTTP-friendly, and
pull stuff out of Avro's payload into HTTP headers. The downside of
that would be that, if we still wish to support non-HTTP transports,
we'd end up with duplicated logic.
If we fully embraced HTTP as Avro's primary RPC transport then it might
make sense to move the message name to the URL and to use the HTTP
return code to determine whether the response is an error or not.
Avro's RPC payload also currently includes request and response
metadata, which are functionally redundant with HTTP headers.
> (In hbase, it might be nice to have path be /tablename/row/family/qualifier etc).
It sounds like you'd perhaps like to be able to put RPC request
parameters into the URL? I don't see that being done automatically in a
general way for arbitrary parameter types without the URLs getting
really ugly and adding a lot of complexity. For this it might be better
to write a servlet filter that constructs the appropriate Avro-format
request and forwards it to the RPC url.
Doug
Re: HTTP transport?
Posted by stack <st...@duboce.net>.
On Tue, Sep 29, 2009 at 12:43 PM, Doug Cutting <cu...@apache.org> wrote:
>
> The question I'm asking now is about the wire format, whether we wish to
> precede each RPC request with something like "GET
> /avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with
> "HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g.,
> Content-Type and Content-Length). I think there are great benefits to using
> a single, standard protocol on the wire. Which server and client
> implementations we use will be determined by performance, features, etc.
> But using a standard wire format will greatly simplify things as we attempt
> to support multiple languages. Since we want to provide browser access,
> we're compelled to support HTTP. So the question is, are there compelling
> reasons why HTTP should not be used for other, non-browser, access?
I like the idea of using a proven transport.
The HTTP request and response header verbiage seems profligate if whats
being passed is small.
What do you think the path on the first line look like? Will it be a method
name or will it be customizable? (In hbase, it might be nice to have path be
/tablename/row/family/qualifier etc).
St.Ack
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> What about out of order exchange. Will we be able to support that with
> http transport?
Out-of-order exchange was originally added to Hadoop's RPC when it was a
part of Nutch. It's an important optimization for distributed search,
but it's not clear how important it is currently to Hadoop.
That said, the simple way to deal with this in HTTP is to use a client
library that pools connections, so that, if a second request to the same
service is made by another thread in the same client process before the
first has returned, a second connection is opened. If this is common,
the high-water mark of connections on the server will be higher.
However with an async-io-based server, the number of connections should
not be a primary bottleneck. And again, we don't know how common this is.
Doug
Re: HTTP transport?
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 29, 2009, at 12:43 PM, Doug Cutting wrote:
> Sanjay Radia wrote:
> > Wrt connection pooling/async servers: Can't we use the same
> libraries
> > that Jetty and Tomcat use?
> > Grizzly?
>
> Grizzly also supports HTTP. Choosing Grizzly is independent of
> choosing
> HTTP as a wire transport or choosing a server.
>
Agreed.
Hence the main advantages that remain for http transport are
1) language independent spec for the protocol. The message headers
will be in avro so that is easy and the message exchange should be
fairly straightforward. I see this as a minor advantage for using http
transport.
2) code to implement the transport in multiple languages.
(2) is a significant advantage.
Once we put in the security modifications, will it remain that
portable? We should look at that more closely.
What about out of order exchange. Will we be able to support that with
http transport?
sanjay
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> Wrt connection pooling/async servers: Can't we use the same libraries
> that Jetty and Tomcat use?
> Grizzly?
Grizzly also supports HTTP. Choosing Grizzly is independent of choosing
HTTP as a wire transport or choosing a server.
The question I'm asking now is about the wire format, whether we wish to
precede each RPC request with something like "GET
/avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with
"HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g.,
Content-Type and Content-Length). I think there are great benefits to
using a single, standard protocol on the wire. Which server and client
implementations we use will be determined by performance, features, etc.
But using a standard wire format will greatly simplify things as we
attempt to support multiple languages. Since we want to provide browser
access, we're compelled to support HTTP. So the question is, are there
compelling reasons why HTTP should not be used for other, non-browser,
access?
> Yes we are expecting to use encryption down the road.
Do we expect to use something different from TLS? With its 'resume'
feature, is TLS performance unacceptable? Would we implement some other
encryption protocol, or use a non-standards-based encryption protocol?
Doug
Re: HTTP transport?
Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 28, 2009, at 3:42 PM, Doug Cutting wrote:
> Owen O'Malley wrote:
> > I've got concerns about this. Both tactical and strategic. The
> tactical
> > problem is that I need to get security (both Kerberos and token)
> in to
> > 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be
> > done roughly in 5 months. If you switch off of the current RPC
> code base
> > to a completely new RPC code base, I don't see that happening.
>
> What transport do you expect to use with Avro? If wire-
> compatibility is
> a goal, and that includes access from languages besides Java, then we
> must use a transport that's well-specified and Java-independent. HTTP
> is both of these. The existing Hadoop RPC protocol is not.
>
> We could adapting Hadoop's existing RPC transport to be well-specified
> and language independent. This is perhaps not a huge task, but it
> feels
> to me a bit like re-inventing much of what's already in HTTP clients
> and
> servers these days: connection-pooling, async servers, etc.
>
Wrt connection pooling/async servers: Can't we use the same libraries
that Jetty and Tomcat use?
Grizzly?
> grate Kerberos with
> Jetty than with a homegrown protocol and server?
>
>
> > - very expensive on the wire encryption (ssl)
>
> If we don't use HTTP, will we be providing on-wire encryption? If
> not,
> this is moot.
>
Yes we are expecting to use encryption down the road.
>
> Finally, need to have secure HTTP-based access anyway, right? If we
> use
> HTTP as our RPC transport mightn't we reuse most of that effort?
>
> Doug
>
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Owen O'Malley wrote:
> I've got concerns about this. Both tactical and strategic. The tactical
> problem is that I need to get security (both Kerberos and token) in to
> 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be
> done roughly in 5 months. If you switch off of the current RPC code base
> to a completely new RPC code base, I don't see that happening.
What transport do you expect to use with Avro? If wire-compatibility is
a goal, and that includes access from languages besides Java, then we
must use a transport that's well-specified and Java-independent. HTTP
is both of these. The existing Hadoop RPC protocol is not.
We could adapting Hadoop's existing RPC transport to be well-specified
and language independent. This is perhaps not a huge task, but it feels
to me a bit like re-inventing much of what's already in HTTP clients and
servers these days: connection-pooling, async servers, etc. Plus we
take on the onus of fully specifying the transport, so that it may be
implemented in other languages, and we need to provide some alternate
implementations to demonstrate this.
Do you feel our existing RPC framework's transport is actually more
scalable and reliable than, say, Jetty? Do you think it would be
substantially harder to add, e.g., token-based security to Jetty than to
a homegrown server?
> [ HTTP ] also has a couple of disadvantages:
> - poor integration with kerberos
Do you think it would be substantially harder to integrate Kerberos with
Jetty than with a homegrown protocol and server?
> - very expensive on the wire encryption (ssl)
If we don't use HTTP, will we be providing on-wire encryption? If not,
this is moot.
Finally, need to have secure HTTP-based access anyway, right? If we use
HTTP as our RPC transport mightn't we reuse most of that effort?
Doug
Re: HTTP transport?
Posted by Owen O'Malley <om...@apache.org>.
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote:
> I'm considering an HTTP-based transport for Avro as the preferred,
> high-performance option.
I've got concerns about this. Both tactical and strategic. The
tactical problem is that I need to get security (both Kerberos and
token) in to 0.22. I'd really like to get Avro RPC into 0.22. I'd like
both to be done roughly in 5 months. If you switch off of the current
RPC code base to a completely new RPC code base, I don't see that
happening.
> HTTP has lots of advantages. In particular, it already has
> - lots of authentication, authorization and encryption support;
> - highly optimized servers;
> - monitoring, logging, etc.
It also has a couple of disadvantages:
- poor integration with kerberos
- very expensive on the wire encryption (ssl)
> Tomcat and other servlet containers support async NIO, where a
> thread is not required per connection.
I'm also concerned about the weight of Tomcat. Everything I've read
about it says that it take a lot more memory and cpu than Jetty. I
think a solution that requires Tomcat may be problematic...
-- Owen
Re: HTTP transport?
Posted by Doug Cutting <cu...@apache.org>.
Scott Carey wrote:
> HTTP is very useful and typically performs very well. It has lots of
> things built-in too. In addition to what you mention, it has a
> caching mechanism built-in, range queries, and all sorts of ways to
> tag along state if needed. To top it off there are a lot of testing
> and debugging tools available for it. So from that front using it is
> very attractive.
Glad you agree!
> However, In my experience zero-copy is not going to be much of a gain
> performance-wise for this sort of application, and will limit what
> can be done. As long as a servlet doesn't transcode data and mostly
> copies, it will be very fast - many multiples of gigabit ethernet
> speed per CPU - far more than most disk setups will handle for a
> while.
In MapReduce, datanodes are also running map and reduce tasks, so we'd
like it if datanodes not only keep up with disks and networks, but also
use minimal CPU to do so. Zerocopy on the datanode has been shown to
help significantly MapReduce benchmarks. That said, zero copy may or
may not be significantly better than one-copy. I intend to benchmark
that. But the important thing to measure is not just throughput but
also idle CPU.
> Additionally, I'm not sure CRC checking should occur on the
> client. TCP/IP already checksums packets, so network data corruption
> over HTTP is not a primary concern. The big concern is silent data
> corruption on the disk.
I believe that disks are the largest source of data corruption, but I am
not confident they are the only source. HDFS uses end-to-end checksums.
As data is written to HDFS it is immediately checksummed on the
client. This checksum then lives with the data and is validated on the
client immediately before the data is returned to the application. The
goal is to catch corruption wherever it may occur, on disks, on the
network, or while buffered in memory. In addition, the checksum is
validated after data is transmitted to datanodes but before before
blocks are stored, so that initial network and memory corruptions are
caught early and the writing process fails, rather than permitting an
application to write corrupt data. Finally, datanodes periodically scan
for corrupt blocks on disks, replacing them with non-corrupt replicas,
decreasing the chance that over time all replicas become corrupt.
> Additionally, embedding Tomcat tends to be more tricky than Jetty,
> though that can be overcome. One might argue that we don't even want
> a servlet container, we just want an HTTP connector. The Servlet API
> is familiar, but for a high performance transport it might just be
> overhead and restrictive. Direct access to Tomcat's NIO connector
> might be significantly lighter-weight and more flexible. Tomcat's NIO
> connector implementation works great and I have had great success
> with up to 10K connections with the pure Java connector using
> ordinary byte buffers and about 20 servlet threads.
I hope to start benchmarking bulk data RPC over the next few weeks.
I'll probably start with a servlet using Jetty, then see if I can
increase throughput and decrease CPU utilization through the use of
things like Tomcat's NIO connector, Grizzly, etc.
Doug
Re: HTTP transport?
Posted by Scott Carey <sc...@richrelevance.com>.
Ok, I have some thoughts on this. I might be misinterpreting some use cases here however.
HTTP is very useful and typically performs very well. It has lots of things built-in too. In addition to what you mention, it has a caching mechanism built-in, range queries, and all sorts of ways to tag along state if needed. To top it off there are a lot of testing and debugging tools available for it. So from that front using it is very attractive.
However, In my experience zero-copy is not going to be much of a gain performance-wise for this sort of application, and will limit what can be done. As long as a servlet doesn't transcode data and mostly copies, it will be very fast - many multiples of gigabit ethernet speed per CPU - far more than most disk setups will handle for a while. Furthermore, it is easier to optimize disk requests to be 'sequentially chunky' if it goes through the JVM. And I suspect that for many use cases, optimizing disk I/O is more valuable than a little bit of extra CPU spent copying data into and out of the process.
Additionally, I'm not sure CRC checking should occur on the client. TCP/IP already checksums packets, so network data corruption over HTTP is not a primary concern. The big concern is silent data corruption on the disk. For the DataNode use case, it should find such errors as early as possible, and not rely on clients discovering errors. Then it can coordinate with the NameNode on fixing the block or discarding it. So if it has to check the file integrity anyway, there is no reason to worry about zero-copy. Avoiding the extra request for the CRC data at least partially counters the loss of zero-copy.
Additionally, embedding Tomcat tends to be more tricky than Jetty, though that can be overcome. One might argue that we don't even want a servlet container, we just want an HTTP connector. The Servlet API is familiar, but for a high performance transport it might just be overhead and restrictive. Direct access to Tomcat's NIO connector might be significantly lighter-weight and more flexible.
Tomcat's NIO connector implementation works great and I have had great success with up to 10K connections with the pure Java connector using ordinary byte buffers and about 20 servlet threads. But if a large number of open connections are not needed (less than about 5x the number of CPU core threads) then thread-per-connection servlet containers work ok too. These sort of implementation details can evolve over time however.
Just my 2c
-Scott
On 9/11/09 2:41 PM, "Doug Cutting" <cu...@apache.org> wrote:
I'm considering an HTTP-based transport for Avro as the preferred,
high-performance option.
HTTP has lots of advantages. In particular, it already has
- lots of authentication, authorization and encryption support;
- highly optimized servers;
- monitoring, logging, etc.
Tomcat and other servlet containers support async NIO, where a thread is
not required per connection. A servlet can process bulk data with a
single copy to and from the socket (bypassing stream buffers). Calls
can be multiplexed over a single HTTP connection using Comet events.
http://tomcat.apache.org/tomcat-6.0-doc/aio.html
Zero copy is not an option for servlets that generate arbitrary data,
but one can specify a file/start/length tuple and Tomcat will use
sendfile to write the response. That means that while HDFS datanode
file reads could not be done via RPC, they could be done via HTTP with
zero-copy. If authentication and authorization are already done in the
HTTP server, this may not be a big loss. The HDFS client might make two
HTTP requests, one to read a files data, and another to read its
checksums. The server would then stream the entire block to the client
using sendfile, using TCP flow control as today.
Thoughts?
Doug