You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Doug Cutting <cu...@apache.org> on 2009/09/11 23:41:23 UTC

HTTP transport?

I'm considering an HTTP-based transport for Avro as the preferred, 
high-performance option.

HTTP has lots of advantages.  In particular, it already has
  - lots of authentication, authorization and encryption support;
  - highly optimized servers;
  - monitoring, logging, etc.

Tomcat and other servlet containers support async NIO, where a thread is 
not required per connection.  A servlet can process bulk data with a 
single copy to and from the socket (bypassing stream buffers).  Calls 
can be multiplexed over a single HTTP connection using Comet events.

http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Zero copy is not an option for servlets that generate arbitrary data, 
but one can specify a file/start/length tuple and Tomcat will use 
sendfile to write the response.  That means that while HDFS datanode 
file reads could not be done via RPC, they could be done via HTTP with 
zero-copy.  If authentication and authorization are already done in the 
HTTP server, this may not be a big loss.  The HDFS client might make two 
HTTP requests, one to read a files data, and another to read its 
checksums.  The server would then stream the entire block to the client 
using sendfile, using TCP flow control as today.

Thoughts?

Doug

Re: HTTP transport?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote:

> I'm considering an HTTP-based transport for Avro as the preferred,
> high-performance option.
>
> HTTP has lots of advantages.  In particular, it already has
>   - lots of authentication, authorization and encryption support;
>   - highly optimized servers;
>   - monitoring, logging, etc.
>

Q. Is this to replace the client-DN  data-transfer protocol or for ALL  
Hadoop rpc?

Q. Was authentication one of your main motivation?
The current plans for authentication is centered around kerberos.
HTTP does not fit in too well in that picture.


sanjay
>
> Tomcat and other servlet containers support async NIO, where a  
> thread is
> not required per connection.  A servlet can process bulk data with a
> single copy to and from the socket (bypassing stream buffers).  Calls
> can be multiplexed over a single HTTP connection using Comet events.
>
> http://tomcat.apache.org/tomcat-6.0-doc/aio.html
>
> Zero copy is not an option for servlets that generate arbitrary data,
> but one can specify a file/start/length tuple and Tomcat will use
> sendfile to write the response.  That means that while HDFS datanode
> file reads could not be done via RPC, they could be done via HTTP with
> zero-copy.  If authentication and authorization are already done in  
> the
> HTTP server, this may not be a big loss.  The HDFS client might make  
> two
> HTTP requests, one to read a files data, and another to read its
> checksums.  The server would then stream the entire block to the  
> client
> using sendfile, using TCP flow control as today.
>
> Thoughts?
>
> Doug
>


Re: HTTP transport?

Posted by Patrick Hunt <ph...@apache.org>.
One additional benefit of using HTTP is that people are always working 
to improve performance, and not only optimizing servers -- Google's SPDY:

http://www.readwriteweb.com/archives/spdy_google_wants_to_speed_up_the_web.php

Multiplexed requests, compressed headers, etc...

Patrick

Doug Cutting wrote:
> I'm considering an HTTP-based transport for Avro as the preferred, 
> high-performance option.
> 
> HTTP has lots of advantages.  In particular, it already has
>  - lots of authentication, authorization and encryption support;
>  - highly optimized servers;
>  - monitoring, logging, etc.
> 
> Tomcat and other servlet containers support async NIO, where a thread is 
> not required per connection.  A servlet can process bulk data with a 
> single copy to and from the socket (bypassing stream buffers).  Calls 
> can be multiplexed over a single HTTP connection using Comet events.
> 
> http://tomcat.apache.org/tomcat-6.0-doc/aio.html
> 
> Zero copy is not an option for servlets that generate arbitrary data, 
> but one can specify a file/start/length tuple and Tomcat will use 
> sendfile to write the response.  That means that while HDFS datanode 
> file reads could not be done via RPC, they could be done via HTTP with 
> zero-copy.  If authentication and authorization are already done in the 
> HTTP server, this may not be a big loss.  The HDFS client might make two 
> HTTP requests, one to read a files data, and another to read its 
> checksums.  The server would then stream the entire block to the client 
> using sendfile, using TCP flow control as today.
> 
> Thoughts?
> 
> Doug

Re: HTTP transport?

Posted by Eric Sammer <er...@lifeless.net>.
Ryan Rawson wrote:
> That's good to know. I thought ka would help... but I was also talking about
> the overhead of a header where the payload is smaller than the framing. Eg:
> 8 byte requests, excluding which rpc. This seems like we could be hurt since
> the headers are potentially 5x the size of our payload/request params.
> 

Oh, got you. That's the classic SOAP problem. ;) I think it's possible,
but to what degree I couldn't be sure. HTTP has that kind of overhead
because of the generality. I think you'd get that with anything that
isn't specifically designed to be wire efficient. Of course, you wind up
having to do what Doug originally mentioned; rebuilding and maintaining
the original stuff that HTTP (and the supporting clients) already supports.

If over-optimization is in fact the root of all evil (which I've heard
once or twice) then maybe it makes sense to start simple and iterate if
necessary. In other words, say screw it, use HTTP, strip unnecessary
headers, but design the code such that Avro's transport is interface
based in case it needs to change. I think prototyping an Avro transport
with HTTP, optimizing, and then dropping lower level if necessary is a
better approach than going straight to the latter.

All of that said, I don't have the insight into the code base that some
of the other folks do. This is based on my experience with similar high
throughput systems, but I wouldn't say I'm 100% convinced it applies
here as the payloads in those systems were bigger than 8 bytes.

-- 
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HTTP transport?

Posted by Ryan Rawson <ry...@gmail.com>.
That's good to know. I thought ka would help... but I was also talking about
the overhead of a header where the payload is smaller than the framing. Eg:
8 byte requests, excluding which rpc. This seems like we could be hurt since
the headers are potentially 5x the size of our payload/request params.

On Oct 5, 2009 4:54 PM, "Eric Sammer" <er...@lifeless.net> wrote:

Ryan:

Certainly keep alive will help in this case, if that's what you're
referring to. The server holds the socket for N seconds or M requests,
which ever comes first. What you're saving with KA is the connection
setup / tear down. If you have a lot of cases where the client makes a
single request and goes away, then KA hurts because the server holds the
connection for the KA timeout (N seconds). This *really* helps if you're
using TLS due to the additional connection setup overhead.

It's my opinion and experience that KA helps greatly in the case of many
exchanges between a small to medium number of clients and a server such
as RPC. The anti-example is an ad server or web beacon server, for instance.

Regards.

Ryan Rawson wrote: > I have a question about these headers... will they
impact the ability to do > ...
--

Eric Sammer eric@lifless.net http://esammer.blogspot.com

Re: HTTP transport?

Posted by Ryan Rawson <ry...@gmail.com>.
I wanted to chime in on a few things, since avro is a candidate for
the HBase RPC.

I am not sure that "browser compatibility" is a legitimate requirement
for this kind of thing. It is at odds with high performance in a
number of areas, and isn't the driving factor for using HTTP anyways.

Security - you can get the advantage of security standards, such as
the X.509 SSL cert, without actually using HTTPS.

Headers - I don't really think providing a caching mechanism built
into the RPC layer is a top requirement.  We'd then have to build in a
GET/POST idempotent flag into the Avro IDL, and everyone would have to
get it right, etc.

Considering my top requirement is "make bulk data access and RPC
rate/sec as high as possible", I'm not sure caching fits in here and
can work against that.


On Tue, Sep 29, 2009 at 8:06 PM, Scott Carey <sc...@richrelevance.com> wrote:
>
>
>
> On 9/29/09 2:57 PM, "stack" <st...@duboce.net> wrote:
>
>> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:
>>
>>>
>>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>>> stuff out of Avro's payload into HTTP headers.  The downside of that would
>>> be that, if we still wish to support non-HTTP transports, we'd end up with
>>> duplicated logic.
>>>
>>
>>
>> There would be loads of upside I'd imagine if there was a natural mapping of
>> avro payload specifiers and metadata up into http headers in terms of
>> visibility
>>
>
> There are some very serious disadvantages to headers if overused.
>
> I highly advise keeping what goes into the URL and headers very specific to
> support well defined features for this specific transport type.  Otherwise,
> put it in the data payload for all transports.
>
> A couple header disadvantages:
> * Limited character set allowed.  You can't put any data in there you want,
> and you can end up with an inefficient encoding mess that is not easy to
> read.
> * Headers don't take advantage of other transport features.  For example,
> Content-Encoding:gzip provides gzip compression support for the data
> payload, but you can't compress the headers in HTTP.
>
> On the other hand, Custom headers can be handy ways to implement transport
> specific handshakes or advertize capabilities (which helps build in
> cross-version compatibility).
> But browsers only work with the standard ones, so whatever 'browser
> requirement' is out there is going to be a limited subset no matter how you
> do it.
>
> This thread brings up the security features.  Payload encryption does not
> seem to be a transport feature -- but it could be done via something like
> Content-Encoding (X-Avro-Content-Encrypted?).  It seems to fit better IMO
> within the payload itself, or at the socket / network level via SSH or a
> secure tunnel.
>
> Authentication is a better fit for the transport layer -- but as mentioned
> elsewhere if it has to be done for all transports, could it fit in the
> payload somehow?
>
>>
>> So, are we're talking about doing something like following for a
>> request/response:
>>
>>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>>  Host: www.example.com
>>
>>
>>  HTTP/1.1 200 OK
>>  Date: Mon, 23 May 2005 22:38:34 GMT
>>  Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>>  Etag: "3f80f-1b6-3e1cb03b"
>>  Accept-Ranges: bytes
>>  Content-Length: 438
>>  Connection: close
>>  Content-Type: X-avro/binary
>>
>
> Its acceptable to drop a lot of the headers above.  Some of them are useful
> to implement extended functionality -- the Etag can be used for caching if
> that were desired, for example.  Keep-Alive connections and chunked
> responses are nice built-ins too.
>
>>
>> ... or some variation on above on each and every RPC?
>>
>> St.Ack
>>
>
>

Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.


On 9/29/09 2:57 PM, "stack" <st...@duboce.net> wrote:

> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:
> 
>> 
>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>> stuff out of Avro's payload into HTTP headers.  The downside of that would
>> be that, if we still wish to support non-HTTP transports, we'd end up with
>> duplicated logic.
>> 
> 
> 
> There would be loads of upside I'd imagine if there was a natural mapping of
> avro payload specifiers and metadata up into http headers in terms of
> visibility
> 

There are some very serious disadvantages to headers if overused.

I highly advise keeping what goes into the URL and headers very specific to
support well defined features for this specific transport type.  Otherwise,
put it in the data payload for all transports.

A couple header disadvantages:
* Limited character set allowed.  You can't put any data in there you want,
and you can end up with an inefficient encoding mess that is not easy to
read.
* Headers don't take advantage of other transport features.  For example,
Content-Encoding:gzip provides gzip compression support for the data
payload, but you can't compress the headers in HTTP.

On the other hand, Custom headers can be handy ways to implement transport
specific handshakes or advertize capabilities (which helps build in
cross-version compatibility).
But browsers only work with the standard ones, so whatever 'browser
requirement' is out there is going to be a limited subset no matter how you
do it.

This thread brings up the security features.  Payload encryption does not
seem to be a transport feature -- but it could be done via something like
Content-Encoding (X-Avro-Content-Encrypted?).  It seems to fit better IMO
within the payload itself, or at the socket / network level via SSH or a
secure tunnel.

Authentication is a better fit for the transport layer -- but as mentioned
elsewhere if it has to be done for all transports, could it fit in the
payload somehow? 

> 
> So, are we're talking about doing something like following for a
> request/response:
> 
>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>  Host: www.example.com
> 
> 
>  HTTP/1.1 200 OK
>  Date: Mon, 23 May 2005 22:38:34 GMT
>  Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>  Etag: "3f80f-1b6-3e1cb03b"
>  Accept-Ranges: bytes
>  Content-Length: 438
>  Connection: close
>  Content-Type: X-avro/binary
> 

Its acceptable to drop a lot of the headers above.  Some of them are useful
to implement extended functionality -- the Etag can be used for caching if
that were desired, for example.  Keep-Alive connections and chunked
responses are nice built-ins too.

> 
> ... or some variation on above on each and every RPC?
> 
> St.Ack
> 


Re: HTTP transport?

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Out of curiosity, do we have such numbers for the current hadoop RPC?


On 9/29/09 4:17 PM, "Doug Cutting" <cu...@apache.org> wrote:

stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
>
>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>  Host: www.example.com
>
>
>  HTTP/1.1 200 OK
>  Date: Mon, 23 May 2005 22:38:34 GMT
>  Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>  Etag: "3f80f-1b6-3e1cb03b"
>  Accept-Ranges: bytes
>  Content-Length: 438
>  Connection: close
>  Content-Type: X-avro/binary
>
>
> ... or some variation on above on each and every RPC?

More or less.  Except we can probably arrange to omit most of those
response headers except Content-Length.  Are any others strictly required?

I today implemented a simple HTTP-based transport for Avro:

   https://issues.apache.org/jira/browse/AVRO-129

In some simple benchmarks I am able to make over 5000 sequential
RPCs/second, each with ~100 bytes of response payload.  Increasing
response payloads to 100kB slows this to around 2500 RPCs/second, giving
throughput of 250MB/second, or 2.5Gbit/s.  This is with both client and
server running on my laptop.  The client is java.net.URLConnection and
the server is Jetty with its default configuration.

Doug


Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.


On 10/5/09 1:47 PM, "Ryan Rawson" <ry...@gmail.com> wrote:

> I have a question about these headers... will they impact the ability to do
> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
> rpcs/second. Would this help or hinder?
> 

As long as the HTTP response and request fit in one network packet
(pessimistic - 1KB or so) there is not much overhead.

50k rpcs/sec with gigabit ethernet saturated (~100MB/sec) is ~2KB per
request.

So, on faster networks an extra 100 to 200 bytes or so won't matter.

On the WAN, it will have more of an effect if the bandwidth is low and the
latency also very low if the RPC is very 'chatty' and not 'chunky' enough.

However, on most WAN links network latency is going to kill you far, far
more than an extra 200 bytes.
For example, imagine a 20ms latency link.  The max RPC throughput to a
single client is then 50/sec (one per 20ms).  With a 1k payload per request,
that's 50k per sec max data transfer.  HTTP pipelining could help here --
but isn't as well supported as one would like.

If WAN level RPC is a goal, the main challenges there will be latency
related first, and packet size related second.
On a fast local network (gigabit) I suspect throughput problems of other
sorts to be the issue before bandwidth from slightly larger packets.

Furthermore, its not like a TCP packet is 0 bytes on its own.  HTTP adds
some overhead, but it can be kept relatively trim. 


Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.
On 10/9/09 10:49 AM, "Doug Cutting" <cu...@apache.org> wrote:

> 
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
> 
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it.  With HttpClient one can limit the number of pooled
> connnections per host:
> 

Also since HTTP supports in-order pipelining out of the box, its only
out-of-order stuff that would require additional connections.

> 
> Doug
> 

Requirements may end up ruling out HTTP, but I doubt that performance (in
the insecure case) will be the cause since there are so many high
performance client and server implementations available.
Consider something lower level than the Servlet API for the server side --
it is baggage-laden and does not allow access to all data in unconverted
form or any asynchronous i/o.

In this respect, jetty has lower level, light-weight API access points.
http://docs.codehaus.org/display/JETTY/Architecture

If HTTP is not used, I suggest a strong look at apache MINA for constructing
high performance NIO clients and servers with Java http://mina.apache.org/


Re: HTTP transport?

Posted by Kan Zhang <ka...@yahoo-inc.com>.


On 10/14/09 9:37 AM, "Doug Cutting" <cu...@apache.org> wrote:

> Kan Zhang wrote:
>> One problem I see with using HTTP is that it's expensive to provide data
>> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
>> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
>> for subsequent communication over the authenticated channel. However, when
>> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
>> DIGEST, respectively), they don't provide data encryption (correct me if I'm
>> wrong). For data encryption over HTTP, one has to use SSL, which is
>> expensive.
> 
> Java supports using Kerberos-based encryption for TLS (nee SSL):
> 
> http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB
> 

This addresses part of my concern (the Kerberos part). I wasn't aware Java
already supports it. Thanks for pointing it out.

Kan


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Kan Zhang wrote:
> Thanks for pointing this out. I did a little testing on it. It seems that
> when you use Kerberos cipher suites with SSL, the Kerberos service name for
> a TLS server has to be literally "host." For example, a TLS server running
> on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use
> host/mach1.imc.org@IMC.ORG as its Kerberos principal name. I couldn't find a
> way to specify a different service name. Can someone confirm this? This can
> be a limitation since we typically run DN and TT on the same set of nodes.

This is unfortunate.  It looks to be part of the specification.

BTW, I found an approach to Kerberos over HTTP bypassing SPNEGO:

http://beamdocs.fnal.gov/DocDB/0019/001987/001/KMJ3_1-guide.pdf

Starting on page 13, he suggests having an applet that the browser loads 
to create a ticket.  The ticket is created by the user's browser talking 
directly to Kerberos.  Then the ticket can be used in subsequent 
requests to identify the user.  An application using HTTP could 
similarly contact Kerberos directly to create tickets that are sent with 
requests.  No multi-step HTTP handshake is thus required.

Doug

Re: HTTP transport?

Posted by Kan Zhang <ka...@yahoo-inc.com>.


On 10/14/09 9:37 AM, "Doug Cutting" <cu...@apache.org> wrote:

> Kan Zhang wrote:
>> One problem I see with using HTTP is that it's expensive to provide data
>> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
>> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
>> for subsequent communication over the authenticated channel. However, when
>> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
>> DIGEST, respectively), they don't provide data encryption (correct me if I'm
>> wrong). For data encryption over HTTP, one has to use SSL, which is
>> expensive.
> 
> Java supports using Kerberos-based encryption for TLS (nee SSL):
> 
> http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB
> 
> http://tools.ietf.org/html/rfc2712
> 
Thanks for pointing this out. I did a little testing on it. It seems that
when you use Kerberos cipher suites with SSL, the Kerberos service name for
a TLS server has to be literally "host." For example, a TLS server running
on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use
host/mach1.imc.org@IMC.ORG as its Kerberos principal name. I couldn't find a
way to specify a different service name. Can someone confirm this? This can
be a limitation since we typically run DN and TT on the same set of nodes.

Kan


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Kan Zhang wrote:
> One problem I see with using HTTP is that it's expensive to provide data
> encryption. We're currently adding 2 authentication mechanisms (Kerberos and
> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
> for subsequent communication over the authenticated channel. However, when
> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
> DIGEST, respectively), they don't provide data encryption (correct me if I'm
> wrong). For data encryption over HTTP, one has to use SSL, which is
> expensive.

Java supports using Kerberos-based encryption for TLS (nee SSL):

http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB

http://tools.ietf.org/html/rfc2712

There's also a standard way to use tickets over TLS:

http://tools.ietf.org/html/rfc4507

Doug


Re: HTTP transport?

Posted by Kan Zhang <ka...@yahoo-inc.com>.


On 10/9/09 12:56 PM, "Doug Cutting" <cu...@apache.org> wrote:

> Sanjay Radia wrote:
>> Will the RPC over HTTP be transparent so that that we can replace with a
>> different layer if needed?
> 
> Yes.
> 
>> My worry was the separation of data and checksums; someone had mentioned
>> that one could do this over 2 RPCs - that is not transparent.
> 
> That was suggested as a possibility if we did not want to use RPC for
> data, but rather raw HTTP, e.g., with a separate URL per block.  The
> zerocopy support built into most HTTP servers only supports entire
> responses from a single file, so if we wanted to take advantage of these
> zerocopy implementations we'd not use RPC for block access, but could
> use HTTP and hence share security, etc.  Using raw HTTP for block access
> might also perform better, since it can use TCP flow control, rather
> than RPC call/response.  In my microbenchmarks, RPC call/response was
> fast enough to easily saturate disks and networks, so that might be
> moot, although RPC call/response for file data may use more CPU than
> we'd like.  With our own transport implementation we could get RPC
> call/response to use zerocopy for file data.
> 

One problem I see with using HTTP is that it's expensive to provide data
encryption. We're currently adding 2 authentication mechanisms (Kerberos and
DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
for subsequent communication over the authenticated channel. However, when
similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
DIGEST, respectively), they don't provide data encryption (correct me if I'm
wrong). For data encryption over HTTP, one has to use SSL, which is
expensive.

Kan


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> Will the RPC over HTTP be transparent so that that we can replace with a
> different layer if needed?

Yes.

> My worry was the separation of data and checksums; someone had mentioned
> that one could do this over 2 RPCs - that is not transparent.

That was suggested as a possibility if we did not want to use RPC for 
data, but rather raw HTTP, e.g., with a separate URL per block.  The 
zerocopy support built into most HTTP servers only supports entire 
responses from a single file, so if we wanted to take advantage of these 
zerocopy implementations we'd not use RPC for block access, but could 
use HTTP and hence share security, etc.  Using raw HTTP for block access 
might also perform better, since it can use TCP flow control, rather 
than RPC call/response.  In my microbenchmarks, RPC call/response was 
fast enough to easily saturate disks and networks, so that might be 
moot, although RPC call/response for file data may use more CPU than 
we'd like.  With our own transport implementation we could get RPC 
call/response to use zerocopy for file data.

> I assume that we
> going to create a branch that moves the data transfer protocols to RPC and
> test the performance and if it is good then we commit and move to RPC?

Yes.  We obviously cannot change the file data transfer protocol without 
benchmarking.  Ideally file data transfer can share as much as possible 
with other protocols.  The most optimistic approach would be to use 
HTTP-based RPC call/response, so we ought to benchmark that.  This was 
the purpose of my recently-reported microbenchmarks.

We also need to determine whether both TCP flow-control and zerocopy are 
critical to data file performance.  If both are indeed critical, and 
HTTP proves sufficient for everything else, then we should consider 
using non-RPC HTTP for file data transfer, since it supports both 
zerocopy and TCP-based flow control, and the implementation of security, 
etc. could be shared.  But, on the other hand, if HTTP is deemed 
inappropriate for security and we develop our own RPC transport that 
permits zerocopy, and TCP flow-control over entire blocks is not 
required, then we might use RPC for file data.  What I'm hoping we can 
avoid is, as today, using different transports for different protocols, 
re-implementing security, connection pooling, async request processing, 
etc. for each, requiring separate configuration and ports for each, etc. 
  But even that might be required.  We don't know yet.

I think starting with HTTP as a hypothesis permits us to make progress 
without a lot of up-front investment.

Doug

Re: HTTP transport?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.


On 10/9/09 10:49 AM, "Doug Cutting" <cu...@apache.org> wrote:

> Owen O'Malley wrote:
>> SPNEGO is the 
>> standard method of using Kerberos with HTTP and we are planning to use
>> that for the web UI's.
> 
> Java 6 also supports using SPNEGO for RPC over HTTP out of the box:
> 
> http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html
> 
>> I also have serious doubts about performance, but that is hard to answer
>> until we have code to test.
> 
> The good news is that, since the HTTP stuff is already implemented, we
> can test its performance easily.  Performance of insecure access over
> HTTP looks good so far.  It's an open question are how much HTTP-based
> security will slow things versus non-HTTP-based security.
> 
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
> 
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it.  With HttpClient one can limit the number of pooled
> connnections per host:
> 
> http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/Mult
> iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29
> 
> Connections are not free of course, but Jetty has been benchmarked at
> 20,000 concurrent connections:
> 
> http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/
> 
>> In short, I think that an HTTP transport is great for playing with, but
>> I don't think you can assume it will work as the primary transport.
> 
> I agree, we cannot assume it.  But it's easy to try it and see how it
> fares.  Any investment in getting it working is perhaps not wasted,
> since, besides providing a performance baseline, it also may be useful
> to provide HTTP-based access to services even if a higher-performance
> option is implemented.

Will the RPC over HTTP be transparent so that that we can replace with a
different layer if needed?
My worry was the separation of data and checksums; someone had mentioned
that one could do this over 2 RPCs - that is not transparent.

Also the other issue is porting from data transfer socket streams to RPC -
that port will not be transparent. We cannot afford to loose performance
over that change. Further,  moving from streaming sockets to RPC is a very
significant code change to the dfs-client and data nodes. I assume that we
going to create a branch that moves the data transfer protocols to RPC and
test the performance and if it is good then we commit and move to RPC?
I am worried about this part - I am surprised that you two are not. Am I
missing something here?



sanjay



> 
> Doug


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Owen O'Malley wrote:
> SPNEGO is the 
> standard method of using Kerberos with HTTP and we are planning to use 
> that for the web UI's.

Java 6 also supports using SPNEGO for RPC over HTTP out of the box:

http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html

> I also have serious doubts about performance, but that is hard to answer 
> until we have code to test.

The good news is that, since the HTTP stuff is already implemented, we 
can test its performance easily.  Performance of insecure access over 
HTTP looks good so far.  It's an open question are how much HTTP-based 
security will slow things versus non-HTTP-based security.

> It is an interesting question how much we 
> depend on being able to answer queries out of order. There are some 
> parts of the code where overlapping requests from the same client 
> matter. In particular, the terasort scheduler uses threads to access the 
> namenode. That would stop providing any pipelining, which I believe 
> would be significant.

No, we wouldn't stop any pipelining, we'd just use more connections to 
implement it.  With HttpClient one can limit the number of pooled 
connnections per host:

http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29

Connections are not free of course, but Jetty has been benchmarked at 
20,000 concurrent connections:

http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/

> In short, I think that an HTTP transport is great for playing with, but 
> I don't think you can assume it will work as the primary transport.

I agree, we cannot assume it.  But it's easy to try it and see how it 
fares.  Any investment in getting it working is perhaps not wasted, 
since, besides providing a performance baseline, it also may be useful 
to provide HTTP-based access to services even if a higher-performance 
option is implemented.

Doug

Re: HTTP transport?

Posted by Owen O'Malley <om...@apache.org>.
I still don't see how to make this play well with security. Security  
needs to go under the transport layer so that it is easy to add  
encryption on the wire. If you go with HTTP, the only way that is  
portable at all is to use HTTP over SSL. SSL is for when there aren't  
shared keys and Kerberos provides those shared keys. SPNEGO is the  
standard method of using Kerberos with HTTP and we are planning to use  
that for the web UI's. But SPNEGO is very much the least painful of  
the alternatives and I'd rather not force our RPC services into that  
corner.

I also have serious doubts about performance, but that is hard to  
answer until we have code to test. It is an interesting question how  
much we depend on being able to answer queries out of order. There are  
some parts of the code where overlapping requests from the same client  
matter. In particular, the terasort scheduler uses threads to access  
the namenode. That would stop providing any pipelining, which I  
believe would be significant.

In short, I think that an HTTP transport is great for playing with,  
but I don't think you can assume it will work as the primary transport.

-- Owen


Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.
>> With respect to Avro/Hadoop, I suspect requests from clients to be time
>> clustered.
> 
> That was my thought as well. The thing that gets me is that in the case
> of Hadoop (and the related subprojects) the clients utilizing this
> particular HTTP connection are probably going to be pretty small (maybe
> low thousands?). This is even better for keep alive as there's a solid
> chance you're going to have a high reuse rate. Of course, I'm assuming
> we're talking about things like name node to data node, hbase client to
> region servers, and those types of communications. Even if you just used
> Jetty (or any other thin HTTP 1.1 container that supports KA), one
> should easily be able to see good performance.
> 

Absolutely.  
Additionally, I realized one more thing -- when a client knows they aren't
likely to send another request soon, they can send a request with
Connection: close.

Well behaved clients can help maximize the benefit.




> Regards.
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
> 


Re: HTTP transport?

Posted by Eric Sammer <er...@lifeless.net>.
Scott Carey wrote:
> Even in the beacon case, if the browser is likely to send another request
> shortly, it cuts the effective network latency in half.

Which is generally not the case in the beacon / ad server use case. That
was the only point I was making. That's besides the point, though. I
think we both agree that KA for something like Avro transport is
probably good.

> Establishing a TCP
> connection is at minimum one full round trip -- before the request.  If
> latency is important KeepAlive is useful as long as a second request is
> expected in a short enough time.
> 
> As long as the server is not process or thread per connection, one can scale
> up connection count rather high (20k) if necessary.
> 
> With respect to Avro/Hadoop, I suspect requests from clients to be time
> clustered.

That was my thought as well. The thing that gets me is that in the case
of Hadoop (and the related subprojects) the clients utilizing this
particular HTTP connection are probably going to be pretty small (maybe
low thousands?). This is even better for keep alive as there's a solid
chance you're going to have a high reuse rate. Of course, I'm assuming
we're talking about things like name node to data node, hbase client to
region servers, and those types of communications. Even if you just used
Jetty (or any other thin HTTP 1.1 container that supports KA), one
should easily be able to see good performance.

Regards.
-- 
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.
On 10/5/09 1:53 PM, "Eric Sammer" <er...@lifeless.net> wrote:

> Ryan:
> 
> Certainly keep alive will help in this case, if that's what you're
> referring to. The server holds the socket for N seconds or M requests,
> which ever comes first. What you're saving with KA is the connection
> setup / tear down. If you have a lot of cases where the client makes a
> single request and goes away, then KA hurts because the server holds the
> connection for the KA timeout (N seconds). This *really* helps if you're
> using TLS due to the additional connection setup overhead.
> 
> It's my opinion and experience that KA helps greatly in the case of many
> exchanges between a small to medium number of clients and a server such
> as RPC. The anti-example is an ad server or web beacon server, for instance.
> 

Even in the beacon case, if the browser is likely to send another request
shortly, it cuts the effective network latency in half.  Establishing a TCP
connection is at minimum one full round trip -- before the request.  If
latency is important KeepAlive is useful as long as a second request is
expected in a short enough time.

As long as the server is not process or thread per connection, one can scale
up connection count rather high (20k) if necessary.

With respect to Avro/Hadoop, I suspect requests from clients to be time
clustered.


> Regards.
> 
> Ryan Rawson wrote:
>> I have a question about these headers... will they impact the ability to do
>> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
>> rpcs/second. Would this help or hinder?
>> 
>> On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:
>> 
>> Doug Cutting wrote: > More or less. Except we can probably arrange to omit
>> most of those > response...
>> Content-Type and Server are probably unavoidable. Some of the others are
>> extremely helpful during development / debugging / etc. It depends on
>> how "open" you are about HTTP being the transport (i.e. do you let
>> developers augment these headers to support additional features, etc.).
>> This may not make sense in the context of something specialized like
>> Avro transport.
>> 
>>> I today implemented a simple HTTP-based transport for Avro: > >
>> https://issues.apache.org/jira...
>> Just out of curiousity, were you using HTTP keep alive? During testing
>> on a project a few years ago, I found a huge difference if Keep Alive is
>> supported. In retrospect, that should have been obvious. I'd imagine the
>> usage pattern here would be a large number of repeated calls between the
>> same client / server within a short period of time; perfect for KA.
>> 
>> Regards.
>> --
>> Eric Sammer
>> eric@lifless.net
>> http://esammer.blogspot.com
>> 
> 
> 
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
> 


Re: HTTP transport?

Posted by Eric Sammer <er...@lifeless.net>.
Ryan:

Certainly keep alive will help in this case, if that's what you're
referring to. The server holds the socket for N seconds or M requests,
which ever comes first. What you're saving with KA is the connection
setup / tear down. If you have a lot of cases where the client makes a
single request and goes away, then KA hurts because the server holds the
connection for the KA timeout (N seconds). This *really* helps if you're
using TLS due to the additional connection setup overhead.

It's my opinion and experience that KA helps greatly in the case of many
exchanges between a small to medium number of clients and a server such
as RPC. The anti-example is an ad server or web beacon server, for instance.

Regards.

Ryan Rawson wrote:
> I have a question about these headers... will they impact the ability to do
> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
> rpcs/second. Would this help or hinder?
> 
> On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:
> 
> Doug Cutting wrote: > More or less. Except we can probably arrange to omit
> most of those > response...
> Content-Type and Server are probably unavoidable. Some of the others are
> extremely helpful during development / debugging / etc. It depends on
> how "open" you are about HTTP being the transport (i.e. do you let
> developers augment these headers to support additional features, etc.).
> This may not make sense in the context of something specialized like
> Avro transport.
> 
>> I today implemented a simple HTTP-based transport for Avro: > >
> https://issues.apache.org/jira...
> Just out of curiousity, were you using HTTP keep alive? During testing
> on a project a few years ago, I found a huge difference if Keep Alive is
> supported. In retrospect, that should have been obvious. I'd imagine the
> usage pattern here would be a large number of repeated calls between the
> same client / server within a short period of time; perfect for KA.
> 
> Regards.
> --
> Eric Sammer
> eric@lifless.net
> http://esammer.blogspot.com
> 


-- 
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HTTP transport?

Posted by Ryan Rawson <ry...@gmail.com>.
I have a question about these headers... will they impact the ability to do
many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000
rpcs/second. Would this help or hinder?

On Oct 5, 2009 4:44 PM, "Eric Sammer" <er...@lifeless.net> wrote:

Doug Cutting wrote: > More or less. Except we can probably arrange to omit
most of those > response...
Content-Type and Server are probably unavoidable. Some of the others are
extremely helpful during development / debugging / etc. It depends on
how "open" you are about HTTP being the transport (i.e. do you let
developers augment these headers to support additional features, etc.).
This may not make sense in the context of something specialized like
Avro transport.

> I today implemented a simple HTTP-based transport for Avro: > >
https://issues.apache.org/jira...
Just out of curiousity, were you using HTTP keep alive? During testing
on a project a few years ago, I found a huge difference if Keep Alive is
supported. In retrospect, that should have been obvious. I'd imagine the
usage pattern here would be a large number of repeated calls between the
same client / server within a short period of time; perfect for KA.

Regards.
--
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HTTP transport?

Posted by Eric Sammer <er...@lifeless.net>.
Doug Cutting wrote:
> More or less.  Except we can probably arrange to omit most of those
> response headers except Content-Length.  Are any others strictly required?

Content-Type and Server are probably unavoidable. Some of the others are
extremely helpful during development / debugging / etc. It depends on
how "open" you are about HTTP being the transport (i.e. do you let
developers augment these headers to support additional features, etc.).
This may not make sense in the context of something specialized like
Avro transport.

> I today implemented a simple HTTP-based transport for Avro:
> 
>   https://issues.apache.org/jira/browse/AVRO-129
> 
> In some simple benchmarks I am able to make over 5000 sequential
> RPCs/second, each with ~100 bytes of response payload.

Just out of curiousity, were you using HTTP keep alive? During testing
on a project a few years ago, I found a huge difference if Keep Alive is
supported. In retrospect, that should have been obvious. I'd imagine the
usage pattern here would be a large number of repeated calls between the
same client / server within a short period of time; perfect for KA.

Regards.
-- 
Eric Sammer
eric@lifless.net
http://esammer.blogspot.com

Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.
BTW, java.net.UrlConnection is the likely bottleneck there - it stinks performance-wise.  The Apache commons http client is much faster.  Try out using Jmeter and switch from one connector to the other for an example.


On 9/29/09 4:17 PM, "Doug Cutting" <cu...@apache.org> wrote:

stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
>
>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>  Host: www.example.com
>
>
>  HTTP/1.1 200 OK
>  Date: Mon, 23 May 2005 22:38:34 GMT
>  Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>  Etag: "3f80f-1b6-3e1cb03b"
>  Accept-Ranges: bytes
>  Content-Length: 438
>  Connection: close
>  Content-Type: X-avro/binary
>
>
> ... or some variation on above on each and every RPC?

More or less.  Except we can probably arrange to omit most of those
response headers except Content-Length.  Are any others strictly required?

I today implemented a simple HTTP-based transport for Avro:

   https://issues.apache.org/jira/browse/AVRO-129

In some simple benchmarks I am able to make over 5000 sequential
RPCs/second, each with ~100 bytes of response payload.  Increasing
response payloads to 100kB slows this to around 2500 RPCs/second, giving
throughput of 250MB/second, or 2.5Gbit/s.  This is with both client and
server running on my laptop.  The client is java.net.URLConnection and
the server is Jetty with its default configuration.

Doug


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
stack wrote:
> So, are we're talking about doing something like following for a
> request/response:
> 
>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>  Host: www.example.com
> 
> 
>  HTTP/1.1 200 OK
>  Date: Mon, 23 May 2005 22:38:34 GMT
>  Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>  Etag: "3f80f-1b6-3e1cb03b"
>  Accept-Ranges: bytes
>  Content-Length: 438
>  Connection: close
>  Content-Type: X-avro/binary
> 
> 
> ... or some variation on above on each and every RPC?

More or less.  Except we can probably arrange to omit most of those 
response headers except Content-Length.  Are any others strictly required?

I today implemented a simple HTTP-based transport for Avro:

   https://issues.apache.org/jira/browse/AVRO-129

In some simple benchmarks I am able to make over 5000 sequential 
RPCs/second, each with ~100 bytes of response payload.  Increasing 
response payloads to 100kB slows this to around 2500 RPCs/second, giving 
throughput of 250MB/second, or 2.5Gbit/s.  This is with both client and 
server running on my laptop.  The client is java.net.URLConnection and 
the server is Jetty with its default configuration.

Doug

Re: HTTP transport?

Posted by stack <st...@duboce.net>.
On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cu...@apache.org> wrote:

>
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
> stuff out of Avro's payload into HTTP headers.  The downside of that would
> be that, if we still wish to support non-HTTP transports, we'd end up with
> duplicated logic.
>


There would be loads of upside I'd imagine if there was a natural mapping of
avro payload specifiers and metadata up into http headers in terms of
visibility


So, are we're talking about doing something like following for a
request/response:

 GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
 Host: www.example.com


 HTTP/1.1 200 OK
 Date: Mon, 23 May 2005 22:38:34 GMT
 Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
 Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
 Etag: "3f80f-1b6-3e1cb03b"
 Accept-Ranges: bytes
 Content-Length: 438
 Connection: close
 Content-Type: X-avro/binary


... or some variation on above on each and every RPC?

St.Ack

Re: HTTP transport?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 29, 2009, at 2:08 PM, Doug Cutting wrote:

> ...
>
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and
> pull stuff out of Avro's payload into HTTP headers.  The downside of
> that would be that, if we still wish to support non-HTTP transports,
> we'd end up with duplicated logic.
>

I would prefer to retain layer independence so that we can use other  
transports.
(I am still not sold on HTTP as a transport so far but am listening  
with an open mind).
>
>


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Raghu Angadi wrote:
> Does this mean current Avro RPC transport (an improved version of Hadoop 
> RPC) can still exist as long as it supported by developers?

Sure, folks can create new transports for Avro.  There is, for example, 
in Hadoop Common some code that tunnels Avro RPCs inside Hadoop RPCs.

> Where does security lie : Avro or Transport layer?

That's not yet clear.  If we settle on HTTP as the preferred transport, 
then the transport should probably handle security, since many security 
standards already exist for HTTP and many HTTP servers and clients 
already support adding new security mechanisms.  I'd rather not 
re-invent all this in Avro if we can avoid it.

> If it is part of transport : How does an app get hold of required 
> information (e.g. user identity).

Perhaps the way we currently do this in the RPC server, with thread 
locals?  For example, the Avro RPC servlet could have a static method 
that returns that returns the value of 
HttpServletRequest#getUserPrincipal().

> May be 'transceiver' can have an interface that can transfer security 
> information between transport layer and Avro.

Yes, we could add methods like getPrincipal() to Transciever, but we'd 
still probably need to use a thread local accessed by a static method to 
get the Transciever if we continue to use reflection for server 
implementations.  Or we could stray from reflection, and make services 
implement an interface through which we can pass them things like the 
principal.

Doug

Re: HTTP transport?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Doug Cutting wrote:
> stack wrote:
>> What do you think the path on the first line look like? Will it be a 
>> method
>> name or will it be customizable?
> 
> Avro RPC currently includes the message name in the payload, so, unless 
> that changes, for Avro RPC, we'd probably use a different URL per 
> protocol.  As a convention we might use the namespace-qualified protocol 
> name as the URL path.
> 
> Alternately, we could try to make Avro's RPC more HTTP-friendly, and 
> pull stuff out of Avro's payload into HTTP headers.  The downside of 
> that would be that, if we still wish to support non-HTTP transports, 
> we'd end up with duplicated logic.

Keeping Avro payload independent of transport seems pretty useful, at 
least for now. As understand Avro payload is Avro 'proper' (i.e. it 
issupported in all the languages supported by Avro... and other goodies).

I just noticed AVRO-129 and it seems like a great example of using HTTP 
transport.

Does this mean current Avro RPC transport (an improved version of Hadoop 
RPC) can still exist as long as it supported by developers?

Where does security lie : Avro or Transport layer?
If it is part of Avro : transport layer does not matter for security.
If it is part of transport : How does an app get hold of required 
information (e.g. user identity).
May be 'transceiver' can have an interface that can transfer security 
information between transport layer and Avro.

Raghu.

> If we fully embraced HTTP as Avro's primary RPC transport then it might 
> make sense to move the message name to the URL and to use the HTTP 
> return code to determine whether the response is an error or not. Avro's 
> RPC payload also currently includes request and response metadata, which 
> are functionally redundant with HTTP headers.
> 
>> (In hbase, it might be nice to have path be 
>> /tablename/row/family/qualifier etc).
> 
> It sounds like you'd perhaps like to be able to put RPC request 
> parameters into the URL?  I don't see that being done automatically in a 
> general way for arbitrary parameter types without the URLs getting 
> really ugly and adding a lot of complexity.  For this it might be better 
> to write a servlet filter that constructs the appropriate Avro-format 
> request and forwards it to the RPC url.
> 
> Doug


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
stack wrote:
> What do you think the path on the first line look like? Will it be a method
> name or will it be customizable?

Avro RPC currently includes the message name in the payload, so, unless 
that changes, for Avro RPC, we'd probably use a different URL per 
protocol.  As a convention we might use the namespace-qualified protocol 
name as the URL path.

Alternately, we could try to make Avro's RPC more HTTP-friendly, and 
pull stuff out of Avro's payload into HTTP headers.  The downside of 
that would be that, if we still wish to support non-HTTP transports, 
we'd end up with duplicated logic.

If we fully embraced HTTP as Avro's primary RPC transport then it might 
make sense to move the message name to the URL and to use the HTTP 
return code to determine whether the response is an error or not. 
Avro's RPC payload also currently includes request and response 
metadata, which are functionally redundant with HTTP headers.

> (In hbase, it might be nice to have path be /tablename/row/family/qualifier etc).

It sounds like you'd perhaps like to be able to put RPC request 
parameters into the URL?  I don't see that being done automatically in a 
general way for arbitrary parameter types without the URLs getting 
really ugly and adding a lot of complexity.  For this it might be better 
to write a servlet filter that constructs the appropriate Avro-format 
request and forwards it to the RPC url.

Doug

Re: HTTP transport?

Posted by stack <st...@duboce.net>.
On Tue, Sep 29, 2009 at 12:43 PM, Doug Cutting <cu...@apache.org> wrote:

>
> The question I'm asking now is about the wire format, whether we wish to
> precede each RPC request with something like "GET
> /avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with
> "HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g.,
> Content-Type and Content-Length).  I think there are great benefits to using
> a single, standard protocol on the wire.  Which server and client
> implementations we use will be determined by performance, features, etc.
>  But using a standard wire format will greatly simplify things as we attempt
> to support multiple languages.  Since we want to provide browser access,
> we're compelled to support HTTP.  So the question is, are there compelling
> reasons why HTTP should not be used for other, non-browser, access?



I like the idea of using a proven transport.

The HTTP request and response header verbiage seems profligate if whats
being passed is small.

What do you think the path on the first line look like? Will it be a method
name or will it be customizable? (In hbase, it might be nice to have path be
/tablename/row/family/qualifier etc).

St.Ack

Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> What about out of order exchange. Will we be able to support that with 
> http transport?

Out-of-order exchange was originally added to Hadoop's RPC when it was a 
part of Nutch.  It's an important optimization for distributed search, 
but it's not clear how important it is currently to Hadoop.

That said, the simple way to deal with this in HTTP is to use a client 
library that pools connections, so that, if a second request to the same 
service is made by another thread in the same client process before the 
first has returned, a second connection is opened.  If this is common, 
the high-water mark of connections on the server will be higher. 
However with an async-io-based server, the number of connections should 
not be a primary bottleneck.  And again, we don't know how common this is.

Doug

Re: HTTP transport?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 29, 2009, at 12:43 PM, Doug Cutting wrote:

> Sanjay Radia wrote:
> > Wrt  connection pooling/async servers: Can't we use the same  
> libraries
> > that Jetty and Tomcat use?
> >  Grizzly?
>
> Grizzly also supports HTTP.  Choosing Grizzly is independent of  
> choosing
> HTTP as a wire transport or choosing a server.
>
Agreed.
Hence the main advantages that remain for http transport are
1)  language independent spec for the protocol. The message headers  
will be in avro so that is easy and the message exchange should be  
fairly straightforward. I see this as a minor advantage for using http  
transport.
2) code to implement the transport in multiple languages.

(2) is a significant advantage.
Once we put in the security modifications, will it remain that  
portable? We should look at that more closely.

What about out of order exchange. Will we be able to support that with  
http transport?

sanjay


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Sanjay Radia wrote:
> Wrt  connection pooling/async servers: Can't we use the same libraries 
> that Jetty and Tomcat use?
>  Grizzly?

Grizzly also supports HTTP.  Choosing Grizzly is independent of choosing 
HTTP as a wire transport or choosing a server.

The question I'm asking now is about the wire format, whether we wish to 
precede each RPC request with something like "GET 
/avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with 
"HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g., 
Content-Type and Content-Length).  I think there are great benefits to 
using a single, standard protocol on the wire.  Which server and client 
implementations we use will be determined by performance, features, etc. 
  But using a standard wire format will greatly simplify things as we 
attempt to support multiple languages.  Since we want to provide browser 
access, we're compelled to support HTTP.  So the question is, are there 
compelling reasons why HTTP should not be used for other, non-browser, 
access?

> Yes we are expecting to use encryption down the road.

Do we expect to use something different from TLS?  With its 'resume' 
feature, is TLS performance unacceptable?  Would we implement some other 
encryption protocol, or use a non-standards-based encryption protocol?

Doug

Re: HTTP transport?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Sep 28, 2009, at 3:42 PM, Doug Cutting wrote:

> Owen O'Malley wrote:
> > I've got concerns about this. Both tactical and strategic. The  
> tactical
> > problem is that I need to get security (both Kerberos and token)  
> in to
> > 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be
> > done roughly in 5 months. If you switch off of the current RPC  
> code base
> > to a completely new RPC code base, I don't see that happening.
>
> What transport do you expect to use with Avro?  If wire- 
> compatibility is
> a goal, and that includes access from languages besides Java, then we
> must use a transport that's well-specified and Java-independent.  HTTP
> is both of these.  The existing Hadoop RPC protocol is not.
>
> We could adapting Hadoop's existing RPC transport to be well-specified
> and language independent.  This is perhaps not a huge task, but it  
> feels
> to me a bit like re-inventing much of what's already in HTTP clients  
> and
> servers these days: connection-pooling, async servers, etc.
>


Wrt  connection pooling/async servers: Can't we use the same libraries  
that Jetty and Tomcat use?
  Grizzly?

> grate Kerberos with
> Jetty than with a homegrown protocol and server?
>
>
> >   - very expensive on the wire encryption (ssl)
>
> If we don't use HTTP, will we be providing on-wire encryption?  If  
> not,
> this is moot.
>

Yes we are expecting to use encryption down the road.
>
> Finally, need to have secure HTTP-based access anyway, right?  If we  
> use
> HTTP as our RPC transport mightn't we reuse most of that effort?
>
> Doug
>


Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Owen O'Malley wrote:
> I've got concerns about this. Both tactical and strategic. The tactical 
> problem is that I need to get security (both Kerberos and token) in to 
> 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be 
> done roughly in 5 months. If you switch off of the current RPC code base 
> to a completely new RPC code base, I don't see that happening.

What transport do you expect to use with Avro?  If wire-compatibility is 
a goal, and that includes access from languages besides Java, then we 
must use a transport that's well-specified and Java-independent.  HTTP 
is both of these.  The existing Hadoop RPC protocol is not.

We could adapting Hadoop's existing RPC transport to be well-specified 
and language independent.  This is perhaps not a huge task, but it feels 
to me a bit like re-inventing much of what's already in HTTP clients and 
servers these days: connection-pooling, async servers, etc.  Plus we 
take on the onus of fully specifying the transport, so that it may be 
implemented in other languages, and we need to provide some alternate 
implementations to demonstrate this.

Do you feel our existing RPC framework's transport is actually more 
scalable and reliable than, say, Jetty?  Do you think it would be 
substantially harder to add, e.g., token-based security to Jetty than to 
a homegrown server?

> [ HTTP ] also has a couple of disadvantages:
>   - poor integration with kerberos

Do you think it would be substantially harder to integrate Kerberos with 
Jetty than with a homegrown protocol and server?

>   - very expensive on the wire encryption (ssl)

If we don't use HTTP, will we be providing on-wire encryption?  If not, 
this is moot.

Finally, need to have secure HTTP-based access anyway, right?  If we use 
HTTP as our RPC transport mightn't we reuse most of that effort?

Doug

Re: HTTP transport?

Posted by Owen O'Malley <om...@apache.org>.
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote:

> I'm considering an HTTP-based transport for Avro as the preferred,  
> high-performance option.

I've got concerns about this. Both tactical and strategic. The  
tactical problem is that I need to get security (both Kerberos and  
token) in to 0.22. I'd really like to get Avro RPC into 0.22. I'd like  
both to be done roughly in 5 months. If you switch off of the current  
RPC code base to a completely new RPC code base, I don't see that  
happening.

> HTTP has lots of advantages.  In particular, it already has
> - lots of authentication, authorization and encryption support;
> - highly optimized servers;
> - monitoring, logging, etc.

It also has a couple of disadvantages:
   - poor integration with kerberos
   - very expensive on the wire encryption (ssl)

> Tomcat and other servlet containers support async NIO, where a  
> thread is not required per connection.

I'm also concerned about the weight of Tomcat. Everything I've read  
about it says that it take a lot more memory and cpu than Jetty. I  
think a solution that requires Tomcat may be problematic...

-- Owen

Re: HTTP transport?

Posted by Doug Cutting <cu...@apache.org>.
Scott Carey wrote:
> HTTP is very useful and typically performs very well.  It has lots of
> things built-in too.  In addition to what you mention, it  has a
> caching mechanism built-in, range queries, and all sorts of ways to
> tag along state if needed.  To top it off there are a lot of testing
> and debugging tools available for it.  So from that front using it is
> very attractive.

Glad you agree!

> However, In my experience zero-copy is not going to be much of a gain
> performance-wise for this sort of application, and will limit what
> can be done.  As long as a servlet doesn't transcode data and mostly
> copies, it will be very fast - many multiples of gigabit ethernet
> speed per CPU - far more than most disk setups will handle for a
> while.

In MapReduce, datanodes are also running map and reduce tasks, so we'd 
like it if datanodes not only keep up with disks and networks, but also 
use minimal CPU to do so.  Zerocopy on the datanode has been shown to 
help significantly MapReduce benchmarks.  That said, zero copy may or 
may not be significantly better than one-copy.  I intend to benchmark 
that.  But the important thing to measure is not just throughput but 
also idle CPU.

> Additionally, I'm not sure CRC checking should occur on the
> client.  TCP/IP already checksums packets, so network data corruption
> over HTTP is not a primary concern.   The big concern is silent data
> corruption on the disk.

I believe that disks are the largest source of data corruption, but I am 
not confident they are the only source.  HDFS uses end-to-end checksums. 
  As data is written to HDFS it is immediately checksummed on the 
client.  This checksum then lives with the data and is validated on the 
client immediately before the data is returned to the application.  The 
goal is to catch corruption wherever it may occur, on disks, on the 
network, or while buffered in memory.  In addition, the checksum is 
validated after data is transmitted to datanodes but before before 
blocks are stored, so that initial network and memory corruptions are 
caught early and the writing process fails, rather than permitting an 
application to write corrupt data.  Finally, datanodes periodically scan 
for corrupt blocks on disks, replacing them with non-corrupt replicas, 
decreasing the chance that over time all replicas become corrupt.

> Additionally, embedding Tomcat tends to be more tricky than Jetty,
> though that can be overcome.  One might argue that we don't even want
> a servlet container, we just want an HTTP connector.  The Servlet API
> is familiar, but for a high performance transport it might just be
> overhead and restrictive.  Direct access to Tomcat's NIO connector
> might be significantly lighter-weight and more flexible. Tomcat's NIO
> connector implementation works great and I have had great success
> with up to 10K connections with the pure Java connector using
> ordinary byte buffers and about 20 servlet threads.

I hope to start benchmarking bulk data RPC over the next few weeks. 
I'll probably start with a servlet using Jetty, then see if I can 
increase throughput and decrease CPU utilization through the use of 
things like Tomcat's NIO connector, Grizzly, etc.

Doug

Re: HTTP transport?

Posted by Scott Carey <sc...@richrelevance.com>.
Ok, I have some thoughts on this.  I might be misinterpreting some use cases here however.

HTTP is very useful and typically performs very well.  It has lots of things built-in too.  In addition to what you mention, it  has a caching mechanism built-in, range queries, and all sorts of ways to tag along state if needed.  To top it off there are a lot of testing and debugging tools available for it.  So from that front using it is very attractive.

However, In my experience zero-copy is not going to be much of a gain performance-wise for this sort of application, and will limit what can be done.  As long as a servlet doesn't transcode data and mostly copies, it will be very fast - many multiples of gigabit ethernet speed per CPU - far more than most disk setups will handle for a while.  Furthermore, it is easier to optimize disk requests to be 'sequentially chunky' if it goes through the JVM.  And I suspect that for many use cases, optimizing disk I/O is more valuable than a little bit of extra CPU spent copying data into and out of the process.
Additionally, I'm not sure CRC checking should occur on the client.  TCP/IP already checksums packets, so network data corruption over HTTP is not a primary concern.   The big concern is silent data corruption on the disk.  For the DataNode use case, it should find such errors as early as possible, and not rely on clients discovering errors.  Then it can coordinate with the NameNode on fixing the block or discarding it.  So if it has to check the file integrity anyway, there is no reason to worry about zero-copy.  Avoiding the extra request for the CRC data at least partially counters the loss of zero-copy.

Additionally, embedding Tomcat tends to be more tricky than Jetty, though that can be overcome.  One might argue that we don't even want a servlet container, we just want an HTTP connector.  The Servlet API is familiar, but for a high performance transport it might just be overhead and restrictive.  Direct access to Tomcat's NIO connector might be significantly lighter-weight and more flexible.
Tomcat's NIO connector implementation works great and I have had great success with up to 10K connections with the pure Java connector using ordinary byte buffers and about 20 servlet threads.  But if a large number of open connections are not needed (less than about 5x the number of CPU core threads) then thread-per-connection servlet containers work ok too.  These sort of implementation details can evolve over time however.

Just my 2c

-Scott



On 9/11/09 2:41 PM, "Doug Cutting" <cu...@apache.org> wrote:

I'm considering an HTTP-based transport for Avro as the preferred,
high-performance option.

HTTP has lots of advantages.  In particular, it already has
  - lots of authentication, authorization and encryption support;
  - highly optimized servers;
  - monitoring, logging, etc.

Tomcat and other servlet containers support async NIO, where a thread is
not required per connection.  A servlet can process bulk data with a
single copy to and from the socket (bypassing stream buffers).  Calls
can be multiplexed over a single HTTP connection using Comet events.

http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Zero copy is not an option for servlets that generate arbitrary data,
but one can specify a file/start/length tuple and Tomcat will use
sendfile to write the response.  That means that while HDFS datanode
file reads could not be done via RPC, they could be done via HTTP with
zero-copy.  If authentication and authorization are already done in the
HTTP server, this may not be a big loss.  The HDFS client might make two
HTTP requests, one to read a files data, and another to read its
checksums.  The server would then stream the entire block to the client
using sendfile, using TCP flow control as today.

Thoughts?

Doug