You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Miles Fidelman <mf...@meetinghouse.net> on 2010/08/16 19:54:04 UTC

why erlang?

Hi Folks,

I wonder if someone might share some insight into why Erlang was chosen 
for CouchDB.

Don't get me wrong, I think Erlang is a really cool 
language/environment; I'm a big fan of designs that spawn lots of 
independent processes, and communicating via messages.  But... it 
doesn't seem like CouchDB takes advantage of all that much of Erlang's 
unique capabilities.

Hence, I'm sort of wondering why Erlang for CouchDB, and if there are 
any visions of taking more advantage of Erlang down the road.

Thanks,

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Paul,

Thanks for the detailed and thoughtful reply.

Paul Davis wrote:
> The biggest thing that an HTTP replicator has going for it is its
> simplicity. The entire protocol can be summed up in as little as "open
> an HTTP connection, stream documents edited after the last
> replication."
Unfortunately, not so simple for someone who wants to deploy lots of nodes.
> Switching the replicator to a more advanced protocol I think isn't
> really in the cards for the problem that the current replication
> scheme is meant to solve. I think that implementing a solution that
> uses P2P/UUCP/multicast discovery would be an awesome feature, but not
> something I would see going into the 'core' CouchDB distribution until
> someone steps up with a long term commitment to supporting it.
>    
I keep looking at doing something like this.  Unfortunately, I lost the 
funding source that I thought was going to pay for my time, sigh...
> Whether the replicator breaks HTTP is rather more of a philosophical
> debate best left for when I've had a few beers. I don't discount your
> points that SOAP/XML-RPC suck hard, but I don't think they have any
> bearing on the replication protocol given how its implemented.
>    
Not so much that it breaks HTTP, as that HTTP imposes some serious 
constraints on the replication approach that other protocols don't

Thanks again,

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Paul Davis <pa...@gmail.com>.

On Mon, Aug 16, 2010 at 1:54 PM, Miles Fidelman
<mf...@meetinghouse.net> wrote:
> Hi Folks,
>
> I wonder if someone might share some insight into why Erlang was chosen for
> CouchDB.
>
> Don't get me wrong, I think Erlang is a really cool language/environment;
> I'm a big fan of designs that spawn lots of independent processes, and
> communicating via messages.  But... it doesn't seem like CouchDB takes
> advantage of all that much of Erlang's unique capabilities.
>
> Hence, I'm sort of wondering why Erlang for CouchDB, and if there are any
> visions of taking more advantage of Erlang down the road.
>
> Thanks,
>
> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In<fnord>  practice, there is.   .... Yogi Berra
>
>
>

Miles,

Firstly I'd like to reemphasize that CouchDB does use Erlang in very
Erlangy ways. There's quite a bit more to the language than just
message passing. Though in the end this thread has seemed to focus on
the use of message passing (or rather, lack thereof) in regards to the
replication protocol.

I can't speak for Damien on why exactly he decided to use HTTP for the
replicator, but I can say that if I were going to design it from
scratch I would probably make very similar choices. Somewhat for
points others have made in that its ubiquitous and does very well with
firewall traversal, but those aren't the main points by a long shot.

The biggest thing that an HTTP replicator has going for it is its
simplicity. The entire protocol can be summed up in as little as "open
an HTTP connection, stream documents edited after the last
replication." Even with that simple idea there's a very large amount
of engineering that has gone into it. We have to take into account
Erlang's memory model, exponential back off when links go wonky,
resumption when they come back, tracking replication histories,
filtered replication, continuous replication, authentication, etc. And
those are just the points I know from listening to the discussion. I
bet Adam Kocoloski and Filipe Manana could go on for hours on the
details I just glossed over.

Switching the replicator to a more advanced protocol I think isn't
really in the cards for the problem that the current replication
scheme is meant to solve. I think that implementing a solution that
uses P2P/UUCP/multicast discovery would be an awesome feature, but not
something I would see going into the 'core' CouchDB distribution until
someone steps up with a long term commitment to supporting it.

Also of interest is that once you get to the 100's or 1000's of nodes
scale you're probably not going to want to use Erlang's native message
passing. Either you're going to be in a datacenter which means you'll
want to fine granted control over network utilization, or you're going
to be distributed in which case epmd/messages will have the usual
firewall/nat issues. Some other interesting points are mentioned in a
recent thread [1] on erlang-questions.

Whether the replicator breaks HTTP is rather more of a philosophical
debate best left for when I've had a few beers. I don't discount your
points that SOAP/XML-RPC suck hard, but I don't think they have any
bearing on the replication protocol given how its implemented.

HTH,
Paul Davis

[1] http://www.erlang.org/cgi-bin/ezmlm-cgi?4:msp:52886:ecobpklllbhjdniiklhn

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Jan Lehnardt wrote:
> Hi Miles,
>
> since this question comes up every once in a while, I compiled a list of links that directly or indirectly address the reasons for the choice: http://wiki.couchone.com/page/why-erlang
>
>    
Great stuff.  Thanks!

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Jan Lehnardt <ja...@apache.org>.

Hi Miles,

since this question comes up every once in a while, I compiled a list of links that directly or indirectly address the reasons for the choice: http://wiki.couchone.com/page/why-erlang


Cheers
Jan
-- 

On 16 Aug 2010, at 19:54, Miles Fidelman wrote:

> Hi Folks,
> 
> I wonder if someone might share some insight into why Erlang was chosen for CouchDB.
> 
> Don't get me wrong, I think Erlang is a really cool language/environment; I'm a big fan of designs that spawn lots of independent processes, and communicating via messages.  But... it doesn't seem like CouchDB takes advantage of all that much of Erlang's unique capabilities.
> 
> Hence, I'm sort of wondering why Erlang for CouchDB, and if there are any visions of taking more advantage of Erlang down the road.
> 
> Thanks,
> 
> Miles Fidelman
> 
> -- 
> In theory, there is no difference between theory and practice.
> In<fnord>  practice, there is.   .... Yogi Berra
> 
>

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Noah Slater wrote:
> On 16 Aug 2010, at 21:26, Miles Fidelman wrote:
>
>    
>>> My reply would be to state that the Web subsumes the Internet in many ways.
>>>
>>>        
>> My reply would be that I sure hope not.  The trend toward pushing lower level functionality on top of application layer protocols really breaks a lot of the resiliency and flexibility that comes from layering.
>>      
> Oops, my bad. You are right of course. I meant to illustrate that the Web is built on top of things, huge things. Like the Internet, and the telephone networks, or anything else you can shove TCP/IP over. I guess I think of it as being bigger than them because of that, if not technically, then conceptually. I'm very probably biased though.
I'm probably biased too - though in the other direction.

For what its worth,  I tend to think in terms of subsets.  Web traffic 
is a subset of IP traffic, email is another subset, XMPP (twitter) 
traffic is another, VoIP is another.  The superset is more complex than 
any of the subsets.

I used to like pointing out that email traffic dwarfs web traffic - not 
(for a long time) in terms of bandwidth, but in terms of individual 
transactions (particularly these days - just think of the spam that 
accumulates while reading one web page).  These days, though, video and 
VoIP dwarf both web and email in bandwidth, and I expect that tweets and 
SMS messages dwarf email in terms of message counts.  (Come to think of 
it, a twitter channel might make a nice vehicle for Couch replication. :-)

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 21:26, Miles Fidelman wrote:

>> My reply would be to state that the Web subsumes the Internet in many ways.
>>   
> My reply would be that I sure hope not.  The trend toward pushing lower level functionality on top of application layer protocols really breaks a lot of the resiliency and flexibility that comes from layering.

Oops, my bad. You are right of course. I meant to illustrate that the Web is built on top of things, huge things. Like the Internet, and the telephone networks, or anything else you can shove TCP/IP over. I guess I think of it as being bigger than them because of that, if not technically, then conceptually. I'm very probably biased though.

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Noah Slater wrote:
> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
>    
>> Actually, I'd dispute that.  The INTERNET is perhaps the largest system ever built, the web rides on top of a lot of lower level infrastructure.  There's a lot of other stuff riding on top of the underlying IP infrastructure - email, VoIP, chat, etc. - which don't rely on HTTP.  (Note: I speak as someone who dates back to almost the beginning - I spent a good part of my career at BBN, just as we were transitioning the ARPANET to TCP/IP, and it was serving as the hub of the then fledgling Internet).
>>      
> I was anticipating this response. :)
>    
;-)
> My reply would be to state that the Web subsumes the Internet in many ways.
>    
My reply would be that I sure hope not.  The trend toward pushing lower 
level functionality on top of application layer protocols really breaks 
a lot of the resiliency and flexibility that comes from layering.
>> True.  Though, it has also lead to (IMHO) abortions such as SOAP - which Dave Winer initially wrote as a way to use HTTP to tunnel traffic through firewalls.
>>      
> LOL
>
> I think you mean XML-RPC, but they're both as bad as each other.
>    
Fair point.
> In either case, they are so hilariously against everything the Web stands for, it's not really applicable!
Unfortunately, lots of people are violating what the web stands for 
these days.  Just take a look at "apps" - silos by another name.

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 21:19, Jan Lehnardt wrote:

> I'd like to add that Miles does have a point, but we have good reasons to have HTTP for now and in the future. It doesn't mean that applying specializations where applicable is not an option (double negative :).

Oh, I agree. :)

I just LUUUURVE talking about HTTP.

Re: why erlang?

Posted by J Chris Anderson <jc...@apache.org>.

On Aug 17, 2010, at 1:31 PM, J Chris Anderson wrote:

> 
> On Aug 16, 2010, at 1:28 PM, Robert Newson wrote:
> 
>> Just one point from me. The distributed goop in Erlang is pretty much
>> just for the everyone-connected-to-everyone-else old school cluster
>> model. I don't think it's useful for the kind of scale I associate
>> with CouchDB at all.
>> 
> 
> Just my 1 cent:
> 
> CouchDB replication is intentionally not special. That is, it is just another web client. It is designed and intended that other non-CouchDB / non-Erlang softwares can replicate with Couch.
> 
> Keeping everything in HTTP makes it much easier to reason about security and application logic. Eg: replication is subject to the same policy as direct client access. This takes some time to wrap your head around, but once you do, you'll realize that any other way would lead to madness.
> 
> That said, I'm not against more-effiecient transports for the existing semantics. They just seem to be optimizing the wrong thing, as the HTTP overhead doesn't matter in real life.
> 
> Also, see for instance Cloudant's code, which uses Erlang transport for clustering of the same logical Couch. Replication is for bridging multiple logical Couches. However you want to build a single big Couch, any old transport is fine. Lounge is extra awesome because it's living proof that you can build a big Couch out of smaller Couches.
> 

BTW added this to a wiki here: http://wiki.couchone.com/page/http-replication


> Chris
> 
>> B.
>> 
>> On Mon, Aug 16, 2010 at 9:25 PM, Randall Leeds <ra...@gmail.com> wrote:
>>> There is no reason I see why HTTP is not a valid transport for a DHT nor any
>>> reason why it is not possible to gossip over HTTP. I think it's confusing
>>> the issue to blame HTTP for any problem Couch has with distribution.
>>> 
>>> Enlighten me if I'm wrong, of course.
>>> 
>>> On Aug 16, 2010 1:19 PM, "Jan Lehnardt" <ja...@apache.org> wrote:
>>> 
>>> 
>>> On 16 Aug 2010, at 22:11, Noah Slater wrote:
>>> 
>>>> 
>>>> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
>>> ...
>>> I'd like to add that Miles does have a point, but we have good reasons to
>>> have HTTP for now and in the future. It doesn't mean that applying
>>> specializations where applicable is not an option (double negative :).
>>> 
>>> Cheers
>>> Jan
>>> --
>>> 
>

Re: why erlang?

Posted by J Chris Anderson <jc...@apache.org>.

On Aug 16, 2010, at 1:28 PM, Robert Newson wrote:

> Just one point from me. The distributed goop in Erlang is pretty much
> just for the everyone-connected-to-everyone-else old school cluster
> model. I don't think it's useful for the kind of scale I associate
> with CouchDB at all.
> 

Just my 1 cent:

CouchDB replication is intentionally not special. That is, it is just another web client. It is designed and intended that other non-CouchDB / non-Erlang softwares can replicate with Couch.

Keeping everything in HTTP makes it much easier to reason about security and application logic. Eg: replication is subject to the same policy as direct client access. This takes some time to wrap your head around, but once you do, you'll realize that any other way would lead to madness.

That said, I'm not against more-effiecient transports for the existing semantics. They just seem to be optimizing the wrong thing, as the HTTP overhead doesn't matter in real life.

Also, see for instance Cloudant's code, which uses Erlang transport for clustering of the same logical Couch. Replication is for bridging multiple logical Couches. However you want to build a single big Couch, any old transport is fine. Lounge is extra awesome because it's living proof that you can build a big Couch out of smaller Couches.

Chris

> B.
> 
> On Mon, Aug 16, 2010 at 9:25 PM, Randall Leeds <ra...@gmail.com> wrote:
>> There is no reason I see why HTTP is not a valid transport for a DHT nor any
>> reason why it is not possible to gossip over HTTP. I think it's confusing
>> the issue to blame HTTP for any problem Couch has with distribution.
>> 
>> Enlighten me if I'm wrong, of course.
>> 
>> On Aug 16, 2010 1:19 PM, "Jan Lehnardt" <ja...@apache.org> wrote:
>> 
>> 
>> On 16 Aug 2010, at 22:11, Noah Slater wrote:
>> 
>>> 
>>> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
>> ...
>> I'd like to add that Miles does have a point, but we have good reasons to
>> have HTTP for now and in the future. It doesn't mean that applying
>> specializations where applicable is not an option (double negative :).
>> 
>> Cheers
>> Jan
>> --
>>

Re: why erlang?

Posted by Robert Newson <ro...@gmail.com>.

Just one point from me. The distributed goop in Erlang is pretty much
just for the everyone-connected-to-everyone-else old school cluster
model. I don't think it's useful for the kind of scale I associate
with CouchDB at all.

B.

On Mon, Aug 16, 2010 at 9:25 PM, Randall Leeds <ra...@gmail.com> wrote:
> There is no reason I see why HTTP is not a valid transport for a DHT nor any
> reason why it is not possible to gossip over HTTP. I think it's confusing
> the issue to blame HTTP for any problem Couch has with distribution.
>
> Enlighten me if I'm wrong, of course.
>
> On Aug 16, 2010 1:19 PM, "Jan Lehnardt" <ja...@apache.org> wrote:
>
>
> On 16 Aug 2010, at 22:11, Noah Slater wrote:
>
>>
>> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
> ...
> I'd like to add that Miles does have a point, but we have good reasons to
> have HTTP for now and in the future. It doesn't mean that applying
> specializations where applicable is not an option (double negative :).
>
> Cheers
> Jan
> --
>

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Noah Slater wrote:
> On 16 Aug 2010, at 21:38, Miles Fidelman wrote:
>
>    
>> There have been some of these built on top of HTTP, but the purist in me really dislikes violating layering.
>>      
> I'm a little out of my depth here, but is that really the case? The things you're talking about sound like patterns to me, not layer-specific details. What makes that pattern any less applicable at a higher level in the network stack?
well, the whole point of protocol layering is to isolate different 
functions in different layers - as soon as the same function is 
available in multiple layers, things just get very confusing and brittle

a simple example that I think will illustrate the difference (note - 
it's easier to read stacks from the bottom up)

a RESTful stack that doesn't violate layering:

RSS - specific encoding of data
HTTP - client-server protocol - deals primarily with addressing data 
items and encodings
TLS - adds security to a connection
TCP - reliable connections
IP - unreliable datagram transport across multiple networks (glues 
individual networks into an internet)
802.x - unreliable data over a single local network
USB (or whatever) - hardware level device connection

vs.

WS-* protocols: try to do everything, including things that are already 
available at lower layers (addressing, security, connections, ...)
SOAP:
HTTP:
TLS
TCP
IP
802.x
USB

essentially, a lot of what the WS-* protocols do is reinvent things that 
have been worked out and incorporated into lower layer protocols - the 
result is that:

1. the WS-* protocols tend to do a poor, and complicated, job of 
functions that are available in lower layers of the network stack, and 
simply ignore those capabilities, and/or,

2. an awful lot of SOAP-based applications generate transactions that 
contain huge numbers of empty XML headers

The impact, in practice:  We recently wrote an application that 
translated OpenSearch formatted queries into SOAP formatted queries, for 
execution by an EBRIM Registry/Repository, and then translated the 
responses back into OpenSearch format (actually RSS).  When you looked 
inside the transactions, you'd see:

a 1-line OpenSearch query turns into 1 line of SQL, surrounded by about 
100 lines of empty XML headers

the response starts as 100+ lines of XML, mostly empty headers, that can 
be compressed into about 10 lines of XML in RSS format

Layering ends up being about managing complexity.  Doing things at the 
right layer simplifies things, doing it at the wrong layer makes things 
intractable.

Miles





-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Randall Leeds <ra...@gmail.com>.

On Mon, Aug 16, 2010 at 13:44, Noah Slater <ns...@apache.org> wrote:
>
> On 16 Aug 2010, at 21:38, Miles Fidelman wrote:
>
>> There have been some of these built on top of HTTP, but the purist in me really dislikes violating layering.
>
> I'm a little out of my depth here, but is that really the case? The things you're talking about sound like patterns to me, not layer-specific details. What makes that pattern any less applicable at a higher level in the network stack?

Exactly, Noah. The purist in me dislikes the fact that HTTP
implementations assume their URIs are IP addresses and that the
transport layer is TCP. No reason the former shouldn't be node ids in
an overlay nor that the HTTP request/response headers shouldn't be
forwarded across said overlay.

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 21:38, Miles Fidelman wrote:

> There have been some of these built on top of HTTP, but the purist in me really dislikes violating layering.

I'm a little out of my depth here, but is that really the case? The things you're talking about sound like patterns to me, not layer-specific details. What makes that pattern any less applicable at a higher level in the network stack?

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Randall Leeds wrote:
> There is no reason I see why HTTP is not a valid transport for a DHT nor any
> reason why it is not possible to gossip over HTTP. I think it's confusing
> the issue to blame HTTP for any problem Couch has with distribution.
>
> Enlighten me if I'm wrong, of course.
>    
I guess that's a fair point.  The thing is, though that the problem has 
a lot more to do with routing than transport.  Something has to keep 
track of which nodes are up/down, as well as lowest-cost paths for 
moving data along.

The extreme cases are:

1. hub-and-spokes: every change ripples through a central node - no 
routing to worry about, but not very robust

2. point-to-point replication, where everything eventually gets to where 
it's going: problems are that large networks require manual 
configuration of lots of pairwise links, and things get brittle if the 
wrong link goes down

Elements of something a bit more reliable and requiring less (no) manual 
configuration:

3.  multi-cast or broadcast protocols: send an update into the ether, 
everyone gets it - but... works for stuff like streaming voice or video, 
where a lost packet doesn't matter; doesn't work for transactions, where 
you need an acknowledge-and-retransmit mechanism to make sure that data 
eventually gets everywhere (though there are some experimental reliable 
multicast protocols floating around)

4. a protocol/data replication mechanism that includes some kind of 
self-tuning routing mechanism - examples include: UUCP, DHTs, etc.

There have been some of these built on top of HTTP, but the purist in me 
really dislikes violating layering.

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Randall Leeds <ra...@gmail.com>.

There is no reason I see why HTTP is not a valid transport for a DHT nor any
reason why it is not possible to gossip over HTTP. I think it's confusing
the issue to blame HTTP for any problem Couch has with distribution.

Enlighten me if I'm wrong, of course.

On Aug 16, 2010 1:19 PM, "Jan Lehnardt" <ja...@apache.org> wrote:

On 16 Aug 2010, at 22:11, Noah Slater wrote:

>
> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
...
I'd like to add that Miles does have a point, but we have good reasons to
have HTTP for now and in the future. It doesn't mean that applying
specializations where applicable is not an option (double negative :).

Cheers
Jan
--

Re: why erlang?

Posted by Jan Lehnardt <ja...@apache.org>.

On 16 Aug 2010, at 22:11, Noah Slater wrote:

> 
> On 16 Aug 2010, at 20:52, Miles Fidelman wrote:
> 
>> Actually, I'd dispute that.  The INTERNET is perhaps the largest system ever built, the web rides on top of a lot of lower level infrastructure.  There's a lot of other stuff riding on top of the underlying IP infrastructure - email, VoIP, chat, etc. - which don't rely on HTTP.  (Note: I speak as someone who dates back to almost the beginning - I spent a good part of my career at BBN, just as we were transitioning the ARPANET to TCP/IP, and it was serving as the hub of the then fledgling Internet).
> 
> I was anticipating this response. :)
> 
> My reply would be to state that the Web subsumes the Internet in many ways.
> 
>> True.  Though, it has also lead to (IMHO) abortions such as SOAP - which Dave Winer initially wrote as a way to use HTTP to tunnel traffic through firewalls.
> 
> LOL
> 
> I think you mean XML-RPC, but they're both as bad as each other.
> 
> In either case, they are so hilariously against everything the Web stands for, it's not really applicable!

I'd like to add that Miles does have a point, but we have good reasons to have HTTP for now and in the future. It doesn't mean that applying specializations where applicable is not an option (double negative :).

Cheers
Jan
--

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 20:52, Miles Fidelman wrote:

> Actually, I'd dispute that.  The INTERNET is perhaps the largest system ever built, the web rides on top of a lot of lower level infrastructure.  There's a lot of other stuff riding on top of the underlying IP infrastructure - email, VoIP, chat, etc. - which don't rely on HTTP.  (Note: I speak as someone who dates back to almost the beginning - I spent a good part of my career at BBN, just as we were transitioning the ARPANET to TCP/IP, and it was serving as the hub of the then fledgling Internet).

I was anticipating this response. :)

My reply would be to state that the Web subsumes the Internet in many ways.

> True.  Though, it has also lead to (IMHO) abortions such as SOAP - which Dave Winer initially wrote as a way to use HTTP to tunnel traffic through firewalls.

LOL

I think you mean XML-RPC, but they're both as bad as each other.

In either case, they are so hilariously against everything the Web stands for, it's not really applicable!

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Noah Slater wrote:
> On 16 Aug 2010, at 19:55, Miles Fidelman wrote:
>    
>> True, but... HTTP is not necessarily an ideal protocol for many-to-many replication, nor is HTTP 30 years old.  There's a lot of experience that dates back further - for example, UUCP is probably a much better protocol for large-scale eventual consistency than the pair-wise approach currently used by CouchDB.
>>      
> The web is the largest technological system mankind has ever built. It's not perfect, but it works. Not only does it work, but it comes with the largest selection of middleware components imaginable.
>
> It is both ubiquitous and commodity, in almost every respect. From network and firewall support, through libraries, to clients. Being able to talk to CouchDB from your web browser is easily (for me) the best thing about CouchDB, and far outweighs any drawbacks, such as protocol overhead.
>    
Actually, I'd dispute that.  The INTERNET is perhaps the largest system 
ever built, the web rides on top of a lot of lower level 
infrastructure.  There's a lot of other stuff riding on top of the 
underlying IP infrastructure - email, VoIP, chat, etc. - which don't 
rely on HTTP.  (Note: I speak as someone who dates back to almost the 
beginning - I spent a good part of my career at BBN, just as we were 
transitioning the ARPANET to TCP/IP, and it was serving as the hub of 
the then fledgling Internet).
> On 16 Aug 2010, at 20:02, Jan Lehnardt wrote:
>    
>> But we have a rock solid implementation (*cough*) that works today :) — Also "better" is hardly objective :) — Noah was using exaggeration as a device to point out that HTTP is indeed very awesome for many reasons including tooling, firewall "support" (hehe) and many more.
>>      
True.  Though, it has also lead to (IMHO) abortions such as SOAP - which 
Dave Winer initially wrote as a way to use HTTP to tunnel traffic 
through firewalls.


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 19:55, Miles Fidelman wrote:

> True, but... HTTP is not necessarily an ideal protocol for many-to-many replication, nor is HTTP 30 years old.  There's a lot of experience that dates back further - for example, UUCP is probably a much better protocol for large-scale eventual consistency than the pair-wise approach currently used by CouchDB.

The web is the largest technological system mankind has ever built. It's not perfect, but it works. Not only does it work, but it comes with the largest selection of middleware components imaginable.

It is both ubiquitous and commodity, in almost every respect. From network and firewall support, through libraries, to clients. Being able to talk to CouchDB from your web browser is easily (for me) the best thing about CouchDB, and far outweighs any drawbacks, such as protocol overhead.

On 16 Aug 2010, at 20:02, Jan Lehnardt wrote:

> But we have a rock solid implementation (*cough*) that works today :) — Also "better" is hardly objective :) — Noah was using exaggeration as a device to point out that HTTP is indeed very awesome for many reasons including tooling, firewall "support" (hehe) and many more.

Actually, I just made up a number that sounded good.

HTTP is really old, yo.

Re: why erlang?

Posted by Jan Lehnardt <ja...@apache.org>.

On 16 Aug 2010, at 20:55, Miles Fidelman wrote:

> Noah Slater wrote:
>> On 16 Aug 2010, at 19:22, Miles Fidelman wrote:
>> 
>>   
>>> I guess I'm just a little surprised that the replication features seem to be independent of Erlang's underlying inter-node communications capabilities.
>>>     
>> I'm going to hazard an answer here. CouchDB is, primarily, a database built for the web. That means speaking HTTP. All communication between CouchDB and other agents is, and should be, done via that route. There has been talk, in the past, about communicating with CouchDB from within Erlang itself, but that is not a priority for the main project. Pushing things over HTTP lets us take advantage of 30 years worth of caching, proxying, authentication, and other Web-stack middleware.
> True, but... HTTP is not necessarily an ideal protocol for many-to-many replication, nor is HTTP 30 years old.  There's a lot of experience that dates back further - for example, UUCP is probably a much better protocol for large-scale eventual consistency than the pair-wise approach currently used by CouchDB.

But we have a rock solid implementation (*cough*) that works today :) — Also "better" is hardly objective :) — Noah was using exaggeration as a device to point out that HTTP is indeed very awesome for many reasons including tooling, firewall "support" (hehe) and many more.

Cheers
Jan
--

Re: why erlang?

Posted by Randall Leeds <ra...@gmail.com>.

I was actually thinking about inter-node communication yesterday.  As was
pointed out, HTTP inefficiency is mainly in TCP setup and request parsing. I
think the cleanest way to get a boost would not be to add a new API but
utilize Accept and Content-Type headers with a custom mimetype like
beam/eterm to indicate erlang binary terms can be sent directly. This would
eliminate JSON overhead and reduce network bytes while still keeping Couch
well embedded in the web ecosystem with all the middleware advantages that
come with it. Furthermore, authentication can happen in a pluggable way that
does not require erlang cookies, a fully connected network, or epmd ports
like distributed OTP nor do we have to write our own gen_tcp based protocol.

-Randall

On Aug 16, 2010 12:55 PM, "Miles Fidelman" <mf...@meetinghouse.net>
wrote:

Klaus Trainer wrote:
>
> I believe the greatest advantage of HTTP over other more efficient
> protoc...
absolutely

>
> Of course, there might be way more efficient protocols to support P2P
> replication, but the tr...
This is precisely where I see a different protocol being more useful.  I've
avoided using Couch in a couple of large, distributed applications because I
really don't want to bite off the management of large numbers of 2-party
links for replication.  A P2P or multi-cast protocol would avoid that
problem.

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, th...

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Klaus Trainer wrote:
> I believe the greatest advantage of HTTP over other more efficient
> protocols is that you can easily integrate it with any other web stuff.
> For example browsers.
>    
absolutely
> Of course, there might be way more efficient protocols to support P2P
> replication, but the tradeoff is reasonable, I think.
>    
This is precisely where I see a different protocol being more useful.  
I've avoided using Couch in a couple of large, distributed applications 
because I really don't want to bite off the management of large numbers 
of 2-party links for replication.  A P2P or multi-cast protocol would 
avoid that problem.

Miles


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Klaus Trainer <kl...@web.de>.

I believe the greatest advantage of HTTP over other more efficient
protocols is that you can easily integrate it with any other web stuff.
For example browsers.

You can build CouchApps (http://couchapp.org/page/index) without having
any layer in between. As an example, see Toast
(http://github.com/jchris/toast). It uses CouchDB's continuous
replication in order to build a super-simple chat application. All you
need therefor is CouchDB and a browser.

Of course, there might be way more efficient protocols to support P2P
replication, but the tradeoff is reasonable, I think.

I wonder if under certain circumstances, the HTTP overhead is
negligible, anyway. For example, once you have an open socket and listen
on the changes feed (that's what Toast does), you normally won't close
the connection intentionally. Instead all the data is streamed, and
there is no overhead through HTTP headers anymore.

- Klaus

On Mon, 2010-08-16 at 14:55 -0400, Miles Fidelman wrote:
> Noah Slater wrote:
> > On 16 Aug 2010, at 19:22, Miles Fidelman wrote:
> >
> >    
> >> I guess I'm just a little surprised that the replication features seem to be independent of Erlang's underlying inter-node communications capabilities.
> >>      
> > I'm going to hazard an answer here. CouchDB is, primarily, a database built for the web. That means speaking HTTP. All communication between CouchDB and other agents is, and should be, done via that route. There has been talk, in the past, about communicating with CouchDB from within Erlang itself, but that is not a priority for the main project. Pushing things over HTTP lets us take advantage of 30 years worth of caching, proxying, authentication, and other Web-stack middleware.
> True, but... HTTP is not necessarily an ideal protocol for many-to-many 
> replication, nor is HTTP 30 years old.  There's a lot of experience that 
> dates back further - for example, UUCP is probably a much better 
> protocol for large-scale eventual consistency than the pair-wise 
> approach currently used by CouchDB.
> 
> Miles
>

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Noah Slater wrote:
> On 16 Aug 2010, at 19:22, Miles Fidelman wrote:
>
>    
>> I guess I'm just a little surprised that the replication features seem to be independent of Erlang's underlying inter-node communications capabilities.
>>      
> I'm going to hazard an answer here. CouchDB is, primarily, a database built for the web. That means speaking HTTP. All communication between CouchDB and other agents is, and should be, done via that route. There has been talk, in the past, about communicating with CouchDB from within Erlang itself, but that is not a priority for the main project. Pushing things over HTTP lets us take advantage of 30 years worth of caching, proxying, authentication, and other Web-stack middleware.
True, but... HTTP is not necessarily an ideal protocol for many-to-many 
replication, nor is HTTP 30 years old.  There's a lot of experience that 
dates back further - for example, UUCP is probably a much better 
protocol for large-scale eventual consistency than the pair-wise 
approach currently used by CouchDB.

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Noah Slater <ns...@apache.org>.

On 16 Aug 2010, at 19:22, Miles Fidelman wrote:

> I guess I'm just a little surprised that the replication features seem to be independent of Erlang's underlying inter-node communications capabilities.

I'm going to hazard an answer here. CouchDB is, primarily, a database built for the web. That means speaking HTTP. All communication between CouchDB and other agents is, and should be, done via that route. There has been talk, in the past, about communicating with CouchDB from within Erlang itself, but that is not a priority for the main project. Pushing things over HTTP lets us take advantage of 30 years worth of caching, proxying, authentication, and other Web-stack middleware.

Re: why erlang?

Posted by Miles Fidelman <mf...@meetinghouse.net>.

Randall,

Thanks for the quick answer.
> Couch absolutely takes advantage of things like supervisor trees and process
> spawning. Handling each client in a separate process is not something only
> done in erlang, but erlang makes it easier. The supervisor trees and pattern
> mattern matching make error handling pretty nice, ensuring that errors
> bubble up appropriately so proper return codes can reach the client.
>
> In the future, I suspect we'll see even more as clustering features start to
> work their way into core couchdb.
>
> Anything else in particular you had in mind?
>    
I guess I'm just a little surprised that the replication features seem 
to be independent of Erlang's underlying inter-node communications 
capabilities.  Seems like the clustering features would have been an 
obvious place to start for high-reliability operations.

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra

Re: why erlang?

Posted by Randall Leeds <ra...@gmail.com>.

My (short) answers:

Couch absolutely takes advantage of things like supervisor trees and process
spawning. Handling each client in a separate process is not something only
done in erlang, but erlang makes it easier. The supervisor trees and pattern
mattern matching make error handling pretty nice, ensuring that errors
bubble up appropriately so proper return codes can reach the client.

In the future, I suspect we'll see even more as clustering features start to
work their way into core couchdb.

Anything else in particular you had in mind?

On Aug 16, 2010 10:54 AM, "Miles Fidelman" <mf...@meetinghouse.net>
wrote:

Hi Folks,

I wonder if someone might share some insight into why Erlang was chosen for
CouchDB.

Don't get me wrong, I think Erlang is a really cool language/environment;
I'm a big fan of designs that spawn lots of independent processes, and
communicating via messages.  But... it doesn't seem like CouchDB takes
advantage of all that much of Erlang's unique capabilities.

Hence, I'm sort of wondering why Erlang for CouchDB, and if there are any
visions of taking more advantage of Erlang down the road.

Thanks,

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra