You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2008/07/02 09:08:14 UTC

CouchDB 0.9 and 1.0

Hello everybody,
this thread is meant to collect missing work items (features and
bugs) for for our 1.0 release and a discussion about how to split
them up between 0.9 and 1.0.

Take it away: Damien.

Cheers
Jan
--

Re: Custom file driver for OS X and Windows Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 2, 2008, at 19:01, Damien Katz wrote:

> So erlang still has some file driver troubles. The old issue of a  
> 2gig limit is long gone, however two other issues remain:
>
> Erlangs built-in file api for disk sync/flush doesn't work on all  
> platforms. The main two that I know where it doesn't work are  
> windows (at least it didn't when a looked over a year ago) and OS X  
> (perhaps other BSDs?). On OS X, the fix seems to be as simple as  
> passing the F_FULLFSYNC flag to fcntl. Without this disk sync, there  
> is no way CouchDB can safely store data. This means we either need  
> the Erlang folks to fix their drivers, or create our own file driver  
> for the problem platforms. Blech.
>
> Windows has the same fsync problem, plus it also doesn't pass the  
> flags to sallow the renaming of our own open files, needed during  
> file compaction. Again there can be fixed in core Erlang, or our own  
> drivers.

I have forwarded this to the Erlang folks to get their insight on this  
issue.
I feel the only sensible way to fix this is to fix the file driver.

--

On a related note: At the Erlang eXchange last week I talked to Klacke
Wikström (the author of (d)ets, mnesia, the IO system and a lot of more
parts of core Erlang) and he mentioned a product of his where they
bypass the Erlang file driver for speed reasons completely. I don't
think this is relevant for us now, but we might want to keep in mind
that want to do the same eventually.

Cheers
Jan
--

>
>
> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>
>

Re: Custom file driver for OS X and Windows Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 2, 2008, at 19:01, Damien Katz wrote:

> So erlang still has some file driver troubles. The old issue of a  
> 2gig limit is long gone, however two other issues remain:
>
> Erlangs built-in file api for disk sync/flush doesn't work on all  
> platforms. The main two that I know where it doesn't work are  
> windows (at least it didn't when a looked over a year ago) and OS X  
> (perhaps other BSDs?). On OS X, the fix seems to be as simple as  
> passing the F_FULLFSYNC flag to fcntl. Without this disk sync, there  
> is no way CouchDB can safely store data. This means we either need  
> the Erlang folks to fix their drivers, or create our own file driver  
> for the problem platforms. Blech.

I'm not a C hacker and all but would this patch solve the issue on  
Darwin / Mac OS X?

I worked solely off documentation:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man2/fsync.2.html#/ 
/apple_ref/doc/man/2/fsync
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man2/fcntl.2.html

The diff is against OTP R12B-3:

--- erts/emulator/drivers/unix/unix_efile.c	2008-07-17  
03:06:13.000000000 +0200
+++ erts/emulator/drivers/unix/unix_efile_new.c	2008-07-17  
03:06:57.000000000 +0200
@@ -44,11 +44,20 @@
  #endif
  #endif /* _OSE_ */

+#if defined(__APPLE__) && defined(__MACH__) && !defined(__DARWIN__)
+#define DARWIN 1
+#endif
+
+#ifdef DARWIN
+#include <fcntl.h>
+#endif /* DARWIN */
+
  #ifdef VXWORKS
  #include <ioLib.h>
  #include <dosFsLib.h>
  #include <nfsLib.h>
  #include <sys/stat.h>
+
  /*
  ** Not nice to include usrLib.h as MANY normal variable names get  
reported
  ** as shadowing globals, like 'i' for example.
@@ -818,7 +827,11 @@
    undefined fsync
  #endif /* VXWORKS */
  #else
+#ifdef DARWIN
+    return check_error(fcntl(fd, F_FULLFSYNC), errInfo);
+#else /* all other unix systems */
      return check_error(fsync(fd), errInfo);
+#endif /* DARWIN */
  #endif /* NO_FSYNC */
  }

If that looks okay, I'll forward it to the erlang-patches mailing list  
for inclusion.

Cheers
Jan
--

Custom file driver for OS X and Windows Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@gmail.com>.

So erlang still has some file driver troubles. The old issue of a 2gig  
limit is long gone, however two other issues remain:

Erlangs built-in file api for disk sync/flush doesn't work on all  
platforms. The main two that I know where it doesn't work are windows  
(at least it didn't when a looked over a year ago) and OS X (perhaps  
other BSDs?). On OS X, the fix seems to be as simple as passing the  
F_FULLFSYNC flag to fcntl. Without this disk sync, there is no way  
CouchDB can safely store data. This means we either need the Erlang  
folks to fix their drivers, or create our own file driver for the  
problem platforms. Blech.

Windows has the same fsync problem, plus it also doesn't pass the  
flags to sallow the renaming of our own open files, needed during file  
compaction. Again there can be fixed in core Erlang, or our own drivers.

On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by David Pitman <me...@davidpitman.name>.

Thanks for your thoughts there, I'll definitely keep that in mind!
Personally I find the mySQL model overly annoying too, but I thought it was
a good place to start since it's "popular".  OK, well maybe not then :) ...
If you (or anyone here) knows of a database whose security model they DO
like, then feel free to post it here so I can take a look at that for
further inspiration ...

Thanks again and I will certainly be posting my results when I have
something that seems worthwhile.

David.




On Mon, Jul 7, 2008 at 6:16 PM, Jan Lehnardt <ja...@apache.org> wrote:

> Disclaimer: It is Monday, I overslept and haven't had coffee yet. If I
> sound
> overly grupmy, that is the reason :)
>
> On Jul 7, 2008, at 05:29, David Pitman wrote:
>
> Just to let you know that I have been working on an "out-of-the-box
>> solution
>> for 1)" for a few weeks (in my spare time), mainly at this stage mapping
>> out
>> various schemes for how this could work and learning more about other
>> databases' authentication frameworks.  I figure if it is conceptually
>> similar (as far as convenient) to existing authentication frameworks such
>> as
>> what's used by mySQL, then developers will have an even easier learning
>> curve and find CouchDB yet more attractive.
>>
>
> Please do _NOT_ model a security model after the MySQL security model.
> There are so many things wrong with it that I don't even know where to
> begin. Well, maybe, it is not that bad, but it is not easy to use and as
> a result everybody implements their own security custom scheme on top of
> MySQL and as a result, shared hosters give out only single MySQL accounts,
> because that's what everybody needs and as a direct result you can't share
> or
> move a userbase between two applications because, well, they have their
> own system. And as a bonus point: All security systems need to do the same
> thing over and over again and will introduce the same bugs over and over
> again.
>
> I don't want that happen to CouchDB :) It would be nice if CouchDB's
> security
> system would be exposed to a user & application for usage so they have a
> framework to do logins and permissions that all CouchDB applications can
> share. To avoid a) duplication of effort by writing yet another user
> management
> and permission system b) introduction of another two billion security
> systems
> that all have one bug or another and c) having to maintain two separate for
> two separate applications which is either a PITA for the user or admin.
>
> Yes out of the box would be nice, yes LDAP and other backends should be
> puginnable and yes this involves a lot of work and that's why we (at least
> I)
> want to keep that off 1.0.
>
>
> At the moment I'm prototyping in php and c++ (fast and easy), but once I've
>> established how I want it to work, I'm planning to start working with
>> Erlang
>> (I'm new to that).  I'll post up some details of my ideas once I've got a
>> nice fleshed-out concept that seems to work for me nicely.
>>
>> I'm thinking of a kind of "out-of-the-box" plugin to the CouchDB which
>> adds
>> in the authentication layer, but which is not required by CouchDB to work.
>> Will let people know more when I've got something useful to show for my
>> efforts ...
>>
>
> Ignoring the above: It would be really nice to see what you come up with
> here,
> please share your results :)
>
> Cheers
> Jan
> --
>
>
>
>>
>> Thanks.
>>
>> David.
>>
>> On Thu, Jul 3, 2008 at 6:47 PM, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>
>>> On Jul 2, 2008, at 20:13, Robert Fischer wrote:
>>>
>>> Two points.
>>>
>>>>
>>>> 1) I'd encourage the CouchDB group to stick to authorization and leave
>>>> authentication to proxies at
>>>> this point.  If you have some free time in the future, maybe you can
>>>> think
>>>> about integrating an
>>>> authentication layer -- but there's a lot more critical functionality
>>>> needed, and an HTTP proxy can
>>>> handle it just fine for the time being.  If you consider that
>>>> username/password authentication is
>>>> inherently evil, and "real" authentication servers are built off of
>>>> LDAP,
>>>> kerberos, or the like,
>>>> then the massive amount of work involved in doing authentication should
>>>> be
>>>> clear.  And this isn't
>>>> even getting into the likelihood that a new authentication
>>>> implementation
>>>> will probably get some
>>>> stuff wrong in non-trivial, non-obvious ways.  So, please, let
>>>> authentication be handled by proxies.
>>>>
>>>> 2) In terms of authorization, it would be nice if there was a concept of
>>>> "read only" and
>>>> "read-write" premissions at the database level.  MySQL goes a bit nuts
>>>> with their permissions
>>>> possibly going all the way down to the column level, but it's nice to
>>>> have
>>>> that distinction at the
>>>> database level.  This means I can guaranty I don't accidentally modify
>>>> something when I just mean to
>>>> be querying it: this kind of functionality has saved my butt a number of
>>>> times in the past ("Why is
>>>> this update failing on my dev box?  Oh...wait...that's my production
>>>> terminal window!"), and it
>>>> would be sad to see it left out.
>>>>
>>>>
>>> +1 on both accounts.
>>>
>>> For the long term, it'd be nice to have an out-of-the-box
>>> solution for 1), but we shouldn't focus on this now.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>>
>>>
>>>> Of course, I could do that kind of permission setting at the Apache
>>>> level,
>>>> too, by defining the
>>>> routes as locations and setting permissions -- but it'd probably be both
>>>> cleaner and more
>>>> appropriate to be done in the DB itself.
>>>>
>>>> ~~ Robert.
>>>>
>>>> Noah Slater wrote:
>>>>
>>>> Perhaps we could rely on standard HTTP auth either:
>>>>>
>>>>> * as passed back through a proxy
>>>>> * as negotiated by CouchDB using a similar method to Apache httpd
>>>>>
>>>>> This doesn't seem too hard, Mochiweb might even support it natively.
>>>>>
>>>>> On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
>>>>>
>>>>> We need to implement a couchdb security model. I think at a high level
>>>>>> it should be simple as possible. Also I think we won't do
>>>>>> authentication, that should be handled by a authenticating proxy, or
>>>>>> application code.
>>>>>>
>>>>>> I'm thinking our model looks something like this:
>>>>>>
>>>>>> We'll have server wide admin accounts, and dbadmin accounts. Db Admins
>>>>>> can create dbs and admin their own dbs. Server admins are like
>>>>>> superusers. Only admins are allowed to update design documents in
>>>>>> databases.
>>>>>>
>>>>>> The per-database customized module will be supported by custom
>>>>>> validation functions contained in databases design documents.  When a
>>>>>> document is updated, either via replication or new edit, these
>>>>>> validation functions are evaluate with provided context.
>>>>>>
>>>>>> Here is a very simplistic validation routine:
>>>>>>
>>>>>> function (doc, ctx) {
>>>>>>   if (doc.type == "topic" && doc.subject == undefined) {
>>>>>>           throw "Error, a subject is required for all topics.";
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> Something that looks at previous revisions:
>>>>>>
>>>>>> function (doc, ctx) {
>>>>>>   var prev = ctx.get_local_doc();
>>>>>>   if (prev != null && prev.author != ctx.user_name()) {
>>>>>>           throw "Error, update by non-author.";
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> It should also be possible modify the document while it's being saved,
>>>>>> but this might only be allowable when its a new edit, vs a replicated
>>>>>> update or backup restore.
>>>>>>
>>>>>> All further security schemes would be handled the customized
>>>>>> functions,
>>>>>> and though APIs to do database or external ldap queries.
>>>>>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>>>>>
>>>>>> Hello everybody,
>>>>>>
>>>>>>> this thread is meant to collect missing work items (features and
>>>>>>> bugs) for for our 1.0 release and a discussion about how to split
>>>>>>> them up between 0.9 and 1.0.
>>>>>>>
>>>>>>> Take it away: Damien.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Jan
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> David Pitman
>> www.davidpitman.name
>>
>
>


-- 
David Pitman
www.davidpitman.name

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

Disclaimer: It is Monday, I overslept and haven't had coffee yet. If I  
sound
overly grupmy, that is the reason :)

On Jul 7, 2008, at 05:29, David Pitman wrote:

> Just to let you know that I have been working on an "out-of-the-box  
> solution
> for 1)" for a few weeks (in my spare time), mainly at this stage  
> mapping out
> various schemes for how this could work and learning more about other
> databases' authentication frameworks.  I figure if it is conceptually
> similar (as far as convenient) to existing authentication frameworks  
> such as
> what's used by mySQL, then developers will have an even easier  
> learning
> curve and find CouchDB yet more attractive.

Please do _NOT_ model a security model after the MySQL security model.
There are so many things wrong with it that I don't even know where to
begin. Well, maybe, it is not that bad, but it is not easy to use and as
a result everybody implements their own security custom scheme on top of
MySQL and as a result, shared hosters give out only single MySQL  
accounts,
because that's what everybody needs and as a direct result you can't  
share or
move a userbase between two applications because, well, they have their
own system. And as a bonus point: All security systems need to do the  
same
thing over and over again and will introduce the same bugs over and  
over again.

I don't want that happen to CouchDB :) It would be nice if CouchDB's  
security
system would be exposed to a user & application for usage so they have a
framework to do logins and permissions that all CouchDB applications can
share. To avoid a) duplication of effort by writing yet another user  
management
and permission system b) introduction of another two billion security  
systems
that all have one bug or another and c) having to maintain two  
separate for
two separate applications which is either a PITA for the user or admin.

Yes out of the box would be nice, yes LDAP and other backends should be
puginnable and yes this involves a lot of work and that's why we (at  
least I)
want to keep that off 1.0.


> At the moment I'm prototyping in php and c++ (fast and easy), but  
> once I've
> established how I want it to work, I'm planning to start working  
> with Erlang
> (I'm new to that).  I'll post up some details of my ideas once I've  
> got a
> nice fleshed-out concept that seems to work for me nicely.
>
> I'm thinking of a kind of "out-of-the-box" plugin to the CouchDB  
> which adds
> in the authentication layer, but which is not required by CouchDB to  
> work.
> Will let people know more when I've got something useful to show for  
> my
> efforts ...

Ignoring the above: It would be really nice to see what you come up  
with here,
please share your results :)

Cheers
Jan
--

>
>
> Thanks.
>
> David.
>
> On Thu, Jul 3, 2008 at 6:47 PM, Jan Lehnardt <ja...@apache.org> wrote:
>
>>
>> On Jul 2, 2008, at 20:13, Robert Fischer wrote:
>>
>> Two points.
>>>
>>> 1) I'd encourage the CouchDB group to stick to authorization and  
>>> leave
>>> authentication to proxies at
>>> this point.  If you have some free time in the future, maybe you  
>>> can think
>>> about integrating an
>>> authentication layer -- but there's a lot more critical  
>>> functionality
>>> needed, and an HTTP proxy can
>>> handle it just fine for the time being.  If you consider that
>>> username/password authentication is
>>> inherently evil, and "real" authentication servers are built off  
>>> of LDAP,
>>> kerberos, or the like,
>>> then the massive amount of work involved in doing authentication  
>>> should be
>>> clear.  And this isn't
>>> even getting into the likelihood that a new authentication  
>>> implementation
>>> will probably get some
>>> stuff wrong in non-trivial, non-obvious ways.  So, please, let
>>> authentication be handled by proxies.
>>>
>>> 2) In terms of authorization, it would be nice if there was a  
>>> concept of
>>> "read only" and
>>> "read-write" premissions at the database level.  MySQL goes a bit  
>>> nuts
>>> with their permissions
>>> possibly going all the way down to the column level, but it's nice  
>>> to have
>>> that distinction at the
>>> database level.  This means I can guaranty I don't accidentally  
>>> modify
>>> something when I just mean to
>>> be querying it: this kind of functionality has saved my butt a  
>>> number of
>>> times in the past ("Why is
>>> this update failing on my dev box?  Oh...wait...that's my production
>>> terminal window!"), and it
>>> would be sad to see it left out.
>>>
>>
>> +1 on both accounts.
>>
>> For the long term, it'd be nice to have an out-of-the-box
>> solution for 1), but we shouldn't focus on this now.
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>>
>>> Of course, I could do that kind of permission setting at the  
>>> Apache level,
>>> too, by defining the
>>> routes as locations and setting permissions -- but it'd probably  
>>> be both
>>> cleaner and more
>>> appropriate to be done in the DB itself.
>>>
>>> ~~ Robert.
>>>
>>> Noah Slater wrote:
>>>
>>>> Perhaps we could rely on standard HTTP auth either:
>>>>
>>>> * as passed back through a proxy
>>>> * as negotiated by CouchDB using a similar method to Apache httpd
>>>>
>>>> This doesn't seem too hard, Mochiweb might even support it  
>>>> natively.
>>>>
>>>> On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
>>>>
>>>>> We need to implement a couchdb security model. I think at a high  
>>>>> level
>>>>> it should be simple as possible. Also I think we won't do
>>>>> authentication, that should be handled by a authenticating  
>>>>> proxy, or
>>>>> application code.
>>>>>
>>>>> I'm thinking our model looks something like this:
>>>>>
>>>>> We'll have server wide admin accounts, and dbadmin accounts. Db  
>>>>> Admins
>>>>> can create dbs and admin their own dbs. Server admins are like
>>>>> superusers. Only admins are allowed to update design documents in
>>>>> databases.
>>>>>
>>>>> The per-database customized module will be supported by custom
>>>>> validation functions contained in databases design documents.   
>>>>> When a
>>>>> document is updated, either via replication or new edit, these
>>>>> validation functions are evaluate with provided context.
>>>>>
>>>>> Here is a very simplistic validation routine:
>>>>>
>>>>> function (doc, ctx) {
>>>>>    if (doc.type == "topic" && doc.subject == undefined) {
>>>>>            throw "Error, a subject is required for all topics.";
>>>>>    }
>>>>> }
>>>>>
>>>>> Something that looks at previous revisions:
>>>>>
>>>>> function (doc, ctx) {
>>>>>    var prev = ctx.get_local_doc();
>>>>>    if (prev != null && prev.author != ctx.user_name()) {
>>>>>            throw "Error, update by non-author.";
>>>>>    }
>>>>> }
>>>>>
>>>>> It should also be possible modify the document while it's being  
>>>>> saved,
>>>>> but this might only be allowable when its a new edit, vs a  
>>>>> replicated
>>>>> update or backup restore.
>>>>>
>>>>> All further security schemes would be handled the customized  
>>>>> functions,
>>>>> and though APIs to do database or external ldap queries.
>>>>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>>>>
>>>>> Hello everybody,
>>>>>> this thread is meant to collect missing work items (features and
>>>>>> bugs) for for our 1.0 release and a discussion about how to split
>>>>>> them up between 0.9 and 1.0.
>>>>>>
>>>>>> Take it away: Damien.
>>>>>>
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> -- 
> David Pitman
> www.davidpitman.name

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by David Pitman <me...@davidpitman.name>.

Just to let you know that I have been working on an "out-of-the-box solution
for 1)" for a few weeks (in my spare time), mainly at this stage mapping out
various schemes for how this could work and learning more about other
databases' authentication frameworks.  I figure if it is conceptually
similar (as far as convenient) to existing authentication frameworks such as
what's used by mySQL, then developers will have an even easier learning
curve and find CouchDB yet more attractive.

At the moment I'm prototyping in php and c++ (fast and easy), but once I've
established how I want it to work, I'm planning to start working with Erlang
(I'm new to that).  I'll post up some details of my ideas once I've got a
nice fleshed-out concept that seems to work for me nicely.

I'm thinking of a kind of "out-of-the-box" plugin to the CouchDB which adds
in the authentication layer, but which is not required by CouchDB to work.
Will let people know more when I've got something useful to show for my
efforts ...

Thanks.

David.

On Thu, Jul 3, 2008 at 6:47 PM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On Jul 2, 2008, at 20:13, Robert Fischer wrote:
>
> Two points.
>>
>> 1) I'd encourage the CouchDB group to stick to authorization and leave
>> authentication to proxies at
>> this point.  If you have some free time in the future, maybe you can think
>> about integrating an
>> authentication layer -- but there's a lot more critical functionality
>> needed, and an HTTP proxy can
>> handle it just fine for the time being.  If you consider that
>> username/password authentication is
>> inherently evil, and "real" authentication servers are built off of LDAP,
>> kerberos, or the like,
>> then the massive amount of work involved in doing authentication should be
>> clear.  And this isn't
>> even getting into the likelihood that a new authentication implementation
>> will probably get some
>> stuff wrong in non-trivial, non-obvious ways.  So, please, let
>> authentication be handled by proxies.
>>
>> 2) In terms of authorization, it would be nice if there was a concept of
>> "read only" and
>> "read-write" premissions at the database level.  MySQL goes a bit nuts
>> with their permissions
>> possibly going all the way down to the column level, but it's nice to have
>> that distinction at the
>> database level.  This means I can guaranty I don't accidentally modify
>> something when I just mean to
>> be querying it: this kind of functionality has saved my butt a number of
>> times in the past ("Why is
>> this update failing on my dev box?  Oh...wait...that's my production
>> terminal window!"), and it
>> would be sad to see it left out.
>>
>
> +1 on both accounts.
>
> For the long term, it'd be nice to have an out-of-the-box
> solution for 1), but we shouldn't focus on this now.
>
> Cheers
> Jan
> --
>
>
>
>
>>
>> Of course, I could do that kind of permission setting at the Apache level,
>> too, by defining the
>> routes as locations and setting permissions -- but it'd probably be both
>> cleaner and more
>> appropriate to be done in the DB itself.
>>
>> ~~ Robert.
>>
>> Noah Slater wrote:
>>
>>> Perhaps we could rely on standard HTTP auth either:
>>>
>>> * as passed back through a proxy
>>> * as negotiated by CouchDB using a similar method to Apache httpd
>>>
>>> This doesn't seem too hard, Mochiweb might even support it natively.
>>>
>>> On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
>>>
>>>> We need to implement a couchdb security model. I think at a high level
>>>> it should be simple as possible. Also I think we won't do
>>>> authentication, that should be handled by a authenticating proxy, or
>>>> application code.
>>>>
>>>> I'm thinking our model looks something like this:
>>>>
>>>> We'll have server wide admin accounts, and dbadmin accounts. Db Admins
>>>> can create dbs and admin their own dbs. Server admins are like
>>>> superusers. Only admins are allowed to update design documents in
>>>> databases.
>>>>
>>>> The per-database customized module will be supported by custom
>>>> validation functions contained in databases design documents.  When a
>>>> document is updated, either via replication or new edit, these
>>>> validation functions are evaluate with provided context.
>>>>
>>>> Here is a very simplistic validation routine:
>>>>
>>>> function (doc, ctx) {
>>>>     if (doc.type == "topic" && doc.subject == undefined) {
>>>>             throw "Error, a subject is required for all topics.";
>>>>     }
>>>> }
>>>>
>>>> Something that looks at previous revisions:
>>>>
>>>> function (doc, ctx) {
>>>>     var prev = ctx.get_local_doc();
>>>>     if (prev != null && prev.author != ctx.user_name()) {
>>>>             throw "Error, update by non-author.";
>>>>     }
>>>> }
>>>>
>>>> It should also be possible modify the document while it's being saved,
>>>> but this might only be allowable when its a new edit, vs a replicated
>>>> update or backup restore.
>>>>
>>>> All further security schemes would be handled the customized functions,
>>>> and though APIs to do database or external ldap queries.
>>>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>>>
>>>> Hello everybody,
>>>>> this thread is meant to collect missing work items (features and
>>>>> bugs) for for our 1.0 release and a discussion about how to split
>>>>> them up between 0.9 and 1.0.
>>>>>
>>>>> Take it away: Damien.
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> --
>>>>>
>>>>
>>>
>>
>


-- 
David Pitman
www.davidpitman.name

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 2, 2008, at 20:13, Robert Fischer wrote:

> Two points.
>
> 1) I'd encourage the CouchDB group to stick to authorization and  
> leave authentication to proxies at
> this point.  If you have some free time in the future, maybe you can  
> think about integrating an
> authentication layer -- but there's a lot more critical  
> functionality needed, and an HTTP proxy can
> handle it just fine for the time being.  If you consider that  
> username/password authentication is
> inherently evil, and "real" authentication servers are built off of  
> LDAP, kerberos, or the like,
> then the massive amount of work involved in doing authentication  
> should be clear.  And this isn't
> even getting into the likelihood that a new authentication  
> implementation will probably get some
> stuff wrong in non-trivial, non-obvious ways.  So, please, let  
> authentication be handled by proxies.
>
> 2) In terms of authorization, it would be nice if there was a  
> concept of "read only" and
> "read-write" premissions at the database level.  MySQL goes a bit  
> nuts with their permissions
> possibly going all the way down to the column level, but it's nice  
> to have that distinction at the
> database level.  This means I can guaranty I don't accidentally  
> modify something when I just mean to
> be querying it: this kind of functionality has saved my butt a  
> number of times in the past ("Why is
> this update failing on my dev box?  Oh...wait...that's my production  
> terminal window!"), and it
> would be sad to see it left out.

+1 on both accounts.

For the long term, it'd be nice to have an out-of-the-box
solution for 1), but we shouldn't focus on this now.

Cheers
Jan
--


>
>
> Of course, I could do that kind of permission setting at the Apache  
> level, too, by defining the
> routes as locations and setting permissions -- but it'd probably be  
> both cleaner and more
> appropriate to be done in the DB itself.
>
> ~~ Robert.
>
> Noah Slater wrote:
>> Perhaps we could rely on standard HTTP auth either:
>>
>> * as passed back through a proxy
>> * as negotiated by CouchDB using a similar method to Apache httpd
>>
>> This doesn't seem too hard, Mochiweb might even support it natively.
>>
>> On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
>>> We need to implement a couchdb security model. I think at a high  
>>> level
>>> it should be simple as possible. Also I think we won't do
>>> authentication, that should be handled by a authenticating proxy, or
>>> application code.
>>>
>>> I'm thinking our model looks something like this:
>>>
>>> We'll have server wide admin accounts, and dbadmin accounts. Db  
>>> Admins
>>> can create dbs and admin their own dbs. Server admins are like
>>> superusers. Only admins are allowed to update design documents in
>>> databases.
>>>
>>> The per-database customized module will be supported by custom
>>> validation functions contained in databases design documents.   
>>> When a
>>> document is updated, either via replication or new edit, these
>>> validation functions are evaluate with provided context.
>>>
>>> Here is a very simplistic validation routine:
>>>
>>> function (doc, ctx) {
>>>      if (doc.type == "topic" && doc.subject == undefined) {
>>>              throw "Error, a subject is required for all topics.";
>>>      }
>>> }
>>>
>>> Something that looks at previous revisions:
>>>
>>> function (doc, ctx) {
>>>      var prev = ctx.get_local_doc();
>>>      if (prev != null && prev.author != ctx.user_name()) {
>>>              throw "Error, update by non-author.";
>>>      }
>>> }
>>>
>>> It should also be possible modify the document while it's being  
>>> saved,
>>> but this might only be allowable when its a new edit, vs a  
>>> replicated
>>> update or backup restore.
>>>
>>> All further security schemes would be handled the customized  
>>> functions,
>>> and though APIs to do database or external ldap queries.
>>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>>
>>>> Hello everybody,
>>>> this thread is meant to collect missing work items (features and
>>>> bugs) for for our 1.0 release and a discussion about how to split
>>>> them up between 0.9 and 1.0.
>>>>
>>>> Take it away: Damien.
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>
>

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by Robert Fischer <ro...@smokejumperit.com>.

Two points.

1) I'd encourage the CouchDB group to stick to authorization and leave authentication to proxies at
this point.  If you have some free time in the future, maybe you can think about integrating an
authentication layer -- but there's a lot more critical functionality needed, and an HTTP proxy can
handle it just fine for the time being.  If you consider that username/password authentication is
inherently evil, and "real" authentication servers are built off of LDAP, kerberos, or the like,
then the massive amount of work involved in doing authentication should be clear.  And this isn't
even getting into the likelihood that a new authentication implementation will probably get some
stuff wrong in non-trivial, non-obvious ways.  So, please, let authentication be handled by proxies.

2) In terms of authorization, it would be nice if there was a concept of "read only" and
"read-write" premissions at the database level.  MySQL goes a bit nuts with their permissions
possibly going all the way down to the column level, but it's nice to have that distinction at the
database level.  This means I can guaranty I don't accidentally modify something when I just mean to
be querying it: this kind of functionality has saved my butt a number of times in the past ("Why is
this update failing on my dev box?  Oh...wait...that's my production terminal window!"), and it
would be sad to see it left out.

Of course, I could do that kind of permission setting at the Apache level, too, by defining the
routes as locations and setting permissions -- but it'd probably be both cleaner and more
appropriate to be done in the DB itself.

~~ Robert.

Noah Slater wrote:
> Perhaps we could rely on standard HTTP auth either:
> 
>  * as passed back through a proxy
>  * as negotiated by CouchDB using a similar method to Apache httpd
> 
> This doesn't seem too hard, Mochiweb might even support it natively.
> 
> On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
>> We need to implement a couchdb security model. I think at a high level
>> it should be simple as possible. Also I think we won't do
>> authentication, that should be handled by a authenticating proxy, or
>> application code.
>>
>> I'm thinking our model looks something like this:
>>
>> We'll have server wide admin accounts, and dbadmin accounts. Db Admins
>> can create dbs and admin their own dbs. Server admins are like
>> superusers. Only admins are allowed to update design documents in
>> databases.
>>
>> The per-database customized module will be supported by custom
>> validation functions contained in databases design documents.  When a
>> document is updated, either via replication or new edit, these
>> validation functions are evaluate with provided context.
>>
>> Here is a very simplistic validation routine:
>>
>> function (doc, ctx) {
>>       if (doc.type == "topic" && doc.subject == undefined) {
>>               throw "Error, a subject is required for all topics.";
>>       }
>> }
>>
>> Something that looks at previous revisions:
>>
>> function (doc, ctx) {
>>       var prev = ctx.get_local_doc();
>>       if (prev != null && prev.author != ctx.user_name()) {
>>               throw "Error, update by non-author.";
>>       }
>> }
>>
>> It should also be possible modify the document while it's being saved,
>> but this might only be allowable when its a new edit, vs a replicated
>> update or backup restore.
>>
>> All further security schemes would be handled the customized functions,
>> and though APIs to do database or external ldap queries.
>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>
>>> Hello everybody,
>>> this thread is meant to collect missing work items (features and
>>> bugs) for for our 1.0 release and a discussion about how to split
>>> them up between 0.9 and 1.0.
>>>
>>> Take it away: Damien.
>>>
>>> Cheers
>>> Jan
>>> --
>

Re: Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by Noah Slater <ns...@apache.org>.

Perhaps we could rely on standard HTTP auth either:

 * as passed back through a proxy
 * as negotiated by CouchDB using a similar method to Apache httpd

This doesn't seem too hard, Mochiweb might even support it natively.

On Wed, Jul 02, 2008 at 12:56:44PM -0400, Damien Katz wrote:
> We need to implement a couchdb security model. I think at a high level
> it should be simple as possible. Also I think we won't do
> authentication, that should be handled by a authenticating proxy, or
> application code.
>
> I'm thinking our model looks something like this:
>
> We'll have server wide admin accounts, and dbadmin accounts. Db Admins
> can create dbs and admin their own dbs. Server admins are like
> superusers. Only admins are allowed to update design documents in
> databases.
>
> The per-database customized module will be supported by custom
> validation functions contained in databases design documents.  When a
> document is updated, either via replication or new edit, these
> validation functions are evaluate with provided context.
>
> Here is a very simplistic validation routine:
>
> function (doc, ctx) {
>       if (doc.type == "topic" && doc.subject == undefined) {
>               throw "Error, a subject is required for all topics.";
>       }
> }
>
> Something that looks at previous revisions:
>
> function (doc, ctx) {
>       var prev = ctx.get_local_doc();
>       if (prev != null && prev.author != ctx.user_name()) {
>               throw "Error, update by non-author.";
>       }
> }
>
> It should also be possible modify the document while it's being saved,
> but this might only be allowable when its a new edit, vs a replicated
> update or backup restore.
>
> All further security schemes would be handled the customized functions,
> and though APIs to do database or external ldap queries.
> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>

-- 
Noah Slater, http://people.apache.org/~nslater/

Security and Validation - Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@gmail.com>.

We need to implement a couchdb security model. I think at a high level  
it should be simple as possible. Also I think we won't do  
authentication, that should be handled by a authenticating proxy, or  
application code.

I'm thinking our model looks something like this:

We'll have server wide admin accounts, and dbadmin accounts. Db Admins  
can create dbs and admin their own dbs. Server admins are like  
superusers. Only admins are allowed to update design documents in  
databases.

The per-database customized module will be supported by custom  
validation functions contained in databases design documents.  When a  
document is updated, either via replication or new edit, these  
validation functions are evaluate with provided context.

Here is a very simplistic validation routine:

function (doc, ctx) {
	if (doc.type == "topic" && doc.subject == undefined) {
		throw "Error, a subject is required for all topics.";
	}
}

Something that looks at previous revisions:

function (doc, ctx) {
	var prev = ctx.get_local_doc();
	if (prev != null && prev.author != ctx.user_name()) {
		throw "Error, update by non-author.";
	}
}

It should also be possible modify the document while it's being saved,  
but this might only be allowable when its a new edit, vs a replicated  
update or backup restore.

All further security schemes would be handled the customized  
functions, and though APIs to do database or external ldap queries.
On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --

Re: Performance and Statistics Module Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 3, 2008, at 21:34, Damien Katz wrote:

> Optimizations of core CouchDB code hasn't begun yet. I did spend a  
> small time profiling CouchDB using the built-in Erlang tools, but  
> using the Erlang tools collecting the data into something coherent  
> was a challenge.

I posted about the traces and coverage analysis earlier today and I  
hope that we get a few of the Erlang folks to have a look at that. Joe  
announced interest in these numbers but I don't know if he will give  
us any feedback :-)

> One reason not to use DTRACE, is we want the performance stats  
> available to admins in production settings.

As far as I understand, DTRACE is designed to be just that, a tool for  
admins (and developers of course). A reason why not to use DTrace is  
support for platforms that don't have DTrace. Namely Linux and  
Windows. So I think an internal  modules that gathers runtime  
statistics is still a good idea. So we can see high level things like,  
queries executed, caches used, data transferred and so on.

I'd be willing to help writing such a module since I've done a  
conceptually similar thing already for the runtimeconfig branch.

Cheers
Jan
--

> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>
>

Performance and Statistics Module Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@gmail.com>.

So we need ways to measure and monitor the performance of various  
CouchDB components.

Optimizations of core CouchDB code hasn't begun yet. I did spend a  
small time profiling CouchDB using the built-in Erlang tools, but  
using the Erlang tools collecting the data into something coherent was  
a challenge.

I think we need to add performance measuring code in our source. What  
this will require is probes in the code to generate statistics (time  
taken, values process, bytes written, errors encountered, etc),  
emitting the values to a statistics collection process. I'm thinking  
we can use DTRACE, but if not, we can build an erlang module pretty  
easily to do something similar.

Then as we optimize code, we place probes in the high levels of code,  
then measure, change, add probes, measure, change... recursing down  
through the lower levels of code.

One reason not to use DTRACE, is we want the performance stats  
available to admins in production settings. Maybe we need both low  
level DTRACE profiling, and high level performance monitoring, though  
it would be nice if we could combine them.

On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 4, 2008, at 15:43 , Kevin Jackson wrote:

> Hi,
>
>>> More developers would be excellent, though it's still very early  
>>> days. :)
>
> I'm a budding Erlang developer (although my apache interest is with
> Ant), and I'd be interested in helping out.  At the moment I have very
> little time (even for working on Ant :( ), but I do have a few Erlang
> projects under my belt and I know the language well enough to start
> contributing - my only problem is time commitment.
>
> Is there an area of the code base which a n00b could get acquainted
> with with minimal fuss/effort?

I could have sworn I answered this, but my records don't show that.

The best place to jump into CouchDB is couch_http.erl. It defines
CouchDB's public API and contains the code that you are most
likely to be familiar with from the outside. Most other things are
called from there.

May I point you to the "HTTP and daemon plugin architecture"
thread for more details on what we could use? :)

Cheers
Jan
--

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Kevin Jackson <fo...@gmail.com>.

Hi,

>> More developers would be excellent, though it's still very early days. :)

I'm a budding Erlang developer (although my apache interest is with
Ant), and I'd be interested in helping out.  At the moment I have very
little time (even for working on Ant :( ), but I do have a few Erlang
projects under my belt and I know the language well enough to start
contributing - my only problem is time commitment.

Is there an area of the code base which a n00b could get acquainted
with with minimal fuss/effort?

Kev

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 3, 2008, at 20:23, Noah Slater wrote:

> On Thu, Jul 03, 2008 at 07:55:31PM +0200, Jan Lehnardt wrote:
>> This might be me, but a native Erlang API would be nice.
>> For the simple reason to attract Erlang developers for the
>> project.
>
> We talked about this before, I think.
>
> Didn't we decide that it was overly complex with little ROI?

We might have misjudged the R regarding getting Erlang
developers on board back then. We might have not, though.


>> There is another reason beside testing to get more Erlang  
>> developers to look
>> at the code. The ASF Incubator is about growing a (developer-)  
>> community
>> around a project. So far, we are still the four original developers  
>> which
>> might suggest that we better look for more support from within the  
>> Erlang
>> community (which I personally find to be nice and supportive) to  
>> ensure that
>> we can graduate from the Incubator.
>
> More developers would be excellent, though it's still very early  
> days. :)

I don't have the feeling that this is 'very early' and I don't know why
that would be a reason not to attract more developers? :)

Cheers
Jan
--

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Noah Slater <ns...@apache.org>.

On Thu, Jul 03, 2008 at 07:55:31PM +0200, Jan Lehnardt wrote:
> This might be me, but a native Erlang API would be nice.
> For the simple reason to attract Erlang developers for the
> project.

We talked about this before, I think.

Didn't we decide that it was overly complex with little ROI?

> There is another reason beside testing to get more Erlang developers to look
> at the code. The ASF Incubator is about growing a (developer-) community
> around a project. So far, we are still the four original developers which
> might suggest that we better look for more support from within the Erlang
> community (which I personally find to be nice and supportive) to ensure that
> we can graduate from the Incubator.

More developers would be excellent, though it's still very early days. :)

Best,

-- 
Noah Slater, http://people.apache.org/~nslater/

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Chris Anderson <jc...@grabb.it>.

On Thu, Jul 3, 2008 at 10:55 AM, Jan Lehnardt <ja...@apache.org> wrote:
>  We might want to consider making a native Erlang
> API and put the HTTP API on top of that to ensure feature
> parity. This is, however one more level of indirection for the
> folks that use the HTTP API which might be the majority of
> users.

Could we build the native Erlang API as a wrapper on top of the HTTP
API? Not as an Erlang client library that uses HTTP transport, but as
a set of exposed hooks into couch_httpd, maybe with friendly names?
This might be a good way that neither API falls behind.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by José Manuel Peña <jo...@gmail.com>.

+1

2008/7/4 Jan Lehnardt <ja...@apache.org>:
> This might be me, but a native Erlang API would be nice.
> For the simple reason to attract Erlang developers for the
> project. They don't want to go through HTTP and they have
> systems in place that are at a scale that we can only dream
> of having available for testing. So we better give them what
> they want. We might want to consider making a native Erlang
> API and put the HTTP API on top of that to ensure feature
> parity. This is, however one more level of indirection for the
> folks that use the HTTP API which might be the majority of
> users. Replication I think, should still go through HTTP, at
> least initially.
>
> There is another reason beside testing to get more Erlang
> developers to look at the code. The ASF Incubator is about
> growing a (developer-) community around a project. So far,
> we are still the four original developers which might suggest
> that we better look for more support from within the Erlang
> community (which I personally find to be nice and supportive)
> to ensure that we can graduate from the Incubator.
>
> I hope you agree with me.
>
> Cheers
> Jan
> --
>
> On Jul 2, 2008, at 09:08, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>>
>
>



-- 
Saludos,

José Manuel Peña

Re: Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 3, 2008, at 18:55 , Jan Lehnardt wrote:

> This might be me, but a native Erlang API would be nice.

Looks like we are getting this with Damien's proposed
refactoring of the internals. Woohoo!

[Thread Closed]

Cheers
Jan
--

>
> For the simple reason to attract Erlang developers for the
> project. They don't want to go through HTTP and they have
> systems in place that are at a scale that we can only dream
> of having available for testing. So we better give them what
> they want. We might want to consider making a native Erlang
> API and put the HTTP API on top of that to ensure feature
> parity. This is, however one more level of indirection for the
> folks that use the HTTP API which might be the majority of
> users. Replication I think, should still go through HTTP, at
> least initially.
>
> There is another reason beside testing to get more Erlang
> developers to look at the code. The ASF Incubator is about
> growing a (developer-) community around a project. So far,
> we are still the four original developers which might suggest
> that we better look for more support from within the Erlang
> community (which I personally find to be nice and supportive)
> to ensure that we can graduate from the Incubator.
>
> I hope you agree with me.
>
> Cheers
> Jan
> --
>
> On Jul 2, 2008, at 09:08, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>>
>
>

Native Erlang API - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

This might be me, but a native Erlang API would be nice.
For the simple reason to attract Erlang developers for the
project. They don't want to go through HTTP and they have
systems in place that are at a scale that we can only dream
of having available for testing. So we better give them what
they want. We might want to consider making a native Erlang
API and put the HTTP API on top of that to ensure feature
parity. This is, however one more level of indirection for the
folks that use the HTTP API which might be the majority of
users. Replication I think, should still go through HTTP, at
least initially.

There is another reason beside testing to get more Erlang
developers to look at the code. The ASF Incubator is about
growing a (developer-) community around a project. So far,
we are still the four original developers which might suggest
that we better look for more support from within the Erlang
community (which I personally find to be nice and supportive)
to ensure that we can graduate from the Incubator.

I hope you agree with me.

Cheers
Jan
--

On Jul 2, 2008, at 09:08, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --
>

Re: Integrated Full Text Indexing and Reporting Re: CouchDB 0.9 and 1.0

Posted by Paul Davis <pa...@gmail.com>.

The patch for Issue74 only affects the line protocol between the
external processes. I think that the biggest show stopper to getting
full text searching right now is the fluidity of how CouchDB is going
to start interfacing with external software. Whether things move
towards having some sort of plugin interface etc should probably be
settled before doing too much work on this. (Assuming that most of the
FTI work will be involved in the integration step.)

Also the note on intersecting views with FTI search results is
interesting, but I'm not certain how that would work implementation
wise. I could see some pretty harsh run time characteristics come into
play when attempting to merge between indices that are in and out of
couchdb.

Not to say it wouldn't be a kick ass feature, but it almost seems like
something that wouldn't be feasible without an erlang FTI engine. In
other news, implementing intersections for arbitrary views might an
entirely separate feature to implement.

Paul

On Sat, Jul 12, 2008 at 5:24 PM, Jan Lehnardt <ja...@apache.org> wrote:
>
> On Jul 11, 2008, at 22:29 , Damien Katz wrote:
>
>> CouchDB needs integrate full-text indexing support. We should be able to
>> support multiple full text engines, but our reference implementation will be
>> Apache Lucene.
>>
>> Initially (I'm hoping for 0.9.0)  we should be able to index all documents
>> and their attachments (for types that lucene can index anyway) and return
>> queries against that index via. Jan has begun this work and I think someone
>> has this mostly working now somewhere, but its not in trunk?
>
> we have a patch that improves the API here:
> https://issues.apache.org/jira/browse/COUCHDB-74
> and there is the
> http://svn.apache.org/repos/asf/incubator/couchdb/branches/lucene-search/
> branch that this patch should be applied to. Further work should be
> continued there. At this
> point the only difference between trunk and the branch is the addition of
> the /db/_search
> API call. The branch also might need to be brought up to trunk. It has no
> current maintainer,
> although Paul Davis voiced interest in pushing this forward. Also, there
> were attempts at adding
> other search engines but they never surfaced. If I remember correctly, the
> problem that views
> can not be searched without expanding the view server, stopped most work.
>
>
>> By 1.0, we should also do a view intersections with full text results. At
>> query time, CouchDB gets back a list of matching documents and then finds
>> the emited view rows from those documents,  and returns them sorted by
>> relevance score. This will require some enhancements to the internal view
>> API, but the data and required index (views keys by doc id) already exist to
>> make this efficient.
>
> I opened a bug report for this.
>
>
> --
>
> Since I started the work on Lucene I am by open source work definition
> somewhat responsible for the life of this. But I'd rather not, at least for
> the Java side of things. If somebody (heya Paul, still in?) wants to take
> this over, that'd be mighty cool.
>
>
> Cheers
> Jan
> --
>
>> Perhaps not initially, but eventually the integration of the fulltext
>> engine will be as proper couchdb HTTP and daemon plug-ins (once those apis
>> are established).
>>
>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>
>>> Hello everybody,
>>> this thread is meant to collect missing work items (features and
>>> bugs) for for our 1.0 release and a discussion about how to split
>>> them up between 0.9 and 1.0.
>>>
>>> Take it away: Damien.
>>>
>>> Cheers
>>> Jan
>>> --
>>
>>
>
>

Re: Integrated Full Text Indexing and Reporting Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

On Jul 11, 2008, at 22:29 , Damien Katz wrote:

> CouchDB needs integrate full-text indexing support. We should be  
> able to support multiple full text engines, but our reference  
> implementation will be Apache Lucene.
>
> Initially (I'm hoping for 0.9.0)  we should be able to index all  
> documents and their attachments (for types that lucene can index  
> anyway) and return queries against that index via. Jan has begun  
> this work and I think someone has this mostly working now somewhere,  
> but its not in trunk?

we have a patch that improves the API here: https://issues.apache.org/jira/browse/COUCHDB-74
and there is the http://svn.apache.org/repos/asf/incubator/couchdb/branches/lucene-search/
branch that this patch should be applied to. Further work should be  
continued there. At this
point the only difference between trunk and the branch is the addition  
of the /db/_search
API call. The branch also might need to be brought up to trunk. It has  
no current maintainer,
although Paul Davis voiced interest in pushing this forward. Also,  
there were attempts at adding
other search engines but they never surfaced. If I remember correctly,  
the problem that views
can not be searched without expanding the view server, stopped most  
work.

> By 1.0, we should also do a view intersections with full text  
> results. At query time, CouchDB gets back a list of matching  
> documents and then finds the emited view rows from those documents,   
> and returns them sorted by relevance score. This will require some  
> enhancements to the internal view API, but the data and required  
> index (views keys by doc id) already exist to make this efficient.

I opened a bug report for this.

--

Since I started the work on Lucene I am by open source work definition  
somewhat responsible for the life of this. But I'd rather not, at  
least for the Java side of things. If somebody (heya Paul, still in?)  
wants to take this over, that'd be mighty cool.

Cheers
Jan
--

> Perhaps not initially, but eventually the integration of the  
> fulltext engine will be as proper couchdb HTTP and daemon plug-ins  
> (once those apis are established).
>
> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>
>

Integrated Full Text Indexing and Reporting Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@apache.org>.

CouchDB needs integrate full-text indexing support. We should be able  
to support multiple full text engines, but our reference  
implementation will be Apache Lucene.

Initially (I'm hoping for 0.9.0)  we should be able to index all  
documents and their attachments (for types that lucene can index  
anyway) and return queries against that index via. Jan has begun this  
work and I think someone has this mostly working now somewhere, but  
its not in trunk?

By 1.0, we should also do a view intersections with full text results.  
At query time, CouchDB gets back a list of matching documents and then  
finds the emited view rows from those documents,  and returns them  
sorted by relevance score. This will require some enhancements to the  
internal view API, but the data and required index (views keys by doc  
id) already exist to make this efficient.

Perhaps not initially, but eventually the integration of the fulltext  
engine will be as proper couchdb HTTP and daemon plug-ins (once those  
apis are established).

On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --

Re: HTTP and daemon plug-in architecture Re: CouchDB 0.9 and 1.0

Posted by Chris Anderson <jc...@grabb.it>.

On Fri, Jul 11, 2008 at 2:47 PM, Damien Katz <da...@apache.org> wrote:
> We really need a plug-in architecture for Couchdb for the both front end
> HTTP server so anyone can create a custom Erlang handler.
>
> These handlers would be written in Erlang but might delegate the actual work
> to external processes via a piped interface or whatever.
>
> Jan's work on the configuration API is also important here, as we need a
> standardized way to support plug-ins each with their own configuration
> settings.

I've done a little work in this vein for my Action Servers experiment.
I should have time in the next couple of weeks to deploy some apps
using Action Servers. Then I'll be ready to start refactoring it with
an eye toward making it a good guinea-pig plugin. I'm excited to have
a way to maintain them that is more modular than mere patches to
CouchDB's trunk.

Are there other Erlang projects with a plugin architecture that might
be a good example of how to manage it from a building/linking
perspective?

-- 
Chris Anderson
http://jchris.mfdz.com

HTTP and daemon plug-in architecture Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@apache.org>.

We really need a plug-in architecture for Couchdb for the both front  
end HTTP server so anyone can create a custom Erlang handler. Already  
our internal handlers are getting too numerous and would benefit with  
better modularization.

These handlers would be written in Erlang but might delegate the  
actual work to external processes via a piped interface or whatever.

Also we need daemon add-in support too, for caching and indexing  
processes. Erlang makes this easy, we just need to actually expose it  
in a standardized manner.

Jan's work on the configuration API is also important here, as we need  
a standardized way to support plug-ins each with their own  
configuration settings.

We'll also need a stable, documented internal Erlang API to make this  
happen. I think the internals of CouchDB are finally stabilizing to  
the point where this is possible.

Also, internal code modularization. I think overall, the  
modularization of the CouchDB internals is correct, that is the  
boundaries between modules are most well properly defined and  
decoupled. However, the internal documentation is non-existant and the  
modularization isn't really granular enough. Too many source files are  
simply too long, the code within needs to be broken out into further  
sub-modules. Only there really isn't good support for sub-modules.  
Which I guess is me blaming Erlang for the way I wrote the code, but I  
really do think Erlang needs another level of code organization. Or  
maybe we need more Erlang module/package thingies, instead of just one  
big couchdb module. Ideas welcome.

I'm going to spend a lot of time on this for the next release. A heavy  
duty refactoring is what's called for. But mostly I just plan to  
reorder, organize and comment the code, more than rewrite anything.  
Fortunately the codebase is small enough to make that a relatively  
small job. Small amounts of work here can have huge impact on other  
ability to work with the code and submit patches.

On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --

Re: Runtime Configuration - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

In addition to the runtime configuration system,
this branch comes with an erlang-based unit
test system that can be run on installation time
without the browser. This is, for now, to unit-test
internals and not the HTTP API for wich the Futon-
based test suite works just fine.

I have only added a few unit tests for the runtime
config so far but I plan to expand it. This can also
be used by other modules within CouchDB. For the
time being, this is a very simple erlang module, but
we might want to do the switch to eunit here which
is a more comprehensive solution to unit testing
soonish. Since unit testing usually involves declaring
functions that are then called by the testing framework,
I'd say it is not that hard to switch from the simple
one we have now to eunit, it doesn't matter if we
wait with the eunit switch and instead start writing
more tests against the current suite. (Well, I can't
call it a framework or suite, it is just a few lines of
code that I stole from David Reid (not the Apache
one)).

Additionally, Noah was so kind to hook up edoc
generation to the build system. edoc is javadoc
for Erlang. The build system scans our .erl files
and generates a nice HTML API reference. The
runtime config modules already are documented
completely with edoc comments and I think we
should make the same true for the rest of our
modules as we go along. edoc is extemely simple
to understand and this should be easy.

Both things will come with the runtime config
branch and are "only" additions to the codebase
that don't introduce new features or otherwise
interfere with the existing code so them being
in a transient state for a release is no big deal
I think and I don't think that we need to make
any of them a priority for either release, while
they are certainly nice to have.

Cheers
Jan
--

On Jul 3, 2008, at 10:45, Jan Lehnardt wrote:

> The runtime configuration is nearly complete
> and just needs a bit of polishing. I'd ask if we
> can merge the branch back to trunk soon to
> get it more widely exposed and tested and
> hopefully stable for the 0.9 release.
>
> Since it changes the names of configuration
> variables it must come with big warning signs
> or a smooth migration path (that we have to
> come up with, yet).
>
> Cheers
> Jan
> --
>
> On Jul 2, 2008, at 09:08, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>>
>
>

Re: Runtime Configuration - Re: CouchDB 0.9 and 1.0

Posted by Noah Slater <ns...@apache.org>.

On Thu, Jul 03, 2008 at 10:45:50AM +0200, Jan Lehnardt wrote:
> Since it changes the names of configuration variables it must come with big
> warning signs or a smooth migration path (that we have to come up with, yet).

I don't think we need to worry about this until we release 0.9.

I say, merge to trunk and put a notice in NEWS.

-- 
Noah Slater, http://people.apache.org/~nslater/

Runtime Configuration - Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

The runtime configuration is nearly complete
and just needs a bit of polishing. I'd ask if we
can merge the branch back to trunk soon to
get it more widely exposed and tested and
hopefully stable for the 0.9 release.

Since it changes the names of configuration
variables it must come with big warning signs
or a smooth migration path (that we have to
come up with, yet).

Cheers
Jan
--

On Jul 2, 2008, at 09:08, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --
>

Re: View index compaction Re: CouchDB 0.9 and 1.0

Posted by Jan Lehnardt <ja...@apache.org>.

I opened a bug report for this.

On Jul 2, 2008, at 17:57 , Damien Katz wrote:

> Right now, view indexes just grow and grow with each new index  
> update. Since they are just indexes, and not the data itself,  
> compaction is simply a matter of deleting the index files.
>
> Also, the current Btree implementation isn't completely self  
> balanacing. It misses a balancing condition, partially for  
> efficiency (it's an expensive balancing operation), and for  
> expediency. It was easier to not implement it and gets the general  
> case perormance boost.
>
> The thing about this is, the btree code can remain as is if the  
> indexing compaction just recopies the map values (and back indexes)  
> and recomputes the reduction values. That's a very simple design,  
> however, if the btree is completely self balancing, then the btree  
> can be copied on a node by node basis, instead of a value by value  
> basis, and the reduction values need not be recomputed all. This  
> will make the compaction significantly faster overall.
> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>
>> Hello everybody,
>> this thread is meant to collect missing work items (features and
>> bugs) for for our 1.0 release and a discussion about how to split
>> them up between 0.9 and 1.0.
>>
>> Take it away: Damien.
>>
>> Cheers
>> Jan
>> --
>
>

View index compaction Re: CouchDB 0.9 and 1.0

Posted by Damien Katz <da...@gmail.com>.

Right now, view indexes just grow and grow with each new index update.  
Since they are just indexes, and not the data itself, compaction is  
simply a matter of deleting the index files.

Also, the current Btree implementation isn't completely self  
balanacing. It misses a balancing condition, partially for efficiency  
(it's an expensive balancing operation), and for expediency. It was  
easier to not implement it and gets the general case perormance boost.

The thing about this is, the btree code can remain as is if the  
indexing compaction just recopies the map values (and back indexes)  
and recomputes the reduction values. That's a very simple design,  
however, if the btree is completely self balancing, then the btree can  
be copied on a node by node basis, instead of a value by value basis,  
and the reduction values need not be recomputed all. This will make  
the compaction significantly faster overall.
On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:

> Hello everybody,
> this thread is meant to collect missing work items (features and
> bugs) for for our 1.0 release and a discussion about how to split
> them up between 0.9 and 1.0.
>
> Take it away: Damien.
>
> Cheers
> Jan
> --