You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Martin Scholl <ms...@diskware.net> on 2009/01/31 15:49:16 UTC

couch_gen_btree: pluggable storage / tree engines

Hello all,


although I am still doing a CouchDB review to better understand its
design, I like to ask for comments for a tiny idea.
I would like to add another index structure to CouchDB (a Merkle-Tree)
and come up with asking myself what the best way of doing this would be.
I have a rough guess of how closely couch_btree is knit into CouchDB.
Therefore I would like to hear from you experienced developers comments
on some of my ideas:

My suggestion is a MySQL-ly approach (pluggable engines) for CouchDB,
that is to factor out several components into generic behaviours:
e.g.
- a couch_gen_tree:
	abstracts access to couch_btree

Maybe even a
- couch_gen_storage
	e.g. file system, file storage access, etc.

- couch_gen_replicator
	an imperative approach to tree / storage replication.

As I said: I am new to CouchDB's code so I cannot really estimate how
the current layering approach looks like, and whether we can even split
out the 3 components.

Imho there would be several benefits in having this flexibility brough
by couch_gen_*:
- new use-cases for CouchDB:
	- R-trees: for adding another way of querrying
	  documents (e.g. nearest neighbour search)
	- genome databases
	- a special datastructure for indexing tags
	- ...
- with a flexible storage layer, CouchDB could ran on top of other
infrastructures and products: like S3, SimpleDB, AppEngine, etc.
- (the following is just a guess:) a cleaner CouchDB codebase with a
clear layering and separation of components
- possibly (again, just a guess): with the plugin approach, we can more
easily support advanced indexing and db management schemes, like
distributed storage access, distributed transactions, etc.


Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Jan Lehnardt <ja...@apache.org>.

On 1 Feb 2009, at 12:18, Antony Blakey wrote:

> On 01/02/2009, at 8:44 PM, Jan Lehnardt wrote:
>
>> To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master
>
> Yeah, sorry.
>
> I think I screwed my github when the upstream switched layouts (IIRC  
> it was a full svn copy then moved to trunk). My git-fu is git- 
> fucked. I've recently locally cloned from halorgium in preparation  
> for redoing the filename mods. At least the external filename v1  
> stuff is there.

I tried reviewing the patch but the fucked-upness of the  
escaped_filenames branch doesn't let me clone or view it. Is it  
possible for you to put the patch up somewhere?

Cheers
Jan
--

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Antony Blakey <an...@gmail.com>.

Thanks Jan,

On 01/02/2009, at 10:00 PM, Jan Lehnardt wrote:

> To save the others from searching: http://mail-archives.apache.org/mod_mbox/couchdb-user/200812.mbox/%3c27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com%3e 
>  :-)

Note that if you want to do this, please read the thread because a)  
that code isn't strictly correct and b) there are some other issues to  
be aware of (and I'm guilty of some misinformation partway through  
that thread). Furthermore, it's only a proof-of-concept test example.

In summary, alternate forms of indexing in an _external is easy,  
although it is single-threaded, which it turns out to be lucky because  
it guarantees that the _external sees a monotonic increasing  
update_seq without any tricky coordination issues when updating your  
index.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

What can be done with fewer [assumptions] is done in vain with more
   -- William of Ockham (ca. 1285-1349)

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Jan Lehnardt <ja...@apache.org>.

Thanks again.

On 1 Feb 2009, at 12:18, Antony Blakey wrote:
>
> I think I screwed my github when the upstream switched layouts (IIRC  
> it was a full svn copy then moved to trunk). My git-fu is git- 
> fucked. I've recently locally cloned from halorgium in preparation  
> for redoing the filename mods. At least the external filename v1  
> stuff is there.
>
> I think the patch for external erlang modules has been lost. In any  
> case, it's trivial, involving nothing more than changes to bin/ 
> couchdb.tpl.in to allow a plugins directory, and add each plugin to  
> the erlang load path and read it's .ini file. I custom built my  
> plugin, so I didn't need to mod the build system.
>
> The ruby source and subsequent discussion for the external indexer  
> was posted to couchdb-user on 21 December 2008 in a thread called  
> 'couchdb' (!) started by Tim Parkin.

To save the others from searching: http://mail-archives.apache.org/mod_mbox/couchdb-user/200812.mbox/%3c27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com%3e 
  :-)

Cheers
Jan
--

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Antony Blakey <an...@gmail.com>.

On 01/02/2009, at 8:44 PM, Jan Lehnardt wrote:

> To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master

Yeah, sorry.

I think I screwed my github when the upstream switched layouts (IIRC  
it was a full svn copy then moved to trunk). My git-fu is git-fucked.  
I've recently locally cloned from halorgium in preparation for redoing  
the filename mods. At least the external filename v1 stuff is there.

I think the patch for external erlang modules has been lost. In any  
case, it's trivial, involving nothing more than changes to bin/ 
couchdb.tpl.in to allow a plugins directory, and add each plugin to  
the erlang load path and read it's .ini file. I custom built my  
plugin, so I didn't need to mod the build system.

The ruby source and subsequent discussion for the external indexer was  
posted to couchdb-user on 21 December 2008 in a thread called  
'couchdb' (!) started by Tim Parkin.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

In anything at all, perfection is finally attained not when there is  
no longer anything to add, but when there is no longer anything to  
take away.
   -- Antoine de Saint-Exupery

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 8:56 PM, Noah Slater wrote:

> I've found this thread a little hard to follow so far, but why not  
> configure the
> plugins from /etc/couchdb/local.ini itself, including the  
> configuration that
> tells CouchDB which plugins to work. I'm not sure I'm comfortable  
> with the idea
> of CouchDB scanning a list of directories each time, loading things  
> automatically.

I did it this way to avoid touching the CouchDB code, and not wanting  
to impose another file format or do parsing in shell script.  
Especially given that I have an imminent requirement for CouchDB/Win32  
where the shell is complete shit. OTOH I may have to replace the start  
script with C for that reason.

It might be better to have an erlang plugin manager that can deal with  
plugin selection, dependencies and ordering requirements, but that's a  
more intrusive solution.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The greatest challenge to any thinker is stating the problem in a way  
that will allow a solution
   -- Bertrand Russell

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Noah Slater <ns...@apache.org>.

On Mon, Feb 02, 2009 at 06:07:37PM +1030, Antony Blakey wrote:
> My concern is how to allow plugins to be updated without stomping on
> user configuration, which is why I suggest having the local plugin
> configuration be outside of the plugin's directory. Maybe that's a
> deployment required that's unlike to happen, but I look forward to a
> time when couch plugins are like rubygems and regular updating is
> common.

I've found this thread a little hard to follow so far, but why not configure the
plugins from /etc/couchdb/local.ini itself, including the configuration that
tells CouchDB which plugins to work. I'm not sure I'm comfortable with the idea
of CouchDB scanning a list of directories each time, loading things automatically.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 5:36 PM, Chris Anderson wrote:

> Maybe each plugin could have a default.ini and a local.ini, so  
> something like:
>
> plugins/
>    geo_index/
>        Makefile.am.in
>        src/
>            geo_index.erl
>            geo_httpd.erl
>            geo_manager.erl
>        default.ini
>        local.ini
>    ...
>
>
> Then CouchDB could automatically pickup plugin config from each
> directory, running all the defaults before all the locals.

I think we may be talking at cross-purposes here - I'm thinking  
particularly of plugins that are separately compiled without being  
included in the canonical sources, while I gather you're focussed on a  
refactored core or a way to handle less tightly coupled contributions.  
I'm interested in seeing those things happen, but I'll be guided by  
you as to what I should for that. In particular I know sfa about  
automake et al.

OTOH, I think top level default.ini + local.ini + local.*.ini would be  
a good idea regardless, and a simple change.

> I'm not invested in these specifics, just trying to think of ways it
> could look that would give new developers the least amount of
> head-scratching.

I think head-scratching would be best mitigated by some good docs. If  
we want to implement this then I'll commit to writing a spec/tutorial  
on the wiki.

My concern is how to allow plugins to be updated without stomping on  
user configuration, which is why I suggest having the local plugin  
configuration be outside of the plugin's directory. Maybe that's a  
deployment required that's unlike to happen, but I look forward to a  
time when couch plugins are like rubygems and regular updating is  
common.

Somewhat OT: plugins are also a mechanism for deploying generic non- 
couch functionality for a deployment scenario like mine where every  
user runs a Couch instance ala Notes client e.g. distributed voting  
algorithms and scalaris-like functionality for emergent clusters  
within wider p2p meshes.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Plurality is not to be assumed without necessity
   -- William of Ockham (ca. 1285-1349)

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Chris Anderson <jc...@apache.org>.

On Sun, Feb 1, 2009 at 10:39 PM, Antony Blakey <an...@gmail.com> wrote:
>
> I imagined that the ini file in the plugin directory wouldn't be modified by
> users i.e. that's "part of the plugin". Users would modify the local.ini
> file to change the user-configurable parts of that. That makes me think that
> the startup script should load local.*.ini, so you can separate your
> per-plugin user-configurable bits. The reason for not modifying the
> plugin.ini directly is so that you can update them easily.
>

There's a bunch of ways we could do this. The simplest overall system
I can think of would be a `make plugins` target that builds whatever
is in the plugins directory.

Maybe each plugin could have a default.ini and a local.ini, so something like:

plugins/
    geo_index/
        Makefile.am.in
        src/
            geo_index.erl
            geo_httpd.erl
            geo_manager.erl
        default.ini
        local.ini
    ...

Then CouchDB could automatically pickup plugin config from each
directory, running all the defaults before all the locals.

I'm not invested in these specifics, just trying to think of ways it
could look that would give new developers the least amount of
head-scratching.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 4:46 PM, Chris Anderson wrote:

> On Sun, Feb 1, 2009 at 9:40 PM, Antony Blakey  
> <an...@gmail.com> wrote:
>> There are command line options for manipulating the list of  
>> plugins. By
>> default it will load plugins from /etc/couchdb/plugins/ or whatever  
>> your
>> localconfdir is - mine is always relative - but this can be  
>> controlled by
>> the options:
>
>> -C      clean the plugins list
>> -D      add all the plugins in $DEFAULT_PLUGIN_DIR
>> -G DIR  add all the plugins in DIR
>> -P DIR  add the plugin DIR
>> Bit light on testing, I wanted to show how trivial it is.
>
> Thanks for getting started on this. It looks like with a make plugins
> target, this will be ready to roll.

I think there are multiple levels of modularity possible - some core  
features could be plugins, purely in a separation-of-concerns kind of  
way. Then there would be third-party plugins. This could be a  
convenient way to play with some new contributions without having to  
merge into the a single code directory (assuming each plugin is  
separate in the source). And finally there would be per-user plugins  
that customize couch on a per-app basis.

I assume you mean a make plugins target for core plugins?

> One thing I don't understand is the plugins list and the additional
> command-line options to launch script. I wonder if there's a
> non-ephemeral way to control the list of loaded plugins.

Well, there is the default location. If a plugin were to have an  
extension e.g. '.plugin' then the activation could be controlled via  
renaming. OTOH I like the Apache (httpd) mechanism of having  
'available/active' dirs, which seems cleaner.

I use the command line options to the startup script because I have  
couchdb/erlang/icu etc packaged as ruby gems that deploys couch using  
dependcies, and the class that controls couchdb allows multiple  
instances in self-contained locations. It looks a bit like this:

---------------------------------------

require 'memetic'
require 'memetic-erlang'
require 'memetic-spidermonkey'
require 'memetic-icu'

module Memetic

   class CouchDB

     DIRECTORY = %x{cd #{File.dirname(File.dirname(__FILE__)) + "/ 
BUILD"} ; pwd}.split()[0]

     def CouchDB.configure_environment(env)
       Erlang.configure_environment(env)
       Spidermonkey.configure_environment(env)
       ICU.configure_environment(env)
     end

     def initialize(local_directory)
       @local_directory = local_directory
       Dir.mkdir(@local_directory) unless File.exists?(@local_directory)

       @log_directory = "#{@local_directory}/log"
       Dir.mkdir(@log_directory) unless File.exists?(@log_directory)

       @data_directory = "#{@local_directory}/data"
       Dir.mkdir(@data_directory) unless File.exists?(@data_directory)

       File.open("#{@local_directory}/configuration.ini", "w") do |f|
         f << configuration()
       end

     end

     def start
       env = Environment.new
       CouchDB.configure_environment(env)
       cmd = env.as_command_prefix
       cmd << "( cd '#{DIRECTORY}' ; "
       cmd << "./bin/couchdb -b "
       cmd << "-o '#{@log_directory}/couchdb.stdout' "
       cmd << "-e '#{@log_directory}/couchdb.stderr' "
       cmd << "-p '#{@local_directory}/pid' "
       cmd << "-c './etc/couchdb/default.ini' "
       cmd << "-c '#{@local_directory}/configuration.ini' "
       #cmd << "-G '#{@local_directory}/plugins' "
       cmd << " )"
       IO.popen(cmd) do |p|
         s = p.gets
         s.chomp if s
       end
     end

     def status
       env = Environment.new
       CouchDB.configure_environment(env)
       IO.popen("#{env.as_command_prefix} #{DIRECTORY}/bin/couchdb -p  
#{@local_directory}/pid -s") do |p|
         s = p.gets
         s.chomp if s
       end
     end

     def stop
       env = Environment.new
       CouchDB.configure_environment(env)
       IO.popen("#{env.as_command_prefix} #{DIRECTORY}/bin/couchdb -p  
#{@local_directory}/pid -d") do |p|
         s = p.gets
         s.chomp if s
       end
     end

     def configuration
       return <<END_OF_STRING
[couchdb]
database_dir = #{@data_directory}

[log]
file = #{@log_directory}/couch.log
level = debug
END_OF_STRING
     end

   end

end

---------------------------------------

> As far as configuration, users will likely also want to configure
> their API endpoints as http_db_handlers or http_global_handlers. These
> allow you to hook an Erlang function directly to a CouchDB path.

Sure.

I imagined that the ini file in the plugin directory wouldn't be  
modified by users i.e. that's "part of the plugin". Users would modify  
the local.ini file to change the user-configurable parts of that. That  
makes me think that the startup script should load local.*.ini, so you  
can separate your per-plugin user-configurable bits. The reason for  
not modifying the plugin.ini directly is so that you can update them  
easily.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A reasonable man adapts himself to suit his environment. An  
unreasonable man persists in attempting to adapt his environment to  
suit himself. Therefore, all progress depends on the unreasonable man.
   -- George Bernard Shaw

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Chris Anderson <jc...@apache.org>.

On Sun, Feb 1, 2009 at 9:40 PM, Antony Blakey <an...@gmail.com> wrote:
> There are command line options for manipulating the list of plugins. By
> default it will load plugins from /etc/couchdb/plugins/ or whatever your
> localconfdir is - mine is always relative - but this can be controlled by
> the options:

>  -C      clean the plugins list
>  -D      add all the plugins in $DEFAULT_PLUGIN_DIR
>  -G DIR  add all the plugins in DIR
>  -P DIR  add the plugin DIR
> Bit light on testing, I wanted to show how trivial it is.

Thanks for getting started on this. It looks like with a make plugins
target, this will be ready to roll.

One thing I don't understand is the plugins list and the additional
command-line options to launch script. I wonder if there's a
non-ephemeral way to control the list of loaded plugins.

At this point `utils/run` is still a copy-paste job from `couchdb`.
Now might be a good opportunity to make that more sensible.

As far as configuration, users will likely also want to configure
their API endpoints as http_db_handlers or http_global_handlers. These
allow you to hook an Erlang function directly to a CouchDB path.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Ben Browning <be...@gmail.com>.

To spark some additional conversation around plugins, is there a clear
idea how we would handle plugin X that also uses plugin Y? For
example, I could see the Erlang API, the partitioning/clustering
functionality, and individual pieces of the clustering as additional
plugins (consistent hashing algorithm being the immediately obvious
one). The partitioning would depend on the Erlang API and a consistent
hashing plugin being configured.

For now I'm just throwing additional modules in the src/couchdb
directory in my git branch but there are enough modules in there it
makes me hesitate to add more. We could separate these into multiple
folders by using Erlang packages, but a plugin system could accomplish
the same goal and provide additional benefit for 3rd-party plugins.

It's not a pressing priority, but getting a plugin system in-place
would better allow for 3rd-party development around CouchDB and make
it easier to release and version functionality separately from the
core app.


Ben

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Chris Anderson <jc...@apache.org>.

On Mon, Feb 2, 2009 at 4:39 AM, Antony Blakey <an...@gmail.com> wrote:
>
> On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:
>
>>
>> On 2 Feb 2009, at 12:45, Antony Blakey wrote:
>>
>>>
>>> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>>>
>>>> That's fine with me. (Says the man who's not going to develop it.)
>>>
>>> Unfortunately requiring a bigger commitment to this feature than I can
>>> manage at this time.
>>
>> I hope you can push for an agreed-upon design and give a rough todo list
>> and maybe start with it and leave open work items with a good description
>> in JIRA for others to pick up. Would that work?
>
> It's just a matter of doing in erlang what I did in the script, finding out
> how to extend the load path. Hopefully my implementation shows how
> conceptually trivial it is. I agree with Noah about the configuration issue.
> Probably the only trick is that you need to load the config to find the
> plugins and then load more config from each plugin.
>
> I need to focus on the transactional _bulk_docs issue, and then the filename
> patch - that's a 0.9 blocker.
>

Yes, I think having this thread open which lays out the possibilities
is good work as long as we can avoid losing them to history by the
time we are ready to implement the plugin thing.

I think dropping a patch in Jira with a someday/maybe priority is th
best way to let us pick back up when the time comes.



-- 
Chris Anderson
http://jchris.mfdz.com

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Jan Lehnardt <ja...@apache.org>.

On 2 Feb 2009, at 13:39, Antony Blakey wrote:

>
> On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:
>
>> I hope you can push for an agreed-upon design and give a rough todo  
>> list
>> and maybe start with it and leave open work items with a good  
>> description
>> in JIRA for others to pick up. Would that work?
>
> It's just a matter of doing in erlang what I did in the script,  
> finding out how to extend the load path. Hopefully my implementation  
> shows how conceptually trivial it is. I agree with Noah about the  
> configuration issue. Probably the only trick is that you need to  
> load the config to find the plugins and then load more config from  
> each plugin.
>
> I need to focus on the transactional _bulk_docs issue, and then the  
> filename patch - that's a 0.9 blocker.

Great, thanks. Focus is good.

Cheers
Jan
--

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:

>
> On 2 Feb 2009, at 12:45, Antony Blakey wrote:
>
>>
>> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>>
>>> That's fine with me. (Says the man who's not going to develop it.)
>>
>> Unfortunately requiring a bigger commitment to this feature than I  
>> can manage at this time.
>
> I hope you can push for an agreed-upon design and give a rough todo  
> list
> and maybe start with it and leave open work items with a good  
> description
> in JIRA for others to pick up. Would that work?

It's just a matter of doing in erlang what I did in the script,  
finding out how to extend the load path. Hopefully my implementation  
shows how conceptually trivial it is. I agree with Noah about the  
configuration issue. Probably the only trick is that you need to load  
the config to find the plugins and then load more config from each  
plugin.

I need to focus on the transactional _bulk_docs issue, and then the  
filename patch - that's a 0.9 blocker.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a  
faithful servant. We have created a society that honours the servant  
and has forgotten the gift.
   -- Albert Einstein

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Jan Lehnardt <ja...@apache.org>.

On 2 Feb 2009, at 12:45, Antony Blakey wrote:

>
> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>
>> That's fine with me. (Says the man who's not going to develop it.)
>
> Unfortunately requiring a bigger commitment to this feature than I  
> can manage at this time.

I hope you can push for an agreed-upon design and give a rough todo list
and maybe start with it and leave open work items with a good  
description
in JIRA for others to pick up. Would that work?

Cheers
Jan
--

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 10:09 PM, Noah Slater wrote:

> That's fine with me. (Says the man who's not going to develop it.)

Unfortunately requiring a bigger commitment to this feature than I can  
manage at this time.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me one with  
everything". He then pays the vendor and asks for change. The vendor  
says, "Change comes from within".

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Noah Slater <ns...@apache.org>.

On Mon, Feb 02, 2009 at 10:06:48PM +1030, Antony Blakey wrote:
>
> On 02/02/2009, at 9:48 PM, Noah Slater wrote:
>
>> I think I agree with a comment earlier in this thread. I don't think
>> command
>> options are the place for this type of configuration. Ideally, you'd
>> want all
>> the configuration in the ini file.
>
> That means that the plugin system needs to be written in CouchDB itself,
> rather than being a lightweight launch-time facility.

That's fine with me. (Says the man who's not going to develop it.)

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 9:48 PM, Noah Slater wrote:

> I think I agree with a comment earlier in this thread. I don't think  
> command
> options are the place for this type of configuration. Ideally, you'd  
> want all
> the configuration in the ini file.

That means that the plugin system needs to be written in CouchDB  
itself, rather than being a lightweight launch-time facility.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The ultimate measure of a man is not where he stands in moments of  
comfort and convenience, but where he stands at times of challenge and  
controversy.
   -- Martin Luther King

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Noah Slater <ns...@apache.org>.

On Mon, Feb 02, 2009 at 09:31:36PM +1030, Antony Blakey wrote:
>>>  -D      add all the plugins in $DEFAULT_PLUGIN_DIR
>>>  -G DIR  add all the plugins in DIR
>>>  -P DIR  add the plugin DIR
>>
>> Why have you chosen all capitals?
>
> Because all the obvious lower case options have been used. I wanted to
> use long options, but the shell getopts doesn't allow it.

I think I agree with a comment earlier in this thread. I don't think command
options are the place for this type of configuration. Ideally, you'd want all
the configuration in the ini file. If you really need to override some ini file
setting, maybe using an environment variable would be the way forward.

Bottom line: I want to keep the number of command line options small.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 8:51 PM, Noah Slater wrote:

> This sounds pretty good.
>
> On Mon, Feb 02, 2009 at 04:10:56PM +1030, Antony Blakey wrote:
>>  -C      clean the plugins list
>
> What does this do? I think it needs more explanation.

clear the plugin list.

The plugin list is by default all the plugins in $DEFAULT_PLUGIN_DIR. - 
C clears the list, either to run without plugins, or as a precursor to  
using -D -G -P in some order to define a specific plugin ordering and/ 
or load additional plugins from non-default locations.

>>  -D      add all the plugins in $DEFAULT_PLUGIN_DIR
>>  -G DIR  add all the plugins in DIR
>>  -P DIR  add the plugin DIR
>
> Why have you chosen all capitals?

Because all the obvious lower case options have been used. I wanted to  
use long options, but the shell getopts doesn't allow it.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

There are two ways of constructing a software design: One way is to  
make it so simple that there are obviously no deficiencies, and the  
other way is to make it so complicated that there are no obvious  
deficiencies.
   -- C. A. R. Hoare

Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Noah Slater <ns...@apache.org>.

This sounds pretty good.

On Mon, Feb 02, 2009 at 04:10:56PM +1030, Antony Blakey wrote:
>   -C      clean the plugins list

What does this do? I think it needs more explanation.

>   -D      add all the plugins in $DEFAULT_PLUGIN_DIR
>   -G DIR  add all the plugins in DIR
>   -P DIR  add the plugin DIR

Why have you chosen all capitals?

-- 
Noah Slater, http://tumbolia.org/nslater

Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

Posted by Antony Blakey <an...@gmail.com>.

I've recreated my simple erlang plugins system

   http://github.com/AntonyBlakey/couchdb/tree/simple-erlang-plugins

A plugin is a directory. It can contain a plugin.ini file and an  
erlang directory.

The erlang directory is added to the erlang load path - you put your  
beam files in there.

The plugin.ini file is loaded after the default.ini, but before  
local.ini. If you supply your own ini files on the command line then  
it comes before them.

To start your plugin you probably want to add to the [daemons] section  
of your plugin.ini file. Check start_secondary_services in  
couch_server_sup.erl to see what happens with daemons, and you can see  
examples in default.ini (.tpl.in in the source).

If plugins need to be integrated in a different fashion, or  
dependencies are needed, then I'd suggest that a different config  
section be created to handle richer specification/invocation option  
format than [daemons].

There are command line options for manipulating the list of plugins.  
By default it will load plugins from /etc/couchdb/plugins/ or whatever  
your localconfdir is - mine is always relative - but this can be  
controlled by the options:
   -C      clean the plugins list
   -D      add all the plugins in $DEFAULT_PLUGIN_DIR
   -G DIR  add all the plugins in DIR
   -P DIR  add the plugin DIR
Bit light on testing, I wanted to show how trivial it is.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

There are two ways of constructing a software design: One way is to  
make it so simple that there are obviously no deficiencies, and the  
other way is to make it so complicated that there are no obvious  
deficiencies.
   -- C. A. R. Hoare

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Jan Lehnardt <ja...@apache.org>.

Thanks Antony!

On 1 Feb 2009, at 07:37, Antony Blakey wrote:

> On 01/02/2009, at 4:33 PM, Chris Anderson wrote:
>
>> You keep talking about these modifications.
>
> I supplied a github branch with the patch for plugins, and I  
> provided the ruby source for an external indexer. Hopefully "put up  
> or shut up" is satisfied :)

To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master


Cheers
Jan
--

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Antony Blakey <an...@gmail.com>.

On 01/02/2009, at 4:33 PM, Chris Anderson wrote:

> You keep talking about these modifications.

I supplied a github branch with the patch for plugins, and I provided  
the ruby source for an external indexer. Hopefully "put up or shut up"  
is satisfied :)

> We're anxious to see your
> db-name patch

I supplied a github branch with that patch. The view server was  
changed after I did it, which requires re-working the patch, and there  
were some other requests which I agreed to (slugs & slashes). The  
transaction issue is more important to me, so I'm waiting to see if I  
need to support a private branch for that before I revisit the unicode- 
file-name code.

> , and we've already added seq id to external.

When I posted the ruby indexer I noted that the full db-info is now  
provided per request - subsequent discussion prompted the db UUID  
thread idea. I haven't checked the code to determine the semantics of  
the purge_seq field - it may be enough to *catch* purges, although I  
think a purge hook would be a very useful optimization to ensure that  
externals don't require a full sweep.

> DB uuids
> are an interesting topic as well, but more interesting is code.

I was providing a perspective for Martin Scholl, my point being that  
what he wants is doable now, without having to make the canonical  
store pluggable (which IMO is neither feasible nor desirable).

Even without DB uuids, external indexing is still a very useful model.

> It's pretty flexible already, maybe just some path globs for `make
> plugins` is all it would take to get a plugins/helpers convention into
> the build. We could add starting of apps (crypto, ibrowse, etc) to
> default.ini. Then you'd probably have everything you need to add new
> modules to the couch system.

Yes, as I say, I supplied a gihub branch for using such modules. Even  
with no other changes, it's useful right now, and laughably trivial.

As far as core modularity is concerned, I don't know if it's feasible  
to patch that from outside the committers group because of the  
coordination required. I may be wrong.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

He who would make his own liberty secure, must guard even his enemy  
from repression.
   -- Thomas Paine

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Chris Anderson <jc...@apache.org>.

On Sat, Jan 31, 2009 at 9:02 PM, Antony Blakey <an...@gmail.com> wrote:
>
> On 01/02/2009, at 2:26 PM, Paul Davis wrote:
>
>> Very cool ideas. We've been discussing erlang plugins. The
>> conversation has generally gotten as far as, "erlang plugins... yeah
>> we should have those."
>
> The mechanical side of building and deploying standalone erlang plugins that
> extend couch, including extending the configuration locations, is very
> simple, roughly equivalent to the classpath/load_path issues in e.g.
> java/ruby, modulo the issue of config-file writeback from within CouchDB.
> I've done and deployed this, although the appropriate 'hooks' aren't always
> available, which I guess is the essence of the OP plugin system.
>
> I've built external indexers using both this mechanism, and _external, as
> have others (GeoCouch, FTI). At one stage I was able to execute SQL queries
> against Couch data that was replicated into SQLite databases, but I decided
> not to maintain it. As long as your indexer architecture acknowledges the
> fundamental Couch model, and is prepared to do a full regeneration at any
> time, you can replicate the operational semantics of inbuilt views quite
> easily and with high performance (e.g. no per-request hit to Couch). There
> are a few edge cases, namely purging and db re-creation that need a small
> mod to the core to allow 100% correct plugins.

You keep talking about these modifications. We're anxious to see your
db-name patch, and we've already added seq id to external. DB uuids
are an interesting topic as well, but more interesting is code.

>
> My $0.02 ... I think the existing build system would benefit from a more
> modular approach.

It's pretty flexible already, maybe just some path globs for `make
plugins` is all it would take to get a plugins/helpers convention into
the build. We could add starting of apps (crypto, ibrowse, etc) to
default.ini. Then you'd probably have everything you need to add new
modules to the couch system.

Patches welcome. :)

-- 
Chris Anderson
http://jchris.mfdz.com

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Antony Blakey <an...@gmail.com>.

On 01/02/2009, at 2:26 PM, Paul Davis wrote:

> Very cool ideas. We've been discussing erlang plugins. The
> conversation has generally gotten as far as, "erlang plugins... yeah
> we should have those."

The mechanical side of building and deploying standalone erlang  
plugins that extend couch, including extending the configuration  
locations, is very simple, roughly equivalent to the classpath/ 
load_path issues in e.g. java/ruby, modulo the issue of config-file  
writeback from within CouchDB. I've done and deployed this, although  
the appropriate 'hooks' aren't always available, which I guess is the  
essence of the OP plugin system.

I've built external indexers using both this mechanism, and _external,  
as have others (GeoCouch, FTI). At one stage I was able to execute SQL  
queries against Couch data that was replicated into SQLite databases,  
but I decided not to maintain it. As long as your indexer architecture  
acknowledges the fundamental Couch model, and is prepared to do a full  
regeneration at any time, you can replicate the operational semantics  
of inbuilt views quite easily and with high performance (e.g. no per- 
request hit to Couch). There are a few edge cases, namely purging and  
db re-creation that need a small mod to the core to allow 100% correct  
plugins.

My $0.02 ... I think the existing build system would benefit from a  
more modular approach because it would reify the coupling between, and  
layering of, existing features e.g. the upcoming stats module would  
consist of a capture core and an analysis/access plugin.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Human beings, who are almost unique in having the ability to learn  
from the experience of others, are also remarkable for their apparent  
disinclination to do so.
   -- Douglas Adams

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Zachary Zolton <za...@gmail.com>.

Do you think that a library of higher-order functions, for the
JavaScript view layer, could be provided help codify the emerging
"best practices"? I mean, we should be able to help newbies emit their
keys and reduce those related documents into one object graph...

On Fri, Feb 6, 2009 at 10:21 AM, Kerr Rainey <ke...@gmail.com> wrote:
>> Great question, I'd say no it runs entirely against the grain of what
>> CouchDB is. Documents aren't supposed to be related to one another. But
>> relational databases don't handle this kind of thing either so I figure why
>> not CouchDB as it offers other features that solve lots of problems.
>
> I think in most practical apps that Couch targets there are
> relationships between docs.  The canonical Blog example has comments
> that are children of a parent post.  I think the emphasis on reminding
> people that CouchDB is not relational tends to lead people to the
> conclusion that the documents won't be related in some way.  Of course
> CouchDB doesn't have any built in way to specify those relationships.
>
> I've been playing around with conventions for specifying doc
> relationships and how an app layer would make those relationships easy
> to handle.  It seems to be quite useful so far. It would be
> interesting to see if it was possible / sensible to push some of that
> further down into CouchDB.
>
> --
> Kerr
>

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Kerr Rainey <ke...@gmail.com>.

> Great question, I'd say no it runs entirely against the grain of what
> CouchDB is. Documents aren't supposed to be related to one another. But
> relational databases don't handle this kind of thing either so I figure why
> not CouchDB as it offers other features that solve lots of problems.

I think in most practical apps that Couch targets there are
relationships between docs.  The canonical Blog example has comments
that are children of a parent post.  I think the emphasis on reminding
people that CouchDB is not relational tends to lead people to the
conclusion that the documents won't be related in some way.  Of course
CouchDB doesn't have any built in way to specify those relationships.

I've been playing around with conventions for specifying doc
relationships and how an app layer would make those relationships easy
to handle.  It seems to be quite useful so far. It would be
interesting to see if it was possible / sensible to push some of that
further down into CouchDB.

--
Kerr

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Robert Dionne <di...@dionne-associates.com>.

Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961



On Feb 1, 2009, at 8:52 AM, Martin Scholl wrote:

> Hello Robert,
>
>
> Robert Dionne wrote:
>> Martin,
>>
>>   I'm very keen on relationships between documents. Coming from the
>> description logic community, I'd like to allow users to declare  
>> certain
>> fields that relate documents and then compute transitive closures  
>> over
>> dags whose nodes are documents and whose arcs the fields of interest.
>> This goes against the grain of  couchdb as collections of unrelated
>> documents, I know, but it's what I want to do as couchdb's schema- 
>> less
>> design offers many advantages over relational databases. Relational
>> databases aren't that great for storing graphs either.
> I like the idea (reminds me of an RDF DB btw), especially when used
> together with views.

it does, though I'm convinced the OWL/RDF community is laboring under  
a delusion that the semantic web can be enabled if we just get enough  
ontologies out there and can federate them. Even RDF is overkill for  
most applications.


>
>>
>>   I don't need to run full classification algorithms in the document
>> store, but would like to just maintain relationships (user- 
>> defined) and
>> transitive closures of them. Inferencing would perhaps be better done
>> externally similar to the hypercouch work. So this would best be  
>> served
>> by pluggable indexing and maybe pluggable storage, though I think I
>> could live without the latter for now.
> With Antony's latest hints (thank you Antony!) in mind, I think I will
> implement first sketches in an external way first. FTI is  
> implemented in
> the same way afair.
>
>>
>>   So I'm very excited about your ideas. I too have been reviewing the
>> code with this in mind and I would agree with others that it's  
>> perhaps a
>> post 1.0 task. From the little time I've spent chasing down a  
>> couple of
>> bugs I've seen there are a few subtle aspects to it. I've also  
>> noticed
>> that the style of design in this community is more bottoms up,  
>> which is
>> how it should be when building something new, so prototypes are  
>> perhaps
>> better for fleshing out ideas. Anyway I'm very happy to help an d
>> collaborate on this as I can.
> Great! I will just publish my results on github. I hope, others will
> join then.
>
> What worries me most, is that I am still unsure in how to differ  
> between
> design docs and indexing schemes, and when to use which  
> infrastructure.
> Applied to the doc-relationship example you gave: how should
> "intermediate reults" of the dag processing be treated? As documents?
> Should they be put into view functions? Should views be able to hint,
> which indexing scheme is to be used? Depending on the index type,
> indexing and doc / view-processing can become inherently coupled and
> complex. Is this still CouchDB then?

Great question, I'd say no it runs entirely against the grain of what  
CouchDB is. Documents aren't supposed to be related to one another.  
But relational databases don't handle this kind of thing either so I  
figure why not CouchDB as it offers other features that solve lots of  
problems. Here's a typical use case (quoted words are documents,  
those with asterisks are fields between documents)"

"heart disease"  is *located_in* the "heart"
"myopathy" is *located_in* the "myocardium"
"myocardium" is *part_of* the "heart"

A reasoner might allow one to compose two relations, .eg.  
*located_in* composed with *part_of* is equal to *located_in* and  
thus conclude that myopathy is a disease of the heart.

So these transitive closures of links between documents would need to  
be incrementally computed and treated the same as views. I think this  
would be best implemented with plugins in the same vm? This kinds f  
processing sems to require a tighter coupling than something like  
full text indexing.

regards,

Bob



>
>
> Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Antony Blakey <an...@gmail.com>.

On 02/02/2009, at 12:22 AM, Martin Scholl wrote:

> What worries me most, is that I am still unsure in how to differ  
> between
> design docs and indexing schemes, and when to use which  
> infrastructure.
> Applied to the doc-relationship example you gave: how should
> "intermediate reults" of the dag processing be treated? As documents?
> Should they be put into view functions? Should views be able to hint,
> which indexing scheme is to be used? Depending on the index type,
> indexing and doc / view-processing can become inherently coupled and
> complex. Is this still CouchDB then?

No sure if this is exactly what you're talking about, but the way I  
was thinking (which I picked up from GeoCouch) was to have design docs  
that define no couch view, that allow you to configure the external  
e.g. in my case, which was for returning the TC of  user/role/ 
permissions graphs, that might look like this:

{
   "_id": ...,
   "_rev": ...,

   transitive_closure_generator: {
     "User": { "from": "_id", "to": "[roles]" },
     "Role": { "from": "_id", "to": "[permissions]" },
   }
}

Then, in my external I process such docs in my update-loop to generate  
the configuration of the external i.e. I monitor changes to design  
docs with a 'transitive_closure_generator' element, obviously caching  
and persisting that configuration. You also need to catch design docs  
updates that remove that element, and the deletion of monitored docs  
as well.

Because I use CouchRest, I can use the 'couchrest-type' key that it  
inserts to identify which document's fields I need to process e.g.  
'User'/'Role'.

As an aside, it might be nice if you could use that doc to define the  
endpoint for your externals, so you could completely simulate a built- 
in view. Putting external configuration into design docs isn't good  
idea but maybe it could be indirect e.g.

{
   "_id": 'my_design',
   "_rev": ...,

   transitive_closure_generator: { ... }
   externals: {
     view_name: 'tc',
     external: <the id of the external mapping in the .ini>
   }
}

which would then allow you to access your external via '_view/ 
my_design/tc'. External views mapped like this would need to adhere to  
the view contract e.g. params etc.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Always have a vision. Why spend your life making other people’s dreams?
  -- Orson Welles (1915-1985)

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Martin Scholl <ms...@diskware.net>.

Hello Robert,


Robert Dionne wrote:
> Martin,
> 
>   I'm very keen on relationships between documents. Coming from the
> description logic community, I'd like to allow users to declare certain
> fields that relate documents and then compute transitive closures over
> dags whose nodes are documents and whose arcs the fields of interest.
> This goes against the grain of  couchdb as collections of unrelated
> documents, I know, but it's what I want to do as couchdb's schema-less
> design offers many advantages over relational databases. Relational
> databases aren't that great for storing graphs either.
I like the idea (reminds me of an RDF DB btw), especially when used
together with views.

> 
>   I don't need to run full classification algorithms in the document
> store, but would like to just maintain relationships (user-defined) and
> transitive closures of them. Inferencing would perhaps be better done
> externally similar to the hypercouch work. So this would best be served
> by pluggable indexing and maybe pluggable storage, though I think I
> could live without the latter for now.
With Antony's latest hints (thank you Antony!) in mind, I think I will
implement first sketches in an external way first. FTI is implemented in
the same way afair.

> 
>   So I'm very excited about your ideas. I too have been reviewing the
> code with this in mind and I would agree with others that it's perhaps a
> post 1.0 task. From the little time I've spent chasing down a couple of
> bugs I've seen there are a few subtle aspects to it. I've also noticed
> that the style of design in this community is more bottoms up, which is
> how it should be when building something new, so prototypes are perhaps
> better for fleshing out ideas. Anyway I'm very happy to help an d
> collaborate on this as I can.
Great! I will just publish my results on github. I hope, others will
join then.

What worries me most, is that I am still unsure in how to differ between
design docs and indexing schemes, and when to use which infrastructure.
Applied to the doc-relationship example you gave: how should
"intermediate reults" of the dag processing be treated? As documents?
Should they be put into view functions? Should views be able to hint,
which indexing scheme is to be used? Depending on the index type,
indexing and doc / view-processing can become inherently coupled and
complex. Is this still CouchDB then?


Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Robert Dionne <di...@dionne-associates.com>.

Martin,

   I'm very keen on relationships between documents. Coming from the  
description logic community, I'd like to allow users to declare  
certain fields that relate documents and then compute transitive  
closures over dags whose nodes are documents and whose arcs the  
fields of interest. This goes against the grain of  couchdb as  
collections of unrelated documents, I know, but it's what I want to  
do as couchdb's schema-less design offers many advantages over  
relational databases. Relational databases aren't that great for  
storing graphs either.

   I don't need to run full classification algorithms in the document  
store, but would like to just maintain relationships (user-defined)  
and transitive closures of them. Inferencing would perhaps be better  
done externally similar to the hypercouch work. So this would best be  
served by pluggable indexing and maybe pluggable storage, though I  
think I could live without the latter for now.

   So I'm very excited about your ideas. I too have been reviewing  
the code with this in mind and I would agree with others that it's  
perhaps a post 1.0 task. From the little time I've spent chasing down  
a couple of bugs I've seen there are a few subtle aspects to it. I've  
also noticed that the style of design in this community is more  
bottoms up, which is how it should be when building something new, so  
prototypes are perhaps better for fleshing out ideas. Anyway I'm very  
happy to help an d collaborate on this as I can.

Cheers,

Bob

Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961

On Feb 1, 2009, at 7:51 AM, Martin Scholl wrote:

> Chris Anderson wrote:
>> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis  
>> <pa...@gmail.com> wrote:
>>> Martin,
>>>
>>> Very cool ideas. We've been discussing erlang plugins. The
>>> conversation has generally gotten as far as, "erlang plugins... yeah
>>> we should have those."
>>
>> I agree this is cool, but I think it would be healthier for the
>> project to wait until we release a rock-solid 1.0.
>>
>> There are some incredibly non-obvious things happening inside, and a
>> big disruption right now wouldn't necessarily keep them all in
>> balance. Once we've met 1.0, we'll have a solid basis for comparison,
>> of any alternate implementations.
>>
>> Then, let the fun begin. :)
>>
>> Martin, I'd very much like to hear more about the sorts of indexers
>> you'd build. Sounds exciting.
> I'd like to experiment with Merkle trees, because these could turn out
> to be a good foundation for several use-cases:
> - index/tree-synchronization: replication is trivial with merkle  
> trees,
> only changed parts of the tree get replicated in a secure manner.
> - secure document storage: modified documents (disc corruption, sw
> failure or even the "bad cracker"-case)
> - by using GPG/PGP-signatures probably even cryptographical secure
> design doc code signing, e.g. "safe applications"
>
> Furthermore, there are a lot of other clever map data structures
> available (not in the sense of a->b , but a<->b) which could become
> quite handy to store document relationships. I'm sure, the database  
> ppl
> out here have many more ideas about what could be added to CouchDB.
>
>
> Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Martin Scholl <ms...@diskware.net>.

Martin Scholl wrote:
> Chris Anderson wrote:
>> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
>>> Martin,
>>>
>>> Very cool ideas. We've been discussing erlang plugins. The
>>> conversation has generally gotten as far as, "erlang plugins... yeah
>>> we should have those."
>> I agree this is cool, but I think it would be healthier for the
>> project to wait until we release a rock-solid 1.0.
>>
>> There are some incredibly non-obvious things happening inside, and a
>> big disruption right now wouldn't necessarily keep them all in
>> balance. Once we've met 1.0, we'll have a solid basis for comparison,
>> of any alternate implementations.
>>
>> Then, let the fun begin. :)
>>
>> Martin, I'd very much like to hear more about the sorts of indexers
>> you'd build. Sounds exciting.
> I'd like to experiment with Merkle trees, because these could turn out
> to be a good foundation for several use-cases:
> - index/tree-synchronization: replication is trivial with merkle trees,
> only changed parts of the tree get replicated in a secure manner.
> - secure document storage: modified documents (disc corruption, sw
> failure or even the "bad cracker"-case)
The important information is missing here:  `-> "external modifications
to documents get detected". Why can this be important? To use CouchDB
for compliance or archiving tasks.



Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Martin Scholl <ms...@diskware.net>.

Chris Anderson wrote:
> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
>> Martin,
>>
>> Very cool ideas. We've been discussing erlang plugins. The
>> conversation has generally gotten as far as, "erlang plugins... yeah
>> we should have those."
> 
> I agree this is cool, but I think it would be healthier for the
> project to wait until we release a rock-solid 1.0.
> 
> There are some incredibly non-obvious things happening inside, and a
> big disruption right now wouldn't necessarily keep them all in
> balance. Once we've met 1.0, we'll have a solid basis for comparison,
> of any alternate implementations.
> 
> Then, let the fun begin. :)
> 
> Martin, I'd very much like to hear more about the sorts of indexers
> you'd build. Sounds exciting.
I'd like to experiment with Merkle trees, because these could turn out
to be a good foundation for several use-cases:
- index/tree-synchronization: replication is trivial with merkle trees,
only changed parts of the tree get replicated in a secure manner.
- secure document storage: modified documents (disc corruption, sw
failure or even the "bad cracker"-case)
- by using GPG/PGP-signatures probably even cryptographical secure
design doc code signing, e.g. "safe applications"

Furthermore, there are a lot of other clever map data structures
available (not in the sense of a->b , but a<->b) which could become
quite handy to store document relationships. I'm sure, the database ppl
out here have many more ideas about what could be added to CouchDB.


Martin

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Chris Anderson <jc...@apache.org>.

On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
> Martin,
>
> Very cool ideas. We've been discussing erlang plugins. The
> conversation has generally gotten as far as, "erlang plugins... yeah
> we should have those."

I agree this is cool, but I think it would be healthier for the
project to wait until we release a rock-solid 1.0.

There are some incredibly non-obvious things happening inside, and a
big disruption right now wouldn't necessarily keep them all in
balance. Once we've met 1.0, we'll have a solid basis for comparison,
of any alternate implementations.

Then, let the fun begin. :)

Martin, I'd very much like to hear more about the sorts of indexers
you'd build. Sounds exciting.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: couch_gen_btree: pluggable storage / tree engines

Posted by Paul Davis <pa...@gmail.com>.

Martin,

Very cool ideas. We've been discussing erlang plugins. The
conversation has generally gotten as far as, "erlang plugins... yeah
we should have those."

As you've noticed, the couch sources depend fairly non-transparently
on the couch_btree implementation. Here's my take on the situation:

1. Erlang plugins are a Good Idea &trade;
2. We should probably focus on at least two types of plugins: storage
and indexing.
3. MySQL is repulsive.

1. I'm going to assume consensus.
2. I'm pretty sure that the two types of plugins are quite different
and will even require different semantics in the _design docs all the
way down to calls to emit.
3. Well, I'm going to assume consensus here too.

Anyway, I think the general steps forward are probably to figure out
what exactly we want to abstract, how to abstract it, and then start
abstracting. Abstractions FTW!

In terms of storage, I'm not entirely certain on the specifics.
There's quite of the API in the raw database layer that i haven't
dealt with yet. It'd be super fun to have specific file systems for
things like multiple servers on a single rack vs the distributed
CouchDB for distant servers etc.

In terms of view indexing, my mind is already racing on the
possibilities of where to push things around to make indexes
selectable in the _design docs etc. This could be really be a fun
project.

Keep up with the reading and thinking. This is something I look
forward to seeing implemented.

HTH,
Paul Davis

On Sat, Jan 31, 2009 at 9:49 AM, Martin Scholl <ms...@diskware.net> wrote:
> Hello all,
>
>
> although I am still doing a CouchDB review to better understand its
> design, I like to ask for comments for a tiny idea.
> I would like to add another index structure to CouchDB (a Merkle-Tree)
> and come up with asking myself what the best way of doing this would be.
> I have a rough guess of how closely couch_btree is knit into CouchDB.
> Therefore I would like to hear from you experienced developers comments
> on some of my ideas:
>
> My suggestion is a MySQL-ly approach (pluggable engines) for CouchDB,
> that is to factor out several components into generic behaviours:
> e.g.
> - a couch_gen_tree:
>        abstracts access to couch_btree
>
> Maybe even a
> - couch_gen_storage
>        e.g. file system, file storage access, etc.
>
> - couch_gen_replicator
>        an imperative approach to tree / storage replication.
>
> As I said: I am new to CouchDB's code so I cannot really estimate how
> the current layering approach looks like, and whether we can even split
> out the 3 components.
>
> Imho there would be several benefits in having this flexibility brough
> by couch_gen_*:
> - new use-cases for CouchDB:
>        - R-trees: for adding another way of querrying
>          documents (e.g. nearest neighbour search)
>        - genome databases
>        - a special datastructure for indexing tags
>        - ...
> - with a flexible storage layer, CouchDB could ran on top of other
> infrastructures and products: like S3, SimpleDB, AppEngine, etc.
> - (the following is just a guess:) a cleaner CouchDB codebase with a
> clear layering and separation of components
> - possibly (again, just a guess): with the plugin approach, we can more
> easily support advanced indexing and db management schemes, like
> distributed storage access, distributed transactions, etc.
>
>
> Martin
>