You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Martin Scholl <ms...@diskware.net> on 2009/01/31 15:49:16 UTC
couch_gen_btree: pluggable storage / tree engines
Hello all,
although I am still doing a CouchDB review to better understand its
design, I like to ask for comments for a tiny idea.
I would like to add another index structure to CouchDB (a Merkle-Tree)
and come up with asking myself what the best way of doing this would be.
I have a rough guess of how closely couch_btree is knit into CouchDB.
Therefore I would like to hear from you experienced developers comments
on some of my ideas:
My suggestion is a MySQL-ly approach (pluggable engines) for CouchDB,
that is to factor out several components into generic behaviours:
e.g.
- a couch_gen_tree:
abstracts access to couch_btree
Maybe even a
- couch_gen_storage
e.g. file system, file storage access, etc.
- couch_gen_replicator
an imperative approach to tree / storage replication.
As I said: I am new to CouchDB's code so I cannot really estimate how
the current layering approach looks like, and whether we can even split
out the 3 components.
Imho there would be several benefits in having this flexibility brough
by couch_gen_*:
- new use-cases for CouchDB:
- R-trees: for adding another way of querrying
documents (e.g. nearest neighbour search)
- genome databases
- a special datastructure for indexing tags
- ...
- with a flexible storage layer, CouchDB could ran on top of other
infrastructures and products: like S3, SimpleDB, AppEngine, etc.
- (the following is just a guess:) a cleaner CouchDB codebase with a
clear layering and separation of components
- possibly (again, just a guess): with the plugin approach, we can more
easily support advanced indexing and db management schemes, like
distributed storage access, distributed transactions, etc.
Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Jan Lehnardt <ja...@apache.org>.
On 1 Feb 2009, at 12:18, Antony Blakey wrote:
> On 01/02/2009, at 8:44 PM, Jan Lehnardt wrote:
>
>> To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master
>
> Yeah, sorry.
>
> I think I screwed my github when the upstream switched layouts (IIRC
> it was a full svn copy then moved to trunk). My git-fu is git-
> fucked. I've recently locally cloned from halorgium in preparation
> for redoing the filename mods. At least the external filename v1
> stuff is there.
I tried reviewing the patch but the fucked-upness of the
escaped_filenames branch doesn't let me clone or view it. Is it
possible for you to put the patch up somewhere?
Cheers
Jan
--
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Antony Blakey <an...@gmail.com>.
Thanks Jan,
On 01/02/2009, at 10:00 PM, Jan Lehnardt wrote:
> To save the others from searching: http://mail-archives.apache.org/mod_mbox/couchdb-user/200812.mbox/%3c27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com%3e
> :-)
Note that if you want to do this, please read the thread because a)
that code isn't strictly correct and b) there are some other issues to
be aware of (and I'm guilty of some misinformation partway through
that thread). Furthermore, it's only a proof-of-concept test example.
In summary, alternate forms of indexing in an _external is easy,
although it is single-threaded, which it turns out to be lucky because
it guarantees that the _external sees a monotonic increasing
update_seq without any tricky coordination issues when updating your
index.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
What can be done with fewer [assumptions] is done in vain with more
-- William of Ockham (ca. 1285-1349)
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Jan Lehnardt <ja...@apache.org>.
Thanks again.
On 1 Feb 2009, at 12:18, Antony Blakey wrote:
>
> I think I screwed my github when the upstream switched layouts (IIRC
> it was a full svn copy then moved to trunk). My git-fu is git-
> fucked. I've recently locally cloned from halorgium in preparation
> for redoing the filename mods. At least the external filename v1
> stuff is there.
>
> I think the patch for external erlang modules has been lost. In any
> case, it's trivial, involving nothing more than changes to bin/
> couchdb.tpl.in to allow a plugins directory, and add each plugin to
> the erlang load path and read it's .ini file. I custom built my
> plugin, so I didn't need to mod the build system.
>
> The ruby source and subsequent discussion for the external indexer
> was posted to couchdb-user on 21 December 2008 in a thread called
> 'couchdb' (!) started by Tim Parkin.
To save the others from searching: http://mail-archives.apache.org/mod_mbox/couchdb-user/200812.mbox/%3c27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com%3e
:-)
Cheers
Jan
--
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Antony Blakey <an...@gmail.com>.
On 01/02/2009, at 8:44 PM, Jan Lehnardt wrote:
> To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master
Yeah, sorry.
I think I screwed my github when the upstream switched layouts (IIRC
it was a full svn copy then moved to trunk). My git-fu is git-fucked.
I've recently locally cloned from halorgium in preparation for redoing
the filename mods. At least the external filename v1 stuff is there.
I think the patch for external erlang modules has been lost. In any
case, it's trivial, involving nothing more than changes to bin/
couchdb.tpl.in to allow a plugins directory, and add each plugin to
the erlang load path and read it's .ini file. I custom built my
plugin, so I didn't need to mod the build system.
The ruby source and subsequent discussion for the external indexer was
posted to couchdb-user on 21 December 2008 in a thread called
'couchdb' (!) started by Tim Parkin.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
In anything at all, perfection is finally attained not when there is
no longer anything to add, but when there is no longer anything to
take away.
-- Antoine de Saint-Exupery
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 8:56 PM, Noah Slater wrote:
> I've found this thread a little hard to follow so far, but why not
> configure the
> plugins from /etc/couchdb/local.ini itself, including the
> configuration that
> tells CouchDB which plugins to work. I'm not sure I'm comfortable
> with the idea
> of CouchDB scanning a list of directories each time, loading things
> automatically.
I did it this way to avoid touching the CouchDB code, and not wanting
to impose another file format or do parsing in shell script.
Especially given that I have an imminent requirement for CouchDB/Win32
where the shell is complete shit. OTOH I may have to replace the start
script with C for that reason.
It might be better to have an erlang plugin manager that can deal with
plugin selection, dependencies and ordering requirements, but that's a
more intrusive solution.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The greatest challenge to any thinker is stating the problem in a way
that will allow a solution
-- Bertrand Russell
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable
storage / tree engines)
Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 02, 2009 at 06:07:37PM +1030, Antony Blakey wrote:
> My concern is how to allow plugins to be updated without stomping on
> user configuration, which is why I suggest having the local plugin
> configuration be outside of the plugin's directory. Maybe that's a
> deployment required that's unlike to happen, but I look forward to a
> time when couch plugins are like rubygems and regular updating is
> common.
I've found this thread a little hard to follow so far, but why not configure the
plugins from /etc/couchdb/local.ini itself, including the configuration that
tells CouchDB which plugins to work. I'm not sure I'm comfortable with the idea
of CouchDB scanning a list of directories each time, loading things automatically.
--
Noah Slater, http://tumbolia.org/nslater
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 5:36 PM, Chris Anderson wrote:
> Maybe each plugin could have a default.ini and a local.ini, so
> something like:
>
> plugins/
> geo_index/
> Makefile.am.in
> src/
> geo_index.erl
> geo_httpd.erl
> geo_manager.erl
> default.ini
> local.ini
> ...
>
>
> Then CouchDB could automatically pickup plugin config from each
> directory, running all the defaults before all the locals.
I think we may be talking at cross-purposes here - I'm thinking
particularly of plugins that are separately compiled without being
included in the canonical sources, while I gather you're focussed on a
refactored core or a way to handle less tightly coupled contributions.
I'm interested in seeing those things happen, but I'll be guided by
you as to what I should for that. In particular I know sfa about
automake et al.
OTOH, I think top level default.ini + local.ini + local.*.ini would be
a good idea regardless, and a simple change.
> I'm not invested in these specifics, just trying to think of ways it
> could look that would give new developers the least amount of
> head-scratching.
I think head-scratching would be best mitigated by some good docs. If
we want to implement this then I'll commit to writing a spec/tutorial
on the wiki.
My concern is how to allow plugins to be updated without stomping on
user configuration, which is why I suggest having the local plugin
configuration be outside of the plugin's directory. Maybe that's a
deployment required that's unlike to happen, but I look forward to a
time when couch plugins are like rubygems and regular updating is
common.
Somewhat OT: plugins are also a mechanism for deploying generic non-
couch functionality for a deployment scenario like mine where every
user runs a Couch instance ala Notes client e.g. distributed voting
algorithms and scalaris-like functionality for emergent clusters
within wider p2p meshes.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Plurality is not to be assumed without necessity
-- William of Ockham (ca. 1285-1349)
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage
/ tree engines)
Posted by Chris Anderson <jc...@apache.org>.
On Sun, Feb 1, 2009 at 10:39 PM, Antony Blakey <an...@gmail.com> wrote:
>
> I imagined that the ini file in the plugin directory wouldn't be modified by
> users i.e. that's "part of the plugin". Users would modify the local.ini
> file to change the user-configurable parts of that. That makes me think that
> the startup script should load local.*.ini, so you can separate your
> per-plugin user-configurable bits. The reason for not modifying the
> plugin.ini directly is so that you can update them easily.
>
There's a bunch of ways we could do this. The simplest overall system
I can think of would be a `make plugins` target that builds whatever
is in the plugins directory.
Maybe each plugin could have a default.ini and a local.ini, so something like:
plugins/
geo_index/
Makefile.am.in
src/
geo_index.erl
geo_httpd.erl
geo_manager.erl
default.ini
local.ini
...
Then CouchDB could automatically pickup plugin config from each
directory, running all the defaults before all the locals.
I'm not invested in these specifics, just trying to think of ways it
could look that would give new developers the least amount of
head-scratching.
--
Chris Anderson
http://jchris.mfdz.com
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 4:46 PM, Chris Anderson wrote:
> On Sun, Feb 1, 2009 at 9:40 PM, Antony Blakey
> <an...@gmail.com> wrote:
>> There are command line options for manipulating the list of
>> plugins. By
>> default it will load plugins from /etc/couchdb/plugins/ or whatever
>> your
>> localconfdir is - mine is always relative - but this can be
>> controlled by
>> the options:
>
>> -C clean the plugins list
>> -D add all the plugins in $DEFAULT_PLUGIN_DIR
>> -G DIR add all the plugins in DIR
>> -P DIR add the plugin DIR
>> Bit light on testing, I wanted to show how trivial it is.
>
> Thanks for getting started on this. It looks like with a make plugins
> target, this will be ready to roll.
I think there are multiple levels of modularity possible - some core
features could be plugins, purely in a separation-of-concerns kind of
way. Then there would be third-party plugins. This could be a
convenient way to play with some new contributions without having to
merge into the a single code directory (assuming each plugin is
separate in the source). And finally there would be per-user plugins
that customize couch on a per-app basis.
I assume you mean a make plugins target for core plugins?
> One thing I don't understand is the plugins list and the additional
> command-line options to launch script. I wonder if there's a
> non-ephemeral way to control the list of loaded plugins.
Well, there is the default location. If a plugin were to have an
extension e.g. '.plugin' then the activation could be controlled via
renaming. OTOH I like the Apache (httpd) mechanism of having
'available/active' dirs, which seems cleaner.
I use the command line options to the startup script because I have
couchdb/erlang/icu etc packaged as ruby gems that deploys couch using
dependcies, and the class that controls couchdb allows multiple
instances in self-contained locations. It looks a bit like this:
---------------------------------------
require 'memetic'
require 'memetic-erlang'
require 'memetic-spidermonkey'
require 'memetic-icu'
module Memetic
class CouchDB
DIRECTORY = %x{cd #{File.dirname(File.dirname(__FILE__)) + "/
BUILD"} ; pwd}.split()[0]
def CouchDB.configure_environment(env)
Erlang.configure_environment(env)
Spidermonkey.configure_environment(env)
ICU.configure_environment(env)
end
def initialize(local_directory)
@local_directory = local_directory
Dir.mkdir(@local_directory) unless File.exists?(@local_directory)
@log_directory = "#{@local_directory}/log"
Dir.mkdir(@log_directory) unless File.exists?(@log_directory)
@data_directory = "#{@local_directory}/data"
Dir.mkdir(@data_directory) unless File.exists?(@data_directory)
File.open("#{@local_directory}/configuration.ini", "w") do |f|
f << configuration()
end
end
def start
env = Environment.new
CouchDB.configure_environment(env)
cmd = env.as_command_prefix
cmd << "( cd '#{DIRECTORY}' ; "
cmd << "./bin/couchdb -b "
cmd << "-o '#{@log_directory}/couchdb.stdout' "
cmd << "-e '#{@log_directory}/couchdb.stderr' "
cmd << "-p '#{@local_directory}/pid' "
cmd << "-c './etc/couchdb/default.ini' "
cmd << "-c '#{@local_directory}/configuration.ini' "
#cmd << "-G '#{@local_directory}/plugins' "
cmd << " )"
IO.popen(cmd) do |p|
s = p.gets
s.chomp if s
end
end
def status
env = Environment.new
CouchDB.configure_environment(env)
IO.popen("#{env.as_command_prefix} #{DIRECTORY}/bin/couchdb -p
#{@local_directory}/pid -s") do |p|
s = p.gets
s.chomp if s
end
end
def stop
env = Environment.new
CouchDB.configure_environment(env)
IO.popen("#{env.as_command_prefix} #{DIRECTORY}/bin/couchdb -p
#{@local_directory}/pid -d") do |p|
s = p.gets
s.chomp if s
end
end
def configuration
return <<END_OF_STRING
[couchdb]
database_dir = #{@data_directory}
[log]
file = #{@log_directory}/couch.log
level = debug
END_OF_STRING
end
end
end
---------------------------------------
> As far as configuration, users will likely also want to configure
> their API endpoints as http_db_handlers or http_global_handlers. These
> allow you to hook an Erlang function directly to a CouchDB path.
Sure.
I imagined that the ini file in the plugin directory wouldn't be
modified by users i.e. that's "part of the plugin". Users would modify
the local.ini file to change the user-configurable parts of that. That
makes me think that the startup script should load local.*.ini, so you
can separate your per-plugin user-configurable bits. The reason for
not modifying the plugin.ini directly is so that you can update them
easily.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A reasonable man adapts himself to suit his environment. An
unreasonable man persists in attempting to adapt his environment to
suit himself. Therefore, all progress depends on the unreasonable man.
-- George Bernard Shaw
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage
/ tree engines)
Posted by Chris Anderson <jc...@apache.org>.
On Sun, Feb 1, 2009 at 9:40 PM, Antony Blakey <an...@gmail.com> wrote:
> There are command line options for manipulating the list of plugins. By
> default it will load plugins from /etc/couchdb/plugins/ or whatever your
> localconfdir is - mine is always relative - but this can be controlled by
> the options:
> -C clean the plugins list
> -D add all the plugins in $DEFAULT_PLUGIN_DIR
> -G DIR add all the plugins in DIR
> -P DIR add the plugin DIR
> Bit light on testing, I wanted to show how trivial it is.
Thanks for getting started on this. It looks like with a make plugins
target, this will be ready to roll.
One thing I don't understand is the plugins list and the additional
command-line options to launch script. I wonder if there's a
non-ephemeral way to control the list of loaded plugins.
At this point `utils/run` is still a copy-paste job from `couchdb`.
Now might be a good opportunity to make that more sensible.
As far as configuration, users will likely also want to configure
their API endpoints as http_db_handlers or http_global_handlers. These
allow you to hook an Erlang function directly to a CouchDB path.
--
Chris Anderson
http://jchris.mfdz.com
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage
/ tree engines)
Posted by Ben Browning <be...@gmail.com>.
To spark some additional conversation around plugins, is there a clear
idea how we would handle plugin X that also uses plugin Y? For
example, I could see the Erlang API, the partitioning/clustering
functionality, and individual pieces of the clustering as additional
plugins (consistent hashing algorithm being the immediately obvious
one). The partitioning would depend on the Erlang API and a consistent
hashing plugin being configured.
For now I'm just throwing additional modules in the src/couchdb
directory in my git branch but there are enough modules in there it
makes me hesitate to add more. We could separate these into multiple
folders by using Erlang packages, but a plugin system could accomplish
the same goal and provide additional benefit for 3rd-party plugins.
It's not a pressing priority, but getting a plugin system in-place
would better allow for 3rd-party development around CouchDB and make
it easier to release and version functionality separately from the
core app.
Ben
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage
/ tree engines)
Posted by Chris Anderson <jc...@apache.org>.
On Mon, Feb 2, 2009 at 4:39 AM, Antony Blakey <an...@gmail.com> wrote:
>
> On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:
>
>>
>> On 2 Feb 2009, at 12:45, Antony Blakey wrote:
>>
>>>
>>> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>>>
>>>> That's fine with me. (Says the man who's not going to develop it.)
>>>
>>> Unfortunately requiring a bigger commitment to this feature than I can
>>> manage at this time.
>>
>> I hope you can push for an agreed-upon design and give a rough todo list
>> and maybe start with it and leave open work items with a good description
>> in JIRA for others to pick up. Would that work?
>
> It's just a matter of doing in erlang what I did in the script, finding out
> how to extend the load path. Hopefully my implementation shows how
> conceptually trivial it is. I agree with Noah about the configuration issue.
> Probably the only trick is that you need to load the config to find the
> plugins and then load more config from each plugin.
>
> I need to focus on the transactional _bulk_docs issue, and then the filename
> patch - that's a 0.9 blocker.
>
Yes, I think having this thread open which lays out the possibilities
is good work as long as we can avoid losing them to history by the
time we are ready to implement the plugin thing.
I think dropping a patch in Jira with a someday/maybe priority is th
best way to let us pick back up when the time comes.
--
Chris Anderson
http://jchris.mfdz.com
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Jan Lehnardt <ja...@apache.org>.
On 2 Feb 2009, at 13:39, Antony Blakey wrote:
>
> On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:
>
>> I hope you can push for an agreed-upon design and give a rough todo
>> list
>> and maybe start with it and leave open work items with a good
>> description
>> in JIRA for others to pick up. Would that work?
>
> It's just a matter of doing in erlang what I did in the script,
> finding out how to extend the load path. Hopefully my implementation
> shows how conceptually trivial it is. I agree with Noah about the
> configuration issue. Probably the only trick is that you need to
> load the config to find the plugins and then load more config from
> each plugin.
>
> I need to focus on the transactional _bulk_docs issue, and then the
> filename patch - that's a 0.9 blocker.
Great, thanks. Focus is good.
Cheers
Jan
--
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 10:34 PM, Jan Lehnardt wrote:
>
> On 2 Feb 2009, at 12:45, Antony Blakey wrote:
>
>>
>> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>>
>>> That's fine with me. (Says the man who's not going to develop it.)
>>
>> Unfortunately requiring a bigger commitment to this feature than I
>> can manage at this time.
>
> I hope you can push for an agreed-upon design and give a rough todo
> list
> and maybe start with it and leave open work items with a good
> description
> in JIRA for others to pick up. Would that work?
It's just a matter of doing in erlang what I did in the script,
finding out how to extend the load path. Hopefully my implementation
shows how conceptually trivial it is. I agree with Noah about the
configuration issue. Probably the only trick is that you need to load
the config to find the plugins and then load more config from each
plugin.
I need to focus on the transactional _bulk_docs issue, and then the
filename patch - that's a 0.9 blocker.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The intuitive mind is a sacred gift and the rational mind is a
faithful servant. We have created a society that honours the servant
and has forgotten the gift.
-- Albert Einstein
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Jan Lehnardt <ja...@apache.org>.
On 2 Feb 2009, at 12:45, Antony Blakey wrote:
>
> On 02/02/2009, at 10:09 PM, Noah Slater wrote:
>
>> That's fine with me. (Says the man who's not going to develop it.)
>
> Unfortunately requiring a bigger commitment to this feature than I
> can manage at this time.
I hope you can push for an agreed-upon design and give a rough todo list
and maybe start with it and leave open work items with a good
description
in JIRA for others to pick up. Would that work?
Cheers
Jan
--
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 10:09 PM, Noah Slater wrote:
> That's fine with me. (Says the man who's not going to develop it.)
Unfortunately requiring a bigger commitment to this feature than I can
manage at this time.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A Buddhist walks up to a hot-dog stand and says, "Make me one with
everything". He then pays the vendor and asks for change. The vendor
says, "Change comes from within".
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable
storage / tree engines)
Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 02, 2009 at 10:06:48PM +1030, Antony Blakey wrote:
>
> On 02/02/2009, at 9:48 PM, Noah Slater wrote:
>
>> I think I agree with a comment earlier in this thread. I don't think
>> command
>> options are the place for this type of configuration. Ideally, you'd
>> want all
>> the configuration in the ini file.
>
> That means that the plugin system needs to be written in CouchDB itself,
> rather than being a lightweight launch-time facility.
That's fine with me. (Says the man who's not going to develop it.)
--
Noah Slater, http://tumbolia.org/nslater
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 9:48 PM, Noah Slater wrote:
> I think I agree with a comment earlier in this thread. I don't think
> command
> options are the place for this type of configuration. Ideally, you'd
> want all
> the configuration in the ini file.
That means that the plugin system needs to be written in CouchDB
itself, rather than being a lightweight launch-time facility.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The ultimate measure of a man is not where he stands in moments of
comfort and convenience, but where he stands at times of challenge and
controversy.
-- Martin Luther King
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable
storage / tree engines)
Posted by Noah Slater <ns...@apache.org>.
On Mon, Feb 02, 2009 at 09:31:36PM +1030, Antony Blakey wrote:
>>> -D add all the plugins in $DEFAULT_PLUGIN_DIR
>>> -G DIR add all the plugins in DIR
>>> -P DIR add the plugin DIR
>>
>> Why have you chosen all capitals?
>
> Because all the obvious lower case options have been used. I wanted to
> use long options, but the shell getopts doesn't allow it.
I think I agree with a comment earlier in this thread. I don't think command
options are the place for this type of configuration. Ideally, you'd want all
the configuration in the ini file. If you really need to override some ini file
setting, maybe using an environment variable would be the way forward.
Bottom line: I want to keep the number of command line options small.
--
Noah Slater, http://tumbolia.org/nslater
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 8:51 PM, Noah Slater wrote:
> This sounds pretty good.
>
> On Mon, Feb 02, 2009 at 04:10:56PM +1030, Antony Blakey wrote:
>> -C clean the plugins list
>
> What does this do? I think it needs more explanation.
clear the plugin list.
The plugin list is by default all the plugins in $DEFAULT_PLUGIN_DIR. -
C clears the list, either to run without plugins, or as a precursor to
using -D -G -P in some order to define a specific plugin ordering and/
or load additional plugins from non-default locations.
>> -D add all the plugins in $DEFAULT_PLUGIN_DIR
>> -G DIR add all the plugins in DIR
>> -P DIR add the plugin DIR
>
> Why have you chosen all capitals?
Because all the obvious lower case options have been used. I wanted to
use long options, but the shell getopts doesn't allow it.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies.
-- C. A. R. Hoare
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable
storage / tree engines)
Posted by Noah Slater <ns...@apache.org>.
This sounds pretty good.
On Mon, Feb 02, 2009 at 04:10:56PM +1030, Antony Blakey wrote:
> -C clean the plugins list
What does this do? I think it needs more explanation.
> -D add all the plugins in $DEFAULT_PLUGIN_DIR
> -G DIR add all the plugins in DIR
> -P DIR add the plugin DIR
Why have you chosen all capitals?
--
Noah Slater, http://tumbolia.org/nslater
Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
Posted by Antony Blakey <an...@gmail.com>.
I've recreated my simple erlang plugins system
http://github.com/AntonyBlakey/couchdb/tree/simple-erlang-plugins
A plugin is a directory. It can contain a plugin.ini file and an
erlang directory.
The erlang directory is added to the erlang load path - you put your
beam files in there.
The plugin.ini file is loaded after the default.ini, but before
local.ini. If you supply your own ini files on the command line then
it comes before them.
To start your plugin you probably want to add to the [daemons] section
of your plugin.ini file. Check start_secondary_services in
couch_server_sup.erl to see what happens with daemons, and you can see
examples in default.ini (.tpl.in in the source).
If plugins need to be integrated in a different fashion, or
dependencies are needed, then I'd suggest that a different config
section be created to handle richer specification/invocation option
format than [daemons].
There are command line options for manipulating the list of plugins.
By default it will load plugins from /etc/couchdb/plugins/ or whatever
your localconfdir is - mine is always relative - but this can be
controlled by the options:
-C clean the plugins list
-D add all the plugins in $DEFAULT_PLUGIN_DIR
-G DIR add all the plugins in DIR
-P DIR add the plugin DIR
Bit light on testing, I wanted to show how trivial it is.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies.
-- C. A. R. Hoare
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Jan Lehnardt <ja...@apache.org>.
Thanks Antony!
On 1 Feb 2009, at 07:37, Antony Blakey wrote:
> On 01/02/2009, at 4:33 PM, Chris Anderson wrote:
>
>> You keep talking about these modifications.
>
> I supplied a github branch with the patch for plugins, and I
> provided the ruby source for an external indexer. Hopefully "put up
> or shut up" is satisfied :)
To save others from searching: http://github.com/AntonyBlakey/couchdb/tree/master
Cheers
Jan
--
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Antony Blakey <an...@gmail.com>.
On 01/02/2009, at 4:33 PM, Chris Anderson wrote:
> You keep talking about these modifications.
I supplied a github branch with the patch for plugins, and I provided
the ruby source for an external indexer. Hopefully "put up or shut up"
is satisfied :)
> We're anxious to see your
> db-name patch
I supplied a github branch with that patch. The view server was
changed after I did it, which requires re-working the patch, and there
were some other requests which I agreed to (slugs & slashes). The
transaction issue is more important to me, so I'm waiting to see if I
need to support a private branch for that before I revisit the unicode-
file-name code.
> , and we've already added seq id to external.
When I posted the ruby indexer I noted that the full db-info is now
provided per request - subsequent discussion prompted the db UUID
thread idea. I haven't checked the code to determine the semantics of
the purge_seq field - it may be enough to *catch* purges, although I
think a purge hook would be a very useful optimization to ensure that
externals don't require a full sweep.
> DB uuids
> are an interesting topic as well, but more interesting is code.
I was providing a perspective for Martin Scholl, my point being that
what he wants is doable now, without having to make the canonical
store pluggable (which IMO is neither feasible nor desirable).
Even without DB uuids, external indexing is still a very useful model.
> It's pretty flexible already, maybe just some path globs for `make
> plugins` is all it would take to get a plugins/helpers convention into
> the build. We could add starting of apps (crypto, ibrowse, etc) to
> default.ini. Then you'd probably have everything you need to add new
> modules to the couch system.
Yes, as I say, I supplied a gihub branch for using such modules. Even
with no other changes, it's useful right now, and laughably trivial.
As far as core modularity is concerned, I don't know if it's feasible
to patch that from outside the committers group because of the
coordination required. I may be wrong.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
He who would make his own liberty secure, must guard even his enemy
from repression.
-- Thomas Paine
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Chris Anderson <jc...@apache.org>.
On Sat, Jan 31, 2009 at 9:02 PM, Antony Blakey <an...@gmail.com> wrote:
>
> On 01/02/2009, at 2:26 PM, Paul Davis wrote:
>
>> Very cool ideas. We've been discussing erlang plugins. The
>> conversation has generally gotten as far as, "erlang plugins... yeah
>> we should have those."
>
> The mechanical side of building and deploying standalone erlang plugins that
> extend couch, including extending the configuration locations, is very
> simple, roughly equivalent to the classpath/load_path issues in e.g.
> java/ruby, modulo the issue of config-file writeback from within CouchDB.
> I've done and deployed this, although the appropriate 'hooks' aren't always
> available, which I guess is the essence of the OP plugin system.
>
> I've built external indexers using both this mechanism, and _external, as
> have others (GeoCouch, FTI). At one stage I was able to execute SQL queries
> against Couch data that was replicated into SQLite databases, but I decided
> not to maintain it. As long as your indexer architecture acknowledges the
> fundamental Couch model, and is prepared to do a full regeneration at any
> time, you can replicate the operational semantics of inbuilt views quite
> easily and with high performance (e.g. no per-request hit to Couch). There
> are a few edge cases, namely purging and db re-creation that need a small
> mod to the core to allow 100% correct plugins.
You keep talking about these modifications. We're anxious to see your
db-name patch, and we've already added seq id to external. DB uuids
are an interesting topic as well, but more interesting is code.
>
> My $0.02 ... I think the existing build system would benefit from a more
> modular approach.
It's pretty flexible already, maybe just some path globs for `make
plugins` is all it would take to get a plugins/helpers convention into
the build. We could add starting of apps (crypto, ibrowse, etc) to
default.ini. Then you'd probably have everything you need to add new
modules to the couch system.
Patches welcome. :)
--
Chris Anderson
http://jchris.mfdz.com
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Antony Blakey <an...@gmail.com>.
On 01/02/2009, at 2:26 PM, Paul Davis wrote:
> Very cool ideas. We've been discussing erlang plugins. The
> conversation has generally gotten as far as, "erlang plugins... yeah
> we should have those."
The mechanical side of building and deploying standalone erlang
plugins that extend couch, including extending the configuration
locations, is very simple, roughly equivalent to the classpath/
load_path issues in e.g. java/ruby, modulo the issue of config-file
writeback from within CouchDB. I've done and deployed this, although
the appropriate 'hooks' aren't always available, which I guess is the
essence of the OP plugin system.
I've built external indexers using both this mechanism, and _external,
as have others (GeoCouch, FTI). At one stage I was able to execute SQL
queries against Couch data that was replicated into SQLite databases,
but I decided not to maintain it. As long as your indexer architecture
acknowledges the fundamental Couch model, and is prepared to do a full
regeneration at any time, you can replicate the operational semantics
of inbuilt views quite easily and with high performance (e.g. no per-
request hit to Couch). There are a few edge cases, namely purging and
db re-creation that need a small mod to the core to allow 100% correct
plugins.
My $0.02 ... I think the existing build system would benefit from a
more modular approach because it would reify the coupling between, and
layering of, existing features e.g. the upcoming stats module would
consist of a capture core and an analysis/access plugin.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Human beings, who are almost unique in having the ability to learn
from the experience of others, are also remarkable for their apparent
disinclination to do so.
-- Douglas Adams
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Zachary Zolton <za...@gmail.com>.
Do you think that a library of higher-order functions, for the
JavaScript view layer, could be provided help codify the emerging
"best practices"? I mean, we should be able to help newbies emit their
keys and reduce those related documents into one object graph...
On Fri, Feb 6, 2009 at 10:21 AM, Kerr Rainey <ke...@gmail.com> wrote:
>> Great question, I'd say no it runs entirely against the grain of what
>> CouchDB is. Documents aren't supposed to be related to one another. But
>> relational databases don't handle this kind of thing either so I figure why
>> not CouchDB as it offers other features that solve lots of problems.
>
> I think in most practical apps that Couch targets there are
> relationships between docs. The canonical Blog example has comments
> that are children of a parent post. I think the emphasis on reminding
> people that CouchDB is not relational tends to lead people to the
> conclusion that the documents won't be related in some way. Of course
> CouchDB doesn't have any built in way to specify those relationships.
>
> I've been playing around with conventions for specifying doc
> relationships and how an app layer would make those relationships easy
> to handle. It seems to be quite useful so far. It would be
> interesting to see if it was possible / sensible to push some of that
> further down into CouchDB.
>
> --
> Kerr
>
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Kerr Rainey <ke...@gmail.com>.
> Great question, I'd say no it runs entirely against the grain of what
> CouchDB is. Documents aren't supposed to be related to one another. But
> relational databases don't handle this kind of thing either so I figure why
> not CouchDB as it offers other features that solve lots of problems.
I think in most practical apps that Couch targets there are
relationships between docs. The canonical Blog example has comments
that are children of a parent post. I think the emphasis on reminding
people that CouchDB is not relational tends to lead people to the
conclusion that the documents won't be related in some way. Of course
CouchDB doesn't have any built in way to specify those relationships.
I've been playing around with conventions for specifying doc
relationships and how an app layer would make those relationships easy
to handle. It seems to be quite useful so far. It would be
interesting to see if it was possible / sensible to push some of that
further down into CouchDB.
--
Kerr
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Robert Dionne <di...@dionne-associates.com>.
Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961
On Feb 1, 2009, at 8:52 AM, Martin Scholl wrote:
> Hello Robert,
>
>
> Robert Dionne wrote:
>> Martin,
>>
>> I'm very keen on relationships between documents. Coming from the
>> description logic community, I'd like to allow users to declare
>> certain
>> fields that relate documents and then compute transitive closures
>> over
>> dags whose nodes are documents and whose arcs the fields of interest.
>> This goes against the grain of couchdb as collections of unrelated
>> documents, I know, but it's what I want to do as couchdb's schema-
>> less
>> design offers many advantages over relational databases. Relational
>> databases aren't that great for storing graphs either.
> I like the idea (reminds me of an RDF DB btw), especially when used
> together with views.
it does, though I'm convinced the OWL/RDF community is laboring under
a delusion that the semantic web can be enabled if we just get enough
ontologies out there and can federate them. Even RDF is overkill for
most applications.
>
>>
>> I don't need to run full classification algorithms in the document
>> store, but would like to just maintain relationships (user-
>> defined) and
>> transitive closures of them. Inferencing would perhaps be better done
>> externally similar to the hypercouch work. So this would best be
>> served
>> by pluggable indexing and maybe pluggable storage, though I think I
>> could live without the latter for now.
> With Antony's latest hints (thank you Antony!) in mind, I think I will
> implement first sketches in an external way first. FTI is
> implemented in
> the same way afair.
>
>>
>> So I'm very excited about your ideas. I too have been reviewing the
>> code with this in mind and I would agree with others that it's
>> perhaps a
>> post 1.0 task. From the little time I've spent chasing down a
>> couple of
>> bugs I've seen there are a few subtle aspects to it. I've also
>> noticed
>> that the style of design in this community is more bottoms up,
>> which is
>> how it should be when building something new, so prototypes are
>> perhaps
>> better for fleshing out ideas. Anyway I'm very happy to help an d
>> collaborate on this as I can.
> Great! I will just publish my results on github. I hope, others will
> join then.
>
> What worries me most, is that I am still unsure in how to differ
> between
> design docs and indexing schemes, and when to use which
> infrastructure.
> Applied to the doc-relationship example you gave: how should
> "intermediate reults" of the dag processing be treated? As documents?
> Should they be put into view functions? Should views be able to hint,
> which indexing scheme is to be used? Depending on the index type,
> indexing and doc / view-processing can become inherently coupled and
> complex. Is this still CouchDB then?
Great question, I'd say no it runs entirely against the grain of what
CouchDB is. Documents aren't supposed to be related to one another.
But relational databases don't handle this kind of thing either so I
figure why not CouchDB as it offers other features that solve lots of
problems. Here's a typical use case (quoted words are documents,
those with asterisks are fields between documents)"
"heart disease" is *located_in* the "heart"
"myopathy" is *located_in* the "myocardium"
"myocardium" is *part_of* the "heart"
A reasoner might allow one to compose two relations, .eg.
*located_in* composed with *part_of* is equal to *located_in* and
thus conclude that myopathy is a disease of the heart.
So these transitive closures of links between documents would need to
be incrementally computed and treated the same as views. I think this
would be best implemented with plugins in the same vm? This kinds f
processing sems to require a tighter coupling than something like
full text indexing.
regards,
Bob
>
>
> Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Antony Blakey <an...@gmail.com>.
On 02/02/2009, at 12:22 AM, Martin Scholl wrote:
> What worries me most, is that I am still unsure in how to differ
> between
> design docs and indexing schemes, and when to use which
> infrastructure.
> Applied to the doc-relationship example you gave: how should
> "intermediate reults" of the dag processing be treated? As documents?
> Should they be put into view functions? Should views be able to hint,
> which indexing scheme is to be used? Depending on the index type,
> indexing and doc / view-processing can become inherently coupled and
> complex. Is this still CouchDB then?
No sure if this is exactly what you're talking about, but the way I
was thinking (which I picked up from GeoCouch) was to have design docs
that define no couch view, that allow you to configure the external
e.g. in my case, which was for returning the TC of user/role/
permissions graphs, that might look like this:
{
"_id": ...,
"_rev": ...,
transitive_closure_generator: {
"User": { "from": "_id", "to": "[roles]" },
"Role": { "from": "_id", "to": "[permissions]" },
}
}
Then, in my external I process such docs in my update-loop to generate
the configuration of the external i.e. I monitor changes to design
docs with a 'transitive_closure_generator' element, obviously caching
and persisting that configuration. You also need to catch design docs
updates that remove that element, and the deletion of monitored docs
as well.
Because I use CouchRest, I can use the 'couchrest-type' key that it
inserts to identify which document's fields I need to process e.g.
'User'/'Role'.
As an aside, it might be nice if you could use that doc to define the
endpoint for your externals, so you could completely simulate a built-
in view. Putting external configuration into design docs isn't good
idea but maybe it could be indirect e.g.
{
"_id": 'my_design',
"_rev": ...,
transitive_closure_generator: { ... }
externals: {
view_name: 'tc',
external: <the id of the external mapping in the .ini>
}
}
which would then allow you to access your external via '_view/
my_design/tc'. External views mapped like this would need to adhere to
the view contract e.g. params etc.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Always have a vision. Why spend your life making other people’s dreams?
-- Orson Welles (1915-1985)
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Martin Scholl <ms...@diskware.net>.
Hello Robert,
Robert Dionne wrote:
> Martin,
>
> I'm very keen on relationships between documents. Coming from the
> description logic community, I'd like to allow users to declare certain
> fields that relate documents and then compute transitive closures over
> dags whose nodes are documents and whose arcs the fields of interest.
> This goes against the grain of couchdb as collections of unrelated
> documents, I know, but it's what I want to do as couchdb's schema-less
> design offers many advantages over relational databases. Relational
> databases aren't that great for storing graphs either.
I like the idea (reminds me of an RDF DB btw), especially when used
together with views.
>
> I don't need to run full classification algorithms in the document
> store, but would like to just maintain relationships (user-defined) and
> transitive closures of them. Inferencing would perhaps be better done
> externally similar to the hypercouch work. So this would best be served
> by pluggable indexing and maybe pluggable storage, though I think I
> could live without the latter for now.
With Antony's latest hints (thank you Antony!) in mind, I think I will
implement first sketches in an external way first. FTI is implemented in
the same way afair.
>
> So I'm very excited about your ideas. I too have been reviewing the
> code with this in mind and I would agree with others that it's perhaps a
> post 1.0 task. From the little time I've spent chasing down a couple of
> bugs I've seen there are a few subtle aspects to it. I've also noticed
> that the style of design in this community is more bottoms up, which is
> how it should be when building something new, so prototypes are perhaps
> better for fleshing out ideas. Anyway I'm very happy to help an d
> collaborate on this as I can.
Great! I will just publish my results on github. I hope, others will
join then.
What worries me most, is that I am still unsure in how to differ between
design docs and indexing schemes, and when to use which infrastructure.
Applied to the doc-relationship example you gave: how should
"intermediate reults" of the dag processing be treated? As documents?
Should they be put into view functions? Should views be able to hint,
which indexing scheme is to be used? Depending on the index type,
indexing and doc / view-processing can become inherently coupled and
complex. Is this still CouchDB then?
Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Robert Dionne <di...@dionne-associates.com>.
Martin,
I'm very keen on relationships between documents. Coming from the
description logic community, I'd like to allow users to declare
certain fields that relate documents and then compute transitive
closures over dags whose nodes are documents and whose arcs the
fields of interest. This goes against the grain of couchdb as
collections of unrelated documents, I know, but it's what I want to
do as couchdb's schema-less design offers many advantages over
relational databases. Relational databases aren't that great for
storing graphs either.
I don't need to run full classification algorithms in the document
store, but would like to just maintain relationships (user-defined)
and transitive closures of them. Inferencing would perhaps be better
done externally similar to the hypercouch work. So this would best be
served by pluggable indexing and maybe pluggable storage, though I
think I could live without the latter for now.
So I'm very excited about your ideas. I too have been reviewing
the code with this in mind and I would agree with others that it's
perhaps a post 1.0 task. From the little time I've spent chasing down
a couple of bugs I've seen there are a few subtle aspects to it. I've
also noticed that the style of design in this community is more
bottoms up, which is how it should be when building something new, so
prototypes are perhaps better for fleshing out ideas. Anyway I'm very
happy to help an d collaborate on this as I can.
Cheers,
Bob
Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961
On Feb 1, 2009, at 7:51 AM, Martin Scholl wrote:
> Chris Anderson wrote:
>> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis
>> <pa...@gmail.com> wrote:
>>> Martin,
>>>
>>> Very cool ideas. We've been discussing erlang plugins. The
>>> conversation has generally gotten as far as, "erlang plugins... yeah
>>> we should have those."
>>
>> I agree this is cool, but I think it would be healthier for the
>> project to wait until we release a rock-solid 1.0.
>>
>> There are some incredibly non-obvious things happening inside, and a
>> big disruption right now wouldn't necessarily keep them all in
>> balance. Once we've met 1.0, we'll have a solid basis for comparison,
>> of any alternate implementations.
>>
>> Then, let the fun begin. :)
>>
>> Martin, I'd very much like to hear more about the sorts of indexers
>> you'd build. Sounds exciting.
> I'd like to experiment with Merkle trees, because these could turn out
> to be a good foundation for several use-cases:
> - index/tree-synchronization: replication is trivial with merkle
> trees,
> only changed parts of the tree get replicated in a secure manner.
> - secure document storage: modified documents (disc corruption, sw
> failure or even the "bad cracker"-case)
> - by using GPG/PGP-signatures probably even cryptographical secure
> design doc code signing, e.g. "safe applications"
>
> Furthermore, there are a lot of other clever map data structures
> available (not in the sense of a->b , but a<->b) which could become
> quite handy to store document relationships. I'm sure, the database
> ppl
> out here have many more ideas about what could be added to CouchDB.
>
>
> Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Martin Scholl <ms...@diskware.net>.
Martin Scholl wrote:
> Chris Anderson wrote:
>> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
>>> Martin,
>>>
>>> Very cool ideas. We've been discussing erlang plugins. The
>>> conversation has generally gotten as far as, "erlang plugins... yeah
>>> we should have those."
>> I agree this is cool, but I think it would be healthier for the
>> project to wait until we release a rock-solid 1.0.
>>
>> There are some incredibly non-obvious things happening inside, and a
>> big disruption right now wouldn't necessarily keep them all in
>> balance. Once we've met 1.0, we'll have a solid basis for comparison,
>> of any alternate implementations.
>>
>> Then, let the fun begin. :)
>>
>> Martin, I'd very much like to hear more about the sorts of indexers
>> you'd build. Sounds exciting.
> I'd like to experiment with Merkle trees, because these could turn out
> to be a good foundation for several use-cases:
> - index/tree-synchronization: replication is trivial with merkle trees,
> only changed parts of the tree get replicated in a secure manner.
> - secure document storage: modified documents (disc corruption, sw
> failure or even the "bad cracker"-case)
The important information is missing here: `-> "external modifications
to documents get detected". Why can this be important? To use CouchDB
for compliance or archiving tasks.
Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Martin Scholl <ms...@diskware.net>.
Chris Anderson wrote:
> On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
>> Martin,
>>
>> Very cool ideas. We've been discussing erlang plugins. The
>> conversation has generally gotten as far as, "erlang plugins... yeah
>> we should have those."
>
> I agree this is cool, but I think it would be healthier for the
> project to wait until we release a rock-solid 1.0.
>
> There are some incredibly non-obvious things happening inside, and a
> big disruption right now wouldn't necessarily keep them all in
> balance. Once we've met 1.0, we'll have a solid basis for comparison,
> of any alternate implementations.
>
> Then, let the fun begin. :)
>
> Martin, I'd very much like to hear more about the sorts of indexers
> you'd build. Sounds exciting.
I'd like to experiment with Merkle trees, because these could turn out
to be a good foundation for several use-cases:
- index/tree-synchronization: replication is trivial with merkle trees,
only changed parts of the tree get replicated in a secure manner.
- secure document storage: modified documents (disc corruption, sw
failure or even the "bad cracker"-case)
- by using GPG/PGP-signatures probably even cryptographical secure
design doc code signing, e.g. "safe applications"
Furthermore, there are a lot of other clever map data structures
available (not in the sense of a->b , but a<->b) which could become
quite handy to store document relationships. I'm sure, the database ppl
out here have many more ideas about what could be added to CouchDB.
Martin
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Chris Anderson <jc...@apache.org>.
On Sat, Jan 31, 2009 at 7:56 PM, Paul Davis <pa...@gmail.com> wrote:
> Martin,
>
> Very cool ideas. We've been discussing erlang plugins. The
> conversation has generally gotten as far as, "erlang plugins... yeah
> we should have those."
I agree this is cool, but I think it would be healthier for the
project to wait until we release a rock-solid 1.0.
There are some incredibly non-obvious things happening inside, and a
big disruption right now wouldn't necessarily keep them all in
balance. Once we've met 1.0, we'll have a solid basis for comparison,
of any alternate implementations.
Then, let the fun begin. :)
Martin, I'd very much like to hear more about the sorts of indexers
you'd build. Sounds exciting.
Chris
--
Chris Anderson
http://jchris.mfdz.com
Re: couch_gen_btree: pluggable storage / tree engines
Posted by Paul Davis <pa...@gmail.com>.
Martin,
Very cool ideas. We've been discussing erlang plugins. The
conversation has generally gotten as far as, "erlang plugins... yeah
we should have those."
As you've noticed, the couch sources depend fairly non-transparently
on the couch_btree implementation. Here's my take on the situation:
1. Erlang plugins are a Good Idea ™
2. We should probably focus on at least two types of plugins: storage
and indexing.
3. MySQL is repulsive.
1. I'm going to assume consensus.
2. I'm pretty sure that the two types of plugins are quite different
and will even require different semantics in the _design docs all the
way down to calls to emit.
3. Well, I'm going to assume consensus here too.
Anyway, I think the general steps forward are probably to figure out
what exactly we want to abstract, how to abstract it, and then start
abstracting. Abstractions FTW!
In terms of storage, I'm not entirely certain on the specifics.
There's quite of the API in the raw database layer that i haven't
dealt with yet. It'd be super fun to have specific file systems for
things like multiple servers on a single rack vs the distributed
CouchDB for distant servers etc.
In terms of view indexing, my mind is already racing on the
possibilities of where to push things around to make indexes
selectable in the _design docs etc. This could be really be a fun
project.
Keep up with the reading and thinking. This is something I look
forward to seeing implemented.
HTH,
Paul Davis
On Sat, Jan 31, 2009 at 9:49 AM, Martin Scholl <ms...@diskware.net> wrote:
> Hello all,
>
>
> although I am still doing a CouchDB review to better understand its
> design, I like to ask for comments for a tiny idea.
> I would like to add another index structure to CouchDB (a Merkle-Tree)
> and come up with asking myself what the best way of doing this would be.
> I have a rough guess of how closely couch_btree is knit into CouchDB.
> Therefore I would like to hear from you experienced developers comments
> on some of my ideas:
>
> My suggestion is a MySQL-ly approach (pluggable engines) for CouchDB,
> that is to factor out several components into generic behaviours:
> e.g.
> - a couch_gen_tree:
> abstracts access to couch_btree
>
> Maybe even a
> - couch_gen_storage
> e.g. file system, file storage access, etc.
>
> - couch_gen_replicator
> an imperative approach to tree / storage replication.
>
> As I said: I am new to CouchDB's code so I cannot really estimate how
> the current layering approach looks like, and whether we can even split
> out the 3 components.
>
> Imho there would be several benefits in having this flexibility brough
> by couch_gen_*:
> - new use-cases for CouchDB:
> - R-trees: for adding another way of querrying
> documents (e.g. nearest neighbour search)
> - genome databases
> - a special datastructure for indexing tags
> - ...
> - with a flexible storage layer, CouchDB could ran on top of other
> infrastructures and products: like S3, SimpleDB, AppEngine, etc.
> - (the following is just a guess:) a cleaner CouchDB codebase with a
> clear layering and separation of components
> - possibly (again, just a guess): with the plugin approach, we can more
> easily support advanced indexing and db management schemes, like
> distributed storage access, distributed transactions, etc.
>
>
> Martin
>