You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Anderson <jc...@apache.org> on 2008/12/30 08:25:32 UTC

0.9 Progress

While it's exciting to be bike-shedding so vigoriously, my mind keeps
coming back to the things we could do to make CouchDB rock:

Full Unicode database and design doc name handling:

I think flexible database urls are worthy goal. We still need to
figure out how this will look on the file system. (Antony, any luck on
the name stubs?) Here's a related ticket asking for VERSION files
inside the db directories to help Debian warn against incompatible
upgrades: http://issues.apache.org/jira/browse/COUCHDB-44 Also
related: http://issues.apache.org/jira/browse/COUCHDB-168
(configurable derived data storage directory).

OMG error handling:

I think Couch has some good code here - the JSON error messages gives
us a strong start. I'm working on normalizing the error handling from
JavaScript processes, so that validation, form and view development
gets easier. There's also a call for start up warnings about file
permissions and port availability (oh and the fact that old versions
of Erlang are unusable). And no room to write to disk:
http://issues.apache.org/jira/browse/COUCHDB-164

Perhaps we need some documentation for ourselves about how errors are
bubbled through the system. We're very close, but there are some
inconsistencies from various modules, I'm afraid.

We need to document forms and the new '%2F'-free urls moreso than just
in couch_tests.js. I was also thinking it'd be slick to set up
redirects from the '%2F' versions back to their '/' counterparts, but
serving the content at both locations works for now.

Jan has some other good ones here:

"Another thing is an ignore_errors=true option for bulk operations for
people who can live with failing writes. In addition, it would be nice
when a bulk operation fails, the failing document id would be included
in the response."

As long as we don't try to return the list of all the updates that
would have failed had we tried to continue the operation, returning
the first failed updates' id would make ammended retries feasible.
There's a big difference between ignore_errors and
transactional=false, so we should be clear on what guarantees CouchDB
will provide.

Fixing `descending=true` and the whole startkey & endkey swap. There's
a major n00b WTF there. I think in this case, an suprisingly empty
view result is not the best addition to the learning curve.

Also view etags:

With the new view group state servers we're within striking distance
of view etags, although it's a bit deeper in the codebase than a lot
of this stuff. View etags would have to factor in the db's current seq
(or more cache-efficiently, the views last-modified-seq, but that can
wait) as well as some details about the request and the design
document.

I'm sure there's more than this to do, but this is more than enough to
keep all of us busy if we want.


-- 
Chris Anderson
http://jchris.mfdz.com

Re: 0.9 Progress

Posted by Antony Blakey <an...@gmail.com>.
On 31/12/2008, at 3:43 AM, Damien Katz wrote:

> It's important we don't obfuscate the filenames unnecessarily. I'd  
> love to support any possible database name, but not at the expensive  
> of making it harder to debug for 99% of the other cases. As of yet,  
> I don't know of anyone who is blocked right now because of this.

I am. I practically can't win a gig in my target market if developers  
are force to use roman script for the database and design document  
names.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The trouble with the world is that the stupid are cocksure and the  
intelligent are full of doubt.
   -- Bertrand Russell



Re: 0.9 Progress

Posted by Antony Blakey <an...@gmail.com>.
On 31/12/2008, at 4:35 AM, Dean Landolt wrote:

> Antony's suggestion seems like it could do a nice job of  
> accommodating all
> cases:
>
> if slugged-filename == original-filename:
>    name-on-disk = original-filename
> else:
>    name-on-disk = slugged-filename+md5-of-original-filename

But I advocate a directory structure rather than just a name,  
otherwise there's no way for an external tool to determine the  
unslugged name of the database or view.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –

Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –

   -– Emily Dickinson 913 (1865)



Re: 0.9 Progress

Posted by Dean Landolt <de...@deanlandolt.com>.
On Tue, Dec 30, 2008 at 12:13 PM, Damien Katz <da...@apache.org> wrote:

>
> On Dec 30, 2008, at 2:25 AM, Chris Anderson wrote:
>
>  While it's exciting to be bike-shedding so vigoriously, my mind keeps
>> coming back to the things we could do to make CouchDB rock:
>>
>> Full Unicode database and design doc name handling:
>>
>> I think flexible database urls are worthy goal. We still need to
>> figure out how this will look on the file system. (Antony, any luck on
>> the name stubs?) Here's a related ticket asking for VERSION files
>> inside the db directories to help Debian warn against incompatible
>> upgrades: http://issues.apache.org/jira/browse/COUCHDB-44 Also
>> related: http://issues.apache.org/jira/browse/COUCHDB-168
>> (configurable derived data storage directory).
>>
>
> It's important we don't obfuscate the filenames unnecessarily. I'd love to
> support any possible database name, but not at the expensive of making it
> harder to debug for 99% of the other cases. As of yet, I don't know of
> anyone who is blocked right now because of this.


Antony's suggestion seems like it could do a nice job of accommodating all
cases:

if slugged-filename == original-filename:
    name-on-disk = original-filename
else:
    name-on-disk = slugged-filename+md5-of-original-filename

This way, if you want to ease administrative burden with a one-to-one
correlation between name and name-on-disk, just don't use any special
characters that would get slugged out. Throw the list of slugged chars on
the wiki and presto, everybody's happy...I think...

Re: 0.9 Progress

Posted by Chris Anderson <jc...@gmail.com>.
On Tue, Dec 30, 2008 at 9:13 AM, Damien Katz <da...@apache.org> wrote:
>> Fixing `descending=true` and the whole startkey & endkey swap. There's
>> a major n00b WTF there. I think in this case, an suprisingly empty
>> view result is not the best addition to the learning curve.
>
>
> So by swapping the startkey and endkey automatically, I think we'd be
> supporting a incorrect model about how the queries work.
>

You've convinced me.


>
> Perhaps a better way is to put a flag in the result set if the startkey and
> endkey are misordered, so when you get  back the results and they are empty,
> there will be a member like "keys_misordered:true".
>

Yes. Better documentation may be the best solution.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: 0.9 Progress

Posted by Damien Katz <da...@apache.org>.
On Dec 30, 2008, at 2:25 AM, Chris Anderson wrote:

> While it's exciting to be bike-shedding so vigoriously, my mind keeps
> coming back to the things we could do to make CouchDB rock:
>
> Full Unicode database and design doc name handling:
>
> I think flexible database urls are worthy goal. We still need to
> figure out how this will look on the file system. (Antony, any luck on
> the name stubs?) Here's a related ticket asking for VERSION files
> inside the db directories to help Debian warn against incompatible
> upgrades: http://issues.apache.org/jira/browse/COUCHDB-44 Also
> related: http://issues.apache.org/jira/browse/COUCHDB-168
> (configurable derived data storage directory).

It's important we don't obfuscate the filenames unnecessarily. I'd  
love to support any possible database name, but not at the expensive  
of making it harder to debug for 99% of the other cases. As of yet, I  
don't know of anyone who is blocked right now because of this.

>
>
> OMG error handling:
>
> I think Couch has some good code here - the JSON error messages gives
> us a strong start. I'm working on normalizing the error handling from
> JavaScript processes, so that validation, form and view development
> gets easier. There's also a call for start up warnings about file
> permissions and port availability (oh and the fact that old versions
> of Erlang are unusable). And no room to write to disk:
> http://issues.apache.org/jira/browse/COUCHDB-164
>
> Perhaps we need some documentation for ourselves about how errors are
> bubbled through the system. We're very close, but there are some
> inconsistencies from various modules, I'm afraid.
>
> We need to document forms and the new '%2F'-free urls moreso than just
> in couch_tests.js. I was also thinking it'd be slick to set up
> redirects from the '%2F' versions back to their '/' counterparts, but
> serving the content at both locations works for now.
>
> Jan has some other good ones here:
>
> "Another thing is an ignore_errors=true option for bulk operations for
> people who can live with failing writes. In addition, it would be nice
> when a bulk operation fails, the failing document id would be included
> in the response."

>
>
> As long as we don't try to return the list of all the updates that
> would have failed had we tried to continue the operation, returning
> the first failed updates' id would make ammended retries feasible.
> There's a big difference between ignore_errors and
> transactional=false, so we should be clear on what guarantees CouchDB
> will provide.

FYI, I am working on this as part of security and replication work.

>
>
> Fixing `descending=true` and the whole startkey & endkey swap. There's
> a major n00b WTF there. I think in this case, an suprisingly empty
> view result is not the best addition to the learning curve.

This is a hairy area, and I think might cause confusion no matter what.

I understand the use case of using in a startkey and endkey, seeing  
the results and then saying "I want this exact output sorted the other  
way" with a single flag. However, that's not how it works. For  
example, if you set a limit count smaller than the possible result  
set, simply switching to descending and swapping keys will actually  
gives you a different rows, because it will start the counting from  
new startkey, which was the endkey previously. The perception problem  
I think is that descending=true is thought of as "reverse the final  
output", when it's really like "reverse the whole view before the  
query". If we change the behavior of endkey to be < instead of <=  
(it's been suggested, but I'm ambivalent about such a change), then  
we've made the problem even worse.

So by swapping the startkey and endkey automatically, I think we'd be  
supporting a incorrect model about how the queries work.

Another possible solution is to examine the keys given, and if the  
keys aren't ordered properly, swap them. However, if the caller  
intentionally provides keys that cannot return a result (often it  
simplifies code to deal with an empty set rather than special logic to  
detect it up front)

I don't feel too strongly about these issues, but I want to point out  
that the change might lead to new confusions and problems.

Perhaps a better way is to put a flag in the result set if the  
startkey and endkey are misordered, so when you get  back the results  
and they are empty, there will be a member like "keys_misordered:true".


>
>
> Also view etags:
>
> With the new view group state servers we're within striking distance
> of view etags, although it's a bit deeper in the codebase than a lot
> of this stuff. View etags would have to factor in the db's current seq
> (or more cache-efficiently, the views last-modified-seq, but that can
> wait) as well as some details about the request and the design
> document.
>
> I'm sure there's more than this to do, but this is more than enough to
> keep all of us busy if we want.


Re: 0.9 Progress

Posted by Adam Kocoloski <ad...@gmail.com>.
On Dec 31, 2008, at 1:02 AM, Chris Anderson wrote:

> On Tue, Dec 30, 2008 at 9:57 PM, Adam Kocoloski
> <ad...@gmail.com> wrote:       ...
>>
>> +1 for being able to easily move derived data around.  In fact, I  
>> think it'd
>> be better if all the derived data on the server shared a common  
>> root that
>> was separate from the canonical data.  I opened a ticket (#168)  
>> about this a
>> month ago.  Sorry I didn't chime earlier on the ML to voice that  
>> viewpoint.
>>
>
> Adam,
>
> I tried to set up the above so that #168 will work. Since the views
> are in .indexes, and the originals in .couch, you could store them
> together in one directory or separate roots. I'm thinking a config
> item for data_dir plus one for derived_dir ... which can be different,
> but by default are the same. This makes it least different from the
> current implementation, but with the advantages we want.
>
> I think it works - please look for holes in my reasoning.
>
> Chris

Hi Chris, sorry for my delayed response.  If an index_dir/derived_dir  
config option is included then I'm happy.  Setting it equal to  
database_dir is a sensible default.  Best,

Adam


Re: 0.9 Progress

Posted by Chris Anderson <jc...@gmail.com>.
On Tue, Dec 30, 2008 at 9:57 PM, Adam Kocoloski
<ad...@gmail.com> wrote:       ...
>
> +1 for being able to easily move derived data around.  In fact, I think it'd
> be better if all the derived data on the server shared a common root that
> was separate from the canonical data.  I opened a ticket (#168) about this a
> month ago.  Sorry I didn't chime earlier on the ML to voice that viewpoint.
>

Adam,

I tried to set up the above so that #168 will work. Since the views
are in .indexes, and the originals in .couch, you could store them
together in one directory or separate roots. I'm thinking a config
item for data_dir plus one for derived_dir ... which can be different,
but by default are the same. This makes it least different from the
current implementation, but with the advantages we want.

I think it works - please look for holes in my reasoning.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: 0.9 Progress

Posted by Adam Kocoloski <ad...@gmail.com>.
On Dec 30, 2008, at 9:48 PM, Chris Anderson wrote:

> On Tue, Dec 30, 2008 at 5:34 PM, Antony Blakey <antony.blakey@gmail.com 
> > wrote:
>>
>> I'm looking for feedback on the canonical/derived containment  
>> issue, the
>> idea of having a defined directory for _externals to store data,  
>> and the
>> idea of putting said directory/ies into the dbinfo call (and hence  
>> also each
>> _external request).
>>
>
> So we're looking for something like (modified the dir structure so
> that derived data is easily symlinked/configed to another drive):
>
> my-favorite-db-6c38ab2b68f86d2c9.couch/
>   name
>   data
>
> my-favorite-db-6c38ab2b68f86d2c9.indexes/
>    views/
>       my-design-doc-8ab2b68f866c3d2c9.view/
>          name
>          data
>       my-other-design-doc-c3d2c98ab2b68f866.view/
>          name
>          data
>    my-external-indexer/
>        ...
>        whatever
>        ...
>    my-other-external/
>        ...

+1 for being able to easily move derived data around.  In fact, I  
think it'd be better if all the derived data on the server shared a  
common root that was separate from the canonical data.  I opened a  
ticket (#168) about this a month ago.  Sorry I didn't chime earlier on  
the ML to voice that viewpoint.  Best,

Adam


Re: 0.9 Progress

Posted by Antony Blakey <an...@gmail.com>.
On 31/12/2008, at 1:18 PM, Chris Anderson wrote:

> On Tue, Dec 30, 2008 at 5:34 PM, Antony Blakey <antony.blakey@gmail.com 
> > wrote:
>>
>> I'm looking for feedback on the canonical/derived containment  
>> issue, the
>> idea of having a defined directory for _externals to store data,  
>> and the
>> idea of putting said directory/ies into the dbinfo call (and hence  
>> also each
>> _external request).
>>
>
> So we're looking for something like (modified the dir structure so
> that derived data is easily symlinked/configed to another drive):
>
> my-favorite-db-6c38ab2b68f86d2c9.couch/
>   name
>   data
>
> my-favorite-db-6c38ab2b68f86d2c9.indexes/
>    views/
>       my-design-doc-8ab2b68f866c3d2c9.view/
>          name
>          data
>       my-other-design-doc-c3d2c98ab2b68f866.view/
>          name
>          data
>    my-external-indexer/
>        ...
>        whatever
>        ...
>    my-other-external/
>        ...
>
> where  _external scripts can choose whatever non-conflicting directory
> name they'd like, within the derived data part. (and store it however
> they'd like...)

OK, but putting each external dir into the top level namespace opens  
the possibility of collision should couch want to put something else  
into the indexes directory. Maybe that directory should be _view/,  
with an API rule that names with a leading '_' are reserved :)

> Also with the added optimization that when slug == the db name we
> don't add the hash to the dirname.
>
> I'm not sure about the db-info call's details (obviously externals
> need the information, but we might not want to send it to http
> clients...)


An external may also want to munge an arbitrary name using the same  
algorithm (it may be driven by multiple design docs). IMO an RPC-like  
HTTP endpoint that munges a name would be sufficient. How about dbinfo  
providing the munged form of it's indexes directory name i.e. the  
leaf, not the path. The _external can already get the indexes  
directory from the /_config request, presuming that /_config is  
available to the external - #168 should be resolved if the external is  
going to be responsible for constructing the directory. Failing that,  
the external can be given a directory to use on it's invocation  
command line, but it will still want to be able to munge names.

> Consider yourself approved for landing. The work you have so far at
> http://github.com/AntonyBlakey/couchdb/tree/escaped_filenames seems
> sane, so it's just a matter of polishing it.

Will do.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Lack of will power has caused more failure than lack of intelligence  
or ability.
  -- Flower A. Newhouse


Re: 0.9 Progress

Posted by Chris Anderson <jc...@gmail.com>.
On Tue, Dec 30, 2008 at 5:34 PM, Antony Blakey <an...@gmail.com> wrote:
>
> I'm looking for feedback on the canonical/derived containment issue, the
> idea of having a defined directory for _externals to store data, and the
> idea of putting said directory/ies into the dbinfo call (and hence also each
> _external request).
>

So we're looking for something like (modified the dir structure so
that derived data is easily symlinked/configed to another drive):

my-favorite-db-6c38ab2b68f86d2c9.couch/
   name
   data

my-favorite-db-6c38ab2b68f86d2c9.indexes/
    views/
       my-design-doc-8ab2b68f866c3d2c9.view/
          name
          data
       my-other-design-doc-c3d2c98ab2b68f866.view/
          name
          data
    my-external-indexer/
        ...
        whatever
        ...
    my-other-external/
        ...

where  _external scripts can choose whatever non-conflicting directory
name they'd like, within the derived data part. (and store it however
they'd like...)

Also with the added optimization that when slug == the db name we
don't add the hash to the dirname.

I'm not sure about the db-info call's details (obviously externals
need the information, but we might not want to send it to http
clients...)


> I'm happy to do the work required, but given that I've published a proof of
> concept, I'd rather not do more work until I know it's worth it.
>

Consider yourself approved for landing. The work you have so far at
http://github.com/AntonyBlakey/couchdb/tree/escaped_filenames seems
sane, so it's just a matter of polishing it.

Thanks for taking the charge on better unicode support!

-- 
Chris Anderson
http://jchris.mfdz.com

Re: 0.9 Progress

Posted by Antony Blakey <an...@gmail.com>.
On 30/12/2008, at 5:55 PM, Chris Anderson wrote:

> While it's exciting to be bike-shedding so vigoriously, my mind keeps
> coming back to the things we could do to make CouchDB rock:
>
> Full Unicode database and design doc name handling:
>
> I think flexible database urls are worthy goal. We still need to
> figure out how this will look on the file system. (Antony, any luck on
> the name stubs?

Trivial, but I'm waiting for the gatekeepers to say 'do this work and  
we'll put it in', because I have to move it to HEAD, which given  
testing etc is some work, and then I'd want to get it as clean as  
possible, commented etc.

I'm looking for feedback on the canonical/derived containment issue,  
the idea of having a defined directory for _externals to store data,  
and the idea of putting said directory/ies into the dbinfo call (and  
hence also each _external request).

I'm happy to do the work required, but given that I've published a  
proof of concept, I'd rather not do more work until I know it's worth  
it.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –

Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –

   -– Emily Dickinson 913 (1865)