You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Markus Jelsma <ma...@buyways.nl> on 2010/01/28 00:14:32 UTC

M/R/M, again

Hello list and core developers,


Some six months ago a thread emerged [1] concerning a possible
map/reduce/merge in CouchDB. The question was if CouchDB would put this
feature on its list of proposals (which it still isn't).

Well, i now humbly attempt to get this topic back to the attention of the
core developers and users that also feel a need for having such a feature
on board.

As far as i know M/R/M would be most helpful in returning related
documents with just one key, instead of using sneaky techniques as
described on the wiki [2] that allows us to fetch related documents but
force us to retrieve two or more documents to do so.

Instead of fetching two or more documents, i'd prefer to fetch just a
single document queried with a `complex` key such as ["customer_id",
"product_id"]. This would then return one document containing information
on the customer as well as the number of products he/she ordered (we have
a separate document for each purchase).
Now we need to use the trick described but, i'm sure most of you would
agree, it would be much easier to have a complete document with all
required information instead of parsing an arbitrary number of documents
as we must do now.

To finish this message; i'd first like to apologize if i understood it all
completely wrong, if so, please enlighten me and others for the sake of
sharing clear information. Secondly, if it actually is a most beneficial
feature for many of us, can we `vote` for it to be integrated in CouchDB
instead of a possible contrib?

Any advanced insight is much appreciated.


Cheers!

[1]: http://osdir.com/ml/couchdb-user/2009-07/msg00002.html
[2]: http://wiki.apache.org/couchdb/EntityRelationship




Re: M/R/M, again

Posted by Markus Jelsma <ma...@buyways.nl>.
Paul,

>Markus,
>
>Sorry its taken me so long to sit down and tap out a reply.
>So, as to M/R/M, it turns out to be quite a bit harder to keep the
>same semantics of incremental view updates as well as the same reduce
>semantics when moving to a 'pure' implementation inside the view
>engine. Specifically, subsequent map's would need to be updated at the
>same time as the first map or we would need to add an update sequence
>like the main database has. Neither of these is a very good solution
>IMO. Also, the reduce semantics make it hard to hook subsequent M/R
>steps up to a view because of how reduces are implemented. The fix
>would require making reductions be persisted to a b~tree and then we'd
>need to pre-declare group_levels some how. Quite a bit of work. Also,
>because we aren't a 'Google M/R' implementation that guarantees 1
>unique key after each M/R stage the merge step becomes less trivial
>than the original M/R/M paper.

Sounds a bit like a lot of tough work for a feature that has quite some decent 
- although less elegant - work arounds. I still can choose for executing two 
seperate requests or merging all the data into one single document.

>These hurdles aren't insurmountable, but the longer I looked at the
>issues the more I thought that I would probably just end up writing a
>new indexer that has a slightly different M/R model to allow for such
>things. And then promptly never got around to it.
>However I have been trying to figure out how to create a CouchDB
>version of Riak's Jaywalker feature. It could do similar things to
>what you're wanting, but there are a couple problems that would put
>the hurt on cluster setups with the initial method I have in mind. And
>its a fairly decent sized addition so unless the implementation
>suddenly crystalizes into a simple solution I don't think it'll be in
>0.11 and hence 1.0.

It would obviously result in many difficulties in a cluster set up. I am using 
a sharded cluster on several machines and it would be quite a task to find out 
no which shard and which node a document resides that needs to be merged.

And ... Riak? Jaywalker? Can you explain? What is it, how is it related and 
and and... :)


>
>HTH,
>Paul Davis

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: M/R/M, again

Posted by Zachary Zolton <za...@gmail.com>.
@Markus,

Not sure I quite understood the semantics of the proposed "merge"
concept... Perhaps another example usage?

In the meantime, however, I'd try to solve your initial problem by
joining related docs, from a view, via a _list function:

function(head, req){
  var row;
  var customer;
  var products = [];
  while(row = getRow()) {
    if (row.doc.type == 'customer') {
      customer = row.doc;
    } else if (row.doc.type == 'product') {
      products.push(row.doc);
    }
  }
  customer.products = products;
  return toJSON(customer);
}

I'm not sure this particular hack is explicitly mentioned in the wiki.


Cheers,

Zach

On Thu, Jan 28, 2010 at 8:11 PM, Nicholas Orr <ni...@zxgen.net> wrote:
> I could probably use m/r/m - however each time I thought I needed this I
> solved my perceived problem by rethinking what I wanted to achieve in the
> first place and then ended up just merging docs together with references to
> one other doc with extra data copied into my new all in one doc.
>
> I think a detailed use case needs to be supplied with actual solved problems
> and why doing it any other way is not feasible or practical in the
> demonstrated use case.
>
> Even with the products, user, purchases use case I have a similar use case.
> Generating "quotes" (finance industry) based on who did it, what product,
> what rates, what lender etc.
> When I first did this I did it in mysql, everything works great, ideal RDBMS
> logic applied. Everything referencing everything else, not much data copied.
> Then I moved to couchdb because at the end of the day I want a document that
> is a quote and re-arranged what I was doing to suit CouchDB's way of getting
> things done. As such I copy a lot of data from my RDBMS
> lender/product/rate structure into my final CouchDB quote doc.
>
> Using map/reduce I can query all these docs anyway I want.
> As they are also quotes they should not be changing because I updated a rate
> in the RDBMS rate structure. (I need to move these rate plans into some sort
> of CouchDB doc as well and the latest doc as defined by some sort of date is
> used, the rates change weekly and are documents in either pdf, excel or
> word)
>
> Everything in CouchDB as far as I can tell is developed due to an actual
> need, without it x is not possible and implementing the feature provides a
> net win for the project. I've actually managed to solve all my issues with
> the existing feature set and ultimately figuring out if CouchDB is the best
> fit for the thing I'm trying to do. CouchDB isn't a do everything tool.
>
> A detailed need with everything mapped out would go a long way, "making my
> life easier" probably doesn't qualify - if you want easy just go back to
> paying M$ or Oracle for a RDBMS product (and associated hardware costs).
> Everything is doable and relatively simple with those tools. I've found its
> easier for me to read and learn more than to figure out where I get a chunk
> of coin to pay M$ (hardware/hosting costs)...
>
> Cheers,
>
> Nick
>
> 2010/1/29 Markus Jelsma <ma...@buyways.nl>
>
>> Nicholas, thanks for your reply!
>>
>>
>> I completely agree with you that CouchDB users need to leave their RDBMS
>> knowledge aside, i tell the same to my co-workers but somehow i tend to
>> feel a need for keeping separate documents. Perhaps i'm biased by some
>> wiki pages that also tend to do the same. You'll recognize the case with
>> products and purchases where each purchase is a separate document.
>> Although the use-case is not entirely the same, you can get much
>> information on purchases in that scenario with simple map/reduce
>> functions.
>>
>> But, if map/reduce/merge can simplify my life and others, why shouldn't it
>> be implemented? I'm quite sure many of us will be very grateful if such a
>> feature was to be available, don't you agree?
>>
>> If it is not being implemented or even suggested as a real proposal - as
>> the list of proposals on the wiki tells us - i would have no other choice
>> than following the principle you speak about. For my use case, it would
>> not be such a bad choice because these kind of updates are only initiated
>> by that same user only, so there is no worry about the
>> multiple-document-transaction paradigm which does not exist in CouchDB.
>>
>> Nevertheless, i, and possibly the core developers, sure would like to know
>> if there are more users that'd like to utilize a feature like me.
>>
>> Perhaps i just need to be either patient or bend the use-case even more
>> towards a pure document oriented database which will not allow us to merge
>> documents in a view.
>>
>>
>> Cheers,
>>
>> Nicholas Orr said:
>> > The other thing you could do is merge the two docs together.
>> > I've done this recently and boy did it make life simple.
>> >
>> > CouchDB requires a different approach than RDBMS.
>> > Applying RDBMS concepts to CouchDB will leave you wanting RDBMS
>> > features. Consider stepping back for a moment and forget RDBMS way of
>> > doing things and solve what you're trying to solve with CouchDB
>> > (Document) concepts.
>> >
>> > From what I can tell you want a username/email to map to a primaryid
>> > depending on the app in question.
>> >
>> > Would this work?
>> >
>> > {
>> > "_id": "a@a.com",
>> > "username": "adam",
>> > "type": "profile"
>> > "apps":
>> >
>> [{"applicationId":"app2","primaryId":18},{"applicationId":"app1","primaryId":17}]
>> > }
>> >
>> > This usually has far reaching implications when converting from RDBMS to
>> > Document Store.
>> > I've made choices like, well if applicationId changes then I'm going to
>> > have to update somehow, there isn't a cascade or atomic op in CouchDB,
>> > so how about applicationId is not allowed to change only allowed to
>> > create new ones, that works :) (then if I want to delete I have a
>> > active/disabled flag)
>> >
>> > Food for thought
>> >
>> > Nick
>> >
>> > 2010/1/29 Markus Jelsma <ma...@buyways.nl>
>> >
>> >> Hi Jan,
>> >>
>> >>
>> >> Thanks for your reply, but i'm afraid that i have provided a lousy
>> >> explanation
>> >> of the case i run in to. Let me explain with actual examples for i
>> >> believe Damien's examples do not fit my use case.
>> >>
>> >> I have a tiny database with two types of documents, profile and
>> >> profileApplication. The profile type has an ID which is the user's
>> >> e-mail address and a simply username field, nothing more (see below
>> >> for anatomy of both document types).
>> >>
>> >> {
>> >>   "_id": "markus@buyways.nl",
>> >>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
>> >>   "type": "profile",
>> >>   "username": "markus17"
>> >> }
>> >> {
>> >>   "_id": "1d2d9db700029557666e5d260b2ea038",
>> >>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
>> >>   "type": "profileApplication",
>> >>   "applicationId": "app2",
>> >>   "profileId": "markus@buyways.nl",
>> >>   "primaryId": 18
>> >> }
>> >>
>> >> The documents with profileApplication type are related to both an
>> >> application
>> >> (which i have omitted for now) and a profile. In RDBMS terms its
>> >> purpose would
>> >> be a common link table.
>> >>
>> >> The purpose for this relation is that a single profile can have a
>> >> different primaryId for different applications. My profile
>> >> (markus@buyways.nl) would have primaryId=18 for app2 and primaryId=17
>> >> for app1 etc.
>> >>
>> >> The goal would be to retrieve both my profile document _and_ the
>> >> primaryId that goes with my profile for app1 or app2, ideally the
>> >> query would be key=["markus@buyways.nl", "app1"], but this is
>> >> currently not possible.
>> >>
>> >> There are two things i can do now:
>> >> 1) retrieve the profile first and then fetch the primaryId for the
>> >> application
>> >> i need, but this takes two requests and manually merging of the
>> >> profile data
>> >> and primaryId;
>> >>
>> >>
>> >> http request 1:
>> >> http://localhost:5984/test/markus@buyways.nl
>> >>
>> >> output:
>> >> {"_id":"markus@buyways.nl
>> >>
>> ","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
>> >>
>> >> http request 2:
>> >>
>> >>
>> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22
>> >> ,
>> >> %20%22app1%22]
>> >>
>> >> output:
>> >> {"total_rows":4,"offset":2,"rows":[
>> >> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
>> >> ","app1"],"value":17}
>> >> ]}
>> >>
>> >> It's clear that i need to merge the value of the second request with
>> >> the document received by the first.
>> >>
>> >>
>> >> 2) fetch the profile and all related primaryIds in one go, this is one
>> >> single
>> >> requests but i also get primaryId's for apps that i don't need so this
>> >> fetches
>> >> more data and also needs clientside merging after i filtered out the
>> >> app i need.
>> >>
>> >>
>> >> http request 1:
>> >>
>> >>
>> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22
>> >> ,
>> >> %20%22zzz%22]
>> >>
>> >> output:
>> >> {"total_rows":6,"offset":3,"rows":[
>> >> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
>> >> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
>> >> ","app1",2],"value":17},
>> >> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl
>> >> ","app2",2],"value":18}
>> >> ]}
>> >>
>> >> It's clear that i need to filter my profile document and the
>> >> profileApplication document for the app i want (app1). The bad thing
>> >> here is
>> >> that i do not get my profile document in the value (although i can
>> >> emit it but
>> >> that's), if i include_docs i'll also get a lot of extra data on the
>> >> documents
>> >> i don't need, here it's just one document but i can be many.
>> >>
>> >>
>> >>
>> >> Both techniques work and have their pros and cons. But do you agree
>> >> that it would be much more convenient if we could simply construct
>> >> views that carry merged or combined documents using
>> >> key=["markus@buyways.nl","app1"].
>> >>
>> >> Am i correct to assume i cannot achieve the goal stated above without
>> >> either
>> >> Chris' technique or merging of documents in one single view?
>> >>
>> >> Please forgive me if i somehow didn't understand Damien's example but
>> >> i believe that deals with arithmetic instead of merging complex data
>> >> structures.
>> >> I also didn't (yet?) feel that the new 0.11 linked documents feature
>> >> will help
>> >> me out here. Also, i wish to keep this data in separate documents,
>> >> keeping an
>> >> array within the profile document isn't really the best approach i
>> >> think.
>> >>
>> >>
>> >>
>> >> Cheers,
>> >>
>> >>
>> >> >See http://damienkatz.net/2008/02/incremental_map.html and
>> >> >http://damienkatz.net/2008/02/incremental_map_1.html and the
>> >> >comments on both.
>> >>
>> >> Markus Jelsma - Technisch Architect - Buyways BV
>> >> http://www.linkedin.com/in/markus17
>> >> 050-8536620 / 06-50258350
>>
>>
>>
>>
>

Re: M/R/M, again

Posted by Nicholas Orr <ni...@zxgen.net>.
I could probably use m/r/m - however each time I thought I needed this I
solved my perceived problem by rethinking what I wanted to achieve in the
first place and then ended up just merging docs together with references to
one other doc with extra data copied into my new all in one doc.

I think a detailed use case needs to be supplied with actual solved problems
and why doing it any other way is not feasible or practical in the
demonstrated use case.

Even with the products, user, purchases use case I have a similar use case.
Generating "quotes" (finance industry) based on who did it, what product,
what rates, what lender etc.
When I first did this I did it in mysql, everything works great, ideal RDBMS
logic applied. Everything referencing everything else, not much data copied.
Then I moved to couchdb because at the end of the day I want a document that
is a quote and re-arranged what I was doing to suit CouchDB's way of getting
things done. As such I copy a lot of data from my RDBMS
lender/product/rate structure into my final CouchDB quote doc.

Using map/reduce I can query all these docs anyway I want.
As they are also quotes they should not be changing because I updated a rate
in the RDBMS rate structure. (I need to move these rate plans into some sort
of CouchDB doc as well and the latest doc as defined by some sort of date is
used, the rates change weekly and are documents in either pdf, excel or
word)

Everything in CouchDB as far as I can tell is developed due to an actual
need, without it x is not possible and implementing the feature provides a
net win for the project. I've actually managed to solve all my issues with
the existing feature set and ultimately figuring out if CouchDB is the best
fit for the thing I'm trying to do. CouchDB isn't a do everything tool.

A detailed need with everything mapped out would go a long way, "making my
life easier" probably doesn't qualify - if you want easy just go back to
paying M$ or Oracle for a RDBMS product (and associated hardware costs).
Everything is doable and relatively simple with those tools. I've found its
easier for me to read and learn more than to figure out where I get a chunk
of coin to pay M$ (hardware/hosting costs)...

Cheers,

Nick

2010/1/29 Markus Jelsma <ma...@buyways.nl>

> Nicholas, thanks for your reply!
>
>
> I completely agree with you that CouchDB users need to leave their RDBMS
> knowledge aside, i tell the same to my co-workers but somehow i tend to
> feel a need for keeping separate documents. Perhaps i'm biased by some
> wiki pages that also tend to do the same. You'll recognize the case with
> products and purchases where each purchase is a separate document.
> Although the use-case is not entirely the same, you can get much
> information on purchases in that scenario with simple map/reduce
> functions.
>
> But, if map/reduce/merge can simplify my life and others, why shouldn't it
> be implemented? I'm quite sure many of us will be very grateful if such a
> feature was to be available, don't you agree?
>
> If it is not being implemented or even suggested as a real proposal - as
> the list of proposals on the wiki tells us - i would have no other choice
> than following the principle you speak about. For my use case, it would
> not be such a bad choice because these kind of updates are only initiated
> by that same user only, so there is no worry about the
> multiple-document-transaction paradigm which does not exist in CouchDB.
>
> Nevertheless, i, and possibly the core developers, sure would like to know
> if there are more users that'd like to utilize a feature like me.
>
> Perhaps i just need to be either patient or bend the use-case even more
> towards a pure document oriented database which will not allow us to merge
> documents in a view.
>
>
> Cheers,
>
> Nicholas Orr said:
> > The other thing you could do is merge the two docs together.
> > I've done this recently and boy did it make life simple.
> >
> > CouchDB requires a different approach than RDBMS.
> > Applying RDBMS concepts to CouchDB will leave you wanting RDBMS
> > features. Consider stepping back for a moment and forget RDBMS way of
> > doing things and solve what you're trying to solve with CouchDB
> > (Document) concepts.
> >
> > From what I can tell you want a username/email to map to a primaryid
> > depending on the app in question.
> >
> > Would this work?
> >
> > {
> > "_id": "a@a.com",
> > "username": "adam",
> > "type": "profile"
> > "apps":
> >
> [{"applicationId":"app2","primaryId":18},{"applicationId":"app1","primaryId":17}]
> > }
> >
> > This usually has far reaching implications when converting from RDBMS to
> > Document Store.
> > I've made choices like, well if applicationId changes then I'm going to
> > have to update somehow, there isn't a cascade or atomic op in CouchDB,
> > so how about applicationId is not allowed to change only allowed to
> > create new ones, that works :) (then if I want to delete I have a
> > active/disabled flag)
> >
> > Food for thought
> >
> > Nick
> >
> > 2010/1/29 Markus Jelsma <ma...@buyways.nl>
> >
> >> Hi Jan,
> >>
> >>
> >> Thanks for your reply, but i'm afraid that i have provided a lousy
> >> explanation
> >> of the case i run in to. Let me explain with actual examples for i
> >> believe Damien's examples do not fit my use case.
> >>
> >> I have a tiny database with two types of documents, profile and
> >> profileApplication. The profile type has an ID which is the user's
> >> e-mail address and a simply username field, nothing more (see below
> >> for anatomy of both document types).
> >>
> >> {
> >>   "_id": "markus@buyways.nl",
> >>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
> >>   "type": "profile",
> >>   "username": "markus17"
> >> }
> >> {
> >>   "_id": "1d2d9db700029557666e5d260b2ea038",
> >>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
> >>   "type": "profileApplication",
> >>   "applicationId": "app2",
> >>   "profileId": "markus@buyways.nl",
> >>   "primaryId": 18
> >> }
> >>
> >> The documents with profileApplication type are related to both an
> >> application
> >> (which i have omitted for now) and a profile. In RDBMS terms its
> >> purpose would
> >> be a common link table.
> >>
> >> The purpose for this relation is that a single profile can have a
> >> different primaryId for different applications. My profile
> >> (markus@buyways.nl) would have primaryId=18 for app2 and primaryId=17
> >> for app1 etc.
> >>
> >> The goal would be to retrieve both my profile document _and_ the
> >> primaryId that goes with my profile for app1 or app2, ideally the
> >> query would be key=["markus@buyways.nl", "app1"], but this is
> >> currently not possible.
> >>
> >> There are two things i can do now:
> >> 1) retrieve the profile first and then fetch the primaryId for the
> >> application
> >> i need, but this takes two requests and manually merging of the
> >> profile data
> >> and primaryId;
> >>
> >>
> >> http request 1:
> >> http://localhost:5984/test/markus@buyways.nl
> >>
> >> output:
> >> {"_id":"markus@buyways.nl
> >>
> ","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
> >>
> >> http request 2:
> >>
> >>
> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22
> >> ,
> >> %20%22app1%22]
> >>
> >> output:
> >> {"total_rows":4,"offset":2,"rows":[
> >> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
> >> ","app1"],"value":17}
> >> ]}
> >>
> >> It's clear that i need to merge the value of the second request with
> >> the document received by the first.
> >>
> >>
> >> 2) fetch the profile and all related primaryIds in one go, this is one
> >> single
> >> requests but i also get primaryId's for apps that i don't need so this
> >> fetches
> >> more data and also needs clientside merging after i filtered out the
> >> app i need.
> >>
> >>
> >> http request 1:
> >>
> >>
> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22
> >> ,
> >> %20%22zzz%22]
> >>
> >> output:
> >> {"total_rows":6,"offset":3,"rows":[
> >> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
> >> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
> >> ","app1",2],"value":17},
> >> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl
> >> ","app2",2],"value":18}
> >> ]}
> >>
> >> It's clear that i need to filter my profile document and the
> >> profileApplication document for the app i want (app1). The bad thing
> >> here is
> >> that i do not get my profile document in the value (although i can
> >> emit it but
> >> that's), if i include_docs i'll also get a lot of extra data on the
> >> documents
> >> i don't need, here it's just one document but i can be many.
> >>
> >>
> >>
> >> Both techniques work and have their pros and cons. But do you agree
> >> that it would be much more convenient if we could simply construct
> >> views that carry merged or combined documents using
> >> key=["markus@buyways.nl","app1"].
> >>
> >> Am i correct to assume i cannot achieve the goal stated above without
> >> either
> >> Chris' technique or merging of documents in one single view?
> >>
> >> Please forgive me if i somehow didn't understand Damien's example but
> >> i believe that deals with arithmetic instead of merging complex data
> >> structures.
> >> I also didn't (yet?) feel that the new 0.11 linked documents feature
> >> will help
> >> me out here. Also, i wish to keep this data in separate documents,
> >> keeping an
> >> array within the profile document isn't really the best approach i
> >> think.
> >>
> >>
> >>
> >> Cheers,
> >>
> >>
> >> >See http://damienkatz.net/2008/02/incremental_map.html and
> >> >http://damienkatz.net/2008/02/incremental_map_1.html and the
> >> >comments on both.
> >>
> >> Markus Jelsma - Technisch Architect - Buyways BV
> >> http://www.linkedin.com/in/markus17
> >> 050-8536620 / 06-50258350
>
>
>
>

Re: M/R/M, again

Posted by Markus Jelsma <ma...@buyways.nl>.
Nicholas, thanks for your reply!


I completely agree with you that CouchDB users need to leave their RDBMS
knowledge aside, i tell the same to my co-workers but somehow i tend to
feel a need for keeping separate documents. Perhaps i'm biased by some
wiki pages that also tend to do the same. You'll recognize the case with
products and purchases where each purchase is a separate document.
Although the use-case is not entirely the same, you can get much
information on purchases in that scenario with simple map/reduce
functions.

But, if map/reduce/merge can simplify my life and others, why shouldn't it
be implemented? I'm quite sure many of us will be very grateful if such a
feature was to be available, don't you agree?

If it is not being implemented or even suggested as a real proposal - as
the list of proposals on the wiki tells us - i would have no other choice
than following the principle you speak about. For my use case, it would
not be such a bad choice because these kind of updates are only initiated
by that same user only, so there is no worry about the
multiple-document-transaction paradigm which does not exist in CouchDB.

Nevertheless, i, and possibly the core developers, sure would like to know
if there are more users that'd like to utilize a feature like me.

Perhaps i just need to be either patient or bend the use-case even more
towards a pure document oriented database which will not allow us to merge
documents in a view.


Cheers,

Nicholas Orr said:
> The other thing you could do is merge the two docs together.
> I've done this recently and boy did it make life simple.
>
> CouchDB requires a different approach than RDBMS.
> Applying RDBMS concepts to CouchDB will leave you wanting RDBMS
> features. Consider stepping back for a moment and forget RDBMS way of
> doing things and solve what you're trying to solve with CouchDB
> (Document) concepts.
>
> From what I can tell you want a username/email to map to a primaryid
> depending on the app in question.
>
> Would this work?
>
> {
> "_id": "a@a.com",
> "username": "adam",
> "type": "profile"
> "apps":
> [{"applicationId":"app2","primaryId":18},{"applicationId":"app1","primaryId":17}]
> }
>
> This usually has far reaching implications when converting from RDBMS to
> Document Store.
> I've made choices like, well if applicationId changes then I'm going to
> have to update somehow, there isn't a cascade or atomic op in CouchDB,
> so how about applicationId is not allowed to change only allowed to
> create new ones, that works :) (then if I want to delete I have a
> active/disabled flag)
>
> Food for thought
>
> Nick
>
> 2010/1/29 Markus Jelsma <ma...@buyways.nl>
>
>> Hi Jan,
>>
>>
>> Thanks for your reply, but i'm afraid that i have provided a lousy
>> explanation
>> of the case i run in to. Let me explain with actual examples for i
>> believe Damien's examples do not fit my use case.
>>
>> I have a tiny database with two types of documents, profile and
>> profileApplication. The profile type has an ID which is the user's
>> e-mail address and a simply username field, nothing more (see below
>> for anatomy of both document types).
>>
>> {
>>   "_id": "markus@buyways.nl",
>>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
>>   "type": "profile",
>>   "username": "markus17"
>> }
>> {
>>   "_id": "1d2d9db700029557666e5d260b2ea038",
>>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
>>   "type": "profileApplication",
>>   "applicationId": "app2",
>>   "profileId": "markus@buyways.nl",
>>   "primaryId": 18
>> }
>>
>> The documents with profileApplication type are related to both an
>> application
>> (which i have omitted for now) and a profile. In RDBMS terms its
>> purpose would
>> be a common link table.
>>
>> The purpose for this relation is that a single profile can have a
>> different primaryId for different applications. My profile
>> (markus@buyways.nl) would have primaryId=18 for app2 and primaryId=17
>> for app1 etc.
>>
>> The goal would be to retrieve both my profile document _and_ the
>> primaryId that goes with my profile for app1 or app2, ideally the
>> query would be key=["markus@buyways.nl", "app1"], but this is
>> currently not possible.
>>
>> There are two things i can do now:
>> 1) retrieve the profile first and then fetch the primaryId for the
>> application
>> i need, but this takes two requests and manually merging of the
>> profile data
>> and primaryId;
>>
>>
>> http request 1:
>> http://localhost:5984/test/markus@buyways.nl
>>
>> output:
>> {"_id":"markus@buyways.nl
>> ","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
>>
>> http request 2:
>>
>> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22
>> ,
>> %20%22app1%22]
>>
>> output:
>> {"total_rows":4,"offset":2,"rows":[
>> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
>> ","app1"],"value":17}
>> ]}
>>
>> It's clear that i need to merge the value of the second request with
>> the document received by the first.
>>
>>
>> 2) fetch the profile and all related primaryIds in one go, this is one
>> single
>> requests but i also get primaryId's for apps that i don't need so this
>> fetches
>> more data and also needs clientside merging after i filtered out the
>> app i need.
>>
>>
>> http request 1:
>>
>> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22
>> ,
>> %20%22zzz%22]
>>
>> output:
>> {"total_rows":6,"offset":3,"rows":[
>> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
>> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
>> ","app1",2],"value":17},
>> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl
>> ","app2",2],"value":18}
>> ]}
>>
>> It's clear that i need to filter my profile document and the
>> profileApplication document for the app i want (app1). The bad thing
>> here is
>> that i do not get my profile document in the value (although i can
>> emit it but
>> that's), if i include_docs i'll also get a lot of extra data on the
>> documents
>> i don't need, here it's just one document but i can be many.
>>
>>
>>
>> Both techniques work and have their pros and cons. But do you agree
>> that it would be much more convenient if we could simply construct
>> views that carry merged or combined documents using
>> key=["markus@buyways.nl","app1"].
>>
>> Am i correct to assume i cannot achieve the goal stated above without
>> either
>> Chris' technique or merging of documents in one single view?
>>
>> Please forgive me if i somehow didn't understand Damien's example but
>> i believe that deals with arithmetic instead of merging complex data
>> structures.
>> I also didn't (yet?) feel that the new 0.11 linked documents feature
>> will help
>> me out here. Also, i wish to keep this data in separate documents,
>> keeping an
>> array within the profile document isn't really the best approach i
>> think.
>>
>>
>>
>> Cheers,
>>
>>
>> >See http://damienkatz.net/2008/02/incremental_map.html and
>> >http://damienkatz.net/2008/02/incremental_map_1.html and the
>> >comments on both.
>>
>> Markus Jelsma - Technisch Architect - Buyways BV
>> http://www.linkedin.com/in/markus17
>> 050-8536620 / 06-50258350




Re: M/R/M, again

Posted by Nicholas Orr <ni...@zxgen.net>.
The other thing you could do is merge the two docs together.
I've done this recently and boy did it make life simple.

CouchDB requires a different approach than RDBMS.
Applying RDBMS concepts to CouchDB will leave you wanting RDBMS features.
Consider stepping back for a moment and forget RDBMS way of doing things and
solve what you're trying to solve with CouchDB (Document) concepts.

>From what I can tell you want a username/email to map to a primaryid
depending on the app in question.

Would this work?

{
"_id": "a@a.com",
"username": "adam",
"type": "profile"
"apps":
[{"applicationId":"app2","primaryId":18},{"applicationId":"app1","primaryId":17}]
}

This usually has far reaching implications when converting from RDBMS to
Document Store.
I've made choices like, well if applicationId changes then I'm going to have
to update somehow, there isn't a cascade or atomic op in CouchDB, so how
about applicationId is not allowed to change only allowed to create new
ones, that works :) (then if I want to delete I have a active/disabled flag)

Food for thought

Nick

2010/1/29 Markus Jelsma <ma...@buyways.nl>

> Hi Jan,
>
>
> Thanks for your reply, but i'm afraid that i have provided a lousy
> explanation
> of the case i run in to. Let me explain with actual examples for i believe
> Damien's examples do not fit my use case.
>
> I have a tiny database with two types of documents, profile and
> profileApplication. The profile type has an ID which is the user's e-mail
> address and a simply username field, nothing more (see below for anatomy of
> both document types).
>
> {
>   "_id": "markus@buyways.nl",
>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
>   "type": "profile",
>   "username": "markus17"
> }
> {
>   "_id": "1d2d9db700029557666e5d260b2ea038",
>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
>   "type": "profileApplication",
>   "applicationId": "app2",
>   "profileId": "markus@buyways.nl",
>   "primaryId": 18
> }
>
> The documents with profileApplication type are related to both an
> application
> (which i have omitted for now) and a profile. In RDBMS terms its purpose
> would
> be a common link table.
>
> The purpose for this relation is that a single profile can have a different
> primaryId for different applications. My profile (markus@buyways.nl) would
> have primaryId=18 for app2 and primaryId=17 for app1 etc.
>
> The goal would be to retrieve both my profile document _and_ the primaryId
> that goes with my profile for app1 or app2, ideally the query would be
> key=["markus@buyways.nl", "app1"], but this is currently not possible.
>
> There are two things i can do now:
> 1) retrieve the profile first and then fetch the primaryId for the
> application
> i need, but this takes two requests and manually merging of the profile
> data
> and primaryId;
>
>
> http request 1:
> http://localhost:5984/test/markus@buyways.nl
>
> output:
> {"_id":"markus@buyways.nl
> ","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
>
> http request 2:
>
> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22
> ,
> %20%22app1%22]
>
> output:
> {"total_rows":4,"offset":2,"rows":[
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
> ","app1"],"value":17}
> ]}
>
> It's clear that i need to merge the value of the second request with the
> document received by the first.
>
>
> 2) fetch the profile and all related primaryIds in one go, this is one
> single
> requests but i also get primaryId's for apps that i don't need so this
> fetches
> more data and also needs clientside merging after i filtered out the app i
> need.
>
>
> http request 1:
>
> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22
> ,
> %20%22zzz%22]
>
> output:
> {"total_rows":6,"offset":3,"rows":[
> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl
> ","app1",2],"value":17},
> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl
> ","app2",2],"value":18}
> ]}
>
> It's clear that i need to filter my profile document and the
> profileApplication document for the app i want (app1). The bad thing here
> is
> that i do not get my profile document in the value (although i can emit it
> but
> that's), if i include_docs i'll also get a lot of extra data on the
> documents
> i don't need, here it's just one document but i can be many.
>
>
>
> Both techniques work and have their pros and cons. But do you agree that it
> would be much more convenient if we could simply construct views that carry
> merged or combined documents using key=["markus@buyways.nl","app1"].
>
> Am i correct to assume i cannot achieve the goal stated above without
> either
> Chris' technique or merging of documents in one single view?
>
> Please forgive me if i somehow didn't understand Damien's example but i
> believe that deals with arithmetic instead of merging complex data
> structures.
> I also didn't (yet?) feel that the new 0.11 linked documents feature will
> help
> me out here. Also, i wish to keep this data in separate documents, keeping
> an
> array within the profile document isn't really the best approach i think.
>
>
>
> Cheers,
>
>
> >See http://damienkatz.net/2008/02/incremental_map.html and
> >http://damienkatz.net/2008/02/incremental_map_1.html and the
> >comments on both.
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

Re: M/R/M, again

Posted by Paul Davis <pa...@gmail.com>.
Markus,

Sorry its taken me so long to sit down and tap out a reply.

So, as to M/R/M, it turns out to be quite a bit harder to keep the
same semantics of incremental view updates as well as the same reduce
semantics when moving to a 'pure' implementation inside the view
engine. Specifically, subsequent map's would need to be updated at the
same time as the first map or we would need to add an update sequence
like the main database has. Neither of these is a very good solution
IMO. Also, the reduce semantics make it hard to hook subsequent M/R
steps up to a view because of how reduces are implemented. The fix
would require making reductions be persisted to a b~tree and then we'd
need to pre-declare group_levels some how. Quite a bit of work. Also,
because we aren't a 'Google M/R' implementation that guarantees 1
unique key after each M/R stage the merge step becomes less trivial
than the original M/R/M paper.

These hurdles aren't insurmountable, but the longer I looked at the
issues the more I thought that I would probably just end up writing a
new indexer that has a slightly different M/R model to allow for such
things. And then promptly never got around to it.

However I have been trying to figure out how to create a CouchDB
version of Riak's Jaywalker feature. It could do similar things to
what you're wanting, but there are a couple problems that would put
the hurt on cluster setups with the initial method I have in mind. And
its a fairly decent sized addition so unless the implementation
suddenly crystalizes into a simple solution I don't think it'll be in
0.11 and hence 1.0.

HTH,
Paul Davis

On Thu, Jan 28, 2010 at 9:57 AM, Markus Jelsma <ma...@buyways.nl> wrote:
> Hi Jan,
>
>
> Thanks for your reply, but i'm afraid that i have provided a lousy explanation
> of the case i run in to. Let me explain with actual examples for i believe
> Damien's examples do not fit my use case.
>
> I have a tiny database with two types of documents, profile and
> profileApplication. The profile type has an ID which is the user's e-mail
> address and a simply username field, nothing more (see below for anatomy of
> both document types).
>
> {
>   "_id": "markus@buyways.nl",
>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
>   "type": "profile",
>   "username": "markus17"
> }
> {
>   "_id": "1d2d9db700029557666e5d260b2ea038",
>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
>   "type": "profileApplication",
>   "applicationId": "app2",
>   "profileId": "markus@buyways.nl",
>   "primaryId": 18
> }
>
> The documents with profileApplication type are related to both an application
> (which i have omitted for now) and a profile. In RDBMS terms its purpose would
> be a common link table.
>
> The purpose for this relation is that a single profile can have a different
> primaryId for different applications. My profile (markus@buyways.nl) would
> have primaryId=18 for app2 and primaryId=17 for app1 etc.
>
> The goal would be to retrieve both my profile document _and_ the primaryId
> that goes with my profile for app1 or app2, ideally the query would be
> key=["markus@buyways.nl", "app1"], but this is currently not possible.
>
> There are two things i can do now:
> 1) retrieve the profile first and then fetch the primaryId for the application
> i need, but this takes two requests and manually merging of the profile data
> and primaryId;
>
>
> http request 1:
> http://localhost:5984/test/markus@buyways.nl
>
> output:
> {"_id":"markus@buyways.nl","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
>
> http request 2:
> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22,
> %20%22app1%22]
>
> output:
> {"total_rows":4,"offset":2,"rows":[
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1"],"value":17}
> ]}
>
> It's clear that i need to merge the value of the second request with the
> document received by the first.
>
>
> 2) fetch the profile and all related primaryIds in one go, this is one single
> requests but i also get primaryId's for apps that i don't need so this fetches
> more data and also needs clientside merging after i filtered out the app i
> need.
>
>
> http request 1:
> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22,
> %20%22zzz%22]
>
> output:
> {"total_rows":6,"offset":3,"rows":[
> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1",2],"value":17},
> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl","app2",2],"value":18}
> ]}
>
> It's clear that i need to filter my profile document and the
> profileApplication document for the app i want (app1). The bad thing here is
> that i do not get my profile document in the value (although i can emit it but
> that's), if i include_docs i'll also get a lot of extra data on the documents
> i don't need, here it's just one document but i can be many.
>
>
>
> Both techniques work and have their pros and cons. But do you agree that it
> would be much more convenient if we could simply construct views that carry
> merged or combined documents using key=["markus@buyways.nl","app1"].
>
> Am i correct to assume i cannot achieve the goal stated above without either
> Chris' technique or merging of documents in one single view?
>
> Please forgive me if i somehow didn't understand Damien's example but i
> believe that deals with arithmetic instead of merging complex data structures.
> I also didn't (yet?) feel that the new 0.11 linked documents feature will help
> me out here. Also, i wish to keep this data in separate documents, keeping an
> array within the profile document isn't really the best approach i think.
>
>
>
> Cheers,
>
>
>>See http://damienkatz.net/2008/02/incremental_map.html and
>>http://damienkatz.net/2008/02/incremental_map_1.html and the
>>comments on both.
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

Re: M/R/M, again

Posted by Markus Jelsma <ma...@buyways.nl>.
Hi Jan,


Thanks for your reply, but i'm afraid that i have provided a lousy explanation 
of the case i run in to. Let me explain with actual examples for i believe 
Damien's examples do not fit my use case.

I have a tiny database with two types of documents, profile and 
profileApplication. The profile type has an ID which is the user's e-mail 
address and a simply username field, nothing more (see below for anatomy of 
both document types).

{
   "_id": "markus@buyways.nl",
   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
   "type": "profile",
   "username": "markus17"
}
{
   "_id": "1d2d9db700029557666e5d260b2ea038",
   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
   "type": "profileApplication",
   "applicationId": "app2",
   "profileId": "markus@buyways.nl",
   "primaryId": 18
}

The documents with profileApplication type are related to both an application 
(which i have omitted for now) and a profile. In RDBMS terms its purpose would 
be a common link table.

The purpose for this relation is that a single profile can have a different 
primaryId for different applications. My profile (markus@buyways.nl) would 
have primaryId=18 for app2 and primaryId=17 for app1 etc.

The goal would be to retrieve both my profile document _and_ the primaryId 
that goes with my profile for app1 or app2, ideally the query would be 
key=["markus@buyways.nl", "app1"], but this is currently not possible.

There are two things i can do now:
1) retrieve the profile first and then fetch the primaryId for the application 
i need, but this takes two requests and manually merging of the profile data 
and primaryId;


http request 1:
http://localhost:5984/test/markus@buyways.nl

output:
{"_id":"markus@buyways.nl","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}

http request 2:
http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22,
%20%22app1%22]

output:
{"total_rows":4,"offset":2,"rows":[
{"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1"],"value":17}
]}

It's clear that i need to merge the value of the second request with the 
document received by the first.


2) fetch the profile and all related primaryIds in one go, this is one single 
requests but i also get primaryId's for apps that i don't need so this fetches 
more data and also needs clientside merging after i filtered out the app i 
need.


http request 1:
http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22,
%20%22zzz%22]

output:
{"total_rows":6,"offset":3,"rows":[
{"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
{"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1",2],"value":17},
{"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl","app2",2],"value":18}
]}

It's clear that i need to filter my profile document and the 
profileApplication document for the app i want (app1). The bad thing here is 
that i do not get my profile document in the value (although i can emit it but 
that's), if i include_docs i'll also get a lot of extra data on the documents 
i don't need, here it's just one document but i can be many.



Both techniques work and have their pros and cons. But do you agree that it 
would be much more convenient if we could simply construct views that carry 
merged or combined documents using key=["markus@buyways.nl","app1"].

Am i correct to assume i cannot achieve the goal stated above without either 
Chris' technique or merging of documents in one single view?

Please forgive me if i somehow didn't understand Damien's example but i 
believe that deals with arithmetic instead of merging complex data structures. 
I also didn't (yet?) feel that the new 0.11 linked documents feature will help 
me out here. Also, i wish to keep this data in separate documents, keeping an 
array within the profile document isn't really the best approach i think.



Cheers,


>See http://damienkatz.net/2008/02/incremental_map.html and
>http://damienkatz.net/2008/02/incremental_map_1.html and the
>comments on both.

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: M/R/M, again

Posted by Jan Lehnardt <ja...@apache.org>.
On 28 Jan 2010, at 03:44, Markus Jelsma wrote:

> Hi Chris,
> 
> 
> 
> Thanks for your reply. I see that this might work indeed, after giving it some 
> more thought, but i think i have a few issues with such an approach. It would 
> fill the database with data i don't want/need and replication would also copy 
> this extraneous data. Secondly, i cannot simply construct a view that can 
> operate on the data at hand for i would then need to construct a view that 
> operates on previously copied data, is it not?
> 
> I believe, but am not entirely sure, that i would then prefer to use the 
> tricks as described in the wiki.
> 
> What is the issue with added the merge feature to the Couch? Is it extremely 
> hard to do so? I'd figure the resulting view with still fit in the BTree as it 
> does now, but correct me if i'm wrong :)

See http://damienkatz.net/2008/02/incremental_map.html and 
http://damienkatz.net/2008/02/incremental_map_1.html and the
comments on both.

Cheers
Jan
--



> 
> I understand people are fine with Hadoop offering batch processes instead but 
> it still sounds a bit funky for CouchDB, which is still so clean and neat and 
> would, in my opinion, benefit since it would be easier to query for related 
> documents.
> 
> Again, please correct me if i talk rubbish ;)
> 
> 
> 
> Cheers,
> 
>> 
>> There are a number of ways things like this could be accomplished. One
>> I proposed recently is the facility to have CouchDB automatically copy
>> any  http view query result (local or remote) to a database full of
>> documents (one document for each row). This gives you a lot of
>> flexibility, but it is not incremental.
>> 
>> I think it's ok if it's not incremental. Hadoop is a batch process,
>> and people are fine with that.
>> 
>> The cool thing about this is it's in the HTTP domain so it will work
>> on a cluster, can be cached, etc.
>> 
>> Chris
>> 
>> 
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
> 
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
> 


Re: M/R/M, again

Posted by Markus Jelsma <ma...@buyways.nl>.
Hi Chris,



Thanks for your reply. I see that this might work indeed, after giving it some 
more thought, but i think i have a few issues with such an approach. It would 
fill the database with data i don't want/need and replication would also copy 
this extraneous data. Secondly, i cannot simply construct a view that can 
operate on the data at hand for i would then need to construct a view that 
operates on previously copied data, is it not?

I believe, but am not entirely sure, that i would then prefer to use the 
tricks as described in the wiki.

What is the issue with added the merge feature to the Couch? Is it extremely 
hard to do so? I'd figure the resulting view with still fit in the BTree as it 
does now, but correct me if i'm wrong :)

I understand people are fine with Hadoop offering batch processes instead but 
it still sounds a bit funky for CouchDB, which is still so clean and neat and 
would, in my opinion, benefit since it would be easier to query for related 
documents.

Again, please correct me if i talk rubbish ;)



Cheers,

>
>There are a number of ways things like this could be accomplished. One
>I proposed recently is the facility to have CouchDB automatically copy
>any  http view query result (local or remote) to a database full of
>documents (one document for each row). This gives you a lot of
>flexibility, but it is not incremental.
>
>I think it's ok if it's not incremental. Hadoop is a batch process,
>and people are fine with that.
>
>The cool thing about this is it's in the HTTP domain so it will work
>on a cluster, can be cached, etc.
>
>Chris
>
>
>--
>Chris Anderson
>http://jchrisa.net
>http://couch.io

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: M/R/M, again

Posted by Chris Anderson <jc...@apache.org>.
On Wed, Jan 27, 2010 at 3:14 PM, Markus Jelsma <ma...@buyways.nl> wrote:
> Hello list and core developers,
>
>
> Some six months ago a thread emerged [1] concerning a possible
> map/reduce/merge in CouchDB. The question was if CouchDB would put this
> feature on its list of proposals (which it still isn't).
>
> Well, i now humbly attempt to get this topic back to the attention of the
> core developers and users that also feel a need for having such a feature
> on board.
>
> As far as i know M/R/M would be most helpful in returning related
> documents with just one key, instead of using sneaky techniques as
> described on the wiki [2] that allows us to fetch related documents but
> force us to retrieve two or more documents to do so.
>
> Instead of fetching two or more documents, i'd prefer to fetch just a
> single document queried with a `complex` key such as ["customer_id",
> "product_id"]. This would then return one document containing information
> on the customer as well as the number of products he/she ordered (we have
> a separate document for each purchase).
> Now we need to use the trick described but, i'm sure most of you would
> agree, it would be much easier to have a complete document with all
> required information instead of parsing an arbitrary number of documents
> as we must do now.
>
> To finish this message; i'd first like to apologize if i understood it all
> completely wrong, if so, please enlighten me and others for the sake of
> sharing clear information. Secondly, if it actually is a most beneficial
> feature for many of us, can we `vote` for it to be integrated in CouchDB
> instead of a possible contrib?

There are a number of ways things like this could be accomplished. One
I proposed recently is the facility to have CouchDB automatically copy
any  http view query result (local or remote) to a database full of
documents (one document for each row). This gives you a lot of
flexibility, but it is not incremental.

I think it's ok if it's not incremental. Hadoop is a batch process,
and people are fine with that.

The cool thing about this is it's in the HTTP domain so it will work
on a cluster, can be cached, etc.

Chris

>
> Any advanced insight is much appreciated.
>
>
> Cheers!
>
> [1]: http://osdir.com/ml/couchdb-user/2009-07/msg00002.html
> [2]: http://wiki.apache.org/couchdb/EntityRelationship
>
>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io