You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Matthew Buckett <ma...@it.ox.ac.uk> on 2016/04/19 15:06:46 UTC

Reporting on joined data

I'm looking at trying to report on some data held in CouchDB, I have 2
types of object stored, users and activities. An example user id:

{
  "_id": "37453929...",
  userId: "1234",
  type: "user",
  name: "John Smith",
  department: "english"
}

There will be multiple users and there is a small set of possible values
for the department. An example activity is:

{
  "_id": "736489...",
  userId: "1234",
  type: "activity",
  title: "Doing some work.",
  hours: 2
}
each user will have multiple activities.

I would like a view showing the total number of hours per department, but I
can't see how to easily do it. I was initially thinking I could perform a
map-reduce to build map the documents by userId, and then use a reduce to
build a super document that contains the department and the hours together.
So I ended up with something like:

{
  user: {.... department: "english" },
  activites: [
    {.. hours: 2 }, ...
  ]
}

However this resulting document wouldn't be keyed on the department and I
can't perform a further map-reduce to switch it to be keyed off the
department.

Thanks in advance.

Re: Reporting on joined data

Posted by Sebastian Rothbucher <se...@googlemail.com>.
Hi Matthew,

multi-step reduction is (unfortunately) something plain CouchDB can't do at
all. So you're right that there's no easy way of doing it. However, from
the top of my head, you could emit a key [userId, '1_dept'] w/ value
department for each userRecord and a key [userid, '2_act'] for each
activity. When reducing w/ maximum detail and using _sum / _count or
_stats, you should get no sum for the '1_dept' part and the sum/count for
the '2_act' parts. You could then use a list-function to iterate over the
result (which should be small in comparison unless you're a really big
organization - and anyways, you won't ever need it in memory all at once)
and create a hash-table with entries per department. Each time you come
across a new '1_dept' value, you have a new hash key, as long as you get
'2_act' values, keep summing up (as the input to the list is the - ordered
- view result, you're good). Finally, just emit the stringified hashtable
as result of the list function.

Let us know how you did solve it in the end

Best
   Sebastian



On Tue, Apr 19, 2016 at 3:06 PM, Matthew Buckett <
matthew.buckett@it.ox.ac.uk> wrote:

> I'm looking at trying to report on some data held in CouchDB, I have 2
> types of object stored, users and activities. An example user id:
>
> {
>   "_id": "37453929...",
>   userId: "1234",
>   type: "user",
>   name: "John Smith",
>   department: "english"
> }
>
> There will be multiple users and there is a small set of possible values
> for the department. An example activity is:
>
> {
>   "_id": "736489...",
>   userId: "1234",
>   type: "activity",
>   title: "Doing some work.",
>   hours: 2
> }
> each user will have multiple activities.
>
> I would like a view showing the total number of hours per department, but I
> can't see how to easily do it. I was initially thinking I could perform a
> map-reduce to build map the documents by userId, and then use a reduce to
> build a super document that contains the department and the hours together.
> So I ended up with something like:
>
> {
>   user: {.... department: "english" },
>   activites: [
>     {.. hours: 2 }, ...
>   ]
> }
>
> However this resulting document wouldn't be keyed on the department and I
> can't perform a further map-reduce to switch it to be keyed off the
> department.
>
> Thanks in advance.
>

Re: Reporting on joined data

Posted by Sebastian Rothbucher <se...@googlemail.com>.
Hi Matthew,

that's off course another option as well. Thinking about it, it might be
quite worthwhile when people switch departments anyways (they won't switch
age or at least birthdate so often). I'm not sure about performance as you
forward through an already-compacted dataset, but I don't know your exact
case and for sure it's not quite as fast as a stored, pre-computed result
that you have in mind.

Anyways, good luck with it
   Sebastian

P.S.: I've no idea whether multistep reduction is on someone's roadmap,
sorry

On Tue, Apr 19, 2016 at 3:48 PM, Matthew Buckett <
matthew.buckett@it.ox.ac.uk> wrote:

> Thanks Sebastian, is multi-step reduction something that might happen?
>
> I think we may end up having the user's department in all the activities so
> that we can just emit this as a key and then do standard map/reduces.
> However as a general solution this means that we end up embedding anything
> that we want to be able to view our reports by into the activities (eg,
> sex, age, etc).
>
> Using a list function is something I had started to look at and as you say
> for smallish datasets should work, but performance will drop off as our
> database grows.
>
> On Tue, 19 Apr 2016 at 14:24 Sebastian Rothbucher <
> sebastianrothbucher@googlemail.com> wrote:
>
> > Hi Matthew,
> >
> > multi-step reduction is (unfortunately) something plain CouchDB can't do
> at
> > all. So you're right that there's no easy way of doing it. However, from
> > the top of my head, you could emit a key [userId, '1_dept'] w/ value
> > department for each userRecord and a key [userid, '2_act'] for each
> > activity. When reducing w/ maximum detail and using _sum / _count or
> > _stats, you should get no sum for the '1_dept' part and the sum/count for
> > the '2_act' parts. You could then use a list-function to iterate over the
> > result (which should be small in comparison unless you're a really big
> > organization - and anyways, you won't ever need it in memory all at once)
> > and create a hash-table with entries per department. Each time you come
> > across a new '1_dept' value, you have a new hash key, as long as you get
> > '2_act' values, keep summing up (as the input to the list is the -
> ordered
> > - view result, you're good). Finally, just emit the stringified hashtable
> > as result of the list function.
> >
> > Let us know how you did solve it in the end
> >
> > Best
> >    Sebastian
> >
> >
> >
> > On Tue, Apr 19, 2016 at 3:06 PM, Matthew Buckett <
> > matthew.buckett@it.ox.ac.uk> wrote:
> >
> > > I'm looking at trying to report on some data held in CouchDB, I have 2
> > > types of object stored, users and activities. An example user id:
> > >
> > > {
> > >   "_id": "37453929...",
> > >   userId: "1234",
> > >   type: "user",
> > >   name: "John Smith",
> > >   department: "english"
> > > }
> > >
> > > There will be multiple users and there is a small set of possible
> values
> > > for the department. An example activity is:
> > >
> > > {
> > >   "_id": "736489...",
> > >   userId: "1234",
> > >   type: "activity",
> > >   title: "Doing some work.",
> > >   hours: 2
> > > }
> > > each user will have multiple activities.
> > >
> > > I would like a view showing the total number of hours per department,
> > but I
> > > can't see how to easily do it. I was initially thinking I could
> perform a
> > > map-reduce to build map the documents by userId, and then use a reduce
> to
> > > build a super document that contains the department and the hours
> > together.
> > > So I ended up with something like:
> > >
> > > {
> > >   user: {.... department: "english" },
> > >   activites: [
> > >     {.. hours: 2 }, ...
> > >   ]
> > > }
> > >
> > > However this resulting document wouldn't be keyed on the department
> and I
> > > can't perform a further map-reduce to switch it to be keyed off the
> > > department.
> > >
> > > Thanks in advance.
> > >
> >
>

Re: Reporting on joined data

Posted by Matthew Buckett <ma...@it.ox.ac.uk>.
Thanks Sebastian, is multi-step reduction something that might happen?

I think we may end up having the user's department in all the activities so
that we can just emit this as a key and then do standard map/reduces.
However as a general solution this means that we end up embedding anything
that we want to be able to view our reports by into the activities (eg,
sex, age, etc).

Using a list function is something I had started to look at and as you say
for smallish datasets should work, but performance will drop off as our
database grows.

On Tue, 19 Apr 2016 at 14:24 Sebastian Rothbucher <
sebastianrothbucher@googlemail.com> wrote:

> Hi Matthew,
>
> multi-step reduction is (unfortunately) something plain CouchDB can't do at
> all. So you're right that there's no easy way of doing it. However, from
> the top of my head, you could emit a key [userId, '1_dept'] w/ value
> department for each userRecord and a key [userid, '2_act'] for each
> activity. When reducing w/ maximum detail and using _sum / _count or
> _stats, you should get no sum for the '1_dept' part and the sum/count for
> the '2_act' parts. You could then use a list-function to iterate over the
> result (which should be small in comparison unless you're a really big
> organization - and anyways, you won't ever need it in memory all at once)
> and create a hash-table with entries per department. Each time you come
> across a new '1_dept' value, you have a new hash key, as long as you get
> '2_act' values, keep summing up (as the input to the list is the - ordered
> - view result, you're good). Finally, just emit the stringified hashtable
> as result of the list function.
>
> Let us know how you did solve it in the end
>
> Best
>    Sebastian
>
>
>
> On Tue, Apr 19, 2016 at 3:06 PM, Matthew Buckett <
> matthew.buckett@it.ox.ac.uk> wrote:
>
> > I'm looking at trying to report on some data held in CouchDB, I have 2
> > types of object stored, users and activities. An example user id:
> >
> > {
> >   "_id": "37453929...",
> >   userId: "1234",
> >   type: "user",
> >   name: "John Smith",
> >   department: "english"
> > }
> >
> > There will be multiple users and there is a small set of possible values
> > for the department. An example activity is:
> >
> > {
> >   "_id": "736489...",
> >   userId: "1234",
> >   type: "activity",
> >   title: "Doing some work.",
> >   hours: 2
> > }
> > each user will have multiple activities.
> >
> > I would like a view showing the total number of hours per department,
> but I
> > can't see how to easily do it. I was initially thinking I could perform a
> > map-reduce to build map the documents by userId, and then use a reduce to
> > build a super document that contains the department and the hours
> together.
> > So I ended up with something like:
> >
> > {
> >   user: {.... department: "english" },
> >   activites: [
> >     {.. hours: 2 }, ...
> >   ]
> > }
> >
> > However this resulting document wouldn't be keyed on the department and I
> > can't perform a further map-reduce to switch it to be keyed off the
> > department.
> >
> > Thanks in advance.
> >
>