You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Amedeo Paglione <am...@gmail.com> on 2011/02/10 16:29:32 UTC

CouchDB and tags

Hi all,

I have been playing with CouchDB for a while and I had to address the
problem of retrieving documents which match a list of tags.

I have documented my approach here:

https://gist.github.com/820412

It is working, but I am wondering what could be an alternative and more
efficient solution to this problem.

Regards,
--
Amedeo

RE: CouchDB and tags

Posted by Steve Yen <st...@couchbase.com>.
Neat -- additionally I think the tags also need to be sorted, if they're not already sorted in the original doc.

Cheers,
Steve

On 10 February 2011 15:57, Zachary Zolton <za...@gmail.com> wrote:
> Amedeo,
>
> If you can afford the disk space, it's a fair tradeoff. I've used a
> similar strategy in the past, and it worked out well for me.
>
> You may want to consider limiting the maximum size of tag combinations
> to index. For example, I changed my view to emit tag combination
> arrays with no more than 4 elements. This could significantly reduced
> the index size if your documents have many tags.
>
>
> Cheers,
>
> Zach
>
>
> On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
> <am...@gmail.com> wrote:
>> Hi all,
>>
>> I have been playing with CouchDB for a while and I had to address the
>> problem of retrieving documents which match a list of tags.
>>
>> I have documented my approach here:
>>
>> https://gist.github.com/820412
>>
>> It is working, but I am wondering what could be an alternative and more
>> efficient solution to this problem.
>>
>> Regards,
>> --
>> Amedeo
>>
>

Re: CouchDB and tags

Posted by Amedeo Paglione <am...@gmail.com>.
Cool. It works perfectly now, thanks!

--
Amedeo




On Fri, Feb 11, 2011 at 00:13, Robert Newson <ro...@gmail.com>wrote:

> Aha, yes, subtle.
>
> "A" is a "stop word". A word deemed so common that it's stripped while
> indexing.
>
> add "analyzer":"keyword" at the same level as "index".
>
> B.
>
> On 10 February 2011 22:47, Amedeo Paglione <am...@gmail.com>
> wrote:
> > Thanks for the suggestion. The solution was based not considering other
> > indexing server other than the standard view, but I got curious to see
> the
> > integration with lucene, considering also that a full-text index can be
> > really helpful.
> >
> > Configuring couchdb-lucene was quite simple, but I got unexpected results
> on
> > some of the queries.
> >
> > I have documented it in this gist:
> >
> > https://gist.github.com/fe0fcf29cb38e7df23d1
> >
> > Is the function described the best way to index an array of tags?
> >
> > Thanks,
> > --
> > Amedeo
> >
> >
> >
> >
> > On Thu, Feb 10, 2011 at 17:11, Robert Newson <robert.newson@gmail.com
> >wrote:
> >
> >> couchdb-lucene is an alternative to the combinatorial explosion
> approach;
> >>
> >> https://github.com/rnewson/couchdb-lucene
> >>
> >> B.
> >>
> >> On 10 February 2011 15:57, Zachary Zolton <za...@gmail.com>
> >> wrote:
> >> > Amedeo,
> >> >
> >> > If you can afford the disk space, it's a fair tradeoff. I've used a
> >> > similar strategy in the past, and it worked out well for me.
> >> >
> >> > You may want to consider limiting the maximum size of tag combinations
> >> > to index. For example, I changed my view to emit tag combination
> >> > arrays with no more than 4 elements. This could significantly reduced
> >> > the index size if your documents have many tags.
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Zach
> >> >
> >> >
> >> > On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
> >> > <am...@gmail.com> wrote:
> >> >> Hi all,
> >> >>
> >> >> I have been playing with CouchDB for a while and I had to address the
> >> >> problem of retrieving documents which match a list of tags.
> >> >>
> >> >> I have documented my approach here:
> >> >>
> >> >> https://gist.github.com/820412
> >> >>
> >> >> It is working, but I am wondering what could be an alternative and
> more
> >> >> efficient solution to this problem.
> >> >>
> >> >> Regards,
> >> >> --
> >> >> Amedeo
> >> >>
> >> >
> >>
> >
>

Re: CouchDB and tags

Posted by Robert Newson <ro...@gmail.com>.
Aha, yes, subtle.

"A" is a "stop word". A word deemed so common that it's stripped while indexing.

add "analyzer":"keyword" at the same level as "index".

B.

On 10 February 2011 22:47, Amedeo Paglione <am...@gmail.com> wrote:
> Thanks for the suggestion. The solution was based not considering other
> indexing server other than the standard view, but I got curious to see the
> integration with lucene, considering also that a full-text index can be
> really helpful.
>
> Configuring couchdb-lucene was quite simple, but I got unexpected results on
> some of the queries.
>
> I have documented it in this gist:
>
> https://gist.github.com/fe0fcf29cb38e7df23d1
>
> Is the function described the best way to index an array of tags?
>
> Thanks,
> --
> Amedeo
>
>
>
>
> On Thu, Feb 10, 2011 at 17:11, Robert Newson <ro...@gmail.com>wrote:
>
>> couchdb-lucene is an alternative to the combinatorial explosion approach;
>>
>> https://github.com/rnewson/couchdb-lucene
>>
>> B.
>>
>> On 10 February 2011 15:57, Zachary Zolton <za...@gmail.com>
>> wrote:
>> > Amedeo,
>> >
>> > If you can afford the disk space, it's a fair tradeoff. I've used a
>> > similar strategy in the past, and it worked out well for me.
>> >
>> > You may want to consider limiting the maximum size of tag combinations
>> > to index. For example, I changed my view to emit tag combination
>> > arrays with no more than 4 elements. This could significantly reduced
>> > the index size if your documents have many tags.
>> >
>> >
>> > Cheers,
>> >
>> > Zach
>> >
>> >
>> > On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
>> > <am...@gmail.com> wrote:
>> >> Hi all,
>> >>
>> >> I have been playing with CouchDB for a while and I had to address the
>> >> problem of retrieving documents which match a list of tags.
>> >>
>> >> I have documented my approach here:
>> >>
>> >> https://gist.github.com/820412
>> >>
>> >> It is working, but I am wondering what could be an alternative and more
>> >> efficient solution to this problem.
>> >>
>> >> Regards,
>> >> --
>> >> Amedeo
>> >>
>> >
>>
>

Re: CouchDB and tags

Posted by Amedeo Paglione <am...@gmail.com>.
Thanks for the suggestion. The solution was based not considering other
indexing server other than the standard view, but I got curious to see the
integration with lucene, considering also that a full-text index can be
really helpful.

Configuring couchdb-lucene was quite simple, but I got unexpected results on
some of the queries.

I have documented it in this gist:

https://gist.github.com/fe0fcf29cb38e7df23d1

Is the function described the best way to index an array of tags?

Thanks,
--
Amedeo




On Thu, Feb 10, 2011 at 17:11, Robert Newson <ro...@gmail.com>wrote:

> couchdb-lucene is an alternative to the combinatorial explosion approach;
>
> https://github.com/rnewson/couchdb-lucene
>
> B.
>
> On 10 February 2011 15:57, Zachary Zolton <za...@gmail.com>
> wrote:
> > Amedeo,
> >
> > If you can afford the disk space, it's a fair tradeoff. I've used a
> > similar strategy in the past, and it worked out well for me.
> >
> > You may want to consider limiting the maximum size of tag combinations
> > to index. For example, I changed my view to emit tag combination
> > arrays with no more than 4 elements. This could significantly reduced
> > the index size if your documents have many tags.
> >
> >
> > Cheers,
> >
> > Zach
> >
> >
> > On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
> > <am...@gmail.com> wrote:
> >> Hi all,
> >>
> >> I have been playing with CouchDB for a while and I had to address the
> >> problem of retrieving documents which match a list of tags.
> >>
> >> I have documented my approach here:
> >>
> >> https://gist.github.com/820412
> >>
> >> It is working, but I am wondering what could be an alternative and more
> >> efficient solution to this problem.
> >>
> >> Regards,
> >> --
> >> Amedeo
> >>
> >
>

Re: CouchDB and tags

Posted by Robert Newson <ro...@gmail.com>.
couchdb-lucene is an alternative to the combinatorial explosion approach;

https://github.com/rnewson/couchdb-lucene

B.

On 10 February 2011 15:57, Zachary Zolton <za...@gmail.com> wrote:
> Amedeo,
>
> If you can afford the disk space, it's a fair tradeoff. I've used a
> similar strategy in the past, and it worked out well for me.
>
> You may want to consider limiting the maximum size of tag combinations
> to index. For example, I changed my view to emit tag combination
> arrays with no more than 4 elements. This could significantly reduced
> the index size if your documents have many tags.
>
>
> Cheers,
>
> Zach
>
>
> On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
> <am...@gmail.com> wrote:
>> Hi all,
>>
>> I have been playing with CouchDB for a while and I had to address the
>> problem of retrieving documents which match a list of tags.
>>
>> I have documented my approach here:
>>
>> https://gist.github.com/820412
>>
>> It is working, but I am wondering what could be an alternative and more
>> efficient solution to this problem.
>>
>> Regards,
>> --
>> Amedeo
>>
>

Re: CouchDB and tags

Posted by Zachary Zolton <za...@gmail.com>.
Amedeo,

If you can afford the disk space, it's a fair tradeoff. I've used a
similar strategy in the past, and it worked out well for me.

You may want to consider limiting the maximum size of tag combinations
to index. For example, I changed my view to emit tag combination
arrays with no more than 4 elements. This could significantly reduced
the index size if your documents have many tags.


Cheers,

Zach


On Thu, Feb 10, 2011 at 9:29 AM, Amedeo Paglione
<am...@gmail.com> wrote:
> Hi all,
>
> I have been playing with CouchDB for a while and I had to address the
> problem of retrieving documents which match a list of tags.
>
> I have documented my approach here:
>
> https://gist.github.com/820412
>
> It is working, but I am wondering what could be an alternative and more
> efficient solution to this problem.
>
> Regards,
> --
> Amedeo
>