You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chris sleeman <ch...@gmail.com> on 2007/06/02 15:52:57 UTC

Nutch and faceted search

Hi all,

I wanted to do some sort of faceted search with Nutch, but am not able to
figure out a clean and elegant solution for this. Could anyone give me any
sort of pointers on how to achieve this?

I understand that we can use Solr for this, but I just wanted to know how to
achieve the same using only Nutch.
Would really appreciate if someone could help me on this.

Regards,
Chris

Re: Nutch and faceted search

Posted by Andrzej Bialecki <ab...@getopt.org>.
chris sleeman wrote:
> Pike,
> Thanks for your quick response. However I was looking for something sightly
> different.
> I understand the concept of query filtering, but what I really need is some
> sort of "category counting" to refine searches.
> 
> For e.g. my documents can have a fieldname - location, which could be any
> city in a country. I want to display the documents (and count) that match
> the search query for each city, so that the user can then search within the
> search results. The name of cities is not known in advance.
> 
> An example of something similar is  -
> http://reviews.cnet.com/4566-6501_7-0.html
> 
> I just wanted to know whether anyone has tried doing this using Nutch , and
> if so then I would be glad if he could give me some pointers for the same.

For the general principle of how to implement it using Lucene, please 
see the thread on Lucene java-user list about "Aggregating category 
hits", started on May 15 2006. This subject was discussed many times.

 From the point of view of Nutch - you can implement all necessary 
modifications within org.apache.nutch.searcher.IndexSearcher or 
LuceneQueryOptimizer. Then, if you don't want to change the 
DistributedSearch protocol, you could extend the o.a.n.s.Hits class to 
pass aggregated category info from back-ends to the front-end.

For a certain project I implemented two methods of faceted search, one 
based on random sampling of search results, the other based on bitset 
intersections. Both methods work reasonably fast, although they differ 
in accuracy vs. speed balance. Unfortunately the code is not public - 
but the task is certainly doable, and doesn't require major changes.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Nutch and faceted search

Posted by chris sleeman <ch...@gmail.com>.
Pike,
Thanks for your quick response. However I was looking for something sightly
different.
I understand the concept of query filtering, but what I really need is some
sort of "category counting" to refine searches.

For e.g. my documents can have a fieldname - location, which could be any
city in a country. I want to display the documents (and count) that match
the search query for each city, so that the user can then search within the
search results. The name of cities is not known in advance.

An example of something similar is  -
http://reviews.cnet.com/4566-6501_7-0.html

I just wanted to know whether anyone has tried doing this using Nutch , and
if so then I would be glad if he could give me some pointers for the same.

Regards,
Puneet

On 6/2/07, Pike <pi...@kw.nl> wrote:
>
> Hi
>
> > I wanted to do some sort of faceted search with Nutch, but am not able
> to
> > figure out a clean and elegant solution for this. Could anyone give me
> any
> > sort of pointers on how to achieve this?
>
> hope this answers your question:
>
> every field that lucene indexes is a sort of facet. you
> can search within one specific field by passing "fieldname:value"
> as the query. one such field by default is title. searching for
> title:test return results that contain "test" in
> the "title" field, which was derived from the <title> tag.
>
> you could extend the fields that lucene indexes
> by writing plugins. this
>
> http://office.labforculture.org:8180/search/search.jsp?query=dc_subject:aboriginal
> returns all the urls we have that contain "aboriginal" in
> the <meta name="DC:subject" field (and some variations on it).
>
> if you'd define your own metadata, and write your own plugin
> to parse that ..
>
> see http://wiki.apache.org/nutch/WritingPluginExample
>
> $2c,
> *pike
>
>

Re: Nutch and faceted search

Posted by Pike <pi...@kw.nl>.
Hi

> I wanted to do some sort of faceted search with Nutch, but am not able to
> figure out a clean and elegant solution for this. Could anyone give me any
> sort of pointers on how to achieve this?

hope this answers your question:

every field that lucene indexes is a sort of facet. you
can search within one specific field by passing "fieldname:value"
as the query. one such field by default is title. searching for
title:test return results that contain "test" in
the "title" field, which was derived from the <title> tag.

you could extend the fields that lucene indexes
by writing plugins. this
http://office.labforculture.org:8180/search/search.jsp?query=dc_subject:aboriginal
returns all the urls we have that contain "aboriginal" in
the <meta name="DC:subject" field (and some variations on it).

if you'd define your own metadata, and write your own plugin
to parse that ..

see http://wiki.apache.org/nutch/WritingPluginExample

$2c,
*pike