You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Enrico Triolo <en...@gmail.com> on 2006/09/26 16:10:24 UTC
Searching on fields with uppercase letters
Hi all, I'm trying to implement a search plugin to search on the
'subType' field added by index-more plugin. It's a very simple plugin,
copied almost entirely from query-basic.
The problem is, when I perform a query on that field I get no results
at all. Other fields are handled by the same plugin, and I'm able to
search over them. Moreover, performing queries with luke on the
subType field I get the expected results.
Looking at the source code I found out that when parsing a query
string all fields are transformed lower case: so, the query
'subType:html' becomes 'subtype:html' (see method 'getNextToken' in
org.apache.nutch.analysis.NutchAnalysisTokenManager).
Could it be this the cause of the wrong result set? Is there a reason
why fields are treated this way?
Thanks,
Enrico
Re: Searching on fields with uppercase letters
Posted by Enrico Triolo <en...@gmail.com>.
That sounds ok... So we should modify index-more (and maybe others?)
plugin to add 'primarytype' and 'subtype' fields instead of
'primaryType' and 'subType', I think.
Cheers,
Enrico
On 9/26/06, Andrzej Bialecki <ab...@getopt.org> wrote:
> Enrico Triolo wrote:
> > Hi all, I'm trying to implement a search plugin to search on the
> > 'subType' field added by index-more plugin. It's a very simple plugin,
> > copied almost entirely from query-basic.
> >
> > The problem is, when I perform a query on that field I get no results
> > at all. Other fields are handled by the same plugin, and I'm able to
> > search over them. Moreover, performing queries with luke on the
> > subType field I get the expected results.
> >
> > Looking at the source code I found out that when parsing a query
> > string all fields are transformed lower case: so, the query
> > 'subType:html' becomes 'subtype:html' (see method 'getNextToken' in
> > org.apache.nutch.analysis.NutchAnalysisTokenManager).
> > Could it be this the cause of the wrong result set? Is there a reason
> > why fields are treated this way?
>
> For simplicity and user-friendliness. While in Lucene we can reasonably
> expect that sophisticated users will construct sophisticated queries,
> paying attention to lower/upper-case, we need to lower the barrier for a
> general-purpose search engine frontend.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
Re: Searching on fields with uppercase letters
Posted by Andrzej Bialecki <ab...@getopt.org>.
Enrico Triolo wrote:
> Hi all, I'm trying to implement a search plugin to search on the
> 'subType' field added by index-more plugin. It's a very simple plugin,
> copied almost entirely from query-basic.
>
> The problem is, when I perform a query on that field I get no results
> at all. Other fields are handled by the same plugin, and I'm able to
> search over them. Moreover, performing queries with luke on the
> subType field I get the expected results.
>
> Looking at the source code I found out that when parsing a query
> string all fields are transformed lower case: so, the query
> 'subType:html' becomes 'subtype:html' (see method 'getNextToken' in
> org.apache.nutch.analysis.NutchAnalysisTokenManager).
> Could it be this the cause of the wrong result set? Is there a reason
> why fields are treated this way?
For simplicity and user-friendliness. While in Lucene we can reasonably
expect that sophisticated users will construct sophisticated queries,
paying attention to lower/upper-case, we need to lower the barrier for a
general-purpose search engine frontend.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com