You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Odmark <da...@moongatetech.com> on 2006/03/08 02:39:13 UTC
Boolean OR QueryFilter
Hi all,
We're trying to implement a nutch app (version 0.8) that allows for
Boolean OR e.g. (this OR that) AND (something OR other). I've found some
relevent posts in the mailing list archive, but I think I'm missing
something. For example, here's a snippet from a post from Doug Cutting:
<snip>
that said, one can implement OR as a filter (replacing or altering
BasicQueryFilter) that scans for terms whose text is "OR" in the default
field.
</snip>
The problem I'm finding is that the NutchAnalysis analyzer seems to be
swallowing all boolean terms by the time the QueryFilter is even
executed (perhaps because OR is a stop word?). To wit:
String queryText = "this OR that";
org.apache.nutch.searcher.Query query =
org.apache.nutch.searcher.Query.parse(queryText, conf);
for (int i=0;i<query.getTerms().length;i++) {
System.out.println("Term = " + query.getTerms()[i]);
}
This results in output that looks like this:
Term = this
Term = that
So am I correct in believing that in order to implement boolean OR using
Nutch search and a QueryFilter, one must also (minimally) hack the
NutchAnalysis.jj file to produce a new analyzer? Also, given that a
Nutch Query object doesn't seem to have a method to add a non-required
Term or Phrase, does that need to be modified as well?
Sorry for the long post, and thanks in advance...
-David Odmark
Re: Boolean OR QueryFilter
Posted by Doug Cutting <cu...@apache.org>.
David Odmark wrote:
> So am I correct in believing that in order to implement boolean OR using
> Nutch search and a QueryFilter, one must also (minimally) hack the
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a
> Nutch Query object doesn't seem to have a method to add a non-required
> Term or Phrase, does that need to be modified as well?
It looks like you might need to make sure that "OR" is not a stop word.
Or use syntax like 'this +OR that', since required words are not
stopped. Or use something like "this operator:OR that".
Doug
RE: Boolean OR QueryFilter
Posted by Alexander Hixon <ad...@aquabeta.net>.
Maybe you could post the code on JIRA, if anyone else wishes to use Boolean
operators in their search queries..? We could probably get a developer or
two to put this in the 0.8 release? Since it IS open source. ;)
Just a thought,
Alex
-----Original Message-----
From: Nguyen Ngoc Giang [mailto:giangnn@gmail.com]
Sent: Wednesday, 15 March 2006 3:45 PM
To: nutch-user@lucene.apache.org; david.odmark@moongatetechnologies.com
Subject: Re: Boolean OR QueryFilter
Hi David,
I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).
Regards,
Giang
PS: I also believe that a hack to jj code is necessary.
On 3/8/06, David Odmark <da...@moongatetech.com> wrote:
>
> Hi all,
>
> We're trying to implement a nutch app (version 0.8) that allows for
> Boolean OR e.g. (this OR that) AND (something OR other). I've found some
> relevent posts in the mailing list archive, but I think I'm missing
> something. For example, here's a snippet from a post from Doug Cutting:
>
> <snip>
> that said, one can implement OR as a filter (replacing or altering
> BasicQueryFilter) that scans for terms whose text is "OR" in the default
> field.
> </snip>
>
> The problem I'm finding is that the NutchAnalysis analyzer seems to be
> swallowing all boolean terms by the time the QueryFilter is even
> executed (perhaps because OR is a stop word?). To wit:
>
> String queryText = "this OR that";
> org.apache.nutch.searcher.Query query =
> org.apache.nutch.searcher.Query.parse(queryText, conf);
> for (int i=0;i<query.getTerms().length;i++) {
> System.out.println("Term = " + query.getTerms()[i]);
> }
>
> This results in output that looks like this:
>
> Term = this
> Term = that
>
> So am I correct in believing that in order to implement boolean OR using
> Nutch search and a QueryFilter, one must also (minimally) hack the
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a
> Nutch Query object doesn't seem to have a method to add a non-required
> Term or Phrase, does that need to be modified as well?
>
> Sorry for the long post, and thanks in advance...
>
> -David Odmark
>
>
>
Re: Boolean OR QueryFilter
Posted by Nguyen Ngoc Giang <gi...@gmail.com>.
Hi David,
I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).
Regards,
Giang
PS: I also believe that a hack to jj code is necessary.
On 3/8/06, David Odmark <da...@moongatetech.com> wrote:
>
> Hi all,
>
> We're trying to implement a nutch app (version 0.8) that allows for
> Boolean OR e.g. (this OR that) AND (something OR other). I've found some
> relevent posts in the mailing list archive, but I think I'm missing
> something. For example, here's a snippet from a post from Doug Cutting:
>
> <snip>
> that said, one can implement OR as a filter (replacing or altering
> BasicQueryFilter) that scans for terms whose text is "OR" in the default
> field.
> </snip>
>
> The problem I'm finding is that the NutchAnalysis analyzer seems to be
> swallowing all boolean terms by the time the QueryFilter is even
> executed (perhaps because OR is a stop word?). To wit:
>
> String queryText = "this OR that";
> org.apache.nutch.searcher.Query query =
> org.apache.nutch.searcher.Query.parse(queryText, conf);
> for (int i=0;i<query.getTerms().length;i++) {
> System.out.println("Term = " + query.getTerms()[i]);
> }
>
> This results in output that looks like this:
>
> Term = this
> Term = that
>
> So am I correct in believing that in order to implement boolean OR using
> Nutch search and a QueryFilter, one must also (minimally) hack the
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a
> Nutch Query object doesn't seem to have a method to add a non-required
> Term or Phrase, does that need to be modified as well?
>
> Sorry for the long post, and thanks in advance...
>
> -David Odmark
>
>
>