You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Odmark <da...@moongatetech.com> on 2006/03/08 02:39:13 UTC

Boolean OR QueryFilter

Hi all,

We're trying to implement a nutch app (version 0.8) that allows for 
Boolean OR e.g. (this OR that) AND (something OR other). I've found some 
relevent posts in the mailing list archive, but I think I'm missing 
something. For example, here's a snippet from a post from Doug Cutting:

<snip>
that said, one can implement OR as a filter (replacing or altering 
BasicQueryFilter) that scans for terms whose text is "OR" in the default 
field.
</snip>

The problem I'm finding is that the NutchAnalysis analyzer seems to be 
swallowing all boolean terms by the time the QueryFilter is even 
executed (perhaps because OR is a stop word?). To wit:

String queryText = "this OR that";
org.apache.nutch.searcher.Query query = 
org.apache.nutch.searcher.Query.parse(queryText, conf);
for (int i=0;i<query.getTerms().length;i++) {
            System.out.println("Term = " + query.getTerms()[i]);
}

This results in output that looks like this:

Term = this
Term = that

So am I correct in believing that in order to implement boolean OR using 
Nutch search and a QueryFilter, one must also (minimally) hack the 
NutchAnalysis.jj file to produce a new analyzer? Also, given that a 
Nutch Query object doesn't seem to have a method to add a non-required 
Term or Phrase, does that need to be modified as well?

Sorry for the long post, and thanks in advance...

-David Odmark



Re: Boolean OR QueryFilter

Posted by Doug Cutting <cu...@apache.org>.
David Odmark wrote:
> So am I correct in believing that in order to implement boolean OR using 
> Nutch search and a QueryFilter, one must also (minimally) hack the 
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a 
> Nutch Query object doesn't seem to have a method to add a non-required 
> Term or Phrase, does that need to be modified as well?

It looks like you might need to make sure that "OR" is not a stop word. 
  Or use syntax like 'this +OR that', since required words are not 
stopped.  Or use something like "this operator:OR that".

Doug

RE: Boolean OR QueryFilter

Posted by Alexander Hixon <ad...@aquabeta.net>.
Maybe you could post the code on JIRA, if anyone else wishes to use Boolean
operators in their search queries..? We could probably get a developer or
two to put this in the 0.8 release? Since it IS open source. ;)

Just a thought,
Alex

-----Original Message-----
From: Nguyen Ngoc Giang [mailto:giangnn@gmail.com] 
Sent: Wednesday, 15 March 2006 3:45 PM
To: nutch-user@lucene.apache.org; david.odmark@moongatetechnologies.com
Subject: Re: Boolean OR QueryFilter

  Hi David,

  I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).

  Regards,
   Giang

PS: I also believe that a hack to jj code is necessary.

On 3/8/06, David Odmark <da...@moongatetech.com> wrote:
>
> Hi all,
>
> We're trying to implement a nutch app (version 0.8) that allows for
> Boolean OR e.g. (this OR that) AND (something OR other). I've found some
> relevent posts in the mailing list archive, but I think I'm missing
> something. For example, here's a snippet from a post from Doug Cutting:
>
> <snip>
> that said, one can implement OR as a filter (replacing or altering
> BasicQueryFilter) that scans for terms whose text is "OR" in the default
> field.
> </snip>
>
> The problem I'm finding is that the NutchAnalysis analyzer seems to be
> swallowing all boolean terms by the time the QueryFilter is even
> executed (perhaps because OR is a stop word?). To wit:
>
> String queryText = "this OR that";
> org.apache.nutch.searcher.Query query =
> org.apache.nutch.searcher.Query.parse(queryText, conf);
> for (int i=0;i<query.getTerms().length;i++) {
>             System.out.println("Term = " + query.getTerms()[i]);
> }
>
> This results in output that looks like this:
>
> Term = this
> Term = that
>
> So am I correct in believing that in order to implement boolean OR using
> Nutch search and a QueryFilter, one must also (minimally) hack the
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a
> Nutch Query object doesn't seem to have a method to add a non-required
> Term or Phrase, does that need to be modified as well?
>
> Sorry for the long post, and thanks in advance...
>
> -David Odmark
>
>
>


Re: Boolean OR QueryFilter

Posted by Nguyen Ngoc Giang <gi...@gmail.com>.
  Hi David,

  I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).

  Regards,
   Giang

PS: I also believe that a hack to jj code is necessary.

On 3/8/06, David Odmark <da...@moongatetech.com> wrote:
>
> Hi all,
>
> We're trying to implement a nutch app (version 0.8) that allows for
> Boolean OR e.g. (this OR that) AND (something OR other). I've found some
> relevent posts in the mailing list archive, but I think I'm missing
> something. For example, here's a snippet from a post from Doug Cutting:
>
> <snip>
> that said, one can implement OR as a filter (replacing or altering
> BasicQueryFilter) that scans for terms whose text is "OR" in the default
> field.
> </snip>
>
> The problem I'm finding is that the NutchAnalysis analyzer seems to be
> swallowing all boolean terms by the time the QueryFilter is even
> executed (perhaps because OR is a stop word?). To wit:
>
> String queryText = "this OR that";
> org.apache.nutch.searcher.Query query =
> org.apache.nutch.searcher.Query.parse(queryText, conf);
> for (int i=0;i<query.getTerms().length;i++) {
>             System.out.println("Term = " + query.getTerms()[i]);
> }
>
> This results in output that looks like this:
>
> Term = this
> Term = that
>
> So am I correct in believing that in order to implement boolean OR using
> Nutch search and a QueryFilter, one must also (minimally) hack the
> NutchAnalysis.jj file to produce a new analyzer? Also, given that a
> Nutch Query object doesn't seem to have a method to add a non-required
> Term or Phrase, does that need to be modified as well?
>
> Sorry for the long post, and thanks in advance...
>
> -David Odmark
>
>
>