You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Natalia Connolly <na...@gmail.com> on 2014/03/17 20:02:26 UTC

How to search for terms containing negation

Hi All,

   Is there any way I could construct a query that would not automatically
exclude negation terms (such as "no", "not", etc)?  For example, I need to
find strings like "not happy", "no idea", "never available".   I tried
using a simple analyzer with combinations such as "not AND happy", and
similar patterns, but it does not work.

   Any help would be appreciated!

   Natalia

Re: How to search for terms containing negation

Posted by Jack Krupansky <ja...@basetechnology.com>.
Of course - you need to use the same analyzer for both indexing and query. 
So, just reindex your data with this new analyzer.

-- Jack Krupansky

-----Original Message----- 
From: Natalia Connolly
Sent: Tuesday, March 18, 2014 10:37 AM
To: java-user@lucene.apache.org
Subject: Re: How to search for terms containing negation

I am afraid this did not work, Tri.  Here's what I tried:

List<String> words = new ArrayList();
Boolean ignoreCase = true;
CharArraySet emptyset = new
CharArraySet(Version.LUCENE_47,words,ignoreCase);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47,emptyset);

Here's what happens:

Searching for: no
0 total matching documents
Searching for: not
0 total matching documents

even though I know the documents contain plenty of "no" and "not"s.

Could the problem be more upstream (i.e., words like this aren't even
indexed?)

Thank you,

Natalia




On Mon, Mar 17, 2014 at 3:57 PM, Tri Cao <tm...@me.com> wrote:

> StandardAnalyzer has a constructor that takes a stop word set, so I guess
> you can pass it an empty set:
>
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html#StandardAnalyzer(org.apache.lucene.util.Version,
> org.apache.lucene.analysis.util.CharArraySet)
>
> QueryParser is probably ok. I rarely use this parser but I don't think it
> recognizes "not" in its grammar.
>
> Hope this helps,
> Tri
>
>
> On Mar 17, 2014, at 12:46 PM, Natalia Connolly <
> natalia.v.connolly@gmail.com> wrote:
>
> Hi Tri,
>
> Thank you so much for your message!
>
> Yes, it looks like the negation terms have indeed been filtered out;
> when I query on "no" or "not", I get no results. I am just using
> StandardAnalyzer and the classic QueryParser:
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
> QueryParser parser = new QueryParser(Version.LUCENE_47, field, analyzer);
>
> Which analyzer/parser would you recommend?
>
> Thank you again,
>
> Natalia
>
>
>
>
>
>
>
> On Mon, Mar 17, 2014 at 3:35 PM, Tri Cao <tm...@me.com> wrote:
>
> Natalia,
>
> First make sure that your analyzers (both index and query analyzers) do
>
> not filter out these as stop words. I think the standard StopFilter list
>
> has "no" and "not". You can try to see if you index have these terms by
>
> querying for "no" as a TermQuery. If there is not match for that query,
>
> then you know for sure they have been filtered out.
>
> The next thing is to check is your query parser. What query parser are you
>
> using? Some parser actually understands the "not" term and rewrite to a
>
> negation query.
>
> Hope this helps,
>
> Tri
>
> On Mar 17, 2014, at 12:02 PM, Natalia Connolly <
>
> natalia.v.connolly@gmail.com> wrote:
>
> Hi All,
>
> Is there any way I could construct a query that would not automatically
>
> exclude negation terms (such as "no", "not", etc)? For example, I need to
>
> find strings like "not happy", "no idea", "never available". I tried
>
> using a simple analyzer with combinations such as "not AND happy", and
>
> similar patterns, but it does not work.
>
> Any help would be appreciated!
>
> Natalia
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to search for terms containing negation

Posted by Natalia Connolly <na...@gmail.com>.
I am afraid this did not work, Tri.  Here's what I tried:

List<String> words = new ArrayList();
Boolean ignoreCase = true;
CharArraySet emptyset = new
CharArraySet(Version.LUCENE_47,words,ignoreCase);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47,emptyset);

Here's what happens:

Searching for: no
0 total matching documents
Searching for: not
0 total matching documents

even though I know the documents contain plenty of "no" and "not"s.

Could the problem be more upstream (i.e., words like this aren't even
indexed?)

Thank you,

Natalia




On Mon, Mar 17, 2014 at 3:57 PM, Tri Cao <tm...@me.com> wrote:

> StandardAnalyzer has a constructor that takes a stop word set, so I guess
> you can pass it an empty set:
>
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html#StandardAnalyzer(org.apache.lucene.util.Version,
> org.apache.lucene.analysis.util.CharArraySet)
>
> QueryParser is probably ok. I rarely use this parser but I don't think it
> recognizes "not" in its grammar.
>
> Hope this helps,
> Tri
>
>
> On Mar 17, 2014, at 12:46 PM, Natalia Connolly <
> natalia.v.connolly@gmail.com> wrote:
>
> Hi Tri,
>
> Thank you so much for your message!
>
> Yes, it looks like the negation terms have indeed been filtered out;
> when I query on "no" or "not", I get no results. I am just using
> StandardAnalyzer and the classic QueryParser:
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
> QueryParser parser = new QueryParser(Version.LUCENE_47, field, analyzer);
>
> Which analyzer/parser would you recommend?
>
> Thank you again,
>
> Natalia
>
>
>
>
>
>
>
> On Mon, Mar 17, 2014 at 3:35 PM, Tri Cao <tm...@me.com> wrote:
>
> Natalia,
>
> First make sure that your analyzers (both index and query analyzers) do
>
> not filter out these as stop words. I think the standard StopFilter list
>
> has "no" and "not". You can try to see if you index have these terms by
>
> querying for "no" as a TermQuery. If there is not match for that query,
>
> then you know for sure they have been filtered out.
>
> The next thing is to check is your query parser. What query parser are you
>
> using? Some parser actually understands the "not" term and rewrite to a
>
> negation query.
>
> Hope this helps,
>
> Tri
>
> On Mar 17, 2014, at 12:02 PM, Natalia Connolly <
>
> natalia.v.connolly@gmail.com> wrote:
>
> Hi All,
>
> Is there any way I could construct a query that would not automatically
>
> exclude negation terms (such as "no", "not", etc)? For example, I need to
>
> find strings like "not happy", "no idea", "never available". I tried
>
> using a simple analyzer with combinations such as "not AND happy", and
>
> similar patterns, but it does not work.
>
> Any help would be appreciated!
>
> Natalia
>
>

Re: How to search for terms containing negation

Posted by Natalia Connolly <na...@gmail.com>.
Hi Tri,

   Thank you so much for your message!

    Yes, it looks like the negation terms have indeed been filtered out;
when I query on "no" or "not", I get no results.  I am just using
StandardAnalyzer and the classic QueryParser:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
QueryParser parser = new QueryParser(Version.LUCENE_47, field, analyzer);

    Which analyzer/parser would you recommend?

    Thank you again,

    Natalia







On Mon, Mar 17, 2014 at 3:35 PM, Tri Cao <tm...@me.com> wrote:

> Natalia,
>
> First make sure that your analyzers (both index and query analyzers) do
> not filter out these as stop words. I think the standard StopFilter list
> has "no" and "not". You can try to see if you index have these terms by
> querying for "no" as a TermQuery. If there is not match for that query,
> then you know for sure they have been filtered out.
>
> The next thing is to check is your query parser. What query parser are you
> using? Some parser actually understands the "not" term and rewrite to a
> negation query.
>
> Hope this helps,
> Tri
>
>
> On Mar 17, 2014, at 12:02 PM, Natalia Connolly <
> natalia.v.connolly@gmail.com> wrote:
>
> Hi All,
>
> Is there any way I could construct a query that would not automatically
> exclude negation terms (such as "no", "not", etc)? For example, I need to
> find strings like "not happy", "no idea", "never available". I tried
> using a simple analyzer with combinations such as "not AND happy", and
> similar patterns, but it does not work.
>
> Any help would be appreciated!
>
> Natalia
>
>