You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otmar Caduff <oc...@gmail.com> on 2016/12/20 13:55:42 UTC

ComplexPhraseQueryParser with wildcards

Hi,

I have an index with a single document with a field "field" and textual
content "johnny peters" and I am using
org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
parse the query:
   field: (john* peter)
When searching with this query, I am getting the document as expected.
However with this query:
   field: ("john*" "peter")
I am getting the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Unknown
query type "org.apache.lucene.search.PrefixQuery" found in phrase query
string "john*"
at
org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:278)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:836)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:886)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:460)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:489)
at ComplexQueryTest.main(ComplexQueryTest.java:36)

Note: the exception is not thrown during the parse() method call, but
during the search() method call.

I don't see why the ComplexQueryParser can't handle this. Am I misusing it?
Or should I file a bug on Jira?

I'm on Lucene 5.5.1, but the situation looks the same on 6.3.0. Any help is
appreciated!

Otmar

The code to reproduce my issue:


import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.RAMDirectory;

public class ComplexQueryTest {

public static void main(String[] args) throws Throwable {
RAMDirectory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(new
StandardAnalyzer()));

Document doc1 = new Document();
doc1.add(new TextField("field", "johnny peters", Store.NO));
writer.addDocument(doc1);

writer.commit();
writer.close();

IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", new
StandardAnalyzer());
TopDocs topDocs;

Query queryOk = parser.parse("field: (john* peters)");
topDocs = searcher.search(queryOk, 2);
System.out.println("found " + topDocs.totalHits + " docs");

Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
topDocs = searcher.search(queryFail, 2); // -----> throws the above
mentioned exception
System.out.println("found " + topDocs.totalHits + " docs");

}

}

Re: ComplexPhraseQueryParser with wildcards

Posted by Mikhail Khludnev <mk...@apache.org>.
https://issues.apache.org/jira/browse/LUCENE-7614 is raised.

On Tue, Dec 20, 2016 at 4:55 PM, Otmar Caduff <oc...@gmail.com> wrote:

> Hi,
>
> I have an index with a single document with a field "field" and textual
> content "johnny peters" and I am using
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> parse the query:
>    field: (john* peter)
> When searching with this query, I am getting the document as expected.
> However with this query:
>    field: ("john*" "peter")
> I am getting the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> string "john*"
> at
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$
> ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:278)
> at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:836)
> at
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(
> IndexSearcher.java:886)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
> at
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:460)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:489)
> at ComplexQueryTest.main(ComplexQueryTest.java:36)
>
> Note: the exception is not thrown during the parse() method call, but
> during the search() method call.
>
> I don't see why the ComplexQueryParser can't handle this. Am I misusing it?
> Or should I file a bug on Jira?
>
> I'm on Lucene 5.5.1, but the situation looks the same on 6.3.0. Any help is
> appreciated!
>
> Otmar
>
> The code to reproduce my issue:
>
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field.Store;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.queryparser.complexPhrase.
> ComplexPhraseQueryParser;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.store.RAMDirectory;
>
> public class ComplexQueryTest {
>
> public static void main(String[] args) throws Throwable {
> RAMDirectory directory = new RAMDirectory();
> IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(new
> StandardAnalyzer()));
>
> Document doc1 = new Document();
> doc1.add(new TextField("field", "johnny peters", Store.NO));
> writer.addDocument(doc1);
>
> writer.commit();
> writer.close();
>
> IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(reader);
> ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field",
> new
> StandardAnalyzer());
> TopDocs topDocs;
>
> Query queryOk = parser.parse("field: (john* peters)");
> topDocs = searcher.search(queryOk, 2);
> System.out.println("found " + topDocs.totalHits + " docs");
>
> Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
> topDocs = searcher.search(queryFail, 2); // -----> throws the above
> mentioned exception
> System.out.println("found " + topDocs.totalHits + " docs");
>
> }
>
> }
>



-- 
Sincerely yours
Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

Posted by Mikhail Khludnev <mk...@apache.org>.
It probably deserves a jira, although it's minor.

On Tue, Dec 20, 2016 at 6:18 PM, Otmar Caduff <oc...@gmail.com> wrote:

> Thanks for your response, Ahmet!
>
> I agree, a meaningful phrase query should have at least two terms. However,
> why should the query "john" (without wildcard) then work?
>
> I'm trying to figure out if I can use ComplexPhraseQueryParser as a default
> in my application or if I have to handle some cases differently.
>
> Otmar
>
> On Tue, Dec 20, 2016 at 3:17 PM, Ahmet Arslan <io...@yahoo.com.invalid>
> wrote:
>
> > Hi Otmar,
> >
> > A single term inside quotes is meaningless. A phrase query should have at
> > least two terms in it, shouldn't it?
> >
> > What is your intention with a such "john*" query?
> >
> > Ahmet
> >
> >
> > On Tuesday, December 20, 2016 4:56 PM, Otmar Caduff <oc...@gmail.com>
> > wrote:
> >
> >
> >
> > Hi,
> >
> > I have an index with a single document with a field "field" and textual
> > content "johnny peters" and I am using
> > org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> > parse the query:
> >    field: (john* peter)
> > When searching with this query, I am getting the document as expected.
> > However with this query:
> >    field: ("john*" "peter")
> > I am getting the following exception:
> > Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> > query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> > string "john*"
> > at
> > org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$
> > ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:278)
> > at org.apache.lucene.search.IndexSearcher.rewrite(
> IndexSearcher.java:836)
> > at
> > org.apache.lucene.search.IndexSearcher.createNormalizedWeight(
> > IndexSearcher.java:886)
> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
> > at
> > org.apache.lucene.search.IndexSearcher.searchAfter(
> IndexSearcher.java:460)
> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:489)
> > at ComplexQueryTest.main(ComplexQueryTest.java:36)
> >
> > Note: the exception is not thrown during the parse() method call, but
> > during the search() method call.
> >
> > I don't see why the ComplexQueryParser can't handle this. Am I misusing
> it?
> > Or should I file a bug on Jira?
> >
> > I'm on Lucene 5.5.1, but the situation looks the same on 6.3.0. Any help
> is
> > appreciated!
> >
> > Otmar
> >
> > The code to reproduce my issue:
> >
> >
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.Field.Store;
> > import org.apache.lucene.document.TextField;
> > import org.apache.lucene.index.DirectoryReader;
> > import org.apache.lucene.index.IndexReader;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.index.IndexWriterConfig;
> > import org.apache.lucene.queryparser.complexPhrase.
> > ComplexPhraseQueryParser;
> > import org.apache.lucene.search.IndexSearcher;
> > import org.apache.lucene.search.Query;
> > import org.apache.lucene.search.TopDocs;
> > import org.apache.lucene.store.RAMDirectory;
> >
> > public class ComplexQueryTest {
> >
> > public static void main(String[] args) throws Throwable {
> > RAMDirectory directory = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(new
> > StandardAnalyzer()));
> >
> > Document doc1 = new Document();
> > doc1.add(new TextField("field", "johnny peters", Store.NO));
> > writer.addDocument(doc1);
> >
> > writer.commit();
> > writer.close();
> >
> > IndexReader reader = DirectoryReader.open(directory);
> > IndexSearcher searcher = new IndexSearcher(reader);
> > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field",
> > new
> > StandardAnalyzer());
> > TopDocs topDocs;
> >
> > Query queryOk = parser.parse("field: (john* peters)");
> > topDocs = searcher.search(queryOk, 2);
> > System.out.println("found " + topDocs.totalHits + " docs");
> >
> > Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
> > topDocs = searcher.search(queryFail, 2); // -----> throws the above
> > mentioned exception
> > System.out.println("found " + topDocs.totalHits + " docs");
> >
> > }
> >
> > }
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

Posted by Otmar Caduff <oc...@gmail.com>.
Thanks for your response, Ahmet!

I agree, a meaningful phrase query should have at least two terms. However,
why should the query "john" (without wildcard) then work?

I'm trying to figure out if I can use ComplexPhraseQueryParser as a default
in my application or if I have to handle some cases differently.

Otmar

On Tue, Dec 20, 2016 at 3:17 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:

> Hi Otmar,
>
> A single term inside quotes is meaningless. A phrase query should have at
> least two terms in it, shouldn't it?
>
> What is your intention with a such "john*" query?
>
> Ahmet
>
>
> On Tuesday, December 20, 2016 4:56 PM, Otmar Caduff <oc...@gmail.com>
> wrote:
>
>
>
> Hi,
>
> I have an index with a single document with a field "field" and textual
> content "johnny peters" and I am using
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> parse the query:
>    field: (john* peter)
> When searching with this query, I am getting the document as expected.
> However with this query:
>    field: ("john*" "peter")
> I am getting the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> string "john*"
> at
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$
> ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:278)
> at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:836)
> at
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(
> IndexSearcher.java:886)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
> at
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:460)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:489)
> at ComplexQueryTest.main(ComplexQueryTest.java:36)
>
> Note: the exception is not thrown during the parse() method call, but
> during the search() method call.
>
> I don't see why the ComplexQueryParser can't handle this. Am I misusing it?
> Or should I file a bug on Jira?
>
> I'm on Lucene 5.5.1, but the situation looks the same on 6.3.0. Any help is
> appreciated!
>
> Otmar
>
> The code to reproduce my issue:
>
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field.Store;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.queryparser.complexPhrase.
> ComplexPhraseQueryParser;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.store.RAMDirectory;
>
> public class ComplexQueryTest {
>
> public static void main(String[] args) throws Throwable {
> RAMDirectory directory = new RAMDirectory();
> IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(new
> StandardAnalyzer()));
>
> Document doc1 = new Document();
> doc1.add(new TextField("field", "johnny peters", Store.NO));
> writer.addDocument(doc1);
>
> writer.commit();
> writer.close();
>
> IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(reader);
> ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field",
> new
> StandardAnalyzer());
> TopDocs topDocs;
>
> Query queryOk = parser.parse("field: (john* peters)");
> topDocs = searcher.search(queryOk, 2);
> System.out.println("found " + topDocs.totalHits + " docs");
>
> Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
> topDocs = searcher.search(queryFail, 2); // -----> throws the above
> mentioned exception
> System.out.println("found " + topDocs.totalHits + " docs");
>
> }
>
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: ComplexPhraseQueryParser with wildcards

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Otmar,

A single term inside quotes is meaningless. A phrase query should have at least two terms in it, shouldn't it?

What is your intention with a such "john*" query?

Ahmet


On Tuesday, December 20, 2016 4:56 PM, Otmar Caduff <oc...@gmail.com> wrote:



Hi,

I have an index with a single document with a field "field" and textual
content "johnny peters" and I am using
org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
parse the query:
   field: (john* peter)
When searching with this query, I am getting the document as expected.
However with this query:
   field: ("john*" "peter")
I am getting the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Unknown
query type "org.apache.lucene.search.PrefixQuery" found in phrase query
string "john*"
at
org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:278)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:836)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:886)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:460)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:489)
at ComplexQueryTest.main(ComplexQueryTest.java:36)

Note: the exception is not thrown during the parse() method call, but
during the search() method call.

I don't see why the ComplexQueryParser can't handle this. Am I misusing it?
Or should I file a bug on Jira?

I'm on Lucene 5.5.1, but the situation looks the same on 6.3.0. Any help is
appreciated!

Otmar

The code to reproduce my issue:


import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.RAMDirectory;

public class ComplexQueryTest {

public static void main(String[] args) throws Throwable {
RAMDirectory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(new
StandardAnalyzer()));

Document doc1 = new Document();
doc1.add(new TextField("field", "johnny peters", Store.NO));
writer.addDocument(doc1);

writer.commit();
writer.close();

IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", new
StandardAnalyzer());
TopDocs topDocs;

Query queryOk = parser.parse("field: (john* peters)");
topDocs = searcher.search(queryOk, 2);
System.out.println("found " + topDocs.totalHits + " docs");

Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
topDocs = searcher.search(queryFail, 2); // -----> throws the above
mentioned exception
System.out.println("found " + topDocs.totalHits + " docs");

}

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org