You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Evert Wagenaar <ev...@gmail.com> on 2016/10/24 16:40:36 UTC

How to get the terms matching a WildCardQuery in Lucene 6.2?

I already asked this on StackOverflow. Unfortunately without any answer for
over a week now.

Therefore again to the real experts:


I downloaded a list of 350.000 English words in a .txt file and Indexed it
using the latest Lucene (6.2). I want to apply wildcard queries like
aard???? and then retreive a list of matches.

I've done this before in an older version of Lucene. Here it was pretty
simple. I just had to do a Query.rewrite() and this retuned what I needed.
Unfortunately in 6.2 this doesn't work anymore. There is a
Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
In my case there's only one matching Term (aardvark). The Searcher returns
one hit, containing the Document path to the wordlist. The HashMap is
however empty.

When I change the Query to find more then one single match (like aa*) the
HashMap remains empty.

I tried the MatchExtractor too. Unfortunately without result.

The Objective of this is to demonstrate the power of Lucene to easily find
words of a particular length, given one or more characters. I'm pretty sure
I can do this using regular expressions in Java but then it's outside my
objective.

Can anyone tell me why this isn't working? I use the StandardAnalyzer.
Should I use a different Application?

Any help is greatly appreciated.

Thanks.



-- 
Sent from Gmail IPad

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

Thanks Allison. I will try it.

Let you know if it works.

Evert Wagenaar

Op dinsdag 25 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>
het volgende geschreven:

> A WildcardTerm subclasses a MultitermQuery.  If you are using the
> QueryParser, you need to set the rewrite method on the parser.
>
> Try this…and beware of hitting the max BooleanQuery clause limit…and/or
> reset that
>
>
>
> BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds);
>
>
>
> import java.util.HashSet;
> import java.util.Set;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.queryparser.classic.QueryParser;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MultiTermQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Weight;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
>
> public class RewriteTest {
>
>
>
>
>     /** Simple command-line based search demo. */
>     public static void main(String[] args) throws Exception {
>         Analyzer analyzer = new StandardAnalyzer();
>         String field = "contents";
>         Directory directory = new RAMDirectory();
>         IndexWriterConfig config = new IndexWriterConfig(analyzer);
>         IndexWriter indexWriter = new IndexWriter(directory, config);
>         for (int i = 0; i < 100; i++) {
>             Document d = new Document();
>             d.add(new TextField(field, "aard00"+i, Field.Store.YES));
>             indexWriter.addDocument(d);
>         }
>         indexWriter.flush();
>         indexWriter.close();
>
>         String queryString = "aard????";
>
>         IndexReader reader = DirectoryReader.open(directory);
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>
>         QueryParser parser = new QueryParser(field, analyzer);
>         parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_
> BOOLEAN_REWRITE);
>         Query q = parser.parse(queryString);
>         q = q.rewrite(reader);
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>         weight.extractTerms(terms);
>         for (Term t : terms) {
>             System.out.println(t);
>         }
>         reader.close();
>     }
>
> }
>
>
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com <javascript:;>]
> Sent: Tuesday, October 25, 2016 1:42 PM
> To: java-user@lucene.apache.org <javascript:;>
> Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> Hi Allison,
>
> Unfortunately I can't compile the code (see below). Can you tell me what's
> wrong?
> I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
> CONSTANT_SCORE_BOOLEAN_REWRITE
>
> What I don't understand actually is the relation between my Query (which
> is a wildcard Query and not a MultiTermQuery.
>
> Can you explain?
>
> Thanks,
>
> Evert Wagenaar
>
>
> [Inline image 1]
>
> Full code of Searcher:
>
>
> package tk.evertwagenaar.lucene;
>
>
>
> import java.io.BufferedReader;
>
> import java.io.IOException;
>
> import java.io.InputStreamReader;
>
> import java.nio.charset.StandardCharsets;
>
> import java.nio.file.Files;
>
> import java.nio.file.Paths;
>
> import java.util.Date;
>
> import java.util.HashSet;
>
> import java.util.Set;
>
>
>
> import org.apache.lucene.analysis.Analyzer;
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.index.DirectoryReader;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.Term;
>
> import org.apache.lucene.queryparser.classic.QueryParser;
>
> import org.apache.lucene.search.IndexSearcher;
>
> import org.apache.lucene.search.MultiTermQuery;
>
> import org.apache.lucene.search.Query;
>
> import org.apache.lucene.search.ScoreDoc;
>
> import org.apache.lucene.search.TopDocs;
>
> import org.apache.lucene.search.Weight;
>
> import org.apache.lucene.store.FSDirectory;
>
>
>
> /** Simple command-line based search demo. */
>
> public class SearchFiles {
>
>
>
>        private static IndexReader reader;
>
>        private static Query q;
>
>
>
>        private SearchFiles() {
>
>        }
>
>
>
>        /** Simple command-line based search demo. */
>
>        public static void main(String[] args) throws Exception {
>
>               String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles
> [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw]
> [-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/
> for details.";
>
>               if (args.length > 0 && ("-h".equals(args[0]) ||
> "-help".equals(args[0]))) {
>
>                      System.out.println(usage);
>
>                      System.exit(0);
>
>               }
>
>
>
>               String index = "index";
>
>               String field = "contents";
>
>               String queries = null;
>
>               int repeat = 0;
>
>               boolean raw = false;
>
>               String queryString = "aard????";
>
>               int hitsPerPage = 10;
>
>
>
>               reader = DirectoryReader.open(FSDirectory.open(Paths.get(
> index)));
>
>               IndexSearcher searcher = new IndexSearcher(reader);
>
>               Analyzer analyzer = new StandardAnalyzer();
>
>
>
>               BufferedReader in = null;
>
>
>
>               QueryParser parser = new QueryParser(field, analyzer);
>
>               while (true) {
>
>                      if (queries == null && queryString == null) { //
> prompt the user
>
>                             System.out.println("Enter query: ");
>
>                      }
>
>
>
>                      Query q = parser.parse(queryString);
>
>                      System.out.println("Searching for: " +
> q.toString(field));
>
>
>
>                      if (repeat > 0) { // repeat & time as benchmark
>
>                             Date start = new Date();
>
>                             for (int i = 0; i < repeat; i++) {
>
>                                    searcher.search(q, 100);
>
>                             }
>
>                             Date end = new Date();
>
>                             System.out.println("Time: " + (end.getTime() -
> start.getTime()) + "ms");
>
>                             doPagingSearch(in, searcher, q, hitsPerPage,
> raw, queries == null && queryString == null);
>
>
>
>
>
>                      MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE
>
>
>
>                             q = q.rewrite(reader);
>
>                             Set<Term> terms = new HashSet<>();
>
>                             Weight weight = q.createWeight(searcher,
> false);
>
>                             terms = weight.extractTerms(terms);
>
>
>
>                             System.out.println("Match: " + terms);
>
>                             reader.close();
>
>
>
>                      }
>
>               }
>
>        }
>
>
>
>        /**
>
>        * Search the Query against the Index
>
>        */
>
>        public static void doPagingSearch(BufferedReader in, IndexSearcher
> searcher, Query query, int hitsPerPage,
>
>                      boolean raw, boolean interactive) throws IOException {
>
>
>
>               // Collect enough docs to show 5 pages
>
>               TopDocs results = searcher.search(query, 5 * hitsPerPage);
>
>               ScoreDoc[] hits = results.scoreDocs;
>
>
>
>               int numTotalHits = results.totalHits;
>
>               System.out.println(numTotalHits + " total matching
> documents");
>
>
>
>               int start = 0;
>
>               int end = Math.min(numTotalHits, hitsPerPage);
>
>
>
>               hits = searcher.search(query, numTotalHits).scoreDocs;
>
>               end = Math.min(hits.length, start + hitsPerPage);
>
>
>
>               for (int i = start; i < end; i++) {
>
>                      Document doc = searcher.doc(hits[i].doc);
>
>                      String path = doc.get("path");
>
>                      System.out.println((i + 1) + ". " + path);
>
>                      query.rewrite(reader);
>
>               }
>
>        }
>
> }
> Evert  Wagenaar
>
> On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <evert.wagenaar@gmail.com
> <javascript:;><mailto:evert.wagenaar@gmail.com <javascript:;>>> wrote:
> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <tallison@mitre.org
> <javascript:;><mailto:tallison@mitre.org <javascript:;>>> het volgende
> geschreven:
> Make sure to setRewriteMethod on the MultiTermQuery to:
>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>
> Then something like this should work:
>
>         q = q.rewrite(reader);
>
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>
>         weight.extractTerms(terms);
>
>
>
> -----Original Message-----
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com <javascript:;>]
> Sent: Monday, October 24, 2016 12:41 PM
> To: java-user@lucene.apache.org <javascript:;>
> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> I already asked this on StackOverflow. Unfortunately without any answer
> for over a week now.
>
> Therefore again to the real experts:
>
>
> I downloaded a list of 350.000 English words in a .txt file and Indexed it
> using the latest Lucene (6.2). I want to apply wildcard queries like
> aard???? and then retreive a list of matches.
>
> I've done this before in an older version of Lucene. Here it was pretty
> simple. I just had to do a Query.rewrite() and this retuned what I needed.
> Unfortunately in 6.2 this doesn't work anymore. There is a
> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
> In my case there's only one matching Term (aardvark). The Searcher returns
> one hit, containing the Document path to the wordlist. The HashMap is
> however empty.
>
> When I change the Query to find more then one single match (like aa*) the
> HashMap remains empty.
>
> I tried the MatchExtractor too. Unfortunately without result.
>
> The Objective of this is to demonstrate the power of Lucene to easily find
> words of a particular length, given one or more characters. I'm pretty sure
> I can do this using regular expressions in Java but then it's outside my
> objective.
>
> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
> Should I use a different Application?
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>
> --
> Sent from Gmail IPad
>
>
> --
> Sent from Gmail IPad
>
>

-- 
Sent from Gmail IPad

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

Thanks again Timothy. I'll need to refactor your code to make it work with
my wordlist App but that won't be the biggest problem. I will mention your
name when it's online and off course in my blog too.

You made it possible. Thanks.

Op dinsdag 25 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>
het volgende geschreven:

> A WildcardTerm subclasses a MultitermQuery.  If you are using the
> QueryParser, you need to set the rewrite method on the parser.
>
> Try this…and beware of hitting the max BooleanQuery clause limit…and/or
> reset that
>
>
>
> BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds);
>
>
>
> import java.util.HashSet;
> import java.util.Set;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.queryparser.classic.QueryParser;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MultiTermQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Weight;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
>
> public class RewriteTest {
>
>
>
>
>     /** Simple command-line based search demo. */
>     public static void main(String[] args) throws Exception {
>         Analyzer analyzer = new StandardAnalyzer();
>         String field = "contents";
>         Directory directory = new RAMDirectory();
>         IndexWriterConfig config = new IndexWriterConfig(analyzer);
>         IndexWriter indexWriter = new IndexWriter(directory, config);
>         for (int i = 0; i < 100; i++) {
>             Document d = new Document();
>             d.add(new TextField(field, "aard00"+i, Field.Store.YES));
>             indexWriter.addDocument(d);
>         }
>         indexWriter.flush();
>         indexWriter.close();
>
>         String queryString = "aard????";
>
>         IndexReader reader = DirectoryReader.open(directory);
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>
>         QueryParser parser = new QueryParser(field, analyzer);
>         parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_
> BOOLEAN_REWRITE);
>         Query q = parser.parse(queryString);
>         q = q.rewrite(reader);
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>         weight.extractTerms(terms);
>         for (Term t : terms) {
>             System.out.println(t);
>         }
>         reader.close();
>     }
>
> }
>
>
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com <javascript:;>]
> Sent: Tuesday, October 25, 2016 1:42 PM
> To: java-user@lucene.apache.org <javascript:;>
> Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> Hi Allison,
>
> Unfortunately I can't compile the code (see below). Can you tell me what's
> wrong?
> I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
> CONSTANT_SCORE_BOOLEAN_REWRITE
>
> What I don't understand actually is the relation between my Query (which
> is a wildcard Query and not a MultiTermQuery.
>
> Can you explain?
>
> Thanks,
>
> Evert Wagenaar
>
>
> [Inline image 1]
>
> Full code of Searcher:
>
>
> package tk.evertwagenaar.lucene;
>
>
>
> import java.io.BufferedReader;
>
> import java.io.IOException;
>
> import java.io.InputStreamReader;
>
> import java.nio.charset.StandardCharsets;
>
> import java.nio.file.Files;
>
> import java.nio.file.Paths;
>
> import java.util.Date;
>
> import java.util.HashSet;
>
> import java.util.Set;
>
>
>
> import org.apache.lucene.analysis.Analyzer;
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.index.DirectoryReader;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.Term;
>
> import org.apache.lucene.queryparser.classic.QueryParser;
>
> import org.apache.lucene.search.IndexSearcher;
>
> import org.apache.lucene.search.MultiTermQuery;
>
> import org.apache.lucene.search.Query;
>
> import org.apache.lucene.search.ScoreDoc;
>
> import org.apache.lucene.search.TopDocs;
>
> import org.apache.lucene.search.Weight;
>
> import org.apache.lucene.store.FSDirectory;
>
>
>
> /** Simple command-line based search demo. */
>
> public class SearchFiles {
>
>
>
>        private static IndexReader reader;
>
>        private static Query q;
>
>
>
>        private SearchFiles() {
>
>        }
>
>
>
>        /** Simple command-line based search demo. */
>
>        public static void main(String[] args) throws Exception {
>
>               String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles
> [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw]
> [-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/
> for details.";
>
>               if (args.length > 0 && ("-h".equals(args[0]) ||
> "-help".equals(args[0]))) {
>
>                      System.out.println(usage);
>
>                      System.exit(0);
>
>               }
>
>
>
>               String index = "index";
>
>               String field = "contents";
>
>               String queries = null;
>
>               int repeat = 0;
>
>               boolean raw = false;
>
>               String queryString = "aard????";
>
>               int hitsPerPage = 10;
>
>
>
>               reader = DirectoryReader.open(FSDirectory.open(Paths.get(
> index)));
>
>               IndexSearcher searcher = new IndexSearcher(reader);
>
>               Analyzer analyzer = new StandardAnalyzer();
>
>
>
>               BufferedReader in = null;
>
>
>
>               QueryParser parser = new QueryParser(field, analyzer);
>
>               while (true) {
>
>                      if (queries == null && queryString == null) { //
> prompt the user
>
>                             System.out.println("Enter query: ");
>
>                      }
>
>
>
>                      Query q = parser.parse(queryString);
>
>                      System.out.println("Searching for: " +
> q.toString(field));
>
>
>
>                      if (repeat > 0) { // repeat & time as benchmark
>
>                             Date start = new Date();
>
>                             for (int i = 0; i < repeat; i++) {
>
>                                    searcher.search(q, 100);
>
>                             }
>
>                             Date end = new Date();
>
>                             System.out.println("Time: " + (end.getTime() -
> start.getTime()) + "ms");
>
>                             doPagingSearch(in, searcher, q, hitsPerPage,
> raw, queries == null && queryString == null);
>
>
>
>
>
>                      MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE
>
>
>
>                             q = q.rewrite(reader);
>
>                             Set<Term> terms = new HashSet<>();
>
>                             Weight weight = q.createWeight(searcher,
> false);
>
>                             terms = weight.extractTerms(terms);
>
>
>
>                             System.out.println("Match: " + terms);
>
>                             reader.close();
>
>
>
>                      }
>
>               }
>
>        }
>
>
>
>        /**
>
>        * Search the Query against the Index
>
>        */
>
>        public static void doPagingSearch(BufferedReader in, IndexSearcher
> searcher, Query query, int hitsPerPage,
>
>                      boolean raw, boolean interactive) throws IOException {
>
>
>
>               // Collect enough docs to show 5 pages
>
>               TopDocs results = searcher.search(query, 5 * hitsPerPage);
>
>               ScoreDoc[] hits = results.scoreDocs;
>
>
>
>               int numTotalHits = results.totalHits;
>
>               System.out.println(numTotalHits + " total matching
> documents");
>
>
>
>               int start = 0;
>
>               int end = Math.min(numTotalHits, hitsPerPage);
>
>
>
>               hits = searcher.search(query, numTotalHits).scoreDocs;
>
>               end = Math.min(hits.length, start + hitsPerPage);
>
>
>
>               for (int i = start; i < end; i++) {
>
>                      Document doc = searcher.doc(hits[i].doc);
>
>                      String path = doc.get("path");
>
>                      System.out.println((i + 1) + ". " + path);
>
>                      query.rewrite(reader);
>
>               }
>
>        }
>
> }
> Evert  Wagenaar
>
> On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <evert.wagenaar@gmail.com
> <javascript:;><mailto:evert.wagenaar@gmail.com <javascript:;>>> wrote:
> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <tallison@mitre.org
> <javascript:;><mailto:tallison@mitre.org <javascript:;>>> het volgende
> geschreven:
> Make sure to setRewriteMethod on the MultiTermQuery to:
>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>
> Then something like this should work:
>
>         q = q.rewrite(reader);
>
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>
>         weight.extractTerms(terms);
>
>
>
> -----Original Message-----
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com <javascript:;>]
> Sent: Monday, October 24, 2016 12:41 PM
> To: java-user@lucene.apache.org <javascript:;>
> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> I already asked this on StackOverflow. Unfortunately without any answer
> for over a week now.
>
> Therefore again to the real experts:
>
>
> I downloaded a list of 350.000 English words in a .txt file and Indexed it
> using the latest Lucene (6.2). I want to apply wildcard queries like
> aard???? and then retreive a list of matches.
>
> I've done this before in an older version of Lucene. Here it was pretty
> simple. I just had to do a Query.rewrite() and this retuned what I needed.
> Unfortunately in 6.2 this doesn't work anymore. There is a
> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
> In my case there's only one matching Term (aardvark). The Searcher returns
> one hit, containing the Document path to the wordlist. The HashMap is
> however empty.
>
> When I change the Query to find more then one single match (like aa*) the
> HashMap remains empty.
>
> I tried the MatchExtractor too. Unfortunately without result.
>
> The Objective of this is to demonstrate the power of Lucene to easily find
> words of a particular length, given one or more characters. I'm pretty sure
> I can do this using regular expressions in Java but then it's outside my
> objective.
>
> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
> Should I use a different Application?
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>
> --
> Sent from Gmail IPad
>
>
> --
> Sent from Gmail IPad
>
>

-- 
Sent from Gmail IPad

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

It works! Thanks a lot Timothy!

Evert  Wagenaar

On Tue, Oct 25, 2016 at 9:30 PM, Allison, Timothy B. <ta...@mitre.org>
wrote:

> A WildcardTerm subclasses a MultitermQuery.  If you are using the
> QueryParser, you need to set the rewrite method on the parser.
>
> Try this…and beware of hitting the max BooleanQuery clause limit…and/or
> reset that
>
>
>
> BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds);
>
>
>
> import java.util.HashSet;
> import java.util.Set;
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.queryparser.classic.QueryParser;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MultiTermQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Weight;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
>
> public class RewriteTest {
>
>
>
>
>     /** Simple command-line based search demo. */
>     public static void main(String[] args) throws Exception {
>         Analyzer analyzer = new StandardAnalyzer();
>         String field = "contents";
>         Directory directory = new RAMDirectory();
>         IndexWriterConfig config = new IndexWriterConfig(analyzer);
>         IndexWriter indexWriter = new IndexWriter(directory, config);
>         for (int i = 0; i < 100; i++) {
>             Document d = new Document();
>             d.add(new TextField(field, "aard00"+i, Field.Store.YES));
>             indexWriter.addDocument(d);
>         }
>         indexWriter.flush();
>         indexWriter.close();
>
>         String queryString = "aard????";
>
>         IndexReader reader = DirectoryReader.open(directory);
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>
>         QueryParser parser = new QueryParser(field, analyzer);
>         parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_
> BOOLEAN_REWRITE);
>         Query q = parser.parse(queryString);
>         q = q.rewrite(reader);
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>         weight.extractTerms(terms);
>         for (Term t : terms) {
>             System.out.println(t);
>         }
>         reader.close();
>     }
>
> }
>
>
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
> Sent: Tuesday, October 25, 2016 1:42 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> Hi Allison,
>
> Unfortunately I can't compile the code (see below). Can you tell me what's
> wrong?
> I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
> CONSTANT_SCORE_BOOLEAN_REWRITE
>
> What I don't understand actually is the relation between my Query (which
> is a wildcard Query and not a MultiTermQuery.
>
> Can you explain?
>
> Thanks,
>
> Evert Wagenaar
>
>
> [Inline image 1]
>
> Full code of Searcher:
>
>
> package tk.evertwagenaar.lucene;
>
>
>
> import java.io.BufferedReader;
>
> import java.io.IOException;
>
> import java.io.InputStreamReader;
>
> import java.nio.charset.StandardCharsets;
>
> import java.nio.file.Files;
>
> import java.nio.file.Paths;
>
> import java.util.Date;
>
> import java.util.HashSet;
>
> import java.util.Set;
>
>
>
> import org.apache.lucene.analysis.Analyzer;
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.index.DirectoryReader;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.Term;
>
> import org.apache.lucene.queryparser.classic.QueryParser;
>
> import org.apache.lucene.search.IndexSearcher;
>
> import org.apache.lucene.search.MultiTermQuery;
>
> import org.apache.lucene.search.Query;
>
> import org.apache.lucene.search.ScoreDoc;
>
> import org.apache.lucene.search.TopDocs;
>
> import org.apache.lucene.search.Weight;
>
> import org.apache.lucene.store.FSDirectory;
>
>
>
> /** Simple command-line based search demo. */
>
> public class SearchFiles {
>
>
>
>        private static IndexReader reader;
>
>        private static Query q;
>
>
>
>        private SearchFiles() {
>
>        }
>
>
>
>        /** Simple command-line based search demo. */
>
>        public static void main(String[] args) throws Exception {
>
>               String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles
> [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw]
> [-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/
> for details.";
>
>               if (args.length > 0 && ("-h".equals(args[0]) ||
> "-help".equals(args[0]))) {
>
>                      System.out.println(usage);
>
>                      System.exit(0);
>
>               }
>
>
>
>               String index = "index";
>
>               String field = "contents";
>
>               String queries = null;
>
>               int repeat = 0;
>
>               boolean raw = false;
>
>               String queryString = "aard????";
>
>               int hitsPerPage = 10;
>
>
>
>               reader = DirectoryReader.open(FSDirectory.open(Paths.get(
> index)));
>
>               IndexSearcher searcher = new IndexSearcher(reader);
>
>               Analyzer analyzer = new StandardAnalyzer();
>
>
>
>               BufferedReader in = null;
>
>
>
>               QueryParser parser = new QueryParser(field, analyzer);
>
>               while (true) {
>
>                      if (queries == null && queryString == null) { //
> prompt the user
>
>                             System.out.println("Enter query: ");
>
>                      }
>
>
>
>                      Query q = parser.parse(queryString);
>
>                      System.out.println("Searching for: " +
> q.toString(field));
>
>
>
>                      if (repeat > 0) { // repeat & time as benchmark
>
>                             Date start = new Date();
>
>                             for (int i = 0; i < repeat; i++) {
>
>                                    searcher.search(q, 100);
>
>                             }
>
>                             Date end = new Date();
>
>                             System.out.println("Time: " + (end.getTime() -
> start.getTime()) + "ms");
>
>                             doPagingSearch(in, searcher, q, hitsPerPage,
> raw, queries == null && queryString == null);
>
>
>
>
>
>                      MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE
>
>
>
>                             q = q.rewrite(reader);
>
>                             Set<Term> terms = new HashSet<>();
>
>                             Weight weight = q.createWeight(searcher,
> false);
>
>                             terms = weight.extractTerms(terms);
>
>
>
>                             System.out.println("Match: " + terms);
>
>                             reader.close();
>
>
>
>                      }
>
>               }
>
>        }
>
>
>
>        /**
>
>        * Search the Query against the Index
>
>        */
>
>        public static void doPagingSearch(BufferedReader in, IndexSearcher
> searcher, Query query, int hitsPerPage,
>
>                      boolean raw, boolean interactive) throws IOException {
>
>
>
>               // Collect enough docs to show 5 pages
>
>               TopDocs results = searcher.search(query, 5 * hitsPerPage);
>
>               ScoreDoc[] hits = results.scoreDocs;
>
>
>
>               int numTotalHits = results.totalHits;
>
>               System.out.println(numTotalHits + " total matching
> documents");
>
>
>
>               int start = 0;
>
>               int end = Math.min(numTotalHits, hitsPerPage);
>
>
>
>               hits = searcher.search(query, numTotalHits).scoreDocs;
>
>               end = Math.min(hits.length, start + hitsPerPage);
>
>
>
>               for (int i = start; i < end; i++) {
>
>                      Document doc = searcher.doc(hits[i].doc);
>
>                      String path = doc.get("path");
>
>                      System.out.println((i + 1) + ". " + path);
>
>                      query.rewrite(reader);
>
>               }
>
>        }
>
> }
> Evert  Wagenaar
>
> On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <evert.wagenaar@gmail.com<
> mailto:evert.wagenaar@gmail.com>> wrote:
> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <tallison@mitre.org
> <ma...@mitre.org>> het volgende geschreven:
> Make sure to setRewriteMethod on the MultiTermQuery to:
>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>
> Then something like this should work:
>
>         q = q.rewrite(reader);
>
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>
>         weight.extractTerms(terms);
>
>
>
> -----Original Message-----
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
> Sent: Monday, October 24, 2016 12:41 PM
> To: java-user@lucene.apache.org
> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> I already asked this on StackOverflow. Unfortunately without any answer
> for over a week now.
>
> Therefore again to the real experts:
>
>
> I downloaded a list of 350.000 English words in a .txt file and Indexed it
> using the latest Lucene (6.2). I want to apply wildcard queries like
> aard???? and then retreive a list of matches.
>
> I've done this before in an older version of Lucene. Here it was pretty
> simple. I just had to do a Query.rewrite() and this retuned what I needed.
> Unfortunately in 6.2 this doesn't work anymore. There is a
> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
> In my case there's only one matching Term (aardvark). The Searcher returns
> one hit, containing the Document path to the wordlist. The HashMap is
> however empty.
>
> When I change the Query to find more then one single match (like aa*) the
> HashMap remains empty.
>
> I tried the MatchExtractor too. Unfortunately without result.
>
> The Objective of this is to demonstrate the power of Lucene to easily find
> words of a particular length, given one or more characters. I'm pretty sure
> I can do this using regular expressions in Java but then it's outside my
> objective.
>
> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
> Should I use a different Application?
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>
> --
> Sent from Gmail IPad
>
>
> --
> Sent from Gmail IPad
>
>

RE: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by "Allison, Timothy B." <ta...@mitre.org>.

A WildcardTerm subclasses a MultitermQuery.  If you are using the QueryParser, you need to set the rewrite method on the parser.

Try this…and beware of hitting the max BooleanQuery clause limit…and/or reset that



BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds);



import java.util.HashSet;
import java.util.Set;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Weight;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class RewriteTest {




    /** Simple command-line based search demo. */
    public static void main(String[] args) throws Exception {
        Analyzer analyzer = new StandardAnalyzer();
        String field = "contents";
        Directory directory = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter indexWriter = new IndexWriter(directory, config);
        for (int i = 0; i < 100; i++) {
            Document d = new Document();
            d.add(new TextField(field, "aard00"+i, Field.Store.YES));
            indexWriter.addDocument(d);
        }
        indexWriter.flush();
        indexWriter.close();

        String queryString = "aard????";

        IndexReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);


        QueryParser parser = new QueryParser(field, analyzer);
        parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE);
        Query q = parser.parse(queryString);
        q = q.rewrite(reader);
        Set<Term> terms = new HashSet<>();
        Weight weight = q.createWeight(searcher, false);
        weight.extractTerms(terms);
        for (Term t : terms) {
            System.out.println(t);
        }
        reader.close();
    }

}


From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
Sent: Tuesday, October 25, 2016 1:42 PM
To: java-user@lucene.apache.org
Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Hi Allison,

Unfortunately I can't compile the code (see below). Can you tell me what's wrong?
I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and CONSTANT_SCORE_BOOLEAN_REWRITE

What I don't understand actually is the relation between my Query (which is a wildcard Query and not a MultiTermQuery.

Can you explain?

Thanks,

Evert Wagenaar


[Inline image 1]

Full code of Searcher:


package tk.evertwagenaar.lucene;



import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Paths;

import java.util.Date;

import java.util.HashSet;

import java.util.Set;



import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.Term;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.MultiTermQuery;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.search.Weight;

import org.apache.lucene.store.FSDirectory;



/** Simple command-line based search demo. */

public class SearchFiles {



       private static IndexReader reader;

       private static Query q;



       private SearchFiles() {

       }



       /** Simple command-line based search demo. */

       public static void main(String[] args) throws Exception {

              String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] [-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/ for details.";

              if (args.length > 0 && ("-h".equals(args[0]) || "-help".equals(args[0]))) {

                     System.out.println(usage);

                     System.exit(0);

              }



              String index = "index";

              String field = "contents";

              String queries = null;

              int repeat = 0;

              boolean raw = false;

              String queryString = "aard????";

              int hitsPerPage = 10;



              reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));

              IndexSearcher searcher = new IndexSearcher(reader);

              Analyzer analyzer = new StandardAnalyzer();



              BufferedReader in = null;



              QueryParser parser = new QueryParser(field, analyzer);

              while (true) {

                     if (queries == null && queryString == null) { // prompt the user

                            System.out.println("Enter query: ");

                     }



                     Query q = parser.parse(queryString);

                     System.out.println("Searching for: " + q.toString(field));



                     if (repeat > 0) { // repeat & time as benchmark

                            Date start = new Date();

                            for (int i = 0; i < repeat; i++) {

                                   searcher.search(q, 100);

                            }

                            Date end = new Date();

                            System.out.println("Time: " + (end.getTime() - start.getTime()) + "ms");

                            doPagingSearch(in, searcher, q, hitsPerPage, raw, queries == null && queryString == null);





                     MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE



                            q = q.rewrite(reader);

                            Set<Term> terms = new HashSet<>();

                            Weight weight = q.createWeight(searcher, false);

                            terms = weight.extractTerms(terms);



                            System.out.println("Match: " + terms);

                            reader.close();



                     }

              }

       }



       /**

       * Search the Query against the Index

       */

       public static void doPagingSearch(BufferedReader in, IndexSearcher searcher, Query query, int hitsPerPage,

                     boolean raw, boolean interactive) throws IOException {



              // Collect enough docs to show 5 pages

              TopDocs results = searcher.search(query, 5 * hitsPerPage);

              ScoreDoc[] hits = results.scoreDocs;



              int numTotalHits = results.totalHits;

              System.out.println(numTotalHits + " total matching documents");



              int start = 0;

              int end = Math.min(numTotalHits, hitsPerPage);



              hits = searcher.search(query, numTotalHits).scoreDocs;

              end = Math.min(hits.length, start + hitsPerPage);



              for (int i = start; i < end; i++) {

                     Document doc = searcher.doc(hits[i].doc);

                     String path = doc.get("path");

                     System.out.println((i + 1) + ". " + path);

                     query.rewrite(reader);

              }

       }

}
Evert  Wagenaar

On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <ev...@gmail.com>> wrote:
Thanks Allison. I will try it.


Op maandag 24 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>> het volgende geschreven:
Make sure to setRewriteMethod on the MultiTermQuery to:
 MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE

Then something like this should work:

        q = q.rewrite(reader);

        Set<Term> terms = new HashSet<>();
        Weight weight = q.createWeight(searcher, false);

        weight.extractTerms(terms);



-----Original Message-----
From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
Sent: Monday, October 24, 2016 12:41 PM
To: java-user@lucene.apache.org
Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?

I already asked this on StackOverflow. Unfortunately without any answer for over a week now.

Therefore again to the real experts:


I downloaded a list of 350.000 English words in a .txt file and Indexed it using the latest Lucene (6.2). I want to apply wildcard queries like aard???? and then retreive a list of matches.

I've done this before in an older version of Lucene. Here it was pretty simple. I just had to do a Query.rewrite() and this retuned what I needed.
Unfortunately in 6.2 this doesn't work anymore. There is a Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
In my case there's only one matching Term (aardvark). The Searcher returns one hit, containing the Document path to the wordlist. The HashMap is however empty.

When I change the Query to find more then one single match (like aa*) the HashMap remains empty.

I tried the MatchExtractor too. Unfortunately without result.

The Objective of this is to demonstrate the power of Lucene to easily find words of a particular length, given one or more characters. I'm pretty sure I can do this using regular expressions in Java but then it's outside my objective.

Can anyone tell me why this isn't working? I use the StandardAnalyzer.
Should I use a different Application?

Any help is greatly appreciated.

Thanks.



--
Sent from Gmail IPad


--
Sent from Gmail IPad

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

Hi Allison,

Unfortunately I can't compile the code (see below). Can you tell me what's
wrong?
I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
CONSTANT_SCORE_BOOLEAN_REWRITE

What I don't understand actually is the relation between my Query (which is
a wildcard Query and not a MultiTermQuery.

Can you explain?

Thanks,

Evert Wagenaar


[image: Inline image 1]

*Full code of Searcher:*

package tk.evertwagenaar.lucene;


import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Paths;

import java.util.Date;

import java.util.HashSet;

import java.util.Set;


import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.Term;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.MultiTermQuery;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.search.Weight;

import org.apache.lucene.store.FSDirectory;


/** Simple command-line based search demo. */

public class SearchFiles {


private static IndexReader reader;

private static Query q;


private SearchFiles() {

}


/** Simple command-line based search demo. */

public static void main(String[] args) throws Exception {

String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles [-index
dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] [-paging
hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/ for details.";

if (args.length > 0 && ("-h".equals(args[0]) || "-help".equals(args[0]))) {

System.out.println(usage);

System.exit(0);

}


String index = "index";

String field = "contents";

String queries = null;

int repeat = 0;

boolean raw = false;

String queryString = "aard????";

int hitsPerPage = 10;


reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));

IndexSearcher searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer();


BufferedReader in = null;


QueryParser parser = new QueryParser(field, analyzer);

while (true) {

if (queries == null && queryString == null) { // prompt the user

System.out.println("Enter query: ");

}


Query q = parser.parse(queryString);

System.out.println("Searching for: " + q.toString(field));


if (repeat > 0) { // repeat & time as benchmark

Date start = new Date();

for (int i = 0; i < repeat; i++) {

searcher.search(q, 100);

}

Date end = new Date();

System.out.println("Time: " + (end.getTime() - start.getTime()) + "ms");

doPagingSearch(in, searcher, q, hitsPerPage, raw, queries == null &&
queryString == null);

MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE

q = q.rewrite(reader);

Set<Term> terms = new HashSet<>();

Weight weight = q.createWeight(searcher, false);

terms = weight.extractTerms(terms);


System.out.println("Match: " + terms);

reader.close();


}

}

}


/**

* Search the Query against the Index

*/

public static void doPagingSearch(BufferedReader in, IndexSearcher searcher,
Query query, int hitsPerPage,

boolean raw, boolean interactive) throws IOException {


// Collect enough docs to show 5 pages

TopDocs results = searcher.search(query, 5 * hitsPerPage);

ScoreDoc[] hits = results.scoreDocs;


int numTotalHits = results.totalHits;

System.out.println(numTotalHits + " total matching documents");


int start = 0;

int end = Math.min(numTotalHits, hitsPerPage);


hits = searcher.search(query, numTotalHits).scoreDocs;

end = Math.min(hits.length, start + hitsPerPage);


for (int i = start; i < end; i++) {

Document doc = searcher.doc(hits[i].doc);

String path = doc.get("path");

System.out.println((i + 1) + ". " + path);

query.rewrite(reader);

}

}

}
Evert  Wagenaar

On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <ev...@gmail.com>
wrote:

> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>
> het volgende geschreven:
>
>> Make sure to setRewriteMethod on the MultiTermQuery to:
>>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>>
>> Then something like this should work:
>>
>>         q = q.rewrite(reader);
>>
>>         Set<Term> terms = new HashSet<>();
>>         Weight weight = q.createWeight(searcher, false);
>>
>>         weight.extractTerms(terms);
>>
>>
>>
>> -----Original Message-----
>> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
>> Sent: Monday, October 24, 2016 12:41 PM
>> To: java-user@lucene.apache.org
>> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>>
>> I already asked this on StackOverflow. Unfortunately without any answer
>> for over a week now.
>>
>> Therefore again to the real experts:
>>
>>
>> I downloaded a list of 350.000 English words in a .txt file and Indexed
>> it using the latest Lucene (6.2). I want to apply wildcard queries like
>> aard???? and then retreive a list of matches.
>>
>> I've done this before in an older version of Lucene. Here it was pretty
>> simple. I just had to do a Query.rewrite() and this retuned what I needed.
>> Unfortunately in 6.2 this doesn't work anymore. There is a
>> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
>> In my case there's only one matching Term (aardvark). The Searcher
>> returns one hit, containing the Document path to the wordlist. The HashMap
>> is however empty.
>>
>> When I change the Query to find more then one single match (like aa*) the
>> HashMap remains empty.
>>
>> I tried the MatchExtractor too. Unfortunately without result.
>>
>> The Objective of this is to demonstrate the power of Lucene to easily
>> find words of a particular length, given one or more characters. I'm pretty
>> sure I can do this using regular expressions in Java but then it's outside
>> my objective.
>>
>> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
>> Should I use a different Application?
>>
>> Any help is greatly appreciated.
>>
>> Thanks.
>>
>>
>>
>> --
>> Sent from Gmail IPad
>>
>
>
> --
> Sent from Gmail IPad
>

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

Again, the Code I try to use to extract the matching term for the query
"aard????" This matches one term in my 350.000 words list. Which I Indexed
using the *StandardAnalyzer*.

As already mentioned this matches "aardvark".

What can I do to make this work?


Thanks,

Evert Wagenaar

http://www.evertwagenaar.tk

Evert  Wagenaar

On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <ev...@gmail.com>
wrote:

> Thanks Allison. I will try it.
>
>
> Op maandag 24 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>
> het volgende geschreven:
>
>> Make sure to setRewriteMethod on the MultiTermQuery to:
>>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>>
>> Then something like this should work:
>>
>>         q = q.rewrite(reader);
>>
>>         Set<Term> terms = new HashSet<>();
>>         Weight weight = q.createWeight(searcher, false);
>>
>>         weight.extractTerms(terms);
>>
>>
>>
>> -----Original Message-----
>> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com]
>> Sent: Monday, October 24, 2016 12:41 PM
>> To: java-user@lucene.apache.org
>> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>>
>> I already asked this on StackOverflow. Unfortunately without any answer
>> for over a week now.
>>
>> Therefore again to the real experts:
>>
>>
>> I downloaded a list of 350.000 English words in a .txt file and Indexed
>> it using the latest Lucene (6.2). I want to apply wildcard queries like
>> aard???? and then retreive a list of matches.
>>
>> I've done this before in an older version of Lucene. Here it was pretty
>> simple. I just had to do a Query.rewrite() and this retuned what I needed.
>> Unfortunately in 6.2 this doesn't work anymore. There is a
>> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
>> In my case there's only one matching Term (aardvark). The Searcher
>> returns one hit, containing the Document path to the wordlist. The HashMap
>> is however empty.
>>
>> When I change the Query to find more then one single match (like aa*) the
>> HashMap remains empty.
>>
>> I tried the MatchExtractor too. Unfortunately without result.
>>
>> The Objective of this is to demonstrate the power of Lucene to easily
>> find words of a particular length, given one or more characters. I'm pretty
>> sure I can do this using regular expressions in Java but then it's outside
>> my objective.
>>
>> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
>> Should I use a different Application?
>>
>> Any help is greatly appreciated.
>>
>> Thanks.
>>
>>
>>
>> --
>> Sent from Gmail IPad
>>
>
>
> --
> Sent from Gmail IPad
>

Re: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by Evert Wagenaar <ev...@gmail.com>.

Thanks Allison. I will try it.

Op maandag 24 oktober 2016 heeft Allison, Timothy B. <ta...@mitre.org>
het volgende geschreven:

> Make sure to setRewriteMethod on the MultiTermQuery to:
>  MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
>
> Then something like this should work:
>
>         q = q.rewrite(reader);
>
>         Set<Term> terms = new HashSet<>();
>         Weight weight = q.createWeight(searcher, false);
>
>         weight.extractTerms(terms);
>
>
>
> -----Original Message-----
> From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com <javascript:;>]
> Sent: Monday, October 24, 2016 12:41 PM
> To: java-user@lucene.apache.org <javascript:;>
> Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
>
> I already asked this on StackOverflow. Unfortunately without any answer
> for over a week now.
>
> Therefore again to the real experts:
>
>
> I downloaded a list of 350.000 English words in a .txt file and Indexed it
> using the latest Lucene (6.2). I want to apply wildcard queries like
> aard???? and then retreive a list of matches.
>
> I've done this before in an older version of Lucene. Here it was pretty
> simple. I just had to do a Query.rewrite() and this retuned what I needed.
> Unfortunately in 6.2 this doesn't work anymore. There is a
> Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
> In my case there's only one matching Term (aardvark). The Searcher returns
> one hit, containing the Document path to the wordlist. The HashMap is
> however empty.
>
> When I change the Query to find more then one single match (like aa*) the
> HashMap remains empty.
>
> I tried the MatchExtractor too. Unfortunately without result.
>
> The Objective of this is to demonstrate the power of Lucene to easily find
> words of a particular length, given one or more characters. I'm pretty sure
> I can do this using regular expressions in Java but then it's outside my
> objective.
>
> Can anyone tell me why this isn't working? I use the StandardAnalyzer.
> Should I use a different Application?
>
> Any help is greatly appreciated.
>
> Thanks.
>
>
>
> --
> Sent from Gmail IPad
>


-- 
Sent from Gmail IPad

RE: How to get the terms matching a WildCardQuery in Lucene 6.2?

Posted by "Allison, Timothy B." <ta...@mitre.org>.

Make sure to setRewriteMethod on the MultiTermQuery to: 
 MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE

Then something like this should work:

        q = q.rewrite(reader);

        Set<Term> terms = new HashSet<>();
        Weight weight = q.createWeight(searcher, false);

        weight.extractTerms(terms);



-----Original Message-----
From: Evert Wagenaar [mailto:evert.wagenaar@gmail.com] 
Sent: Monday, October 24, 2016 12:41 PM
To: java-user@lucene.apache.org
Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?

I already asked this on StackOverflow. Unfortunately without any answer for over a week now.

Therefore again to the real experts:


I downloaded a list of 350.000 English words in a .txt file and Indexed it using the latest Lucene (6.2). I want to apply wildcard queries like aard???? and then retreive a list of matches.

I've done this before in an older version of Lucene. Here it was pretty simple. I just had to do a Query.rewrite() and this retuned what I needed.
Unfortunately in 6.2 this doesn't work anymore. There is a Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
In my case there's only one matching Term (aardvark). The Searcher returns one hit, containing the Document path to the wordlist. The HashMap is however empty.

When I change the Query to find more then one single match (like aa*) the HashMap remains empty.

I tried the MatchExtractor too. Unfortunately without result.

The Objective of this is to demonstrate the power of Lucene to easily find words of a particular length, given one or more characters. I'm pretty sure I can do this using regular expressions in Java but then it's outside my objective.

Can anyone tell me why this isn't working? I use the StandardAnalyzer.
Should I use a different Application?

Any help is greatly appreciated.

Thanks.



--
Sent from Gmail IPad