You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ankit Murarka <an...@rancoretech.com> on 2013/08/13 09:56:51 UTC
Boolean Query when indexing each line as a document.
Hello All,
I have 2 different usecases.
I am trying to provide both boolean query and phrase search query in the
application.
In every line of the document which I am indexing I have content like :
<attribute name="remedial action" value="Checking"/>\
Due to the phrase search requirement, I am indexing each line of the
file as a new document.
Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer
etc, or phrase suggest) this seems to work fine and provide me with
desired suggestions.
Problem is :
How do I invoke boolean query for this. I mean when I verified the
indexes in Luke, I saw the whole line as expected is indexed.
So, if user wish to perform a boolean query say suppose containing
"remedialaction" and "Checking" how do I get this document as a hit. I
believe since I am indexing each line, this seems to be bit tricky.
Please guide.
--
Regards
Ankit
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello.
My main aim is following:
a. Index both on line and doc basis (Line basis for providing phrase
suggestions/infix suggestions. Doc basis for Firing
booleanquery/wildcard query etc.)
b. Yeah for boolean/wildcard etc user input will be "xxx" and "yyy" and
I will show document name.
c. WHAT I intend to do here is once user click on respective filename
having XXX and YYY, I should be able to show some 100 lines above XXX
and YYY and 100 lines below XXX and YYY..
Basically once I provide the filename and user click on that filename I
should be able to show 100 lines above XXX and YYY and 100 lines below
XXX and YYY.
Any suggestion/guidance will be appreciated.
On 8/21/2013 2:39 PM, Roberto Ragusa wrote:
> On 08/21/2013 09:51 AM, Ankit Murarka wrote:
>
>> Yeah..I eventually DID THIS....
>>
>> Just a small question : Knowing that BooleanQuery/PrefixQuery/WildCardQuery might also run fine even if I index the complete document as opposed to doing it Line by Line. Shouldn't I do it this way rather than indexing each line for Boolean/Prefix/Wildcard also. ?
>>
>> What might be the impact on performance. What might be possible pitfalls..
>>
> Do you want to search documents or lines? Do you want a search "xxx AND yyy" to return a document
> where xxx is at line 6 and yyy is at line 324? If yes, index by doc, if not, index by line.
> No particular performance difference (less documents, but same number of term occurrences).
>
>
>> Also Final part is showing user the content above and below the given query to search. Will this be achieveable if I index document by document for Boolean/Prefix/Wildcard...
>>
> Sorry, i don't understand this.
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Roberto Ragusa <ma...@robertoragusa.it>.
On 08/21/2013 09:51 AM, Ankit Murarka wrote:
> Yeah..I eventually DID THIS....
>
> Just a small question : Knowing that BooleanQuery/PrefixQuery/WildCardQuery might also run fine even if I index the complete document as opposed to doing it Line by Line. Shouldn't I do it this way rather than indexing each line for Boolean/Prefix/Wildcard also. ?
>
> What might be the impact on performance. What might be possible pitfalls..
Do you want to search documents or lines? Do you want a search "xxx AND yyy" to return a document
where xxx is at line 6 and yyy is at line 324? If yes, index by doc, if not, index by line.
No particular performance difference (less documents, but same number of term occurrences).
> Also Final part is showing user the content above and below the given query to search. Will this be achieveable if I index document by document for Boolean/Prefix/Wildcard...
Sorry, i don't understand this.
--
Roberto Ragusa mail at robertoragusa.it
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Yeah..I eventually DID THIS....
Just a small question : Knowing that
BooleanQuery/PrefixQuery/WildCardQuery might also run fine even if I
index the complete document as opposed to doing it Line by Line.
Shouldn't I do it this way rather than indexing each line for
Boolean/Prefix/Wildcard also. ?
What might be the impact on performance. What might be possible pitfalls..
Also Final part is showing user the content above and below the given
query to search. Will this be achieveable if I index document by
document for Boolean/Prefix/Wildcard...
Kindly advice...
On 8/21/2013 12:56 PM, Roberto Ragusa wrote:
> On 08/21/2013 08:38 AM, Ankit Murarka wrote:
>
>> Hello.
>> I tried with
>>
>> doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>
>> The BooleanQuery/PrefixMatch/WildCard all started Running fine..
>>
>> But it broke the Existing code for Phrase Suggestion/InfixSuggester. Now these suggesters are returning Word suggestion instead of Phrase Suggestion which is not serving any purpose.
>>
>> THIS IS NOT DESIRABLE..
>>
>> My PhraseSuggestion/InfixSuggestion etc. is now not working fine. Please guide.. This is complete blocker.
>>
> I do not know how PhraseSuggestion/InfixSuggestion work, but, assuming
> there is no better solution, you could try indexing both a Field and a StringField
> (with different field names) and using the first for searching and the second
> for suggestions.
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Roberto Ragusa <ma...@robertoragusa.it>.
On 08/21/2013 08:38 AM, Ankit Murarka wrote:
> Hello.
> I tried with
>
> doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>
> The BooleanQuery/PrefixMatch/WildCard all started Running fine..
>
> But it broke the Existing code for Phrase Suggestion/InfixSuggester. Now these suggesters are returning Word suggestion instead of Phrase Suggestion which is not serving any purpose.
>
> THIS IS NOT DESIRABLE..
>
> My PhraseSuggestion/InfixSuggestion etc. is now not working fine. Please guide.. This is complete blocker.
I do not know how PhraseSuggestion/InfixSuggestion work, but, assuming
there is no better solution, you could try indexing both a Field and a StringField
(with different field names) and using the first for searching and the second
for suggestions.
--
Roberto Ragusa mail at robertoragusa.it
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello.
I tried with
doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
The BooleanQuery/PrefixMatch/WildCard all started Running fine..
But it broke the Existing code for Phrase Suggestion/InfixSuggester. Now
these suggesters are returning Word suggestion instead of Phrase
Suggestion which is not serving any purpose.
THIS IS NOT DESIRABLE..
My PhraseSuggestion/InfixSuggestion etc. is now not working fine. Please
guide.. This is complete blocker.
On 8/19/2013 12:28 PM, Roberto Ragusa wrote:
> On 08/19/2013 08:17 AM, Ankit Murarka wrote:
>
>> doc.add(new StringField("contents",line,Field.Store.YES));
>>
> Did you try with:
>
> doc.add(new Field("contents",line,Field.Store.YES));
>
> ?
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Roberto Ragusa <ma...@robertoragusa.it>.
On 08/19/2013 08:17 AM, Ankit Murarka wrote:
> doc.add(new StringField("contents",line,Field.Store.YES));
Did you try with:
doc.add(new Field("contents",line,Field.Store.YES));
?
--
Roberto Ragusa mail at robertoragusa.it
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello All,
The smallest possible self contained program using RAMDirectory and no
external classes is being mentioned below.kindly assist me in what might
be wrong.
**package example;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import
org.apache.lucene.queryparser.flexible.standard.parser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
public class MainClassForLine {
public static void main(String[] args) {
try
{
Directory dir=new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_44);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
analyzer);
IndexWriter writer = new IndexWriter(dir, iwc);
indexDocs(writer);
writer.close();
IndexReader reader = DirectoryReader.open(dir); //location where
indexes are.
IndexSearcher searcher = new IndexSearcher(reader);
searchIndexWithQueryParser(searcher,"+contents:running",analyzer);
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
}
static void indexDocs(IndexWriter writer)
throws IOException {
try {
Document doc = new Document();
LineNumberReader lnr = new LineNumberReader( new
FileReader("D:\\Helios-WorkSpace\\Alwith4.3\\FileSearch\\Demo1.txt"));
String line=null;
while( null != (line = lnr.readLine())){
doc.add(new StringField("contents",line,Field.Store.YES));
}
if (writer.getConfig().getOpenMode() ==
OpenMode.CREATE_OR_APPEND) {
System.out.println("adding");
writer.addDocument(doc);
} else {
}
}catch(Exception e) {
e.printStackTrace();
}finally {}
}
public static void searchIndexWithQueryParser(IndexSearcher
searcher,String searchString,Analyzer analyzer) throws IOException,
ParseException {
try
{
System.out.println("Searching for '" + searchString + "' using
QueryParser");
QueryParser queryParser = new QueryParser(Version.LUCENE_44,
"contents",analyzer);
Query query = queryParser.parse(searchString);
System.out.println("Type of query: " +
query.getClass().getSimpleName());
displayQuery(query);
TopDocs results = searcher.search(query,100);
ScoreDoc[] hits = results.scoreDocs;
System.out.println("size of HITS is " + hits.length);
int numTotalHits = results.totalHits;
System.out.println(numTotalHits + " total matching
documents");
}catch(Exception e){
e.printStackTrace();
}
}
public static void displayQuery(Query query) {
System.out.println("Query: " + query.toString());
}
}
I would like to mention the problem once again:
Parsing each document line by line to build up indexes and then firing a
Boolean query is not returning any Hits.
However if the same document is parsed without line by line then boolean
query is returning hits.
I want boolean query/wildcard query/prefix query to return hits even
when I parse all the files line by line...
On 8/17/2013 1:15 PM, Ankit Murarka wrote:
> Hello. Reference to CustomAnalyzer is what I had mentioned.
>
> I created a custom analyzer by using the StandardAnalyzer code. Only
> change I made was to comment the LowerCaseFilter in StandardAnalyzer
> to create my own analyzer.
>
> Yeah.I am trying to provide a smallest possible self contained
> program. Give me some time.
>
> On 8/14/2013 8:39 PM, Ian Lea wrote:
>> If you're using StandardAnalyzer what's the reference to
>> CustomAnalyzerForCaseSensitive all about? Someone else with more
>> patience or better diagnostic skill may well spot your problem but I
>> can't.
>>
>> My final suggestion is that you build and post the smallest possible
>> self-contained program, using RAMDirectory and no external classes.
>> If you are using a custom analyzer try it without - if that works
>> you've got a clue as to where to look next.
>>
>> Good luck.
>>
>>
>> On Wed, Aug 14, 2013 at 3:46 PM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>> Hello. I gave the complete code sample so that anyone can try and
>>> let me
>>> know. This is because this issue is really taking a toll on me.
>>> I am so close yet so far.
>>>
>>> Yes, I am using analyzer to index the document. The Analyzer is
>>> StandardAnalyzer but I have commented the LowerCaseFilter code from
>>> that.
>>>
>>> Yes In my trailing mail I have mentioned the same.
>>>
>>> This is what is present in my file:
>>>
>>> INSIDE POST OF Listener\
>>>
>>> This is what is present in the index:
>>>
>>> INSIDE POST OF Listener\
>>>
>>> The query which I gave to search:
>>>
>>>
>>> Query is +contents:INSIDE contents:POST
>>>
>>>
>>> STILL I AM GETTING NO HIT.. But If I index all the documents normally
>>> (without indexing them line by line) I do get HITS..
>>>
>>> Still not able to figure out the problem.
>>>
>>>
>>>
>>> On 8/14/2013 8:07 PM, Ian Lea wrote:
>>>> I was rather hoping for something smaller!
>>>>
>>>> One suggestion from a glance is that you're using some analyzer
>>>> somewhere but building a BooleanQuery out of a TermQuery or two. Are
>>>> you sure (test it and prove it) that the strings you pass to the
>>>> TermQuery are EXACTLY what has been indexed?
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>> Hello. The problem is as follows:
>>>>>
>>>>> I have a document containing information in lines. So I am
>>>>> indexing all
>>>>> files line by line.
>>>>> So If I say in my document I have,
>>>>> INSIDE POST OF SERVER\
>>>>> and in my index file created I have,
>>>>> INSIDE POST OF SERVER\
>>>>>
>>>>> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am
>>>>> getting
>>>>> no HIT.
>>>>>
>>>>> I am providing the complete CODE I am using to create INDEX and TO
>>>>> SEARCH..Both are drawn from sample code present online.
>>>>>
>>>>> /*INDEX CODE:
>>>>> */
>>>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>>>
>>>>> public class CreateIndex {
>>>>> public static void main(String[] args) {
>>>>> String indexPath = "D:\\INDEXFORQUERY"; //Place where
>>>>> indexes will
>>>>> be
>>>>> created
>>>>> String docsPath="Indexed"; //Place where the files are kept.
>>>>> boolean create=true;
>>>>> final File docDir = new File(docsPath);
>>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>>> System.exit(1);
>>>>> }
>>>>> try {
>>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>>> Analyzer analyzer=new
>>>>> CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>> IndexWriterConfig iwc = new
>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>> analyzer);
>>>>> if (create) {
>>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>> } else {
>>>>> System.out.println("Trying to set IWC mode to
>>>>> UPDATE...NOT
>>>>> DESIRED..");
>>>>> }
>>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>>> indexDocs(writer, docDir);
>>>>> writer.close();
>>>>> } catch (IOException e) {
>>>>> System.out.println(" caught a " + e.getClass() +
>>>>> "\n with message: " + e.getMessage());
>>>>> }
>>>>> }
>>>>> static void indexDocs(IndexWriter writer, File file)
>>>>> throws IOException {
>>>>> if (file.canRead())
>>>>> {
>>>>> if (file.isDirectory()) {
>>>>> String[] files = file.list();
>>>>> if (files != null) {
>>>>> for (int i = 0; i< files.length; i++) {
>>>>> if(files[i]!=null)
>>>>> indexDocs(writer, new File(file, files[i]));
>>>>> }
>>>>> }
>>>>> } else {
>>>>> try {
>>>>> Document doc = new Document();
>>>>> Field pathField = new StringField("path", file.getPath(),
>>>>> Field.Store.YES);
>>>>> doc.add(pathField);
>>>>> doc.add(new LongField("modified", file.lastModified(),
>>>>> Field.Store.NO));
>>>>> LineNumberReader lnr=new LineNumberReader(new
>>>>> FileReader(file));
>>>>> String line=null;
>>>>> while( null != (line = lnr.readLine()) ){
>>>>> doc.add(new
>>>>> StringField("contents",line,Field.Store.YES));
>>>>> }
>>>>> if (writer.getConfig().getOpenMode() ==
>>>>> OpenMode.CREATE) {
>>>>> writer.addDocument(doc);
>>>>> } else {
>>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>>> doc);
>>>>> }
>>>>> } finally {
>>>>> }
>>>>> }
>>>>> }
>>>>> } }
>>>>>
>>>>> /*SEARCHING CODE:-*/
>>>>>
>>>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>>>
>>>>> public class SearchFORALLQUERIES {
>>>>> public static void main(String[] args) throws Exception {
>>>>>
>>>>> String[] argument=new String[20];
>>>>> argument[0]="-index";
>>>>> argument[1]="D:\\INDEXFORQUERY";
>>>>> argument[2]="-field";
>>>>> argument[3]="contents"; //field value
>>>>> argument[4]="-repeat";
>>>>> argument[5]="2"; //repeat value
>>>>> argument[6]="-raw";
>>>>> argument[7]="-paging";
>>>>> argument[8]="300"; //paging value
>>>>>
>>>>> String index = "index";
>>>>> String field = "contents";
>>>>> String queries = null;
>>>>> int repeat = 0;
>>>>> boolean raw = false;
>>>>> String queryString = null;
>>>>> int hitsPerPage = 10;
>>>>>
>>>>> for(int i = 0;i< argument.length;i++) {
>>>>> if ("-index".equals(argument[i])) {
>>>>> index = argument[i+1];
>>>>> i++;
>>>>> } else if ("-field".equals(argument[i])) {
>>>>> field = argument[i+1];
>>>>> i++;
>>>>> } else if ("-queries".equals(argument[i])) {
>>>>> queries = argument[i+1];
>>>>> i++;
>>>>> } else if ("-query".equals(argument[i])) {
>>>>> queryString = argument[i+1];
>>>>> i++;
>>>>> } else if ("-repeat".equals(argument[i])) {
>>>>> repeat = Integer.parseInt(argument[i+1]);
>>>>> i++;
>>>>> } else if ("-raw".equals(argument[i])) {
>>>>> raw = true; //set it true to just display the count.
>>>>> If false
>>>>> then
>>>>> it also display file name.
>>>>> } else if ("-paging".equals(argument[i])) {
>>>>> hitsPerPage = Integer.parseInt(argument[i+1]);
>>>>> if (hitsPerPage<= 0) {
>>>>> System.err.println("There must be at least 1 hit per
>>>>> page.");
>>>>> System.exit(1);
>>>>> }
>>>>> i++;
>>>>> }
>>>>> }
>>>>> System.out.println("processing input");
>>>>> IndexReader reader = DirectoryReader.open(FSDirectory.open(new
>>>>> File(index))); //location where indexes are.
>>>>> IndexSearcher searcher = new IndexSearcher(reader);
>>>>> BufferedReader in = null;
>>>>> if (queries != null) {
>>>>> in = new BufferedReader(new InputStreamReader(new
>>>>> FileInputStream(queries), "UTF-8")); //provide query as input
>>>>> } else {
>>>>> in = new BufferedReader(new InputStreamReader(System.in,
>>>>> "UTF-8"));
>>>>> }
>>>>> while (true) {
>>>>> if (queries == null&& queryString == null) {
>>>>> //
>>>>>
>>>>> prompt the user
>>>>> System.out.println("Enter query: "); //if query is not
>>>>> present,
>>>>> prompt the user to enter query.
>>>>> }
>>>>> String line = queryString != null ? queryString :
>>>>> in.readLine();
>>>>>
>>>>> if (line == null || line.length() == -1) {
>>>>> break;
>>>>> }
>>>>> line = line.trim();
>>>>> if (line.length() == 0) {
>>>>> break;
>>>>> }
>>>>> String[] str=line.split(" ");
>>>>> System.out.println("queries are " + str[0] + " and is " +
>>>>> str[1]);
>>>>> Query query1 = new TermQuery(new Term(field, str[0]));
>>>>> Query query2=new TermQuery(new Term(field,str[1]));
>>>>> BooleanQuery booleanQuery = new BooleanQuery();
>>>>> booleanQuery.add(query1, BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(query2, BooleanClause.Occur.MUST);
>>>>> if (repeat> 0) { //repeat=2 //
>>>>> repeat&
>>>>> time as benchmark
>>>>> Date start = new Date();
>>>>> for (int i = 0; i< repeat; i++) {
>>>>> searcher.search(booleanQuery, null, 100);
>>>>> }
>>>>> Date end = new Date();
>>>>> System.out.println("Time:
>>>>> "+(end.getTime()-start.getTime())+"ms");
>>>>> }
>>>>> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw,
>>>>> queries
>>>>> == null&& queryString == null);
>>>>>
>>>>> if (queryString != null) {
>>>>> break;
>>>>> }
>>>>> }
>>>>> reader.close();
>>>>> }
>>>>> public static void doPagingSearch(BufferedReader in,
>>>>> IndexSearcher
>>>>> searcher, Query query,
>>>>> int hitsPerPage, boolean raw,
>>>>> boolean
>>>>> interactive) throws IOException {
>>>>> TopDocs results = searcher.search(query, 5 * hitsPerPage);
>>>>> ScoreDoc[] hits = results.scoreDocs;
>>>>> int numTotalHits = results.totalHits;
>>>>> System.out.println(numTotalHits + " total matching documents");
>>>>> int start = 0;
>>>>> int end = Math.min(numTotalHits, hitsPerPage);
>>>>> while (true) {
>>>>> if (end> hits.length) {
>>>>> System.out.println("Only results 1 - " + hits.length +"
>>>>> of " +
>>>>> numTotalHits + " total matching documents collected.");
>>>>> System.out.println("Collect more (y/n) ?");
>>>>> String line = in.readLine();
>>>>> if (line.length() == 0 || line.charAt(0) == 'n') {
>>>>> break;
>>>>> }
>>>>> hits = searcher.search(query, numTotalHits).scoreDocs;
>>>>> }
>>>>> end = Math.min(hits.length, start + hitsPerPage); //3
>>>>> and 5.
>>>>> for (int i = start; i< end; i++) { //0 to 3.
>>>>> if (raw) {
>>>>>
>>>>> System.out.println("doc="+hits[i].doc+"
>>>>> score="+hits[i].score);
>>>>> }
>>>>> Document doc = searcher.doc(hits[i].doc);
>>>>> List<IndexableField> filed=doc.getFields();
>>>>> filed.size();
>>>>> String path = doc.get("path");
>>>>> if (path != null) {
>>>>> System.out.println((i+1) + ". " + path);
>>>>> String title = doc.get("title");
>>>>> if (title != null) {
>>>>> System.out.println(" Title: " + doc.get("title"));
>>>>> }
>>>>> } else {
>>>>> System.out.println((i+1) + ". " + "No path for this
>>>>> document");
>>>>> }
>>>>> }
>>>>> if (!interactive || end == 0) {
>>>>> break;
>>>>> }
>>>>> if (numTotalHits>= end) {
>>>>> boolean quit = false;
>>>>> while (true) {
>>>>> System.out.print("Press ");
>>>>> if (start - hitsPerPage>= 0) {
>>>>> System.out.print("(p)revious page, ");
>>>>> }
>>>>> if (start + hitsPerPage< numTotalHits) {
>>>>> System.out.print("(n)ext page, ");
>>>>> }
>>>>> System.out.println("(q)uit or enter number to jump to a
>>>>> page.");
>>>>> String line = in.readLine();
>>>>> if (line.length() == 0 || line.charAt(0)=='q') {
>>>>> quit = true;
>>>>> break;
>>>>> }
>>>>> if (line.charAt(0) == 'p') {
>>>>> start = Math.max(0, start - hitsPerPage);
>>>>> break;
>>>>> } else if (line.charAt(0) == 'n') {
>>>>> if (start + hitsPerPage< numTotalHits) {
>>>>> start+=hitsPerPage;
>>>>> }
>>>>> break;
>>>>> } else {
>>>>> int page = Integer.parseInt(line);
>>>>> if ((page - 1) * hitsPerPage< numTotalHits) {
>>>>> start = (page - 1) * hitsPerPage;
>>>>> break;
>>>>> } else {
>>>>> System.out.println("No such page");
>>>>> }
>>>>> }
>>>>> }
>>>>> if (quit) break;
>>>>> end = Math.min(numTotalHits, start + hitsPerPage);
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> /*CUSTOM ANALYZER CODE:*/
>>>>>
>>>>> package com.rancore.demo;
>>>>>
>>>>> import java.io.IOException;
>>>>> import java.io.Reader;
>>>>>
>>>>> import org.apache.lucene.analysis.TokenStream;
>>>>> import org.apache.lucene.analysis.core.StopAnalyzer;
>>>>> import org.apache.lucene.analysis.core.StopFilter;
>>>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>>>> import org.apache.lucene.analysis.standard.StandardTokenizer;
>>>>> import org.apache.lucene.analysis.util.CharArraySet;
>>>>> import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
>>>>> import org.apache.lucene.util.Version;
>>>>>
>>>>> public class CustomAnalyzerForCaseSensitive extends
>>>>> StopwordAnalyzerBase
>>>>> {
>>>>>
>>>>> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
>>>>> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
>>>>> public static final CharArraySet STOP_WORDS_SET =
>>>>> StopAnalyzer.ENGLISH_STOP_WORDS_SET;
>>>>> public CustomAnalyzerForCaseSensitive(Version matchVersion,
>>>>> CharArraySet stopWords) {
>>>>> super(matchVersion, stopWords);
>>>>> }
>>>>> public CustomAnalyzerForCaseSensitive(Version matchVersion) {
>>>>> this(matchVersion, STOP_WORDS_SET);
>>>>> }
>>>>> public CustomAnalyzerForCaseSensitive(Version
>>>>> matchVersion, Reader
>>>>> stopwords) throws IOException {
>>>>> this(matchVersion, loadStopwordSet(stopwords,
>>>>> matchVersion));
>>>>> }
>>>>> public void setMaxTokenLength(int length) {
>>>>> maxTokenLength = length;
>>>>> }
>>>>> /**
>>>>> * @see #setMaxTokenLength
>>>>> */
>>>>> public int getMaxTokenLength() {
>>>>> return maxTokenLength;
>>>>> }
>>>>> @Override
>>>>> protected TokenStreamComponents createComponents(final String
>>>>> fieldName,
>>>>> final Reader reader) {
>>>>> final StandardTokenizer src = new
>>>>> StandardTokenizer(matchVersion,
>>>>> reader);
>>>>> src.setMaxTokenLength(maxTokenLength);
>>>>> TokenStream tok = new StandardFilter(matchVersion,
>>>>> src);
>>>>> // tok = new LowerCaseFilter(matchVersion, tok);
>>>>> tok = new StopFilter(matchVersion, tok, stopwords);
>>>>> return new TokenStreamComponents(src, tok) {
>>>>> @Override
>>>>> protected void setReader(final Reader reader) throws
>>>>> IOException {
>>>>>
>>>>>
>>>>> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
>>>>>
>>>>> super.setReader(reader);
>>>>> }
>>>>> };
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
>>>>>
>>>>> PLEASE GUIDE ME NOW: IN case any further information is required
>>>>> please
>>>>> let
>>>>> me know.
>>>>>
>>>>>
>>>>> On 8/14/2013 7:43 PM, Ian Lea wrote:
>>>>>
>>>>>> Well, you have supplied a bit more info - good - but I still can't
>>>>>> spot the problem. Unless someone else can I suggest you post a very
>>>>>> small self-contained program that demonstrates the problem.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
>>>>>> <an...@rancoretech.com> wrote:
>>>>>>
>>>>>>
>>>>>>> Hello.
>>>>>>> The problem does not seem to be getting solved.
>>>>>>>
>>>>>>> As mentioned, I am indexing each line of each file.
>>>>>>> The sample text present inside LUKE is
>>>>>>>
>>>>>>> <am name="notification" value="10"/>\
>>>>>>> <type="DE">\
>>>>>>> java.lang.Thread.run(Thread.java:619)
>>>>>>>
>>>>>>>
>>>>>>>>> Size of list array::0\
>>>>>>>>>
>>>>>>>>>
>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>> org.com.dummy,INFO,<< Still figuring out how to run
>>>>>>>
>>>>>>>
>>>>>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>>>>>>
>>>>>>>>>
>>>>>>> INSIDE POST OF Listener\
>>>>>>>
>>>>>>> In my Luke, I can see the text as "INSIDE POST OF Listener" ..
>>>>>>> This is
>>>>>>> present in many files.
>>>>>>>
>>>>>>> /*Query is +contents:INSIDE contents:POST */ --/The
>>>>>>> field
>>>>>>> name
>>>>>>> is contents. Same analyzer is being used. This is a boolean query./
>>>>>>>
>>>>>>> To test, I indexed only 20 files. In 19 files, this is present.
>>>>>>>
>>>>>>> The boolean query should give me a hit for this document.
>>>>>>>
>>>>>>> BUT IT IS RETURNING ME NO HIT..
>>>>>>>
>>>>>>> If I index the same files WITHOUT line by line then, it gives me
>>>>>>> proper
>>>>>>> hits..
>>>>>>>
>>>>>>> But for me it should work on Indexes created by Line by Line
>>>>>>> parsing
>>>>>>> also.
>>>>>>>
>>>>>>> Please guide.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>>>>>>
>>>>>>>
>>>>>>>> remedialaction != "remedial action"?
>>>>>>>>
>>>>>>>> Show us your query. Show a small self-contained sample program or
>>>>>>>> test case that demonstrates the problem. You need to give us
>>>>>>>> something more to go on.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ian.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> I am aware of that link and I have been through
>>>>>>>>> that link
>>>>>>>>> many
>>>>>>>>> number of times.
>>>>>>>>>
>>>>>>>>> Problem I have is:
>>>>>>>>>
>>>>>>>>> 1. Each line is indexed. So indexed line looks something like
>>>>>>>>> "<attribute
>>>>>>>>> name="remedial action" value="Checking"/>\"
>>>>>>>>> 2. I am easily firing a phrase query on this line. It suggest
>>>>>>>>> me the
>>>>>>>>> possible values. No problem,.
>>>>>>>>> 3. If I fire a Boolean Query with "remedialaction" and
>>>>>>>>> "Checking" as
>>>>>>>>> a
>>>>>>>>> must/must , then it is not providing me this document as a hit.
>>>>>>>>> 4. I am using StandardAnalyzer both during the indexing and
>>>>>>>>> searching
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Should be straightforward enough. Work through the tips in
>>>>>>>>>> the FAQ
>>>>>>>>>> entry at
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>>>>>>>
>>>>>>>>>> and post back if that doesn't help, with details of how you are
>>>>>>>>>> analyzing the data and how you are searching.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ian.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello All,
>>>>>>>>>>> I have 2 different usecases.
>>>>>>>>>>> I am trying to provide both boolean query and phrase search
>>>>>>>>>>> query
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> application.
>>>>>>>>>>>
>>>>>>>>>>> In every line of the document which I am indexing I have
>>>>>>>>>>> content
>>>>>>>>>>> like
>>>>>>>>>>> :
>>>>>>>>>>>
>>>>>>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>>>>>>
>>>>>>>>>>> Due to the phrase search requirement, I am indexing each
>>>>>>>>>>> line of
>>>>>>>>>>> the
>>>>>>>>>>> file
>>>>>>>>>>> as
>>>>>>>>>>> a new document.
>>>>>>>>>>>
>>>>>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix
>>>>>>>>>>> Analyzer
>>>>>>>>>>> etc,
>>>>>>>>>>> or phrase suggest) this seems to work fine and provide me with
>>>>>>>>>>> desired
>>>>>>>>>>> suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Problem is :
>>>>>>>>>>>
>>>>>>>>>>> How do I invoke boolean query for this. I mean when I
>>>>>>>>>>> verified the
>>>>>>>>>>> indexes
>>>>>>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>>>>>>
>>>>>>>>>>> So, if user wish to perform a boolean query say suppose
>>>>>>>>>>> containing
>>>>>>>>>>> "remedialaction" and "Checking" how do I get this document as a
>>>>>>>>>>> hit.
>>>>>>>>>>> I
>>>>>>>>>>> believe since I am indexing each line, this seems to be bit
>>>>>>>>>>> tricky.
>>>>>>>>>>>
>>>>>>>>>>> Please guide.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Ankit
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> java-user-help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>> java-user-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Ankit Murarka
>>>>>>>>>
>>>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>>>> compared
>>>>>>>>> with
>>>>>>>>> what lies within us"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit Murarka
>>>>>>>
>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>> compared
>>>>>>> with
>>>>>>> what lies within us"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters
>>>>> compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters
>>> compared with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello. Reference to CustomAnalyzer is what I had mentioned.
I created a custom analyzer by using the StandardAnalyzer code. Only
change I made was to comment the LowerCaseFilter in StandardAnalyzer to
create my own analyzer.
Yeah.I am trying to provide a smallest possible self contained program.
Give me some time.
On 8/14/2013 8:39 PM, Ian Lea wrote:
> If you're using StandardAnalyzer what's the reference to
> CustomAnalyzerForCaseSensitive all about? Someone else with more
> patience or better diagnostic skill may well spot your problem but I
> can't.
>
> My final suggestion is that you build and post the smallest possible
> self-contained program, using RAMDirectory and no external classes.
> If you are using a custom analyzer try it without - if that works
> you've got a clue as to where to look next.
>
> Good luck.
>
>
> On Wed, Aug 14, 2013 at 3:46 PM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello. I gave the complete code sample so that anyone can try and let me
>> know. This is because this issue is really taking a toll on me.
>> I am so close yet so far.
>>
>> Yes, I am using analyzer to index the document. The Analyzer is
>> StandardAnalyzer but I have commented the LowerCaseFilter code from that.
>>
>> Yes In my trailing mail I have mentioned the same.
>>
>> This is what is present in my file:
>>
>> INSIDE POST OF Listener\
>>
>> This is what is present in the index:
>>
>> INSIDE POST OF Listener\
>>
>> The query which I gave to search:
>>
>>
>> Query is +contents:INSIDE contents:POST
>>
>>
>> STILL I AM GETTING NO HIT.. But If I index all the documents normally
>> (without indexing them line by line) I do get HITS..
>>
>> Still not able to figure out the problem.
>>
>>
>>
>> On 8/14/2013 8:07 PM, Ian Lea wrote:
>>
>>> I was rather hoping for something smaller!
>>>
>>> One suggestion from a glance is that you're using some analyzer
>>> somewhere but building a BooleanQuery out of a TermQuery or two. Are
>>> you sure (test it and prove it) that the strings you pass to the
>>> TermQuery are EXACTLY what has been indexed?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Hello. The problem is as follows:
>>>>
>>>> I have a document containing information in lines. So I am indexing all
>>>> files line by line.
>>>> So If I say in my document I have,
>>>> INSIDE POST OF SERVER\
>>>> and in my index file created I have,
>>>> INSIDE POST OF SERVER\
>>>>
>>>> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am
>>>> getting
>>>> no HIT.
>>>>
>>>> I am providing the complete CODE I am using to create INDEX and TO
>>>> SEARCH..Both are drawn from sample code present online.
>>>>
>>>> /*INDEX CODE:
>>>> */
>>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>>
>>>> public class CreateIndex {
>>>> public static void main(String[] args) {
>>>> String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will
>>>> be
>>>> created
>>>> String docsPath="Indexed"; //Place where the files are kept.
>>>> boolean create=true;
>>>> final File docDir = new File(docsPath);
>>>> if (!docDir.exists() || !docDir.canRead()) {
>>>> System.exit(1);
>>>> }
>>>> try {
>>>> Directory dir = FSDirectory.open(new File(indexPath));
>>>> Analyzer analyzer=new
>>>> CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>>>> analyzer);
>>>> if (create) {
>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>> } else {
>>>> System.out.println("Trying to set IWC mode to UPDATE...NOT
>>>> DESIRED..");
>>>> }
>>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>>> indexDocs(writer, docDir);
>>>> writer.close();
>>>> } catch (IOException e) {
>>>> System.out.println(" caught a " + e.getClass() +
>>>> "\n with message: " + e.getMessage());
>>>> }
>>>> }
>>>> static void indexDocs(IndexWriter writer, File file)
>>>> throws IOException {
>>>> if (file.canRead())
>>>> {
>>>> if (file.isDirectory()) {
>>>> String[] files = file.list();
>>>> if (files != null) {
>>>> for (int i = 0; i< files.length; i++) {
>>>> if(files[i]!=null)
>>>> indexDocs(writer, new File(file, files[i]));
>>>> }
>>>> }
>>>> } else {
>>>> try {
>>>> Document doc = new Document();
>>>> Field pathField = new StringField("path", file.getPath(),
>>>> Field.Store.YES);
>>>> doc.add(pathField);
>>>> doc.add(new LongField("modified", file.lastModified(),
>>>> Field.Store.NO));
>>>> LineNumberReader lnr=new LineNumberReader(new
>>>> FileReader(file));
>>>> String line=null;
>>>> while( null != (line = lnr.readLine()) ){
>>>> doc.add(new StringField("contents",line,Field.Store.YES));
>>>> }
>>>> if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
>>>> writer.addDocument(doc);
>>>> } else {
>>>> writer.updateDocument(new Term("path", file.getPath()),
>>>> doc);
>>>> }
>>>> } finally {
>>>> }
>>>> }
>>>> }
>>>> } }
>>>>
>>>> /*SEARCHING CODE:-*/
>>>>
>>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>>
>>>> public class SearchFORALLQUERIES {
>>>> public static void main(String[] args) throws Exception {
>>>>
>>>> String[] argument=new String[20];
>>>> argument[0]="-index";
>>>> argument[1]="D:\\INDEXFORQUERY";
>>>> argument[2]="-field";
>>>> argument[3]="contents"; //field value
>>>> argument[4]="-repeat";
>>>> argument[5]="2"; //repeat value
>>>> argument[6]="-raw";
>>>> argument[7]="-paging";
>>>> argument[8]="300"; //paging value
>>>>
>>>> String index = "index";
>>>> String field = "contents";
>>>> String queries = null;
>>>> int repeat = 0;
>>>> boolean raw = false;
>>>> String queryString = null;
>>>> int hitsPerPage = 10;
>>>>
>>>> for(int i = 0;i< argument.length;i++) {
>>>> if ("-index".equals(argument[i])) {
>>>> index = argument[i+1];
>>>> i++;
>>>> } else if ("-field".equals(argument[i])) {
>>>> field = argument[i+1];
>>>> i++;
>>>> } else if ("-queries".equals(argument[i])) {
>>>> queries = argument[i+1];
>>>> i++;
>>>> } else if ("-query".equals(argument[i])) {
>>>> queryString = argument[i+1];
>>>> i++;
>>>> } else if ("-repeat".equals(argument[i])) {
>>>> repeat = Integer.parseInt(argument[i+1]);
>>>> i++;
>>>> } else if ("-raw".equals(argument[i])) {
>>>> raw = true; //set it true to just display the count. If false
>>>> then
>>>> it also display file name.
>>>> } else if ("-paging".equals(argument[i])) {
>>>> hitsPerPage = Integer.parseInt(argument[i+1]);
>>>> if (hitsPerPage<= 0) {
>>>> System.err.println("There must be at least 1 hit per page.");
>>>> System.exit(1);
>>>> }
>>>> i++;
>>>> }
>>>> }
>>>> System.out.println("processing input");
>>>> IndexReader reader = DirectoryReader.open(FSDirectory.open(new
>>>> File(index))); //location where indexes are.
>>>> IndexSearcher searcher = new IndexSearcher(reader);
>>>> BufferedReader in = null;
>>>> if (queries != null) {
>>>> in = new BufferedReader(new InputStreamReader(new
>>>> FileInputStream(queries), "UTF-8")); //provide query as input
>>>> } else {
>>>> in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
>>>> }
>>>> while (true) {
>>>> if (queries == null&& queryString == null) {
>>>> //
>>>>
>>>> prompt the user
>>>> System.out.println("Enter query: "); //if query is not present,
>>>> prompt the user to enter query.
>>>> }
>>>> String line = queryString != null ? queryString : in.readLine();
>>>>
>>>> if (line == null || line.length() == -1) {
>>>> break;
>>>> }
>>>> line = line.trim();
>>>> if (line.length() == 0) {
>>>> break;
>>>> }
>>>> String[] str=line.split(" ");
>>>> System.out.println("queries are " + str[0] + " and is " + str[1]);
>>>> Query query1 = new TermQuery(new Term(field, str[0]));
>>>> Query query2=new TermQuery(new Term(field,str[1]));
>>>> BooleanQuery booleanQuery = new BooleanQuery();
>>>> booleanQuery.add(query1, BooleanClause.Occur.MUST);
>>>> booleanQuery.add(query2, BooleanClause.Occur.MUST);
>>>> if (repeat> 0) { //repeat=2 // repeat&
>>>> time as benchmark
>>>> Date start = new Date();
>>>> for (int i = 0; i< repeat; i++) {
>>>> searcher.search(booleanQuery, null, 100);
>>>> }
>>>> Date end = new Date();
>>>> System.out.println("Time:
>>>> "+(end.getTime()-start.getTime())+"ms");
>>>> }
>>>> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw,
>>>> queries
>>>> == null&& queryString == null);
>>>>
>>>> if (queryString != null) {
>>>> break;
>>>> }
>>>> }
>>>> reader.close();
>>>> }
>>>> public static void doPagingSearch(BufferedReader in, IndexSearcher
>>>> searcher, Query query,
>>>> int hitsPerPage, boolean raw,
>>>> boolean
>>>> interactive) throws IOException {
>>>> TopDocs results = searcher.search(query, 5 * hitsPerPage);
>>>> ScoreDoc[] hits = results.scoreDocs;
>>>> int numTotalHits = results.totalHits;
>>>> System.out.println(numTotalHits + " total matching documents");
>>>> int start = 0;
>>>> int end = Math.min(numTotalHits, hitsPerPage);
>>>> while (true) {
>>>> if (end> hits.length) {
>>>> System.out.println("Only results 1 - " + hits.length +" of " +
>>>> numTotalHits + " total matching documents collected.");
>>>> System.out.println("Collect more (y/n) ?");
>>>> String line = in.readLine();
>>>> if (line.length() == 0 || line.charAt(0) == 'n') {
>>>> break;
>>>> }
>>>> hits = searcher.search(query, numTotalHits).scoreDocs;
>>>> }
>>>> end = Math.min(hits.length, start + hitsPerPage); //3 and 5.
>>>> for (int i = start; i< end; i++) { //0 to 3.
>>>> if (raw) {
>>>>
>>>> System.out.println("doc="+hits[i].doc+"
>>>> score="+hits[i].score);
>>>> }
>>>> Document doc = searcher.doc(hits[i].doc);
>>>> List<IndexableField> filed=doc.getFields();
>>>> filed.size();
>>>> String path = doc.get("path");
>>>> if (path != null) {
>>>> System.out.println((i+1) + ". " + path);
>>>> String title = doc.get("title");
>>>> if (title != null) {
>>>> System.out.println(" Title: " + doc.get("title"));
>>>> }
>>>> } else {
>>>> System.out.println((i+1) + ". " + "No path for this
>>>> document");
>>>> }
>>>> }
>>>> if (!interactive || end == 0) {
>>>> break;
>>>> }
>>>> if (numTotalHits>= end) {
>>>> boolean quit = false;
>>>> while (true) {
>>>> System.out.print("Press ");
>>>> if (start - hitsPerPage>= 0) {
>>>> System.out.print("(p)revious page, ");
>>>> }
>>>> if (start + hitsPerPage< numTotalHits) {
>>>> System.out.print("(n)ext page, ");
>>>> }
>>>> System.out.println("(q)uit or enter number to jump to a
>>>> page.");
>>>> String line = in.readLine();
>>>> if (line.length() == 0 || line.charAt(0)=='q') {
>>>> quit = true;
>>>> break;
>>>> }
>>>> if (line.charAt(0) == 'p') {
>>>> start = Math.max(0, start - hitsPerPage);
>>>> break;
>>>> } else if (line.charAt(0) == 'n') {
>>>> if (start + hitsPerPage< numTotalHits) {
>>>> start+=hitsPerPage;
>>>> }
>>>> break;
>>>> } else {
>>>> int page = Integer.parseInt(line);
>>>> if ((page - 1) * hitsPerPage< numTotalHits) {
>>>> start = (page - 1) * hitsPerPage;
>>>> break;
>>>> } else {
>>>> System.out.println("No such page");
>>>> }
>>>> }
>>>> }
>>>> if (quit) break;
>>>> end = Math.min(numTotalHits, start + hitsPerPage);
>>>> }
>>>> }
>>>> }
>>>> }
>>>>
>>>> /*CUSTOM ANALYZER CODE:*/
>>>>
>>>> package com.rancore.demo;
>>>>
>>>> import java.io.IOException;
>>>> import java.io.Reader;
>>>>
>>>> import org.apache.lucene.analysis.TokenStream;
>>>> import org.apache.lucene.analysis.core.StopAnalyzer;
>>>> import org.apache.lucene.analysis.core.StopFilter;
>>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>>> import org.apache.lucene.analysis.standard.StandardTokenizer;
>>>> import org.apache.lucene.analysis.util.CharArraySet;
>>>> import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
>>>> import org.apache.lucene.util.Version;
>>>>
>>>> public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase
>>>> {
>>>>
>>>> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
>>>> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
>>>> public static final CharArraySet STOP_WORDS_SET =
>>>> StopAnalyzer.ENGLISH_STOP_WORDS_SET;
>>>> public CustomAnalyzerForCaseSensitive(Version matchVersion,
>>>> CharArraySet stopWords) {
>>>> super(matchVersion, stopWords);
>>>> }
>>>> public CustomAnalyzerForCaseSensitive(Version matchVersion) {
>>>> this(matchVersion, STOP_WORDS_SET);
>>>> }
>>>> public CustomAnalyzerForCaseSensitive(Version matchVersion, Reader
>>>> stopwords) throws IOException {
>>>> this(matchVersion, loadStopwordSet(stopwords,
>>>> matchVersion));
>>>> }
>>>> public void setMaxTokenLength(int length) {
>>>> maxTokenLength = length;
>>>> }
>>>> /**
>>>> * @see #setMaxTokenLength
>>>> */
>>>> public int getMaxTokenLength() {
>>>> return maxTokenLength;
>>>> }
>>>> @Override
>>>> protected TokenStreamComponents createComponents(final String
>>>> fieldName,
>>>> final Reader reader) {
>>>> final StandardTokenizer src = new
>>>> StandardTokenizer(matchVersion,
>>>> reader);
>>>> src.setMaxTokenLength(maxTokenLength);
>>>> TokenStream tok = new StandardFilter(matchVersion, src);
>>>> // tok = new LowerCaseFilter(matchVersion, tok);
>>>> tok = new StopFilter(matchVersion, tok, stopwords);
>>>> return new TokenStreamComponents(src, tok) {
>>>> @Override
>>>> protected void setReader(final Reader reader) throws
>>>> IOException {
>>>>
>>>>
>>>> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
>>>> super.setReader(reader);
>>>> }
>>>> };
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
>>>>
>>>> PLEASE GUIDE ME NOW: IN case any further information is required please
>>>> let
>>>> me know.
>>>>
>>>>
>>>> On 8/14/2013 7:43 PM, Ian Lea wrote:
>>>>
>>>>
>>>>> Well, you have supplied a bit more info - good - but I still can't
>>>>> spot the problem. Unless someone else can I suggest you post a very
>>>>> small self-contained program that demonstrates the problem.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
>>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello.
>>>>>> The problem does not seem to be getting solved.
>>>>>>
>>>>>> As mentioned, I am indexing each line of each file.
>>>>>> The sample text present inside LUKE is
>>>>>>
>>>>>> <am name="notification" value="10"/>\
>>>>>> <type="DE">\
>>>>>> java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Size of list array::0\
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>> org.com.dummy,INFO,<< Still figuring out how to run
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> INSIDE POST OF Listener\
>>>>>>
>>>>>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
>>>>>> present in many files.
>>>>>>
>>>>>> /*Query is +contents:INSIDE contents:POST */ --/The field
>>>>>> name
>>>>>> is contents. Same analyzer is being used. This is a boolean query./
>>>>>>
>>>>>> To test, I indexed only 20 files. In 19 files, this is present.
>>>>>>
>>>>>> The boolean query should give me a hit for this document.
>>>>>>
>>>>>> BUT IT IS RETURNING ME NO HIT..
>>>>>>
>>>>>> If I index the same files WITHOUT line by line then, it gives me proper
>>>>>> hits..
>>>>>>
>>>>>> But for me it should work on Indexes created by Line by Line parsing
>>>>>> also.
>>>>>>
>>>>>> Please guide.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> remedialaction != "remedial action"?
>>>>>>>
>>>>>>> Show us your query. Show a small self-contained sample program or
>>>>>>> test case that demonstrates the problem. You need to give us
>>>>>>> something more to go on.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ian.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I am aware of that link and I have been through that link
>>>>>>>> many
>>>>>>>> number of times.
>>>>>>>>
>>>>>>>> Problem I have is:
>>>>>>>>
>>>>>>>> 1. Each line is indexed. So indexed line looks something like
>>>>>>>> "<attribute
>>>>>>>> name="remedial action" value="Checking"/>\"
>>>>>>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>>>>>>> possible values. No problem,.
>>>>>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as
>>>>>>>> a
>>>>>>>> must/must , then it is not providing me this document as a hit.
>>>>>>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>>>>>>> time.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>>>>>>> entry at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>>>>>> and post back if that doesn't help, with details of how you are
>>>>>>>>> analyzing the data and how you are searching.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hello All,
>>>>>>>>>> I have 2 different usecases.
>>>>>>>>>> I am trying to provide both boolean query and phrase search query
>>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>> application.
>>>>>>>>>>
>>>>>>>>>> In every line of the document which I am indexing I have content
>>>>>>>>>> like
>>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>>>>>
>>>>>>>>>> Due to the phrase search requirement, I am indexing each line of
>>>>>>>>>> the
>>>>>>>>>> file
>>>>>>>>>> as
>>>>>>>>>> a new document.
>>>>>>>>>>
>>>>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix
>>>>>>>>>> Analyzer
>>>>>>>>>> etc,
>>>>>>>>>> or phrase suggest) this seems to work fine and provide me with
>>>>>>>>>> desired
>>>>>>>>>> suggestions.
>>>>>>>>>>
>>>>>>>>>> Problem is :
>>>>>>>>>>
>>>>>>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>>>>>>> indexes
>>>>>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>>>>>
>>>>>>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>>>>>>> "remedialaction" and "Checking" how do I get this document as a
>>>>>>>>>> hit.
>>>>>>>>>> I
>>>>>>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>>>>>>
>>>>>>>>>> Please guide.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Ankit
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Ankit Murarka
>>>>>>>>
>>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>>> compared
>>>>>>>> with
>>>>>>>> what lies within us"
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>>
>>>>>> Ankit Murarka
>>>>>>
>>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>>> with
>>>>>> what lies within us"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ian Lea <ia...@gmail.com>.
If you're using StandardAnalyzer what's the reference to
CustomAnalyzerForCaseSensitive all about? Someone else with more
patience or better diagnostic skill may well spot your problem but I
can't.
My final suggestion is that you build and post the smallest possible
self-contained program, using RAMDirectory and no external classes.
If you are using a custom analyzer try it without - if that works
you've got a clue as to where to look next.
Good luck.
On Wed, Aug 14, 2013 at 3:46 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello. I gave the complete code sample so that anyone can try and let me
> know. This is because this issue is really taking a toll on me.
> I am so close yet so far.
>
> Yes, I am using analyzer to index the document. The Analyzer is
> StandardAnalyzer but I have commented the LowerCaseFilter code from that.
>
> Yes In my trailing mail I have mentioned the same.
>
> This is what is present in my file:
>
> INSIDE POST OF Listener\
>
> This is what is present in the index:
>
> INSIDE POST OF Listener\
>
> The query which I gave to search:
>
>
> Query is +contents:INSIDE contents:POST
>
>
> STILL I AM GETTING NO HIT.. But If I index all the documents normally
> (without indexing them line by line) I do get HITS..
>
> Still not able to figure out the problem.
>
>
>
> On 8/14/2013 8:07 PM, Ian Lea wrote:
>>
>> I was rather hoping for something smaller!
>>
>> One suggestion from a glance is that you're using some analyzer
>> somewhere but building a BooleanQuery out of a TermQuery or two. Are
>> you sure (test it and prove it) that the strings you pass to the
>> TermQuery are EXACTLY what has been indexed?
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Hello. The problem is as follows:
>>>
>>> I have a document containing information in lines. So I am indexing all
>>> files line by line.
>>> So If I say in my document I have,
>>> INSIDE POST OF SERVER\
>>> and in my index file created I have,
>>> INSIDE POST OF SERVER\
>>>
>>> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am
>>> getting
>>> no HIT.
>>>
>>> I am providing the complete CODE I am using to create INDEX and TO
>>> SEARCH..Both are drawn from sample code present online.
>>>
>>> /*INDEX CODE:
>>> */
>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>
>>> public class CreateIndex {
>>> public static void main(String[] args) {
>>> String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will
>>> be
>>> created
>>> String docsPath="Indexed"; //Place where the files are kept.
>>> boolean create=true;
>>> final File docDir = new File(docsPath);
>>> if (!docDir.exists() || !docDir.canRead()) {
>>> System.exit(1);
>>> }
>>> try {
>>> Directory dir = FSDirectory.open(new File(indexPath));
>>> Analyzer analyzer=new
>>> CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>>> analyzer);
>>> if (create) {
>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>> } else {
>>> System.out.println("Trying to set IWC mode to UPDATE...NOT
>>> DESIRED..");
>>> }
>>> IndexWriter writer = new IndexWriter(dir, iwc);
>>> indexDocs(writer, docDir);
>>> writer.close();
>>> } catch (IOException e) {
>>> System.out.println(" caught a " + e.getClass() +
>>> "\n with message: " + e.getMessage());
>>> }
>>> }
>>> static void indexDocs(IndexWriter writer, File file)
>>> throws IOException {
>>> if (file.canRead())
>>> {
>>> if (file.isDirectory()) {
>>> String[] files = file.list();
>>> if (files != null) {
>>> for (int i = 0; i< files.length; i++) {
>>> if(files[i]!=null)
>>> indexDocs(writer, new File(file, files[i]));
>>> }
>>> }
>>> } else {
>>> try {
>>> Document doc = new Document();
>>> Field pathField = new StringField("path", file.getPath(),
>>> Field.Store.YES);
>>> doc.add(pathField);
>>> doc.add(new LongField("modified", file.lastModified(),
>>> Field.Store.NO));
>>> LineNumberReader lnr=new LineNumberReader(new
>>> FileReader(file));
>>> String line=null;
>>> while( null != (line = lnr.readLine()) ){
>>> doc.add(new StringField("contents",line,Field.Store.YES));
>>> }
>>> if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
>>> writer.addDocument(doc);
>>> } else {
>>> writer.updateDocument(new Term("path", file.getPath()),
>>> doc);
>>> }
>>> } finally {
>>> }
>>> }
>>> }
>>> } }
>>>
>>> /*SEARCHING CODE:-*/
>>>
>>> package org.RunAllQueriesWithLineByLinePhrases;
>>>
>>> public class SearchFORALLQUERIES {
>>> public static void main(String[] args) throws Exception {
>>>
>>> String[] argument=new String[20];
>>> argument[0]="-index";
>>> argument[1]="D:\\INDEXFORQUERY";
>>> argument[2]="-field";
>>> argument[3]="contents"; //field value
>>> argument[4]="-repeat";
>>> argument[5]="2"; //repeat value
>>> argument[6]="-raw";
>>> argument[7]="-paging";
>>> argument[8]="300"; //paging value
>>>
>>> String index = "index";
>>> String field = "contents";
>>> String queries = null;
>>> int repeat = 0;
>>> boolean raw = false;
>>> String queryString = null;
>>> int hitsPerPage = 10;
>>>
>>> for(int i = 0;i< argument.length;i++) {
>>> if ("-index".equals(argument[i])) {
>>> index = argument[i+1];
>>> i++;
>>> } else if ("-field".equals(argument[i])) {
>>> field = argument[i+1];
>>> i++;
>>> } else if ("-queries".equals(argument[i])) {
>>> queries = argument[i+1];
>>> i++;
>>> } else if ("-query".equals(argument[i])) {
>>> queryString = argument[i+1];
>>> i++;
>>> } else if ("-repeat".equals(argument[i])) {
>>> repeat = Integer.parseInt(argument[i+1]);
>>> i++;
>>> } else if ("-raw".equals(argument[i])) {
>>> raw = true; //set it true to just display the count. If false
>>> then
>>> it also display file name.
>>> } else if ("-paging".equals(argument[i])) {
>>> hitsPerPage = Integer.parseInt(argument[i+1]);
>>> if (hitsPerPage<= 0) {
>>> System.err.println("There must be at least 1 hit per page.");
>>> System.exit(1);
>>> }
>>> i++;
>>> }
>>> }
>>> System.out.println("processing input");
>>> IndexReader reader = DirectoryReader.open(FSDirectory.open(new
>>> File(index))); //location where indexes are.
>>> IndexSearcher searcher = new IndexSearcher(reader);
>>> BufferedReader in = null;
>>> if (queries != null) {
>>> in = new BufferedReader(new InputStreamReader(new
>>> FileInputStream(queries), "UTF-8")); //provide query as input
>>> } else {
>>> in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
>>> }
>>> while (true) {
>>> if (queries == null&& queryString == null) {
>>> //
>>>
>>> prompt the user
>>> System.out.println("Enter query: "); //if query is not present,
>>> prompt the user to enter query.
>>> }
>>> String line = queryString != null ? queryString : in.readLine();
>>>
>>> if (line == null || line.length() == -1) {
>>> break;
>>> }
>>> line = line.trim();
>>> if (line.length() == 0) {
>>> break;
>>> }
>>> String[] str=line.split(" ");
>>> System.out.println("queries are " + str[0] + " and is " + str[1]);
>>> Query query1 = new TermQuery(new Term(field, str[0]));
>>> Query query2=new TermQuery(new Term(field,str[1]));
>>> BooleanQuery booleanQuery = new BooleanQuery();
>>> booleanQuery.add(query1, BooleanClause.Occur.MUST);
>>> booleanQuery.add(query2, BooleanClause.Occur.MUST);
>>> if (repeat> 0) { //repeat=2 // repeat&
>>> time as benchmark
>>> Date start = new Date();
>>> for (int i = 0; i< repeat; i++) {
>>> searcher.search(booleanQuery, null, 100);
>>> }
>>> Date end = new Date();
>>> System.out.println("Time:
>>> "+(end.getTime()-start.getTime())+"ms");
>>> }
>>> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw,
>>> queries
>>> == null&& queryString == null);
>>>
>>> if (queryString != null) {
>>> break;
>>> }
>>> }
>>> reader.close();
>>> }
>>> public static void doPagingSearch(BufferedReader in, IndexSearcher
>>> searcher, Query query,
>>> int hitsPerPage, boolean raw,
>>> boolean
>>> interactive) throws IOException {
>>> TopDocs results = searcher.search(query, 5 * hitsPerPage);
>>> ScoreDoc[] hits = results.scoreDocs;
>>> int numTotalHits = results.totalHits;
>>> System.out.println(numTotalHits + " total matching documents");
>>> int start = 0;
>>> int end = Math.min(numTotalHits, hitsPerPage);
>>> while (true) {
>>> if (end> hits.length) {
>>> System.out.println("Only results 1 - " + hits.length +" of " +
>>> numTotalHits + " total matching documents collected.");
>>> System.out.println("Collect more (y/n) ?");
>>> String line = in.readLine();
>>> if (line.length() == 0 || line.charAt(0) == 'n') {
>>> break;
>>> }
>>> hits = searcher.search(query, numTotalHits).scoreDocs;
>>> }
>>> end = Math.min(hits.length, start + hitsPerPage); //3 and 5.
>>> for (int i = start; i< end; i++) { //0 to 3.
>>> if (raw) {
>>>
>>> System.out.println("doc="+hits[i].doc+"
>>> score="+hits[i].score);
>>> }
>>> Document doc = searcher.doc(hits[i].doc);
>>> List<IndexableField> filed=doc.getFields();
>>> filed.size();
>>> String path = doc.get("path");
>>> if (path != null) {
>>> System.out.println((i+1) + ". " + path);
>>> String title = doc.get("title");
>>> if (title != null) {
>>> System.out.println(" Title: " + doc.get("title"));
>>> }
>>> } else {
>>> System.out.println((i+1) + ". " + "No path for this
>>> document");
>>> }
>>> }
>>> if (!interactive || end == 0) {
>>> break;
>>> }
>>> if (numTotalHits>= end) {
>>> boolean quit = false;
>>> while (true) {
>>> System.out.print("Press ");
>>> if (start - hitsPerPage>= 0) {
>>> System.out.print("(p)revious page, ");
>>> }
>>> if (start + hitsPerPage< numTotalHits) {
>>> System.out.print("(n)ext page, ");
>>> }
>>> System.out.println("(q)uit or enter number to jump to a
>>> page.");
>>> String line = in.readLine();
>>> if (line.length() == 0 || line.charAt(0)=='q') {
>>> quit = true;
>>> break;
>>> }
>>> if (line.charAt(0) == 'p') {
>>> start = Math.max(0, start - hitsPerPage);
>>> break;
>>> } else if (line.charAt(0) == 'n') {
>>> if (start + hitsPerPage< numTotalHits) {
>>> start+=hitsPerPage;
>>> }
>>> break;
>>> } else {
>>> int page = Integer.parseInt(line);
>>> if ((page - 1) * hitsPerPage< numTotalHits) {
>>> start = (page - 1) * hitsPerPage;
>>> break;
>>> } else {
>>> System.out.println("No such page");
>>> }
>>> }
>>> }
>>> if (quit) break;
>>> end = Math.min(numTotalHits, start + hitsPerPage);
>>> }
>>> }
>>> }
>>> }
>>>
>>> /*CUSTOM ANALYZER CODE:*/
>>>
>>> package com.rancore.demo;
>>>
>>> import java.io.IOException;
>>> import java.io.Reader;
>>>
>>> import org.apache.lucene.analysis.TokenStream;
>>> import org.apache.lucene.analysis.core.StopAnalyzer;
>>> import org.apache.lucene.analysis.core.StopFilter;
>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>> import org.apache.lucene.analysis.standard.StandardTokenizer;
>>> import org.apache.lucene.analysis.util.CharArraySet;
>>> import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
>>> import org.apache.lucene.util.Version;
>>>
>>> public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase
>>> {
>>>
>>> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
>>> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
>>> public static final CharArraySet STOP_WORDS_SET =
>>> StopAnalyzer.ENGLISH_STOP_WORDS_SET;
>>> public CustomAnalyzerForCaseSensitive(Version matchVersion,
>>> CharArraySet stopWords) {
>>> super(matchVersion, stopWords);
>>> }
>>> public CustomAnalyzerForCaseSensitive(Version matchVersion) {
>>> this(matchVersion, STOP_WORDS_SET);
>>> }
>>> public CustomAnalyzerForCaseSensitive(Version matchVersion, Reader
>>> stopwords) throws IOException {
>>> this(matchVersion, loadStopwordSet(stopwords,
>>> matchVersion));
>>> }
>>> public void setMaxTokenLength(int length) {
>>> maxTokenLength = length;
>>> }
>>> /**
>>> * @see #setMaxTokenLength
>>> */
>>> public int getMaxTokenLength() {
>>> return maxTokenLength;
>>> }
>>> @Override
>>> protected TokenStreamComponents createComponents(final String
>>> fieldName,
>>> final Reader reader) {
>>> final StandardTokenizer src = new
>>> StandardTokenizer(matchVersion,
>>> reader);
>>> src.setMaxTokenLength(maxTokenLength);
>>> TokenStream tok = new StandardFilter(matchVersion, src);
>>> // tok = new LowerCaseFilter(matchVersion, tok);
>>> tok = new StopFilter(matchVersion, tok, stopwords);
>>> return new TokenStreamComponents(src, tok) {
>>> @Override
>>> protected void setReader(final Reader reader) throws
>>> IOException {
>>>
>>>
>>> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
>>> super.setReader(reader);
>>> }
>>> };
>>> }
>>> }
>>>
>>>
>>>
>>> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
>>>
>>> PLEASE GUIDE ME NOW: IN case any further information is required please
>>> let
>>> me know.
>>>
>>>
>>> On 8/14/2013 7:43 PM, Ian Lea wrote:
>>>
>>>>
>>>> Well, you have supplied a bit more info - good - but I still can't
>>>> spot the problem. Unless someone else can I suggest you post a very
>>>> small self-contained program that demonstrates the problem.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Hello.
>>>>> The problem does not seem to be getting solved.
>>>>>
>>>>> As mentioned, I am indexing each line of each file.
>>>>> The sample text present inside LUKE is
>>>>>
>>>>> <am name="notification" value="10"/>\
>>>>> <type="DE">\
>>>>> java.lang.Thread.run(Thread.java:619)
>>>>>
>>>>>
>>>>>>>
>>>>>>> Size of list array::0\
>>>>>>>
>>>>>>>
>>>>>
>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>> org.com.dummy,INFO,<< Still figuring out how to run
>>>>>
>>>>>
>>>>>>>
>>>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>>>>
>>>>>>>
>>>>>
>>>>> INSIDE POST OF Listener\
>>>>>
>>>>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
>>>>> present in many files.
>>>>>
>>>>> /*Query is +contents:INSIDE contents:POST */ --/The field
>>>>> name
>>>>> is contents. Same analyzer is being used. This is a boolean query./
>>>>>
>>>>> To test, I indexed only 20 files. In 19 files, this is present.
>>>>>
>>>>> The boolean query should give me a hit for this document.
>>>>>
>>>>> BUT IT IS RETURNING ME NO HIT..
>>>>>
>>>>> If I index the same files WITHOUT line by line then, it gives me proper
>>>>> hits..
>>>>>
>>>>> But for me it should work on Indexes created by Line by Line parsing
>>>>> also.
>>>>>
>>>>> Please guide.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> remedialaction != "remedial action"?
>>>>>>
>>>>>> Show us your query. Show a small self-contained sample program or
>>>>>> test case that demonstrates the problem. You need to give us
>>>>>> something more to go on.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>>>>> <an...@rancoretech.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>> I am aware of that link and I have been through that link
>>>>>>> many
>>>>>>> number of times.
>>>>>>>
>>>>>>> Problem I have is:
>>>>>>>
>>>>>>> 1. Each line is indexed. So indexed line looks something like
>>>>>>> "<attribute
>>>>>>> name="remedial action" value="Checking"/>\"
>>>>>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>>>>>> possible values. No problem,.
>>>>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as
>>>>>>> a
>>>>>>> must/must , then it is not providing me this document as a hit.
>>>>>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>>>>>> time.
>>>>>>>
>>>>>>>
>>>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>>>>>> entry at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>>>>> and post back if that doesn't help, with details of how you are
>>>>>>>> analyzing the data and how you are searching.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ian.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>> I have 2 different usecases.
>>>>>>>>> I am trying to provide both boolean query and phrase search query
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> application.
>>>>>>>>>
>>>>>>>>> In every line of the document which I am indexing I have content
>>>>>>>>> like
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>>>>
>>>>>>>>> Due to the phrase search requirement, I am indexing each line of
>>>>>>>>> the
>>>>>>>>> file
>>>>>>>>> as
>>>>>>>>> a new document.
>>>>>>>>>
>>>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix
>>>>>>>>> Analyzer
>>>>>>>>> etc,
>>>>>>>>> or phrase suggest) this seems to work fine and provide me with
>>>>>>>>> desired
>>>>>>>>> suggestions.
>>>>>>>>>
>>>>>>>>> Problem is :
>>>>>>>>>
>>>>>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>>>>>> indexes
>>>>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>>>>
>>>>>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>>>>>> "remedialaction" and "Checking" how do I get this document as a
>>>>>>>>> hit.
>>>>>>>>> I
>>>>>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>>>>>
>>>>>>>>> Please guide.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Ankit
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit Murarka
>>>>>>>
>>>>>>> "What lies behind us and what lies before us are tiny matters
>>>>>>> compared
>>>>>>> with
>>>>>>> what lies within us"
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello. I gave the complete code sample so that anyone can try and let me
know. This is because this issue is really taking a toll on me.
I am so close yet so far.
Yes, I am using analyzer to index the document. The Analyzer is
StandardAnalyzer but I have commented the LowerCaseFilter code from that.
Yes In my trailing mail I have mentioned the same.
This is what is present in my file:
INSIDE POST OF Listener\
This is what is present in the index:
INSIDE POST OF Listener\
The query which I gave to search:
Query is +contents:INSIDE contents:POST
STILL I AM GETTING NO HIT.. But If I index all the documents normally
(without indexing them line by line) I do get HITS..
Still not able to figure out the problem.
On 8/14/2013 8:07 PM, Ian Lea wrote:
> I was rather hoping for something smaller!
>
> One suggestion from a glance is that you're using some analyzer
> somewhere but building a BooleanQuery out of a TermQuery or two. Are
> you sure (test it and prove it) that the strings you pass to the
> TermQuery are EXACTLY what has been indexed?
>
>
> --
> Ian.
>
>
> On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello. The problem is as follows:
>>
>> I have a document containing information in lines. So I am indexing all
>> files line by line.
>> So If I say in my document I have,
>> INSIDE POST OF SERVER\
>> and in my index file created I have,
>> INSIDE POST OF SERVER\
>>
>> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am getting
>> no HIT.
>>
>> I am providing the complete CODE I am using to create INDEX and TO
>> SEARCH..Both are drawn from sample code present online.
>>
>> /*INDEX CODE:
>> */
>> package org.RunAllQueriesWithLineByLinePhrases;
>>
>> public class CreateIndex {
>> public static void main(String[] args) {
>> String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will be
>> created
>> String docsPath="Indexed"; //Place where the files are kept.
>> boolean create=true;
>> final File docDir = new File(docsPath);
>> if (!docDir.exists() || !docDir.canRead()) {
>> System.exit(1);
>> }
>> try {
>> Directory dir = FSDirectory.open(new File(indexPath));
>> Analyzer analyzer=new
>> CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>> analyzer);
>> if (create) {
>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>> } else {
>> System.out.println("Trying to set IWC mode to UPDATE...NOT
>> DESIRED..");
>> }
>> IndexWriter writer = new IndexWriter(dir, iwc);
>> indexDocs(writer, docDir);
>> writer.close();
>> } catch (IOException e) {
>> System.out.println(" caught a " + e.getClass() +
>> "\n with message: " + e.getMessage());
>> }
>> }
>> static void indexDocs(IndexWriter writer, File file)
>> throws IOException {
>> if (file.canRead())
>> {
>> if (file.isDirectory()) {
>> String[] files = file.list();
>> if (files != null) {
>> for (int i = 0; i< files.length; i++) {
>> if(files[i]!=null)
>> indexDocs(writer, new File(file, files[i]));
>> }
>> }
>> } else {
>> try {
>> Document doc = new Document();
>> Field pathField = new StringField("path", file.getPath(),
>> Field.Store.YES);
>> doc.add(pathField);
>> doc.add(new LongField("modified", file.lastModified(),
>> Field.Store.NO));
>> LineNumberReader lnr=new LineNumberReader(new FileReader(file));
>> String line=null;
>> while( null != (line = lnr.readLine()) ){
>> doc.add(new StringField("contents",line,Field.Store.YES));
>> }
>> if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
>> writer.addDocument(doc);
>> } else {
>> writer.updateDocument(new Term("path", file.getPath()), doc);
>> }
>> } finally {
>> }
>> }
>> }
>> } }
>>
>> /*SEARCHING CODE:-*/
>>
>> package org.RunAllQueriesWithLineByLinePhrases;
>>
>> public class SearchFORALLQUERIES {
>> public static void main(String[] args) throws Exception {
>>
>> String[] argument=new String[20];
>> argument[0]="-index";
>> argument[1]="D:\\INDEXFORQUERY";
>> argument[2]="-field";
>> argument[3]="contents"; //field value
>> argument[4]="-repeat";
>> argument[5]="2"; //repeat value
>> argument[6]="-raw";
>> argument[7]="-paging";
>> argument[8]="300"; //paging value
>>
>> String index = "index";
>> String field = "contents";
>> String queries = null;
>> int repeat = 0;
>> boolean raw = false;
>> String queryString = null;
>> int hitsPerPage = 10;
>>
>> for(int i = 0;i< argument.length;i++) {
>> if ("-index".equals(argument[i])) {
>> index = argument[i+1];
>> i++;
>> } else if ("-field".equals(argument[i])) {
>> field = argument[i+1];
>> i++;
>> } else if ("-queries".equals(argument[i])) {
>> queries = argument[i+1];
>> i++;
>> } else if ("-query".equals(argument[i])) {
>> queryString = argument[i+1];
>> i++;
>> } else if ("-repeat".equals(argument[i])) {
>> repeat = Integer.parseInt(argument[i+1]);
>> i++;
>> } else if ("-raw".equals(argument[i])) {
>> raw = true; //set it true to just display the count. If false then
>> it also display file name.
>> } else if ("-paging".equals(argument[i])) {
>> hitsPerPage = Integer.parseInt(argument[i+1]);
>> if (hitsPerPage<= 0) {
>> System.err.println("There must be at least 1 hit per page.");
>> System.exit(1);
>> }
>> i++;
>> }
>> }
>> System.out.println("processing input");
>> IndexReader reader = DirectoryReader.open(FSDirectory.open(new
>> File(index))); //location where indexes are.
>> IndexSearcher searcher = new IndexSearcher(reader);
>> BufferedReader in = null;
>> if (queries != null) {
>> in = new BufferedReader(new InputStreamReader(new
>> FileInputStream(queries), "UTF-8")); //provide query as input
>> } else {
>> in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
>> }
>> while (true) {
>> if (queries == null&& queryString == null) { //
>> prompt the user
>> System.out.println("Enter query: "); //if query is not present,
>> prompt the user to enter query.
>> }
>> String line = queryString != null ? queryString : in.readLine();
>>
>> if (line == null || line.length() == -1) {
>> break;
>> }
>> line = line.trim();
>> if (line.length() == 0) {
>> break;
>> }
>> String[] str=line.split(" ");
>> System.out.println("queries are " + str[0] + " and is " + str[1]);
>> Query query1 = new TermQuery(new Term(field, str[0]));
>> Query query2=new TermQuery(new Term(field,str[1]));
>> BooleanQuery booleanQuery = new BooleanQuery();
>> booleanQuery.add(query1, BooleanClause.Occur.MUST);
>> booleanQuery.add(query2, BooleanClause.Occur.MUST);
>> if (repeat> 0) { //repeat=2 // repeat&
>> time as benchmark
>> Date start = new Date();
>> for (int i = 0; i< repeat; i++) {
>> searcher.search(booleanQuery, null, 100);
>> }
>> Date end = new Date();
>> System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
>> }
>> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw, queries
>> == null&& queryString == null);
>> if (queryString != null) {
>> break;
>> }
>> }
>> reader.close();
>> }
>> public static void doPagingSearch(BufferedReader in, IndexSearcher
>> searcher, Query query,
>> int hitsPerPage, boolean raw, boolean
>> interactive) throws IOException {
>> TopDocs results = searcher.search(query, 5 * hitsPerPage);
>> ScoreDoc[] hits = results.scoreDocs;
>> int numTotalHits = results.totalHits;
>> System.out.println(numTotalHits + " total matching documents");
>> int start = 0;
>> int end = Math.min(numTotalHits, hitsPerPage);
>> while (true) {
>> if (end> hits.length) {
>> System.out.println("Only results 1 - " + hits.length +" of " +
>> numTotalHits + " total matching documents collected.");
>> System.out.println("Collect more (y/n) ?");
>> String line = in.readLine();
>> if (line.length() == 0 || line.charAt(0) == 'n') {
>> break;
>> }
>> hits = searcher.search(query, numTotalHits).scoreDocs;
>> }
>> end = Math.min(hits.length, start + hitsPerPage); //3 and 5.
>> for (int i = start; i< end; i++) { //0 to 3.
>> if (raw) {
>>
>> System.out.println("doc="+hits[i].doc+" score="+hits[i].score);
>> }
>> Document doc = searcher.doc(hits[i].doc);
>> List<IndexableField> filed=doc.getFields();
>> filed.size();
>> String path = doc.get("path");
>> if (path != null) {
>> System.out.println((i+1) + ". " + path);
>> String title = doc.get("title");
>> if (title != null) {
>> System.out.println(" Title: " + doc.get("title"));
>> }
>> } else {
>> System.out.println((i+1) + ". " + "No path for this document");
>> }
>> }
>> if (!interactive || end == 0) {
>> break;
>> }
>> if (numTotalHits>= end) {
>> boolean quit = false;
>> while (true) {
>> System.out.print("Press ");
>> if (start - hitsPerPage>= 0) {
>> System.out.print("(p)revious page, ");
>> }
>> if (start + hitsPerPage< numTotalHits) {
>> System.out.print("(n)ext page, ");
>> }
>> System.out.println("(q)uit or enter number to jump to a page.");
>> String line = in.readLine();
>> if (line.length() == 0 || line.charAt(0)=='q') {
>> quit = true;
>> break;
>> }
>> if (line.charAt(0) == 'p') {
>> start = Math.max(0, start - hitsPerPage);
>> break;
>> } else if (line.charAt(0) == 'n') {
>> if (start + hitsPerPage< numTotalHits) {
>> start+=hitsPerPage;
>> }
>> break;
>> } else {
>> int page = Integer.parseInt(line);
>> if ((page - 1) * hitsPerPage< numTotalHits) {
>> start = (page - 1) * hitsPerPage;
>> break;
>> } else {
>> System.out.println("No such page");
>> }
>> }
>> }
>> if (quit) break;
>> end = Math.min(numTotalHits, start + hitsPerPage);
>> }
>> }
>> }
>> }
>>
>> /*CUSTOM ANALYZER CODE:*/
>>
>> package com.rancore.demo;
>>
>> import java.io.IOException;
>> import java.io.Reader;
>>
>> import org.apache.lucene.analysis.TokenStream;
>> import org.apache.lucene.analysis.core.StopAnalyzer;
>> import org.apache.lucene.analysis.core.StopFilter;
>> import org.apache.lucene.analysis.standard.StandardFilter;
>> import org.apache.lucene.analysis.standard.StandardTokenizer;
>> import org.apache.lucene.analysis.util.CharArraySet;
>> import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
>> import org.apache.lucene.util.Version;
>>
>> public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase {
>>
>> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
>> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
>> public static final CharArraySet STOP_WORDS_SET =
>> StopAnalyzer.ENGLISH_STOP_WORDS_SET;
>> public CustomAnalyzerForCaseSensitive(Version matchVersion,
>> CharArraySet stopWords) {
>> super(matchVersion, stopWords);
>> }
>> public CustomAnalyzerForCaseSensitive(Version matchVersion) {
>> this(matchVersion, STOP_WORDS_SET);
>> }
>> public CustomAnalyzerForCaseSensitive(Version matchVersion, Reader
>> stopwords) throws IOException {
>> this(matchVersion, loadStopwordSet(stopwords, matchVersion));
>> }
>> public void setMaxTokenLength(int length) {
>> maxTokenLength = length;
>> }
>> /**
>> * @see #setMaxTokenLength
>> */
>> public int getMaxTokenLength() {
>> return maxTokenLength;
>> }
>> @Override
>> protected TokenStreamComponents createComponents(final String fieldName,
>> final Reader reader) {
>> final StandardTokenizer src = new StandardTokenizer(matchVersion,
>> reader);
>> src.setMaxTokenLength(maxTokenLength);
>> TokenStream tok = new StandardFilter(matchVersion, src);
>> // tok = new LowerCaseFilter(matchVersion, tok);
>> tok = new StopFilter(matchVersion, tok, stopwords);
>> return new TokenStreamComponents(src, tok) {
>> @Override
>> protected void setReader(final Reader reader) throws
>> IOException {
>>
>> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
>> super.setReader(reader);
>> }
>> };
>> }
>> }
>>
>>
>>
>> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
>>
>> PLEASE GUIDE ME NOW: IN case any further information is required please let
>> me know.
>>
>>
>> On 8/14/2013 7:43 PM, Ian Lea wrote:
>>
>>> Well, you have supplied a bit more info - good - but I still can't
>>> spot the problem. Unless someone else can I suggest you post a very
>>> small self-contained program that demonstrates the problem.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Hello.
>>>> The problem does not seem to be getting solved.
>>>>
>>>> As mentioned, I am indexing each line of each file.
>>>> The sample text present inside LUKE is
>>>>
>>>> <am name="notification" value="10"/>\
>>>> <type="DE">\
>>>> java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>>>> Size of list array::0\
>>>>>>
>>>>>>
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> org.com.dummy,INFO,<< Still figuring out how to run
>>>>
>>>>
>>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>>>
>>>>>>
>>>> INSIDE POST OF Listener\
>>>>
>>>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
>>>> present in many files.
>>>>
>>>> /*Query is +contents:INSIDE contents:POST */ --/The field
>>>> name
>>>> is contents. Same analyzer is being used. This is a boolean query./
>>>>
>>>> To test, I indexed only 20 files. In 19 files, this is present.
>>>>
>>>> The boolean query should give me a hit for this document.
>>>>
>>>> BUT IT IS RETURNING ME NO HIT..
>>>>
>>>> If I index the same files WITHOUT line by line then, it gives me proper
>>>> hits..
>>>>
>>>> But for me it should work on Indexes created by Line by Line parsing
>>>> also.
>>>>
>>>> Please guide.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>>>
>>>>
>>>>> remedialaction != "remedial action"?
>>>>>
>>>>> Show us your query. Show a small self-contained sample program or
>>>>> test case that demonstrates the problem. You need to give us
>>>>> something more to go on.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello,
>>>>>> I am aware of that link and I have been through that link
>>>>>> many
>>>>>> number of times.
>>>>>>
>>>>>> Problem I have is:
>>>>>>
>>>>>> 1. Each line is indexed. So indexed line looks something like
>>>>>> "<attribute
>>>>>> name="remedial action" value="Checking"/>\"
>>>>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>>>>> possible values. No problem,.
>>>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
>>>>>> must/must , then it is not providing me this document as a hit.
>>>>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>>>>> time.
>>>>>>
>>>>>>
>>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>>>>> entry at
>>>>>>>
>>>>>>>
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>>>> and post back if that doesn't help, with details of how you are
>>>>>>> analyzing the data and how you are searching.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ian.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>>>> <an...@rancoretech.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>> I have 2 different usecases.
>>>>>>>> I am trying to provide both boolean query and phrase search query in
>>>>>>>> the
>>>>>>>> application.
>>>>>>>>
>>>>>>>> In every line of the document which I am indexing I have content like
>>>>>>>> :
>>>>>>>>
>>>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>>>
>>>>>>>> Due to the phrase search requirement, I am indexing each line of the
>>>>>>>> file
>>>>>>>> as
>>>>>>>> a new document.
>>>>>>>>
>>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix
>>>>>>>> Analyzer
>>>>>>>> etc,
>>>>>>>> or phrase suggest) this seems to work fine and provide me with
>>>>>>>> desired
>>>>>>>> suggestions.
>>>>>>>>
>>>>>>>> Problem is :
>>>>>>>>
>>>>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>>>>> indexes
>>>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>>>
>>>>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>>>>> "remedialaction" and "Checking" how do I get this document as a hit.
>>>>>>>> I
>>>>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>>>>
>>>>>>>> Please guide.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Ankit
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>>
>>>>>> Ankit Murarka
>>>>>>
>>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>>> with
>>>>>> what lies within us"
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ian Lea <ia...@gmail.com>.
I was rather hoping for something smaller!
One suggestion from a glance is that you're using some analyzer
somewhere but building a BooleanQuery out of a TermQuery or two. Are
you sure (test it and prove it) that the strings you pass to the
TermQuery are EXACTLY what has been indexed?
--
Ian.
On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello. The problem is as follows:
>
> I have a document containing information in lines. So I am indexing all
> files line by line.
> So If I say in my document I have,
> INSIDE POST OF SERVER\
> and in my index file created I have,
> INSIDE POST OF SERVER\
>
> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am getting
> no HIT.
>
> I am providing the complete CODE I am using to create INDEX and TO
> SEARCH..Both are drawn from sample code present online.
>
> /*INDEX CODE:
> */
> package org.RunAllQueriesWithLineByLinePhrases;
>
> public class CreateIndex {
> public static void main(String[] args) {
> String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will be
> created
> String docsPath="Indexed"; //Place where the files are kept.
> boolean create=true;
> final File docDir = new File(docsPath);
> if (!docDir.exists() || !docDir.canRead()) {
> System.exit(1);
> }
> try {
> Directory dir = FSDirectory.open(new File(indexPath));
> Analyzer analyzer=new
> CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
> analyzer);
> if (create) {
> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
> } else {
> System.out.println("Trying to set IWC mode to UPDATE...NOT
> DESIRED..");
> }
> IndexWriter writer = new IndexWriter(dir, iwc);
> indexDocs(writer, docDir);
> writer.close();
> } catch (IOException e) {
> System.out.println(" caught a " + e.getClass() +
> "\n with message: " + e.getMessage());
> }
> }
> static void indexDocs(IndexWriter writer, File file)
> throws IOException {
> if (file.canRead())
> {
> if (file.isDirectory()) {
> String[] files = file.list();
> if (files != null) {
> for (int i = 0; i < files.length; i++) {
> if(files[i]!=null)
> indexDocs(writer, new File(file, files[i]));
> }
> }
> } else {
> try {
> Document doc = new Document();
> Field pathField = new StringField("path", file.getPath(),
> Field.Store.YES);
> doc.add(pathField);
> doc.add(new LongField("modified", file.lastModified(),
> Field.Store.NO));
> LineNumberReader lnr=new LineNumberReader(new FileReader(file));
> String line=null;
> while( null != (line = lnr.readLine()) ){
> doc.add(new StringField("contents",line,Field.Store.YES));
> }
> if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
> writer.addDocument(doc);
> } else {
> writer.updateDocument(new Term("path", file.getPath()), doc);
> }
> } finally {
> }
> }
> }
> } }
>
> /*SEARCHING CODE:-*/
>
> package org.RunAllQueriesWithLineByLinePhrases;
>
> public class SearchFORALLQUERIES {
> public static void main(String[] args) throws Exception {
>
> String[] argument=new String[20];
> argument[0]="-index";
> argument[1]="D:\\INDEXFORQUERY";
> argument[2]="-field";
> argument[3]="contents"; //field value
> argument[4]="-repeat";
> argument[5]="2"; //repeat value
> argument[6]="-raw";
> argument[7]="-paging";
> argument[8]="300"; //paging value
>
> String index = "index";
> String field = "contents";
> String queries = null;
> int repeat = 0;
> boolean raw = false;
> String queryString = null;
> int hitsPerPage = 10;
>
> for(int i = 0;i < argument.length;i++) {
> if ("-index".equals(argument[i])) {
> index = argument[i+1];
> i++;
> } else if ("-field".equals(argument[i])) {
> field = argument[i+1];
> i++;
> } else if ("-queries".equals(argument[i])) {
> queries = argument[i+1];
> i++;
> } else if ("-query".equals(argument[i])) {
> queryString = argument[i+1];
> i++;
> } else if ("-repeat".equals(argument[i])) {
> repeat = Integer.parseInt(argument[i+1]);
> i++;
> } else if ("-raw".equals(argument[i])) {
> raw = true; //set it true to just display the count. If false then
> it also display file name.
> } else if ("-paging".equals(argument[i])) {
> hitsPerPage = Integer.parseInt(argument[i+1]);
> if (hitsPerPage <= 0) {
> System.err.println("There must be at least 1 hit per page.");
> System.exit(1);
> }
> i++;
> }
> }
> System.out.println("processing input");
> IndexReader reader = DirectoryReader.open(FSDirectory.open(new
> File(index))); //location where indexes are.
> IndexSearcher searcher = new IndexSearcher(reader);
> BufferedReader in = null;
> if (queries != null) {
> in = new BufferedReader(new InputStreamReader(new
> FileInputStream(queries), "UTF-8")); //provide query as input
> } else {
> in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
> }
> while (true) {
> if (queries == null && queryString == null) { //
> prompt the user
> System.out.println("Enter query: "); //if query is not present,
> prompt the user to enter query.
> }
> String line = queryString != null ? queryString : in.readLine();
>
> if (line == null || line.length() == -1) {
> break;
> }
> line = line.trim();
> if (line.length() == 0) {
> break;
> }
> String[] str=line.split(" ");
> System.out.println("queries are " + str[0] + " and is " + str[1]);
> Query query1 = new TermQuery(new Term(field, str[0]));
> Query query2=new TermQuery(new Term(field,str[1]));
> BooleanQuery booleanQuery = new BooleanQuery();
> booleanQuery.add(query1, BooleanClause.Occur.MUST);
> booleanQuery.add(query2, BooleanClause.Occur.MUST);
> if (repeat > 0) { //repeat=2 // repeat &
> time as benchmark
> Date start = new Date();
> for (int i = 0; i < repeat; i++) {
> searcher.search(booleanQuery, null, 100);
> }
> Date end = new Date();
> System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
> }
> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw, queries
> == null && queryString == null);
> if (queryString != null) {
> break;
> }
> }
> reader.close();
> }
> public static void doPagingSearch(BufferedReader in, IndexSearcher
> searcher, Query query,
> int hitsPerPage, boolean raw, boolean
> interactive) throws IOException {
> TopDocs results = searcher.search(query, 5 * hitsPerPage);
> ScoreDoc[] hits = results.scoreDocs;
> int numTotalHits = results.totalHits;
> System.out.println(numTotalHits + " total matching documents");
> int start = 0;
> int end = Math.min(numTotalHits, hitsPerPage);
> while (true) {
> if (end > hits.length) {
> System.out.println("Only results 1 - " + hits.length +" of " +
> numTotalHits + " total matching documents collected.");
> System.out.println("Collect more (y/n) ?");
> String line = in.readLine();
> if (line.length() == 0 || line.charAt(0) == 'n') {
> break;
> }
> hits = searcher.search(query, numTotalHits).scoreDocs;
> }
> end = Math.min(hits.length, start + hitsPerPage); //3 and 5.
> for (int i = start; i < end; i++) { //0 to 3.
> if (raw) {
>
> System.out.println("doc="+hits[i].doc+" score="+hits[i].score);
> }
> Document doc = searcher.doc(hits[i].doc);
> List<IndexableField> filed=doc.getFields();
> filed.size();
> String path = doc.get("path");
> if (path != null) {
> System.out.println((i+1) + ". " + path);
> String title = doc.get("title");
> if (title != null) {
> System.out.println(" Title: " + doc.get("title"));
> }
> } else {
> System.out.println((i+1) + ". " + "No path for this document");
> }
> }
> if (!interactive || end == 0) {
> break;
> }
> if (numTotalHits >= end) {
> boolean quit = false;
> while (true) {
> System.out.print("Press ");
> if (start - hitsPerPage >= 0) {
> System.out.print("(p)revious page, ");
> }
> if (start + hitsPerPage < numTotalHits) {
> System.out.print("(n)ext page, ");
> }
> System.out.println("(q)uit or enter number to jump to a page.");
> String line = in.readLine();
> if (line.length() == 0 || line.charAt(0)=='q') {
> quit = true;
> break;
> }
> if (line.charAt(0) == 'p') {
> start = Math.max(0, start - hitsPerPage);
> break;
> } else if (line.charAt(0) == 'n') {
> if (start + hitsPerPage < numTotalHits) {
> start+=hitsPerPage;
> }
> break;
> } else {
> int page = Integer.parseInt(line);
> if ((page - 1) * hitsPerPage < numTotalHits) {
> start = (page - 1) * hitsPerPage;
> break;
> } else {
> System.out.println("No such page");
> }
> }
> }
> if (quit) break;
> end = Math.min(numTotalHits, start + hitsPerPage);
> }
> }
> }
> }
>
> /*CUSTOM ANALYZER CODE:*/
>
> package com.rancore.demo;
>
> import java.io.IOException;
> import java.io.Reader;
>
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.core.StopAnalyzer;
> import org.apache.lucene.analysis.core.StopFilter;
> import org.apache.lucene.analysis.standard.StandardFilter;
> import org.apache.lucene.analysis.standard.StandardTokenizer;
> import org.apache.lucene.analysis.util.CharArraySet;
> import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
> import org.apache.lucene.util.Version;
>
> public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase {
>
> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
> public static final CharArraySet STOP_WORDS_SET =
> StopAnalyzer.ENGLISH_STOP_WORDS_SET;
> public CustomAnalyzerForCaseSensitive(Version matchVersion,
> CharArraySet stopWords) {
> super(matchVersion, stopWords);
> }
> public CustomAnalyzerForCaseSensitive(Version matchVersion) {
> this(matchVersion, STOP_WORDS_SET);
> }
> public CustomAnalyzerForCaseSensitive(Version matchVersion, Reader
> stopwords) throws IOException {
> this(matchVersion, loadStopwordSet(stopwords, matchVersion));
> }
> public void setMaxTokenLength(int length) {
> maxTokenLength = length;
> }
> /**
> * @see #setMaxTokenLength
> */
> public int getMaxTokenLength() {
> return maxTokenLength;
> }
> @Override
> protected TokenStreamComponents createComponents(final String fieldName,
> final Reader reader) {
> final StandardTokenizer src = new StandardTokenizer(matchVersion,
> reader);
> src.setMaxTokenLength(maxTokenLength);
> TokenStream tok = new StandardFilter(matchVersion, src);
> // tok = new LowerCaseFilter(matchVersion, tok);
> tok = new StopFilter(matchVersion, tok, stopwords);
> return new TokenStreamComponents(src, tok) {
> @Override
> protected void setReader(final Reader reader) throws
> IOException {
>
> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
> super.setReader(reader);
> }
> };
> }
> }
>
>
>
> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
>
> PLEASE GUIDE ME NOW: IN case any further information is required please let
> me know.
>
>
> On 8/14/2013 7:43 PM, Ian Lea wrote:
>>
>> Well, you have supplied a bit more info - good - but I still can't
>> spot the problem. Unless someone else can I suggest you post a very
>> small self-contained program that demonstrates the problem.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Hello.
>>> The problem does not seem to be getting solved.
>>>
>>> As mentioned, I am indexing each line of each file.
>>> The sample text present inside LUKE is
>>>
>>> <am name="notification" value="10"/>\
>>> <type="DE">\
>>> java.lang.Thread.run(Thread.java:619)
>>>
>>>>>
>>>>> Size of list array::0\
>>>>>
>>>
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> org.com.dummy,INFO,<< Still figuring out how to run
>>>
>>>>>
>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>>
>>>
>>> INSIDE POST OF Listener\
>>>
>>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
>>> present in many files.
>>>
>>> /*Query is +contents:INSIDE contents:POST */ --/The field
>>> name
>>> is contents. Same analyzer is being used. This is a boolean query./
>>>
>>> To test, I indexed only 20 files. In 19 files, this is present.
>>>
>>> The boolean query should give me a hit for this document.
>>>
>>> BUT IT IS RETURNING ME NO HIT..
>>>
>>> If I index the same files WITHOUT line by line then, it gives me proper
>>> hits..
>>>
>>> But for me it should work on Indexes created by Line by Line parsing
>>> also.
>>>
>>> Please guide.
>>>
>>>
>>>
>>>
>>>
>>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>>
>>>>
>>>> remedialaction != "remedial action"?
>>>>
>>>> Show us your query. Show a small self-contained sample program or
>>>> test case that demonstrates the problem. You need to give us
>>>> something more to go on.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Hello,
>>>>> I am aware of that link and I have been through that link
>>>>> many
>>>>> number of times.
>>>>>
>>>>> Problem I have is:
>>>>>
>>>>> 1. Each line is indexed. So indexed line looks something like
>>>>> "<attribute
>>>>> name="remedial action" value="Checking"/>\"
>>>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>>>> possible values. No problem,.
>>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
>>>>> must/must , then it is not providing me this document as a hit.
>>>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>>>> time.
>>>>>
>>>>>
>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>>>> entry at
>>>>>>
>>>>>>
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>>> and post back if that doesn't help, with details of how you are
>>>>>> analyzing the data and how you are searching.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>>> <an...@rancoretech.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hello All,
>>>>>>> I have 2 different usecases.
>>>>>>> I am trying to provide both boolean query and phrase search query in
>>>>>>> the
>>>>>>> application.
>>>>>>>
>>>>>>> In every line of the document which I am indexing I have content like
>>>>>>> :
>>>>>>>
>>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>>
>>>>>>> Due to the phrase search requirement, I am indexing each line of the
>>>>>>> file
>>>>>>> as
>>>>>>> a new document.
>>>>>>>
>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix
>>>>>>> Analyzer
>>>>>>> etc,
>>>>>>> or phrase suggest) this seems to work fine and provide me with
>>>>>>> desired
>>>>>>> suggestions.
>>>>>>>
>>>>>>> Problem is :
>>>>>>>
>>>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>>>> indexes
>>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>>
>>>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>>>> "remedialaction" and "Checking" how do I get this document as a hit.
>>>>>>> I
>>>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>>>
>>>>>>> Please guide.
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello. The problem is as follows:
I have a document containing information in lines. So I am indexing all
files line by line.
So If I say in my document I have,
INSIDE POST OF SERVER\
and in my index file created I have,
INSIDE POST OF SERVER\
and I fire a boolean query with INSIDE and POST with MUST/MUST, I am
getting no HIT.
I am providing the complete CODE I am using to create INDEX and TO
SEARCH..Both are drawn from sample code present online.
/*INDEX CODE:
*/
package org.RunAllQueriesWithLineByLinePhrases;
public class CreateIndex {
public static void main(String[] args) {
String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will
be created
String docsPath="Indexed"; //Place where the files are kept.
boolean create=true;
final File docDir = new File(docsPath);
if (!docDir.exists() || !docDir.canRead()) {
System.exit(1);
}
try {
Directory dir = FSDirectory.open(new File(indexPath));
Analyzer analyzer=new
CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
analyzer);
if (create) {
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
} else {
System.out.println("Trying to set IWC mode to UPDATE...NOT
DESIRED..");
}
IndexWriter writer = new IndexWriter(dir, iwc);
indexDocs(writer, docDir);
writer.close();
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
}
}
static void indexDocs(IndexWriter writer, File file)
throws IOException {
if (file.canRead())
{
if (file.isDirectory()) {
String[] files = file.list();
if (files != null) {
for (int i = 0; i < files.length; i++) {
if(files[i]!=null)
indexDocs(writer, new File(file, files[i]));
}
}
} else {
try {
Document doc = new Document();
Field pathField = new StringField("path", file.getPath(),
Field.Store.YES);
doc.add(pathField);
doc.add(new LongField("modified", file.lastModified(),
Field.Store.NO));
LineNumberReader lnr=new LineNumberReader(new FileReader(file));
String line=null;
while( null != (line = lnr.readLine()) ){
doc.add(new StringField("contents",line,Field.Store.YES));
}
if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
writer.addDocument(doc);
} else {
writer.updateDocument(new Term("path", file.getPath()), doc);
}
} finally {
}
}
}
} }
/*SEARCHING CODE:-*/
package org.RunAllQueriesWithLineByLinePhrases;
public class SearchFORALLQUERIES {
public static void main(String[] args) throws Exception {
String[] argument=new String[20];
argument[0]="-index";
argument[1]="D:\\INDEXFORQUERY";
argument[2]="-field";
argument[3]="contents"; //field value
argument[4]="-repeat";
argument[5]="2"; //repeat value
argument[6]="-raw";
argument[7]="-paging";
argument[8]="300"; //paging value
String index = "index";
String field = "contents";
String queries = null;
int repeat = 0;
boolean raw = false;
String queryString = null;
int hitsPerPage = 10;
for(int i = 0;i < argument.length;i++) {
if ("-index".equals(argument[i])) {
index = argument[i+1];
i++;
} else if ("-field".equals(argument[i])) {
field = argument[i+1];
i++;
} else if ("-queries".equals(argument[i])) {
queries = argument[i+1];
i++;
} else if ("-query".equals(argument[i])) {
queryString = argument[i+1];
i++;
} else if ("-repeat".equals(argument[i])) {
repeat = Integer.parseInt(argument[i+1]);
i++;
} else if ("-raw".equals(argument[i])) {
raw = true; //set it true to just display the count. If false
then it also display file name.
} else if ("-paging".equals(argument[i])) {
hitsPerPage = Integer.parseInt(argument[i+1]);
if (hitsPerPage <= 0) {
System.err.println("There must be at least 1 hit per page.");
System.exit(1);
}
i++;
}
}
System.out.println("processing input");
IndexReader reader = DirectoryReader.open(FSDirectory.open(new
File(index))); //location where indexes are.
IndexSearcher searcher = new IndexSearcher(reader);
BufferedReader in = null;
if (queries != null) {
in = new BufferedReader(new InputStreamReader(new
FileInputStream(queries), "UTF-8")); //provide query as input
} else {
in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
}
while (true) {
if (queries == null && queryString == null)
{ // prompt the user
System.out.println("Enter query: "); //if query is not
present, prompt the user to enter query.
}
String line = queryString != null ? queryString : in.readLine();
if (line == null || line.length() == -1) {
break;
}
line = line.trim();
if (line.length() == 0) {
break;
}
String[] str=line.split(" ");
System.out.println("queries are " + str[0] + " and is " + str[1]);
Query query1 = new TermQuery(new Term(field, str[0]));
Query query2=new TermQuery(new Term(field,str[1]));
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(query1, BooleanClause.Occur.MUST);
booleanQuery.add(query2, BooleanClause.Occur.MUST);
if (repeat > 0) { //repeat=2 // repeat
& time as benchmark
Date start = new Date();
for (int i = 0; i < repeat; i++) {
searcher.search(booleanQuery, null, 100);
}
Date end = new Date();
System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
}
doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw,
queries == null && queryString == null);
if (queryString != null) {
break;
}
}
reader.close();
}
public static void doPagingSearch(BufferedReader in, IndexSearcher
searcher, Query query,
int hitsPerPage, boolean raw,
boolean interactive) throws IOException {
TopDocs results = searcher.search(query, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
System.out.println(numTotalHits + " total matching documents");
int start = 0;
int end = Math.min(numTotalHits, hitsPerPage);
while (true) {
if (end > hits.length) {
System.out.println("Only results 1 - " + hits.length +" of " +
numTotalHits + " total matching documents collected.");
System.out.println("Collect more (y/n) ?");
String line = in.readLine();
if (line.length() == 0 || line.charAt(0) == 'n') {
break;
}
hits = searcher.search(query, numTotalHits).scoreDocs;
}
end = Math.min(hits.length, start + hitsPerPage); //3 and 5.
for (int i = start; i < end; i++) { //0 to 3.
if (raw) {
System.out.println("doc="+hits[i].doc+" score="+hits[i].score);
}
Document doc = searcher.doc(hits[i].doc);
List<IndexableField> filed=doc.getFields();
filed.size();
String path = doc.get("path");
if (path != null) {
System.out.println((i+1) + ". " + path);
String title = doc.get("title");
if (title != null) {
System.out.println(" Title: " + doc.get("title"));
}
} else {
System.out.println((i+1) + ". " + "No path for this document");
}
}
if (!interactive || end == 0) {
break;
}
if (numTotalHits >= end) {
boolean quit = false;
while (true) {
System.out.print("Press ");
if (start - hitsPerPage >= 0) {
System.out.print("(p)revious page, ");
}
if (start + hitsPerPage < numTotalHits) {
System.out.print("(n)ext page, ");
}
System.out.println("(q)uit or enter number to jump to a page.");
String line = in.readLine();
if (line.length() == 0 || line.charAt(0)=='q') {
quit = true;
break;
}
if (line.charAt(0) == 'p') {
start = Math.max(0, start - hitsPerPage);
break;
} else if (line.charAt(0) == 'n') {
if (start + hitsPerPage < numTotalHits) {
start+=hitsPerPage;
}
break;
} else {
int page = Integer.parseInt(line);
if ((page - 1) * hitsPerPage < numTotalHits) {
start = (page - 1) * hitsPerPage;
break;
} else {
System.out.println("No such page");
}
}
}
if (quit) break;
end = Math.min(numTotalHits, start + hitsPerPage);
}
}
}
}
/*CUSTOM ANALYZER CODE:*/
package com.rancore.demo;
import java.io.IOException;
import java.io.Reader;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.StopAnalyzer;
import org.apache.lucene.analysis.core.StopFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.util.CharArraySet;
import org.apache.lucene.analysis.util.StopwordAnalyzerBase;
import org.apache.lucene.util.Version;
public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase {
public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
public static final CharArraySet STOP_WORDS_SET =
StopAnalyzer.ENGLISH_STOP_WORDS_SET;
public CustomAnalyzerForCaseSensitive(Version matchVersion,
CharArraySet stopWords) {
super(matchVersion, stopWords);
}
public CustomAnalyzerForCaseSensitive(Version matchVersion) {
this(matchVersion, STOP_WORDS_SET);
}
public CustomAnalyzerForCaseSensitive(Version matchVersion,
Reader stopwords) throws IOException {
this(matchVersion, loadStopwordSet(stopwords, matchVersion));
}
public void setMaxTokenLength(int length) {
maxTokenLength = length;
}
/**
* @see #setMaxTokenLength
*/
public int getMaxTokenLength() {
return maxTokenLength;
}
@Override
protected TokenStreamComponents createComponents(final String
fieldName, final Reader reader) {
final StandardTokenizer src = new
StandardTokenizer(matchVersion, reader);
src.setMaxTokenLength(maxTokenLength);
TokenStream tok = new StandardFilter(matchVersion, src);
// tok = new LowerCaseFilter(matchVersion, tok);
tok = new StopFilter(matchVersion, tok, stopwords);
return new TokenStreamComponents(src, tok) {
@Override
protected void setReader(final Reader reader) throws
IOException {
src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength);
super.setReader(reader);
}
};
}
}
I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON..
PLEASE GUIDE ME NOW: IN case any further information is required please
let me know.
On 8/14/2013 7:43 PM, Ian Lea wrote:
> Well, you have supplied a bit more info - good - but I still can't
> spot the problem. Unless someone else can I suggest you post a very
> small self-contained program that demonstrates the problem.
>
>
> --
> Ian.
>
>
> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello.
>> The problem does not seem to be getting solved.
>>
>> As mentioned, I am indexing each line of each file.
>> The sample text present inside LUKE is
>>
>> <am name="notification" value="10"/>\
>> <type="DE">\
>> java.lang.Thread.run(Thread.java:619)
>>
>>>> Size of list array::0\
>>>>
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> org.com.dummy,INFO,<< Still figuring out how to run
>>
>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349
>>>>
>> INSIDE POST OF Listener\
>>
>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
>> present in many files.
>>
>> /*Query is +contents:INSIDE contents:POST */ --/The field name
>> is contents. Same analyzer is being used. This is a boolean query./
>>
>> To test, I indexed only 20 files. In 19 files, this is present.
>>
>> The boolean query should give me a hit for this document.
>>
>> BUT IT IS RETURNING ME NO HIT..
>>
>> If I index the same files WITHOUT line by line then, it gives me proper
>> hits..
>>
>> But for me it should work on Indexes created by Line by Line parsing also.
>>
>> Please guide.
>>
>>
>>
>>
>>
>> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>
>>> remedialaction != "remedial action"?
>>>
>>> Show us your query. Show a small self-contained sample program or
>>> test case that demonstrates the problem. You need to give us
>>> something more to go on.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Hello,
>>>> I am aware of that link and I have been through that link many
>>>> number of times.
>>>>
>>>> Problem I have is:
>>>>
>>>> 1. Each line is indexed. So indexed line looks something like "<attribute
>>>> name="remedial action" value="Checking"/>\"
>>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>>> possible values. No problem,.
>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
>>>> must/must , then it is not providing me this document as a hit.
>>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>>> time.
>>>>
>>>>
>>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>>
>>>>
>>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>>> entry at
>>>>>
>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>>> and post back if that doesn't help, with details of how you are
>>>>> analyzing the data and how you are searching.
>>>>>
>>>>>
>>>>> --
>>>>> Ian.
>>>>>
>>>>>
>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>>> <an...@rancoretech.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello All,
>>>>>> I have 2 different usecases.
>>>>>> I am trying to provide both boolean query and phrase search query in
>>>>>> the
>>>>>> application.
>>>>>>
>>>>>> In every line of the document which I am indexing I have content like :
>>>>>>
>>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>>
>>>>>> Due to the phrase search requirement, I am indexing each line of the
>>>>>> file
>>>>>> as
>>>>>> a new document.
>>>>>>
>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer
>>>>>> etc,
>>>>>> or phrase suggest) this seems to work fine and provide me with desired
>>>>>> suggestions.
>>>>>>
>>>>>> Problem is :
>>>>>>
>>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>>> indexes
>>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>>
>>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>>> "remedialaction" and "Checking" how do I get this document as a hit. I
>>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>>
>>>>>> Please guide.
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>>
>>>>>> Ankit
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
Re: Boolean Query when indexing each line as a document.
Posted by Ian Lea <ia...@gmail.com>.
Well, you have supplied a bit more info - good - but I still can't
spot the problem. Unless someone else can I suggest you post a very
small self-contained program that demonstrates the problem.
--
Ian.
On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello.
> The problem does not seem to be getting solved.
>
> As mentioned, I am indexing each line of each file.
> The sample text present inside LUKE is
>
> <am name="notification" value="10"/>\
> <type="DE">\
> java.lang.Thread.run(Thread.java:619)
>>>Size of list array::0\
> at java.lang.reflect.Method.invoke(Method.java:597)
> org.com.dummy,INFO,<< Still figuring out how to run
>>>,SERVER,100.100.100.100:8080,EXCEPTION,10613349
> INSIDE POST OF Listener\
>
> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
> present in many files.
>
> /*Query is +contents:INSIDE contents:POST */ --/The field name
> is contents. Same analyzer is being used. This is a boolean query./
>
> To test, I indexed only 20 files. In 19 files, this is present.
>
> The boolean query should give me a hit for this document.
>
> BUT IT IS RETURNING ME NO HIT..
>
> If I index the same files WITHOUT line by line then, it gives me proper
> hits..
>
> But for me it should work on Indexes created by Line by Line parsing also.
>
> Please guide.
>
>
>
>
>
> On 8/13/2013 4:41 PM, Ian Lea wrote:
>>
>> remedialaction != "remedial action"?
>>
>> Show us your query. Show a small self-contained sample program or
>> test case that demonstrates the problem. You need to give us
>> something more to go on.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Hello,
>>> I am aware of that link and I have been through that link many
>>> number of times.
>>>
>>> Problem I have is:
>>>
>>> 1. Each line is indexed. So indexed line looks something like "<attribute
>>> name="remedial action" value="Checking"/>\"
>>> 2. I am easily firing a phrase query on this line. It suggest me the
>>> possible values. No problem,.
>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
>>> must/must , then it is not providing me this document as a hit.
>>> 4. I am using StandardAnalyzer both during the indexing and searching
>>> time.
>>>
>>>
>>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>>
>>>>
>>>> Should be straightforward enough. Work through the tips in the FAQ
>>>> entry at
>>>>
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>>> and post back if that doesn't help, with details of how you are
>>>> analyzing the data and how you are searching.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>>> <an...@rancoretech.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Hello All,
>>>>> I have 2 different usecases.
>>>>> I am trying to provide both boolean query and phrase search query in
>>>>> the
>>>>> application.
>>>>>
>>>>> In every line of the document which I am indexing I have content like :
>>>>>
>>>>> <attribute name="remedial action" value="Checking"/>\
>>>>>
>>>>> Due to the phrase search requirement, I am indexing each line of the
>>>>> file
>>>>> as
>>>>> a new document.
>>>>>
>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer
>>>>> etc,
>>>>> or phrase suggest) this seems to work fine and provide me with desired
>>>>> suggestions.
>>>>>
>>>>> Problem is :
>>>>>
>>>>> How do I invoke boolean query for this. I mean when I verified the
>>>>> indexes
>>>>> in Luke, I saw the whole line as expected is indexed.
>>>>>
>>>>> So, if user wish to perform a boolean query say suppose containing
>>>>> "remedialaction" and "Checking" how do I get this document as a hit. I
>>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>>
>>>>> Please guide.
>>>>>
>>>>> --
>>>>> Regards
>>>>>
>>>>> Ankit
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters compared
>>> with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello.
The problem does not seem to be getting solved.
As mentioned, I am indexing each line of each file.
The sample text present inside LUKE is
<am name="notification" value="10"/>\
<type="DE">\
java.lang.Thread.run(Thread.java:619)
>>Size of list array::0\
at java.lang.reflect.Method.invoke(Method.java:597)
org.com.dummy,INFO,<< Still figuring out how to run
>>,SERVER,100.100.100.100:8080,EXCEPTION,10613349
INSIDE POST OF Listener\
In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is
present in many files.
/*Query is +contents:INSIDE contents:POST */ --/The field
name is contents. Same analyzer is being used. This is a boolean query./
To test, I indexed only 20 files. In 19 files, this is present.
The boolean query should give me a hit for this document.
BUT IT IS RETURNING ME NO HIT..
If I index the same files WITHOUT line by line then, it gives me proper
hits..
But for me it should work on Indexes created by Line by Line parsing also.
Please guide.
On 8/13/2013 4:41 PM, Ian Lea wrote:
> remedialaction != "remedial action"?
>
> Show us your query. Show a small self-contained sample program or
> test case that demonstrates the problem. You need to give us
> something more to go on.
>
>
> --
> Ian.
>
>
> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello,
>> I am aware of that link and I have been through that link many
>> number of times.
>>
>> Problem I have is:
>>
>> 1. Each line is indexed. So indexed line looks something like "<attribute
>> name="remedial action" value="Checking"/>\"
>> 2. I am easily firing a phrase query on this line. It suggest me the
>> possible values. No problem,.
>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
>> must/must , then it is not providing me this document as a hit.
>> 4. I am using StandardAnalyzer both during the indexing and searching time.
>>
>>
>> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>
>>> Should be straightforward enough. Work through the tips in the FAQ
>>> entry at
>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>> and post back if that doesn't help, with details of how you are
>>> analyzing the data and how you are searching.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>>> <an...@rancoretech.com> wrote:
>>>
>>>
>>>> Hello All,
>>>> I have 2 different usecases.
>>>> I am trying to provide both boolean query and phrase search query in the
>>>> application.
>>>>
>>>> In every line of the document which I am indexing I have content like :
>>>>
>>>> <attribute name="remedial action" value="Checking"/>\
>>>>
>>>> Due to the phrase search requirement, I am indexing each line of the file
>>>> as
>>>> a new document.
>>>>
>>>> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer
>>>> etc,
>>>> or phrase suggest) this seems to work fine and provide me with desired
>>>> suggestions.
>>>>
>>>> Problem is :
>>>>
>>>> How do I invoke boolean query for this. I mean when I verified the
>>>> indexes
>>>> in Luke, I saw the whole line as expected is indexed.
>>>>
>>>> So, if user wish to perform a boolean query say suppose containing
>>>> "remedialaction" and "Checking" how do I get this document as a hit. I
>>>> believe since I am indexing each line, this seems to be bit tricky.
>>>>
>>>> Please guide.
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
Re: Boolean Query when indexing each line as a document.
Posted by Ian Lea <ia...@gmail.com>.
remedialaction != "remedial action"?
Show us your query. Show a small self-contained sample program or
test case that demonstrates the problem. You need to give us
something more to go on.
--
Ian.
On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello,
> I am aware of that link and I have been through that link many
> number of times.
>
> Problem I have is:
>
> 1. Each line is indexed. So indexed line looks something like "<attribute
> name="remedial action" value="Checking"/>\"
> 2. I am easily firing a phrase query on this line. It suggest me the
> possible values. No problem,.
> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
> must/must , then it is not providing me this document as a hit.
> 4. I am using StandardAnalyzer both during the indexing and searching time.
>
>
> On 8/13/2013 2:31 PM, Ian Lea wrote:
>>
>> Should be straightforward enough. Work through the tips in the FAQ
>> entry at
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>> and post back if that doesn't help, with details of how you are
>> analyzing the data and how you are searching.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
>> <an...@rancoretech.com> wrote:
>>
>>>
>>> Hello All,
>>> I have 2 different usecases.
>>> I am trying to provide both boolean query and phrase search query in the
>>> application.
>>>
>>> In every line of the document which I am indexing I have content like :
>>>
>>> <attribute name="remedial action" value="Checking"/>\
>>>
>>> Due to the phrase search requirement, I am indexing each line of the file
>>> as
>>> a new document.
>>>
>>> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer
>>> etc,
>>> or phrase suggest) this seems to work fine and provide me with desired
>>> suggestions.
>>>
>>> Problem is :
>>>
>>> How do I invoke boolean query for this. I mean when I verified the
>>> indexes
>>> in Luke, I saw the whole line as expected is indexed.
>>>
>>> So, if user wish to perform a boolean query say suppose containing
>>> "remedialaction" and "Checking" how do I get this document as a hit. I
>>> believe since I am indexing each line, this seems to be bit tricky.
>>>
>>> Please guide.
>>>
>>> --
>>> Regards
>>>
>>> Ankit
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "What lies behind us and what lies before us are tiny matters compared with
> what lies within us"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ankit Murarka <an...@rancoretech.com>.
Hello,
I am aware of that link and I have been through that link many
number of times.
Problem I have is:
1. Each line is indexed. So indexed line looks something like
"<attribute name="remedial action" value="Checking"/>\"
2. I am easily firing a phrase query on this line. It suggest me the
possible values. No problem,.
3. If I fire a Boolean Query with "remedialaction" and "Checking" as a
must/must , then it is not providing me this document as a hit.
4. I am using StandardAnalyzer both during the indexing and searching time.
On 8/13/2013 2:31 PM, Ian Lea wrote:
> Should be straightforward enough. Work through the tips in the FAQ
> entry at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
> and post back if that doesn't help, with details of how you are
> analyzing the data and how you are searching.
>
>
> --
> Ian.
>
>
> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
> <an...@rancoretech.com> wrote:
>
>> Hello All,
>> I have 2 different usecases.
>> I am trying to provide both boolean query and phrase search query in the
>> application.
>>
>> In every line of the document which I am indexing I have content like :
>>
>> <attribute name="remedial action" value="Checking"/>\
>>
>> Due to the phrase search requirement, I am indexing each line of the file as
>> a new document.
>>
>> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer etc,
>> or phrase suggest) this seems to work fine and provide me with desired
>> suggestions.
>>
>> Problem is :
>>
>> How do I invoke boolean query for this. I mean when I verified the indexes
>> in Luke, I saw the whole line as expected is indexed.
>>
>> So, if user wish to perform a boolean query say suppose containing
>> "remedialaction" and "Checking" how do I get this document as a hit. I
>> believe since I am indexing each line, this seems to be bit tricky.
>>
>> Please guide.
>>
>> --
>> Regards
>>
>> Ankit
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boolean Query when indexing each line as a document.
Posted by Ian Lea <ia...@gmail.com>.
Should be straightforward enough. Work through the tips in the FAQ
entry at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
and post back if that doesn't help, with details of how you are
analyzing the data and how you are searching.
--
Ian.
On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka
<an...@rancoretech.com> wrote:
> Hello All,
> I have 2 different usecases.
> I am trying to provide both boolean query and phrase search query in the
> application.
>
> In every line of the document which I am indexing I have content like :
>
> <attribute name="remedial action" value="Checking"/>\
>
> Due to the phrase search requirement, I am indexing each line of the file as
> a new document.
>
> Now when I am trying to do a phrase query (Did you Mean, Infix Analyzer etc,
> or phrase suggest) this seems to work fine and provide me with desired
> suggestions.
>
> Problem is :
>
> How do I invoke boolean query for this. I mean when I verified the indexes
> in Luke, I saw the whole line as expected is indexed.
>
> So, if user wish to perform a boolean query say suppose containing
> "remedialaction" and "Checking" how do I get this document as a hit. I
> believe since I am indexing each line, this seems to be bit tricky.
>
> Please guide.
>
> --
> Regards
>
> Ankit
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org