You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ilya Zavorin <iz...@caci.com> on 2012/07/13 18:28:19 UTC

can't find queries when they are one per line in target file

Hi,

I am using 3.4.0 and just discovered a weird issue. I have a set of simple English one-word queries and two target files that I want to search. One has all these queries in one line, i.e. something like this

Query1 Query2 Query3 Query4

The other has them one per line, i.e.

Query1
Query2
Query3
Query4

The 1st test was to index only the 1st target file and then iterate over all queries. The 2nd test was the same but with the 2nd file. In the first test, I found all the queries, as expected. In the second one, I found none! More specifically, the block of code

String qStr = "Query1"; // or "Query2" or ...
QueryParser parser = ...;
IndexSearcher searcher = ...;
Query query = parser.parse(qStr);
TopDocs results = searcher.search(query, Integer.MAX_VALUE);
ScoreDoc[] hits = results.scoreDocs;

returned no hits for the 2nd test.

Do I have to index the file differently? Or handle the queries differently? Or something else?

Thanks much,

Ilya

RE: can't find queries when they are one per line in target file

Posted by Ilya Zavorin <iz...@caci.com>.
Ian,

Turns out you were very close to the truth. The problem was in how I was ingesting the original file into memory before indexing.

Thanks,


Mr. Ilya Zavorin
Applied Research and Consulting
CACI Advanced Knowledge Solutions Division
4831 Walden Lane, Lanham, MD 20706
ph: 1-301-306-2859
fx: 1-301-306-8201
izavorin@caci.com
www.caci.com


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: Friday, July 13, 2012 1:06 PM
To: java-user@lucene.apache.org
Subject: Re: can't find queries when they are one per line in target file

It's hard to tell from your description exactly what you are indexing and searching for, but I'd hazard a guess that the problem is related to your "content of entire target file" comment.  Maybe you need to read the files line by line.


--
Ian.


On Fri, Jul 13, 2012 at 6:02 PM, Ilya Zavorin <iz...@caci.com> wrote:
>
> Here are the details:
>
> I ran 2 tests:
>
> 1. Index only the first target file (the one where all the queries are in one long line); Then loop over all queries and search for each using the code block below.
> 2. Index only the second target file (the one where all the queries are listed one per line); Then loop over all queries and search for each using the code block below.
>
>         Query query = parser.parse(qStr);
>         TopDocs results = searcher.search(query, Integer.MAX_VALUE);
>         ScoreDoc[] hits = results.scoreDocs;
>
> 1st test: all queries found, i.e. the code block below returned a hit 
> for each query 2nd test: no queries found, i.e. the code block below 
> returned no hits for any query
>
> So the difference seems to be in the structure of the indexed target files.
>
> Here's the block where a target file gets added to the index:
>
> ...
> Document doc = new Document();
> String oc = ...;        // content of entire target file
> doc.add(new Field("contents",
>         oc,
>         Field.Store.NO,
>         Field.Index.ANALYZED,
>         Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> ...
>
> Let me know if more details are needed.
>
> Thanks,
>
> Ilya
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, July 13, 2012 12:44 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target 
> file
>
> What do you mean with "files"? Without a complete description what you are doing we cannot answer your request.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Ilya Zavorin [mailto:izavorin@caci.com]
>> Sent: Friday, July 13, 2012 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: can't find queries when they are one per line in target 
>> file
>>
>>
>>
>> But why then does it find all the querries in the 1st file? I use 
>> exactly
> the same
>> code.
>>
>> IZ
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Friday, July 13, 2012 12:32 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: can't find queries when they are one per line in target 
>> file
>>
>> > String qStr = "Query1"; // or "Query2" or ...
>> > QueryParser parser = ...;
>> > IndexSearcher searcher = ...;
>> > Query query = parser.parse(qStr);
>> > TopDocs results = searcher.search(query, Integer.MAX_VALUE); 
>> > ScoreDoc[]
>> hits
>> > = results.scoreDocs;
>> >
>> > returned no hits for the 2nd test.
>>
>> Maybe because it runs out of memory? Passing Integer.MAX_VALUE is 
>> allocating
>> 2 billion result slots...
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: can't find queries when they are one per line in target file

Posted by Ian Lea <ia...@gmail.com>.
It's hard to tell from your description exactly what you are indexing
and searching for, but I'd hazard a guess that the problem is related
to your "content of entire target file" comment.  Maybe you need to
read the files line by line.


--
Ian.


On Fri, Jul 13, 2012 at 6:02 PM, Ilya Zavorin <iz...@caci.com> wrote:
>
> Here are the details:
>
> I ran 2 tests:
>
> 1. Index only the first target file (the one where all the queries are in one long line); Then loop over all queries and search for each using the code block below.
> 2. Index only the second target file (the one where all the queries are listed one per line); Then loop over all queries and search for each using the code block below.
>
>         Query query = parser.parse(qStr);
>         TopDocs results = searcher.search(query, Integer.MAX_VALUE);
>         ScoreDoc[] hits = results.scoreDocs;
>
> 1st test: all queries found, i.e. the code block below returned a hit for each query
> 2nd test: no queries found, i.e. the code block below returned no hits for any query
>
> So the difference seems to be in the structure of the indexed target files.
>
> Here's the block where a target file gets added to the index:
>
> ...
> Document doc = new Document();
> String oc = ...;        // content of entire target file
> doc.add(new Field("contents",
>         oc,
>         Field.Store.NO,
>         Field.Index.ANALYZED,
>         Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> ...
>
> Let me know if more details are needed.
>
> Thanks,
>
> Ilya
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, July 13, 2012 12:44 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target file
>
> What do you mean with "files"? Without a complete description what you are doing we cannot answer your request.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Ilya Zavorin [mailto:izavorin@caci.com]
>> Sent: Friday, July 13, 2012 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: can't find queries when they are one per line in target
>> file
>>
>>
>>
>> But why then does it find all the querries in the 1st file? I use
>> exactly
> the same
>> code.
>>
>> IZ
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Friday, July 13, 2012 12:32 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: can't find queries when they are one per line in target
>> file
>>
>> > String qStr = "Query1"; // or "Query2" or ...
>> > QueryParser parser = ...;
>> > IndexSearcher searcher = ...;
>> > Query query = parser.parse(qStr);
>> > TopDocs results = searcher.search(query, Integer.MAX_VALUE);
>> > ScoreDoc[]
>> hits
>> > = results.scoreDocs;
>> >
>> > returned no hits for the 2nd test.
>>
>> Maybe because it runs out of memory? Passing Integer.MAX_VALUE is
>> allocating
>> 2 billion result slots...
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: can't find queries when they are one per line in target file

Posted by Ilya Zavorin <iz...@caci.com>.
Here are the details:

I ran 2 tests:

1. Index only the first target file (the one where all the queries are in one long line); Then loop over all queries and search for each using the code block below. 
2. Index only the second target file (the one where all the queries are listed one per line); Then loop over all queries and search for each using the code block below.

	Query query = parser.parse(qStr);
	TopDocs results = searcher.search(query, Integer.MAX_VALUE); 
	ScoreDoc[] hits = results.scoreDocs;

1st test: all queries found, i.e. the code block below returned a hit for each query
2nd test: no queries found, i.e. the code block below returned no hits for any query

So the difference seems to be in the structure of the indexed target files. 

Here's the block where a target file gets added to the index:

...
Document doc = new Document();
String oc = ...;	// content of entire target file
doc.add(new Field("contents", 
	oc, 
	Field.Store.NO,
	Field.Index.ANALYZED, 
	Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
...

Let me know if more details are needed.

Thanks,

Ilya

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Friday, July 13, 2012 12:44 PM
To: java-user@lucene.apache.org
Subject: RE: can't find queries when they are one per line in target file

What do you mean with "files"? Without a complete description what you are doing we cannot answer your request.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ilya Zavorin [mailto:izavorin@caci.com]
> Sent: Friday, July 13, 2012 6:39 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target 
> file
> 
> 
> 
> But why then does it find all the querries in the 1st file? I use 
> exactly
the same
> code.
> 
> IZ
> 
> 
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, July 13, 2012 12:32 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target 
> file
> 
> > String qStr = "Query1"; // or "Query2" or ...
> > QueryParser parser = ...;
> > IndexSearcher searcher = ...;
> > Query query = parser.parse(qStr);
> > TopDocs results = searcher.search(query, Integer.MAX_VALUE); 
> > ScoreDoc[]
> hits
> > = results.scoreDocs;
> >
> > returned no hits for the 2nd test.
> 
> Maybe because it runs out of memory? Passing Integer.MAX_VALUE is 
> allocating
> 2 billion result slots...
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: can't find queries when they are one per line in target file

Posted by Uwe Schindler <uw...@thetaphi.de>.
What do you mean with "files"? Without a complete description what you are
doing we cannot answer your request.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ilya Zavorin [mailto:izavorin@caci.com]
> Sent: Friday, July 13, 2012 6:39 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target file
> 
> 
> 
> But why then does it find all the querries in the 1st file? I use exactly
the same
> code.
> 
> IZ
> 
> 
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, July 13, 2012 12:32 PM
> To: java-user@lucene.apache.org
> Subject: RE: can't find queries when they are one per line in target file
> 
> > String qStr = "Query1"; // or "Query2" or ...
> > QueryParser parser = ...;
> > IndexSearcher searcher = ...;
> > Query query = parser.parse(qStr);
> > TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[]
> hits
> > = results.scoreDocs;
> >
> > returned no hits for the 2nd test.
> 
> Maybe because it runs out of memory? Passing Integer.MAX_VALUE is
> allocating
> 2 billion result slots...
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: can't find queries when they are one per line in target file

Posted by Ilya Zavorin <iz...@caci.com>.

But why then does it find all the querries in the 1st file? I use exactly the same code.

IZ


-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Friday, July 13, 2012 12:32 PM
To: java-user@lucene.apache.org
Subject: RE: can't find queries when they are one per line in target file

> String qStr = "Query1"; // or "Query2" or ...
> QueryParser parser = ...;
> IndexSearcher searcher = ...;
> Query query = parser.parse(qStr);
> TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[]
hits
> = results.scoreDocs;
> 
> returned no hits for the 2nd test.

Maybe because it runs out of memory? Passing Integer.MAX_VALUE is allocating
2 billion result slots...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: can't find queries when they are one per line in target file

Posted by Uwe Schindler <uw...@thetaphi.de>.
> String qStr = "Query1"; // or "Query2" or ...
> QueryParser parser = ...;
> IndexSearcher searcher = ...;
> Query query = parser.parse(qStr);
> TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[]
hits
> = results.scoreDocs;
> 
> returned no hits for the 2nd test.

Maybe because it runs out of memory? Passing Integer.MAX_VALUE is allocating
2 billion result slots...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org