You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by djd0383 <dd...@formos.com> on 2006/09/29 19:28:18 UTC

Upgrading 1.4 to 2.0 - Indexing Issue.

I am in the process of trying to upgrade to v2.0 from v1.4 and am having
trouble building my index.  For each of the various entries in the database,
I am more or less doing the following:

doc1.add(new Field("allText",searchText,Store.NO,Index.TOKENIZED));
indexWriter.add(doc1);

This seems to build an incorrect index.  I know this is true because I can
see entries that are not indexed.  The searching and indexing have been
minorly editted from the v1.4 code and seem to be correct.

My query is like the following if this helps too:

final Query query;
QueryParser qp = new QueryParser(fieldToSearch,analyzer);
query = qp.parse(search);

Thanks for your help.
-- 
View this message in context: http://www.nabble.com/Upgrading-1.4-to-2.0---Indexing-Issue.-tf2358182.html#a6569399
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by djd0383 <dd...@formos.com>.

When you say open reader/searcher after closing writer.  I am currently
creating an instance of these after creating the query while a user is
searching.  Is this good enough?

Indexing/Searching are using StandardAnalyzer()

'search' is the desired search string
'searchText' is the choosen indexing string
-both these strings are correctly entered


Doron Cohen wrote:
> 
> Two quicks things I can think of:
> - make sure that 'fieldToSearch' == "allText"
> - make sure writer is closed after all docs added and then open the
> reader/searcher
> 
> Otherwise, can you provide more info:
> - at least one example where it "doesn't work":
>   - 'searchText' - the text of the field added to the document, that
> should
> be returned by the query but is not.
>   - 'search'  - the query text, for the query that should find that
> document, but does not.
> - which analyzer is used at search
> - which analyzer used at indexing
> 
> 
> djd0383 <dd...@formos.com> wrote on 29/09/2006 10:28:18:
>>
>> I am in the process of trying to upgrade to v2.0 from v1.4 and am having
>> trouble building my index.  For each of the various entries in the
> database,
>> I am more or less doing the following:
>>
>> doc1.add(new Field("allText",searchText,Store.NO,Index.TOKENIZED));
>> indexWriter.add(doc1);
>>
>> This seems to build an incorrect index.  I know this is true because I
> can
>> see entries that are not indexed.  The searching and indexing have been
>> minorly editted from the v1.4 code and seem to be correct.
>>
>> My query is like the following if this helps too:
>>
>> final Query query;
>> QueryParser qp = new QueryParser(fieldToSearch,analyzer);
>> query = qp.parse(search);
>>
>> Thanks for your help.
>> --
>> View this message in context: http://www.nabble.com/Upgrading-1.4-
>> to-2.0---Indexing-Issue.-tf2358182.html#a6569399
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Upgrading-1.4-to-2.0---Indexing-Issue.-tf2358182.html#a6570180
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by djd0383 <dd...@formos.com>.

Well I figured it out...

After getting Luke I was able to see that the values were all being indexed
correctly.  The problem came when take the ids from the index and running it
agains the database.  It was in fact searching the db on the Lucene indices
instead of the db index.  A simple parse and it works.

I would like to thank you for your help.



Doron Cohen wrote:
> 
>> I have updated my doc.add() to use Store.YES...
> 
> So I understand this did not help.
> 
>> I am currently searching for "test" which makes 'search' = "test*'.  Also
> I
>> do not remember the exact string for 'searchText' but it did start with
>> "test" in one occurrence.
> 
>> I can use the debugger (and step through the index process) and see that
>> there is at least one occurence where a 'searchText' is added which
> contains
>> "test".  The problem though is that this one is not in the results when
>> searching.  Mean that I have found that these values are indexed (as far
> as
>> I know - indexWriter adds it), but when searching against them, they are
> not
>> coming up in the search.
> 
> I am not aware of differences with prefix queries that could cause this,
> perhaps others in the list (btw this seems more like a 'user list' issue
> than a 'dev list' issue) have an idea.
> 
> ... mmm ... just a thought - after changing to Store.YES, did you start
> the
> "test" from scratch, or did you continue to use an existing index> Because
> if your application has the logic of updating an index by searching for a
> document, then modifying the document found at search (e.g. adding a
> field), then adding the modified document to the index - this would not
> work if that document was already added to the index with Store.NO. Just a
> thought.
> 
> Otherwise I can think of 3 ways to proceed:
> 
> 1) use Luke to examine the content of the index. See what tokens are there
> really in the index: is the document in question listed for the tokens you
> expect it to be listed?
> 
> 2) print the query before the search that fails to find that document:
> does
> it contain the tokens you expect it to? Do they match with what you saw
> with Luke?
> 
> 3) provide here a short and simple stand-alone program that demonstrates
> the problem. My experience is that I often learn more on the problem (and
> on Lucene) from just trying to reproduce the problem in an isolated
> manner.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Upgrading-1.4-to-2.0---Indexing-Issue.-tf2358182.html#a6604601
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by Doron Cohen <DO...@il.ibm.com>.

> I have updated my doc.add() to use Store.YES...

So I understand this did not help.

> I am currently searching for "test" which makes 'search' = "test*'.  Also
I
> do not remember the exact string for 'searchText' but it did start with
> "test" in one occurrence.

> I can use the debugger (and step through the index process) and see that
> there is at least one occurence where a 'searchText' is added which
contains
> "test".  The problem though is that this one is not in the results when
> searching.  Mean that I have found that these values are indexed (as far
as
> I know - indexWriter adds it), but when searching against them, they are
not
> coming up in the search.

I am not aware of differences with prefix queries that could cause this,
perhaps others in the list (btw this seems more like a 'user list' issue
than a 'dev list' issue) have an idea.

... mmm ... just a thought - after changing to Store.YES, did you start the
"test" from scratch, or did you continue to use an existing index> Because
if your application has the logic of updating an index by searching for a
document, then modifying the document found at search (e.g. adding a
field), then adding the modified document to the index - this would not
work if that document was already added to the index with Store.NO. Just a
thought.

Otherwise I can think of 3 ways to proceed:

1) use Luke to examine the content of the index. See what tokens are there
really in the index: is the document in question listed for the tokens you
expect it to be listed?

2) print the query before the search that fails to find that document: does
it contain the tokens you expect it to? Do they match with what you saw
with Luke?

3) provide here a short and simple stand-alone program that demonstrates
the problem. My experience is that I often learn more on the problem (and
on Lucene) from just trying to reproduce the problem in an isolated manner.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by djd0383 <dd...@formos.com>.

I have updated my doc.add() to use Store.YES...

>> 'search' is the desired search string
>> 'searchText' is the choosen indexing string
>> -both these strings are correctly entered

>I realized that. I was asking about an example of theactual strings that
>demostrates the problem.

I am currently searching for "test" which makes 'search' = "test*'.  Also I
do not remember the exact string for 'searchText' but it did start with
"test" in one occurrence.

>What is the wrong behavior you see? Is it that the same query text did not
>return the expected documents? Or do you see some results without, say, the
>correct summary? If this one is the case, most likely your application is
>counting on stored text so the change above might fix that.

I can use the debugger (and step through the index process) and see that
there is at least one occurence where a 'searchText' is added which contains
"test".  The problem though is that this one is not in the results when
searching.  Mean that I have found that these values are indexed (as far as
I know - indexWriter adds it), but when searching against them, they are not
coming up in the search.

>Yes and no. This should not cause the problem you now have. But this is not
>efficient, as opening a searcher takes time, and getting it to speed
>(warming up 'system IO') takes longer. You would see faster search if you
>use keep a single searcher instance to be used for queries (you can use the
>same searcher for concurrent searches) and re-open that searcher once in a
>while - when the index was updated. There were several discussions on this
>in the mailing list, and I think the FAQ also mentions this.

Thank you for the idea.  I think I will run this by the higher ups to see if
I can do this task next.


I hope this gives you a better idea of what I am trying to accomplish. 
Thank you for your help.




Doron Cohen wrote:
> 
> djd0383 <dd...@formos.com> wrote on 29/09/2006 11:20:37:
>> This all worked fine in v1.4 when using:
>>   doc.add(Field.Text("allText", searchColumns));
> 
> The equivalent of 1.4 use
>   doc.add(Field.Text("allText", searchColumns));
> would be with 2.0:
>   doc1.add(new Field("allText", searchColumns, Store.YES,
> Index.TOKENIZED));
> 
> So not storing the field content is one difference from your 1.4 code. I
> am
> not sure this is the problem cause, since I cannot tell if your
> application
> is using the stored content at all. But give it a try.
> 
>> This seems to build an incorrect index.  I know this is true because I
>> can see entries that are not indexed.  The searching and indexing have
> been
>> minorly editted from the v1.4 code and seem to be correct.
> 
> What is the wrong behavior you see? Is it that the same query text did not
> return the expected documents? Or do you see some results without, say,
> the
> correct summary? If this one is the case, most likely your application is
> counting on stored text so the change above might fix that.
> 
>> When you say open reader/searcher after closing writer.  I am currently
>> creating an instance of these after creating the query while a user is
>> searching.  Is this good enough?
> 
> Yes and no. This should not cause the problem you now have. But this is
> not
> efficient, as opening a searcher takes time, and getting it to speed
> (warming up 'system IO') takes longer. You would see faster search if you
> use keep a single searcher instance to be used for queries (you can use
> the
> same searcher for concurrent searches) and re-open that searcher once in a
> while - when the index was updated. There were several discussions on this
> in the mailing list, and I think the FAQ also mentions this.
> 
>> 'search' is the desired search string
>> 'searchText' is the choosen indexing string
>> -both these strings are correctly entered
> 
> I realized that. I was asking about an example of theactual strings that
> demostrates the problem.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Upgrading-1.4-to-2.0---Indexing-Issue.-tf2358182.html#a6573081
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by Doron Cohen <DO...@il.ibm.com>.

djd0383 <dd...@formos.com> wrote on 29/09/2006 11:20:37:
> This all worked fine in v1.4 when using:
>   doc.add(Field.Text("allText", searchColumns));

The equivalent of 1.4 use
  doc.add(Field.Text("allText", searchColumns));
would be with 2.0:
  doc1.add(new Field("allText", searchColumns, Store.YES,
Index.TOKENIZED));

So not storing the field content is one difference from your 1.4 code. I am
not sure this is the problem cause, since I cannot tell if your application
is using the stored content at all. But give it a try.

> This seems to build an incorrect index.  I know this is true because I
> can see entries that are not indexed.  The searching and indexing have
been
> minorly editted from the v1.4 code and seem to be correct.

What is the wrong behavior you see? Is it that the same query text did not
return the expected documents? Or do you see some results without, say, the
correct summary? If this one is the case, most likely your application is
counting on stored text so the change above might fix that.

> When you say open reader/searcher after closing writer.  I am currently
> creating an instance of these after creating the query while a user is
> searching.  Is this good enough?

Yes and no. This should not cause the problem you now have. But this is not
efficient, as opening a searcher takes time, and getting it to speed
(warming up 'system IO') takes longer. You would see faster search if you
use keep a single searcher instance to be used for queries (you can use the
same searcher for concurrent searches) and re-open that searcher once in a
while - when the index was updated. There were several discussions on this
in the mailing list, and I think the FAQ also mentions this.

> 'search' is the desired search string
> 'searchText' is the choosen indexing string
> -both these strings are correctly entered

I realized that. I was asking about an example of theactual strings that
demostrates the problem.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by djd0383 <dd...@formos.com>.

Another note:

This all worked fine in v1.4 when using:

doc.add(Field.Text("allText", searchColumns));
doc.add(Field.Keyword(LuceneSearchIndex.ID, item.getId().toString()));

and:

query = QueryParser.parse(search, fieldToSearch, analyzer);

Thanks.



Doron Cohen wrote:
> 
> Two quicks things I can think of:
> - make sure that 'fieldToSearch' == "allText"
> - make sure writer is closed after all docs added and then open the
> reader/searcher
> 
> Otherwise, can you provide more info:
> - at least one example where it "doesn't work":
>   - 'searchText' - the text of the field added to the document, that
> should
> be returned by the query but is not.
>   - 'search'  - the query text, for the query that should find that
> document, but does not.
> - which analyzer is used at search
> - which analyzer used at indexing
> 
> 
> djd0383 <dd...@formos.com> wrote on 29/09/2006 10:28:18:
>>
>> I am in the process of trying to upgrade to v2.0 from v1.4 and am having
>> trouble building my index.  For each of the various entries in the
> database,
>> I am more or less doing the following:
>>
>> doc1.add(new Field("allText",searchText,Store.NO,Index.TOKENIZED));
>> indexWriter.add(doc1);
>>
>> This seems to build an incorrect index.  I know this is true because I
> can
>> see entries that are not indexed.  The searching and indexing have been
>> minorly editted from the v1.4 code and seem to be correct.
>>
>> My query is like the following if this helps too:
>>
>> final Query query;
>> QueryParser qp = new QueryParser(fieldToSearch,analyzer);
>> query = qp.parse(search);
>>
>> Thanks for your help.
>> --
>> View this message in context: http://www.nabble.com/Upgrading-1.4-
>> to-2.0---Indexing-Issue.-tf2358182.html#a6569399
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Upgrading-1.4-to-2.0---Indexing-Issue.-tf2358182.html#a6570244
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Upgrading 1.4 to 2.0 - Indexing Issue.

Posted by Doron Cohen <DO...@il.ibm.com>.

Two quicks things I can think of:
- make sure that 'fieldToSearch' == "allText"
- make sure writer is closed after all docs added and then open the
reader/searcher

Otherwise, can you provide more info:
- at least one example where it "doesn't work":
  - 'searchText' - the text of the field added to the document, that should
be returned by the query but is not.
  - 'search'  - the query text, for the query that should find that
document, but does not.
- which analyzer is used at search
- which analyzer used at indexing


djd0383 <dd...@formos.com> wrote on 29/09/2006 10:28:18:
>
> I am in the process of trying to upgrade to v2.0 from v1.4 and am having
> trouble building my index.  For each of the various entries in the
database,
> I am more or less doing the following:
>
> doc1.add(new Field("allText",searchText,Store.NO,Index.TOKENIZED));
> indexWriter.add(doc1);
>
> This seems to build an incorrect index.  I know this is true because I
can
> see entries that are not indexed.  The searching and indexing have been
> minorly editted from the v1.4 code and seem to be correct.
>
> My query is like the following if this helps too:
>
> final Query query;
> QueryParser qp = new QueryParser(fieldToSearch,analyzer);
> query = qp.parse(search);
>
> Thanks for your help.
> --
> View this message in context: http://www.nabble.com/Upgrading-1.4-
> to-2.0---Indexing-Issue.-tf2358182.html#a6569399
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org