You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Mihai Caraman <ca...@gmail.com> on 2011/11/24 11:59:39 UTC

Taxonomy indexer debug

Hello,

I'm having an issue with using NRT and Tax. After a couple of days of
running continuously , the taxonomyreader doesn't return results anymore
(but taxindex has them). How can i debug this?! does taxonomy index have a
logoutput like indexwriter has? will that be enough or relevant?

Current workaround is to simply restart the application (after that, the
results come up).

Re: Taxonomy indexer debug

Posted by Doron Cohen <cd...@gmail.com>.

> > Could you minimize this to a small stand-alone program that does not work
> > as expected?
>
> This will be hard, because of the bug only appearing after a couple of days
> or more and i'm starting to think that it is triggered by high data
> volumes. I'll try to minimize the code and serve more data to it.
>

OK that would be great.

If indeed this is related to high volumes, I wonder what's the number of
categories in the taxonomy index when the taxo reader stops detecting that
more categories were committed.

Re: Taxonomy indexer debug

Posted by Mihai Caraman <ca...@gmail.com>.

> Could you minimize this to a small stand-alone program that does not work
> as expected?

This will be hard, because of the bug only appearing after a couple of days
or more and i'm starting to think that it is triggered by high data
volumes. I'll try to minimize the code and serve more data to it.


> Any particular reason why not using the same version in all 3?
>
There was a concurrency bug  at some point, and after it was fixed, i got a
night build to use until 3.5 official release.


> Doron
>
> On Mon, Nov 28, 2011 at 1:01 PM, Mihai Caraman <caraman.mihai@gmail.com
> >wrote:
>
> > All packages used: core3.4, queries3.4, facet3.5.
> > Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*.
> >
> > *InitWriters()*
> > writer = new ThreadedIndexWriter
> > taxWriter = new LuceneTaxonomyWriter
> > // because the reader can't start if doesn't have a valid taxIndex
> > directory
> > taxWriter.commit();
> >
> > *InitReaders()*
> > reader =     IndexReader.open(writer, false);
> > taxReader =  new LuceneTaxonomyReader
> >
> > *RefreshTax()*
> > taxWriter.commit();
> > writer.commit();
> > reader = Singleton.reader.reopen();
> > taxReader.refresh();
> >
> > *reopenEverything*()
> > reader.close();
> > taxReader.close();
> > taxWriter.close();
> > writer.close();
> > initWriters();
> > initReaders();
> >
> > I don't think the infostream from the taxWriter would do me any good.
> > because the writer does he's job, he's not stopping from indexing, but
> the
> > taxReader doesn't have access to those new entries.
> >
>

Re: Taxonomy indexer debug

Posted by Doron Cohen <cd...@gmail.com>.

Sequence of operations seems logical, I don't see straight why this does
not work.
Could you minimize this to a small stand-alone program that does not work
as expected? This will allow to recreate the problem here and debug it.
It is interesting that facet 3.5 is used with core 3.4 and queries 3.4. Any
particular reason why not using the same version in all 3?

Doron

On Mon, Nov 28, 2011 at 1:01 PM, Mihai Caraman <ca...@gmail.com>wrote:

> All packages used: core3.4, queries3.4, facet3.5.
> Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*.
>
> *InitWriters()*
> writer = new ThreadedIndexWriter
> taxWriter = new LuceneTaxonomyWriter
> // because the reader can't start if doesn't have a valid taxIndex
> directory
> taxWriter.commit();
>
> *InitReaders()*
> reader =     IndexReader.open(writer, false);
> taxReader =  new LuceneTaxonomyReader
>
> *RefreshTax()*
> taxWriter.commit();
> writer.commit();
> reader = Singleton.reader.reopen();
> taxReader.refresh();
>
> *reopenEverything*()
> reader.close();
> taxReader.close();
> taxWriter.close();
> writer.close();
> initWriters();
> initReaders();
>
> I don't think the infostream from the taxWriter would do me any good.
> because the writer does he's job, he's not stopping from indexing, but the
> taxReader doesn't have access to those new entries.
>

Re: Taxonomy indexer debug

Posted by Mihai Caraman <ca...@gmail.com>.

All packages used: core3.4, queries3.4, facet3.5.
Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*.

*InitWriters()*
writer = new ThreadedIndexWriter
taxWriter = new LuceneTaxonomyWriter
// because the reader can't start if doesn't have a valid taxIndex directory
taxWriter.commit();

*InitReaders()*
reader =     IndexReader.open(writer, false);
taxReader =  new LuceneTaxonomyReader

*RefreshTax()*
taxWriter.commit();
writer.commit();
reader = Singleton.reader.reopen();
taxReader.refresh();

*reopenEverything*()
reader.close();
taxReader.close();
taxWriter.close();
writer.close();
initWriters();
initReaders();

I don't think the infostream from the taxWriter would do me any good.
because the writer does he's job, he's not stopping from indexing, but the
taxReader doesn't have access to those new entries.

Re: Taxonomy indexer debug

Posted by Doron Cohen <cd...@gmail.com>.

>
> However there are at least two issues with this:
> 1) the info would be in the lower level of the internal index writer, and
> not in that of the categories logic.
> 2) one cannot just call super.openIndexWriter(directory, openMode) and
> modify the result before returning it, because once IW is opened it already
> extracted its settings from IndexWriterConfig, and the infoStream for
> example is final.
>

I just found out that the above holds for trunk, but is not true for 3x,
so, assuming it is not trunk being used, overriding this method would be
sufficient, as you can call super.openIW() and, just before returning the
writer, set its info stream by IW.setInfoStream().

Doron

Re: Taxonomy indexer debug

Posted by Doron Cohen <cd...@gmail.com>.

>
> I'm having an issue with using NRT and Tax. After a couple of days of
> running continuously , the taxonomyreader doesn't return results anymore
> (but taxindex has them).


Taxonomy Reader does not support NRT - see
https://issues.apache.org/jira/browse/LUCENE-3441 ("Add NRT support to
TaxonomyReader").

However I assume you are aware this since you commented on that issue.
So perhaps I did not understand the exact problem you are having.
Do you mean you refreshed the taxonomy reader but it did not "see" the new
categories?
Note that at the moment, since it does not support NRT, you need to first
commit() the taxonomy writer.
Is this the case?

If this does not explain the behavior you are seeing, a short code snippet
that demo it would be good, or, for the least, a description of the
sequence of operations that take place.


> How can i debug this?! does taxonomy index have a
> logoutput like indexwriter has? will that be enough or relevant?
>

Not a conveneient one. But there is some way. Far from perfect.
There is an extension point that allows you to control how the taxonomy
writer opens its internal index writer.
The method openIndexWriter(Directory directory, OpenMode openMode) is
protected.
So one can override it and open an index writer in a way that enabled some
info logging.

However there are at least two issues with this:
1) the info would be in the lower level of the internal index writer, and
not in that of the categories logic.
2) one cannot just call super.openIndexWriter(directory, openMode) and
modify the result before returning it, because once IW is opened it already
extracted its settings from IndexWriterConfig, and the infoStream for
example is final.

To workaround 2 above uou could take a look at the code of current
openIndexWriter(Directory directory, OpenMode openMode) implementation,
copy it to your extending class, and just modify the IWC to set the info
stream.

I opened https://issues.apache.org/jira/browse/LUCENE-3596 to track this.

Doron