You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/04/09 05:54:15 UTC

[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager

    [ https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626190#comment-13626190 ] 

Shai Erera commented on LUCENE-3786:
------------------------------------

I actually started working on that a long time ago and had a patch which I never posted because it wasn't ready. Back then I was thinking of how can the search and taxonomy indexes be synced. I.e. in your patch, opening a pair on a Directory assumes that the pair is "valid", which may not be the case at all. For instance, when you commit such pair, you need to first commit IW, and only then TW. That way, TW always contains ordinals that are >= than what IW knows about, in which case they are "valid". So if the commit to TW succeeds and the commit to IW fails, you potentially don't end up in an inconsistent state. However, if an app makes a mistake and commits them in a different order, the pair may not be valid. I'm willing to live with a documentation that says "you should commit the pair like so...".

But, if the app's commit logic is fine, yet it opened IW and TW with OpenMode.CREATE, then the taxonomy will include ordinals that may be completely unrelated to what IW stores, right? For that reason, I created (in the un-posted patch) a taxonomy timestamp class (which we can treat as version or something similar) which is written to both TW's and IW's commitData, and the manager checks for that at initialization to ensure the two actually match.

The taxonomy writer records an index.epoch property on the internal IW commitData, which keeps track of how many times it has been recreated (opened w/ OpenMode.CREATE, or replaceTaxonomy). TaxoReader uses that on openIfChanged, returning a new instance if it is the case. I think I had issues recording that in the IW commitData as well, because you need to first commit IW, but the epoch on TW commitData is unknown until it is committed... and pulling it from DirTaxoWriter's member is dangerous because in between you might get a replaceTaxo call which increments the epoch.

It's not that simple to keep these two in sync, so I put it aside until I have time to get back to it. Thanks for re-initiating!

What do you think about this? This recreate thing is delicate. Since apps can call IW.deleteAll() as well as TW.replaceTaxonomy, I wish that we give a solution that works in all cases, rather than say this manager doesn't work if these methods were called.

I think that we also need an object that manages IW and TW pair, so that a faceted search app calls commit() on it, and it handles the delicate commit order + whatever metadata we need to commit to make these two in sync. I wonder if it should be this manager? Then it can always take IW and TW pair, and offer both the acquire/release logic as well as commit.

About the patch:

* Why does the test uses newFSDirectory? I didn't read through, but can't it work with RAMDirectory too?
* Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed?
* It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx?
                
> Create SearcherTaxoManager
> --------------------------
>
>                 Key: LUCENE-3786
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3786
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 5.0, 4.3
>
>         Attachments: LUCENE-3786.patch
>
>
> If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index.
> This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance.
> I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call:
> {code}
> SearcherTaxoPair pair = manager.acquire();
> try {
>   IndexSearcher searcher = pair.searcher;
>   TaxonomyReader taxoReader = pair.taxoReader;
>   // do something with them
> } finally {
>   manager.release(pair);
>   pair = null;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org