You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (Created) (JIRA)" <ji...@apache.org> on 2011/11/14 15:24:52 UTC

[jira] [Created] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
------------------------------------------------------------------------------------------------

                 Key: LUCENE-3573
                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/facet
            Reporter: Doron Cohen
            Assignee: Doron Cohen
            Priority: Minor


When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3573:
--------------------------------

    Attachment: LUCENE-3573.patch

Attached patch for trunk adds two tests:
* one of them is opening a new TR and passes
* the other is refreshing the TR and fails.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151177#comment-13151177 ] 

Doron Cohen commented on LUCENE-3573:
-------------------------------------

Thanks for the review and comments Shai, and also thanks for taking care of DTW.rollback() in LUCENE-3579.

I fixed the typos and the == as suggested.

bq. name 'taxoCreateTimeToCommit' just 'taxoCreateTime'

Given your further comment renamed it to 'taxoIndexCreateTime'

bq. Perhaps DirTW.commit() can call commit(null)

I considered this when first coding, as it would have compacted the code. But felt uncomfortable (still do) relying on a non documented behavior of IW.commit(null).

bq. DirectoryTaxonomyReader.DIR_TAXONOMY_CREATE_TIME_PROP should be in DirTW + renaming to INDEX_CREATE_TIME

Done.

bq. add a message to InconsistentTaxonomyException

Added.

bq. But why not fix FIR to override getCommitData in this issue?

Done. Now it feels a bit wrong that this will not appear in lucene/CHANGES since this issue is in lucene/contrib. Guess this is not too bad...?

bq. TR.refresh() should return a boolean indicating anything was changed (issue). I prefer that we change the method signature once...

Good point. Added a test case in TestDirectoryTaxonomyReader for this.

                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-3573.
---------------------------------

    Resolution: Fixed

Committed:
- r1202754 - trunk
- r1203165 - 3x
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151237#comment-13151237 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

{quote}
For fixing this:

    added private doClose() which closes IW and nullifies it, and calls closeResources().
    rollback() calls doClose() instead of close().

Also, rollback() now made synchronized.
{quote}

Good catch. It is dangerous that rollback is not sync'ed when commit/prepare/close are.

+1 to commit !
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151188#comment-13151188 ] 

Doron Cohen commented on LUCENE-3573:
-------------------------------------

Hmm, now that there is a test for LTW.rollback() my changes fail LTW's testRollback() because LTW.close() now may call IW.commit(Map) (which it did not do before my changes).

For fixing this:
- added private doClose() which closes IW and nullifies it, and calls closeResources().
- rollback() calls doClose() instead of close().

Also, rollback() now made synchronized.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150279#comment-13150279 ] 

Doron Cohen commented on LUCENE-3573:
-------------------------------------

bq. So in fact, let's not call it openIfChanged, because may not be meaningful.

Yes this bothered me too. 

bq. so maybe refreshIfChanged?

... let's stick to refresh() (but...)

Current refresh impl is efficient in that (1) arrays only grow if needed (2) caches only cleaned from wrong 'invalid ordinals'. In that, it relies on that the taxonomy can only grow (unless it is recreated, hence this issue).

So I now think it would be best to modify slightly refresh() - in case it detects that the taxonomy was created, it will throw a new (checked) exception - telling the application that this TR cannot be refreshed but the app can open a new TR.

This way there is no 3-way logic - either the TR was refreshed or it was not.

And while we are at this, refresh() is void. I think it would be useful to return boolean, indicating whether any refresh took place.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149684#comment-13149684 ] 

Doron Cohen commented on LUCENE-3573:
-------------------------------------

I agree about keeping the same notions as IR. 

bq. returns null (no changes, or the taxonomy wasn't recreated) 

In fact I was thinking of a different "contract".

So we have two approaches here for the returned value:

* Option A:
## *new TR* - if the taxonomy was recreated.
## *null* - if the taxonomy was either not modified or just grew.

* Option B:
## *new TR* - if the taxonomy was modified (either recreated or just grew)
## *null* - if the taxonomy was not modified.

Option A is simpler to implement, but I think it has two drawbacks:
* it is confusingly different from that of IR
* the fact that the TR was refreshed is hidden from the caller.

Option B is a bit more involved to implement:
* would need to copy arrays' data from old TR to new one in case the taxonomy only grew

I started to implement option B but now rethinking this...

bq. Was there any reason to add it to TestTaxonomyCombined?

Good point, should probably move this to TestDirectoryTaxonomyReader.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3573:
--------------------------------

    Attachment: LUCENE-3573.patch

Patch, in principle ready to commit, though I plan to go through it once more.

In this patch:
* new tests moved to TestDirectoryTaxonomyReader
* an exception added: InconsistentTaxonomyException
* when the reader cannot refresh because the taxonomy was recreated since the last time open/refresh, that exception is thrown and the application should open a fresh taxonomy reader.

Bumped into 3 TODO's while working on this:
* FilterIndexReader does not implement getCommitUserData(). Once this is fixed can resolvethe TODO in TestIndexClose. I'll open an issue later.
* TR.refresh() should return a boolean indicating anything was changed (issue).
* DTW.rollback() seems wrong to me - it rollback the internal IW, which also closes it, but then it refreshes its internal TR, seems wrong...
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150229#comment-13150229 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

I thought about it more, and I really think that we should go with option A. The common case is that the taxonomy index is not recreated, however it may be updated very frequently. TR.refresh() denotes exactly that -- it refreshes the internal state of the TaxonomyReader. This method must be very efficient. I.e., with NRT, people rely on IR.openIfChanged to only open new segments, which is what DTR.refresh() does (calls IR.openIfChanged).

So putting the name aside, we should have the method X() which either returns null (and updates or not the internal state) or returns a new TR. Unfortunately refresh() is not a bad name (reload() sounds more drastic and less efficient), so maybe refreshIfChanged?
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3573:
--------------------------------

    Attachment: LUCENE-3573.patch

Final patch.

Also updated the user-guide about refresh() behavior.

Removed the changes entry - for facet this would go only into 3x.

Planning to commit this soon.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151089#comment-13151089 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

bq. DTW.rollback() seems wrong to me

I modified the method impl to call close() instead of refreshReader() and added an appropriate test to DirTWTest to assert that following rollback, no more actions are allowed on DirTW.

Separately, I think that we should add ensureOpen() to DirTW so that if you call its API after rollback()/close(), you get a proper exception rather than random exceptions (like NPE). I will open an issue for that (and cover the rollback changes there too).
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149687#comment-13149687 ] 

Doron Cohen commented on LUCENE-3573:
-------------------------------------

One more thing 
- In approach B, the fact that the taxonomy just grew simply allows an optimization (read only the new ordinals), and so it is not a part of the API logic, and the only logic is - was the taxonomy modified or not. 
- In approach A, this fact is part of the API logic. 
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150286#comment-13150286 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

bq. This way there is no 3-way logic - either the TR was refreshed or it was not.

+1.

bq. And while we are at this, refresh() is void. I think it would be useful to return boolean, indicating whether any refresh took place.

+1.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151057#comment-13151057 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

Patch looks good ! Few minor comments:

* DirTW: shoudl --> should
* Can we name 'taxoCreateTimeToCommit' just 'taxoCreateTime'?
* OpenMode.CREATE.equals(openMode) can be OpenMode.CREATE == openMode
* Perhaps DirTW.commit() can call commit(null) to reduce code duplication? IndexWriter anyway calls commit(null) in its commit(), so it's not an error to call iw.commit(null).
** Same for prepareCommit()
* I think that DirectoryTaxonomyReader.DIR_TAXONOMY_CREATE_TIME_PROP should be in DirTW since it is the one writing it. We also have this notion in other places in the code, i.e. TermInfosWriter declares the format constants, and other classes reference them.
** Also, how about renaming it to INDEX_CREATE_TIME, since technically it is the index that has been recreated. This is to avoid confusion when storing this time in e.g. the search index's commit data, which we do in order to synchronize the taxo and search indices.
* Maybe for debugging add a message to InconsistentTaxonomyException which will tell us that the taxo has been recreated + the recreate time?

About the other comments:

bq. FilterIndexReader does not implement getCommitUserData(). Once this is fixed can resolvethe TODO in TestIndexClose. I'll open an issue later.

We need a separate issue to write a test which ensures FilterIndexReader overrides all non-final methods of IndexReader. But why not fix FIR to override getCommitData in this issue? It's a simple fix, no need for the TODO (or any change to TestIndexClose) and it is technically related to the issue, as it exposes a bug in FIR.

bq. TR.refresh() should return a boolean indicating anything was changed (issue).

I prefer that we change the method signature once. Why not modify it to return a boolean? Isn't that a simple change?

bq. DTW.rollback() seems wrong to me

I added it when TW was changed to extend TwoPhaseCommit. I'll take a look. Thanks for raising that.
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch, LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149661#comment-13149661 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

+1. I think that we should nuke refresh() and adopt the IR approach, even though I don't like the 'maybe' and 'if', might as well make the API consistent. So instead of refresh() we'll have a static TR.openIfChanged that either returns null (no changes, or the taxonomy wasn't recreated) or a new instance in case it was recreated.

Note that unlike IndexReader, if the taxonomy index wasn't recreated, openIfChanged will modify the internal state of TR. That's ok since the taxonomy index was built for it: existing TR instances (that weren't refreshed) won't be affected as they won't know about the new categories (and taxonomy index doesn't support deletes) and the caller can use the same TR instance in that case.

Whatever we end up doing, we should remove refresh(). Even though we're not committed to back-compat yet (it's all experimental), I think it is dangerous if we'll simply modify refresh() behavior, because users may not be aware of the change. So a new method is a must.

Besides that, the test looks good. Was there any reason to add it to TestTaxonomyCombined?
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149737#comment-13149737 ] 

Shai Erera commented on LUCENE-3573:
------------------------------------

My thinking is that today refresh() modifies the internal state, and does so in an optimized manner (however I think we thought there might be concurrency issues with it). And people (whoever they are) got used to it. And the TaxonomyReader does not imply in any way that it is an index (i.e. we've been thinking about implementing the taxonomy as a serialized FST maybe).

So in fact, let's not call it openIfChanged, because may not be meaningful. We can call it reload() maybe (just to have a better name). Now the question is whether we want to trust javadocs to note that the returned value may not be equals to the given one (ala the old IndexReader.reopen()), or follow the same IR concept where reload() is static.

If we want to follow IR semantics fully, then option B is the right one. But can it be done in an optimized manner? I bet we won't have any concurrency issues with it ...
                
> TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3573
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3573.patch
>
>
> When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore.
> As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org