You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (Created) (JIRA)" <ji...@apache.org> on 2011/11/25 19:15:39 UTC

[jira] [Created] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
--------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-3596
                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/facet
            Reporter: Doron Cohen
            Priority: Minor


Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157605#comment-13157605 ] 

Doron Cohen commented on LUCENE-3596:
-------------------------------------

bq. and getIWC (if you intend to add it).
Yes that's what I would like to add.
These docs are missing then anyhow, with or without getIWC(). 
This added extendability is useful although behavior regarding info-stream differs between trunk and 3x - i.e. that in 3x one can set that stream also with current extension point.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Priority: Minor
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3596:
--------------------------------

    Attachment: LUCENE-3596.patch

patch adds the method createIndexWriterConfig(OpenMode openMode) and javadocs for in-order segments merging.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157365#comment-13157365 ] 

Shai Erera commented on LUCENE-3596:
------------------------------------

We should be careful about this though. DirTW requires LogMergePolicy and won't work well with TieredMP, or any other out-of-order-merges MP. We should clearly note that in the javadocs of both openIndexWriter, and getIWC (if you intend to add it).

Other than that, I wonder why was infoStream moved to IWC and made final in IW. But I will discuss that on the dev@ list.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Priority: Minor
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157717#comment-13157717 ] 

Doron Cohen commented on LUCENE-3596:
-------------------------------------

Also, there seems to be a bug in current taxonomy writer test - TestIndexClose - where the IndexWriterConfig's merge policy might allow to merge segments out-of-order. That test calls LTC.newIndexWriterConfig() and it is just by luck that this test have not failed so far.

This is a bad type of failure for an application (is there ever a good type?;)), because by the time the bug is exposed it would show as a wrong facet returned in faceted search, and go figure that late that this is because at an earlier time an index writer was created which allowed out-of-order merging...

Therefore, it would have been useful if, in addition to the javadocs about requiring type of merge policy, we would also throw an exception (IllegalArgument or IO) if the IWC's merge policy allows merging out-of-order. This should be checked in two locations: 
- createIWC() returns
- openIndex() returns, by examining the IWC of the index

The second check is more involved as it is done after the index was already opened, so it must be closed prior to throwing that exception.

However, merge-policy does not have in its "contract" anything like Collector.acceptsDocsOutOfOrder(), so it is not possible to verify this at all.

Adding such a method to MergePolicy seems to me an over-kill, for this particular case, unless there is additional interest in such a declaration?

Otherwise, it is possible to require that the merge policy must be a descendant of LogMergePolicy. This on the other hand would not allow to test this class with other order-preserving policies, such as NoMerge.

So I am not sure what is the best  way to proceed in this regard.

I think there are two options actually:
# just javadoc that fact, and fix the test to always create an order preserving MP.
# add that declaration to MP.

Unless there are opinions favoring the second option I'll go with the first one.

In addition, (this is true for both options) I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode, as it seems wrong - API wise - that one extension point (createIWC) is invoked by another extension point (openIndex) - better have them both be invoked from the constructor, making it harder for someone to, by mistake, totally ignore in createIndex() the value returned by createIWC().
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157718#comment-13157718 ] 

Shai Erera commented on LUCENE-3596:
------------------------------------

bq. I will move the call to createIWC into the constructor and modify openIndex signature to accept an IWC instead of the open mode

+1. This makes sense.

bq. just javadoc that fact, and fix the test to always create an order preserving MP.

I agree. The options are not very appealing and we might work too hard to prevent something that will never happen. The root cause of this issue was the inability to turn on IW's infoStream (which IMO should be changed to a proper logging framework). Besides our tests, I don't think many will override createIWC / openIndexWriter for changing other attributes.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Shai Erera (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158172#comment-13158172 ] 

Shai Erera commented on LUCENE-3596:
------------------------------------

Patch looks good. +1 to commit.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch, LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3596:
--------------------------------

    Attachment: LUCENE-3596.patch

Patch taking approach (1) above, and moving createIWC() to constructor. 

In addition fixed some javadoc comments, and added an assert to the constructor, which, only when assertions are enabled, will verify that the IWC in effect is not an instance of TieredMergePolicy. Imperfect as this is, it at least exposed the problem in current test (fixed to use newLogMP()).

I think this is ready to commit.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch, LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-3596.
---------------------------------

    Resolution: Fixed

Thanks for reviewing Shai.

Committed:
- r1206996 - trunk
- r1207008 - 3x

CHANGES.txt entry is only in 3x because in trunk facet is under modules. I don't like this difference, but there it is.
                
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch, LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

Posted by "Doron Cohen (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-3596:
-----------------------------------

    Assignee: Doron Cohen
    
> DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3596.patch
>
>
> Current protected openIndexWriter(Directory directory, OpenMode openMode) does not provide access to the IWC it creates.
> So extensions must reimplement this method completely in order to set e.f. info stream for the internal index writer.
> This came up in [user question: Taxonomy indexer debug |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org