You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (Created) (JIRA)" <ji...@apache.org> on 2011/12/09 10:52:40 UTC

[jira] [Created] (LUCENE-3628) Cut Norms over to IndexDocValues

Cut Norms over to IndexDocValues
--------------------------------

                 Key: LUCENE-3628
                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/index, core/search
    Affects Versions: 4.0
            Reporter: Simon Willnauer
             Fix For: 4.0


since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3628) Cut Norms over to IndexDocValues

Posted by "Simon Willnauer (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer reassigned LUCENE-3628:
---------------------------------------

    Assignee: Simon Willnauer
    
> Cut Norms over to IndexDocValues
> --------------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180482#comment-13180482 ] 

Simon Willnauer commented on LUCENE-3628:
-----------------------------------------

I am going to commit this soonish if nobody objects
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

new patch with "svn diff --show-copies-as-adds"
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

here is another patch removing unused imports of Lucene3x codecs internal in Lucene4Codec & renamed IOUtils#deleteFiles into deleteFilesIgnoringExceptions(...)
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180499#comment-13180499 ] 

Robert Muir commented on LUCENE-3628:
-------------------------------------

I basically agree with Mike's comments, but we can do this on a followup issue. I think its nice as a step for now that you dont 
have to write merge() to make a DV impl.

In my opinion the default merge() impl should just form a MultiDocValues of subs and call add() just as flush would, and not 
have "hooks" for subclasses to do bulk merging.
Subclasses just override the entire merge() to do bulk-merging (or maybe other codec-specific optimizations they have)

                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

{quote} I think the problem is that MultiDocValues no longer has an array, so hasArray() returns false? If that's the case, I agree that we should leave it as it is for now and simply wait until norms() gets obsolete in a later issue.{quote}

this is the case! Yet, this patch implements getArray / hasArray in MultiDocValues which allow to move the norms(String) method up to IR (its final now). I moved MultiNorms to the test-framework since some tests still need that. We can remove it once norms() is gone. I also added tests for get/hasArray in TestDocValuesIndexing. All test pass, I think this is ready; I will add a changes entry.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Lucene Fields: New,Patch Available  (was: New)
          Summary: Cut Norms over to DocValues  (was: Cut Norms over to IndexDocValues)
    
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179533#comment-13179533 ] 

Robert Muir commented on LUCENE-3628:
-------------------------------------

thanks for working on this!

at a glance I really like it. I totally agree that the sim changes should be deferred to a separate issue: in that issue I think we should also nuke the byte[] norms(String) method so we only have the dv api (maybe we even do that first on the followup issue, before the more interesting changes).

I didnt look too much at the codec api or the default merging or whatever, but we really need this. If simpletext in this patch can have a norms-only DV api and merge it etc, then it must not be too far away from having a full DV impl.


                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179593#comment-13179593 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

Hi,
much more readable, thanks!

Just one question: The Lucene3xNormsConsumer could be theoretically moved to the PreflexRWCodec in test-framework, or am I missing something?
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

next iteration...

{quote}Though, why do
we need to add the 3 per-type add methods...? Can we somehow use the
existing add that takes a DocValue...? {quote}
those were pulled up during refactoring. I added them to enable the default merging. This patch marks them as protected so they are really an impl detail and not a public API part.

bq. The PerDocConsumer.pull seems a bit odd... 
renamed to getDocValuesToMerge

bq. This change is curious:
I catch the exception there since its only the assert that triggers this so it should not be part of the method signature. I wrapped it in a RuntimeEx in this patch.

This patch also reduces the DocValue.java interface to gettters. the setters are really part of the DocValuesField and are not needed inside the implementation. Nobody should modify a DocValue instance. I also added a fix for the exception mike reported. I can't reproduce it locally anymore.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

another patch delegating to normValues in IR#norms(field) as uwe suggested.

bq. Maybe this method impl should be in the top-level IndexReader class as final method that delegates to the abstract normValues?

I looked into this but subclasses override this and to tricky things like ParallelReader uses MultiNorms internally etc. I think since this method will go away anyway and consumers should call normValues(field) instead and then decide how they use it instead of forcing an array we should just leave this as it is for now. I can certainly move it up to IR but making it final is going to be tricky and likely not worth the trouble.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

next iteration moving the Lucene3xNormsConsumer into Preflex and throwing IAE if Lucene3x codec is used for writing. I also added some more javadoc comments and a missing licence header.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179541#comment-13179541 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

Hi I did not yet understand the patch completely, only two things recognized without close review:
- Lucene40Codec now imports 3x NormsFormat but it should not use it, I think thats obsolete but maybe there is some backwards compatibility invisible to me
- byte[] SegmentReader.norms() has less null checks than the new DocValues normsValues() method. Maybe the "old" norms() using byte should simply delegate to the new method, and return the inner byte[]:

{code:java}
public byte[] norms(String field) throws IOException {
  // ensureOpen() is called by normValues():
  final DocValues docValues = this.normValues(field);
  if (docValues != null) {
    Source source = docValues.getSource();
    assert source.hasArray(); // TODO cut over to source
    return (byte[])source.getArray();  
  }
  return null;
}
{code}

Maybe this method impl should be in the top-level IndexReader class as final method that delegates to the abstract normValues?

Otherwise initial patch looks fine!
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180253#comment-13180253 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

bq. I looked into this but subclasses override this and to tricky things like ParallelReader uses MultiNorms internally etc

This method could theoretically go (and must as final in IR). ParallelReader and SlowMultiReader will also delegate to normValues(), which internally uses MultiDocValues. This should behave identical and MultiNorms would be useless at all and could be deleted (that was the idea). *BUT:*

I think the problem is that MultiDocValues no longer has an array, so hasArray() returns false? If that's the case, I agree that we should leave it as it is for now and simply wait until norms() gets obsolete in a later issue.

Patch looks good!
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Karl Wettin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182705#comment-13182705 ] 

Karl Wettin commented on LUCENE-3628:
-------------------------------------

huzzah!
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179637#comment-13179637 ] 

Simon Willnauer commented on LUCENE-3628:
-----------------------------------------

bq. Just one question: The Lucene3xNormsConsumer could be theoretically moved to the PreflexRWCodec in test-framework, or am I missing something?

yeah good call... I will move this impl to preflex and throw an exception in the 3x impl.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179554#comment-13179554 ] 

Michael McCandless commented on LUCENE-3628:
--------------------------------------------

Simon are you using svn 1.7...?  If so, can you regen the patch with "svn diff --show-copies-as-adds"?  Thanks!
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180479#comment-13180479 ] 

Michael McCandless commented on LUCENE-3628:
--------------------------------------------

{quote}
bq.Though, why do we need to add the 3 per-type add methods...? Can we somehow use the existing add that takes a DocValue...?

those were pulled up during refactoring. I added them to enable the default merging. This patch marks them as protected so they are really an impl detail and not a public API part.
{quote}

OK... really, ideally (I think?) the base "merge" method should only
use the public API (ie the very same API indexing/flushing uses when
writing a new segment).  This is how our other base merge impls
(stored fields, term vectors, postings) work.  But let's open a
separate issue for this...

bq. I also added a fix for the exception mike reported. I can't reproduce it locally anymore.

Looks fixed -- I ran for 135 iterations and no fail (previously it
failed in first 10 or so...).


                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Simon Willnauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3628.
-------------------------------------

    Resolution: Fixed

Committed in revision 1227676

thanks guys

                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180301#comment-13180301 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

bq. I think we should fix the DirectoryReader test to not use MultiNorms, maybe its a specific test only for MultiNorms, so obsolete? I will look into it!

There are more tests, but this can be easily fixed later and MultiNorms nuked (extract from TestDuellingCodecs):

{code:java}
byte[] leftNorms = MultiNorms.norms(leftReader, field);
{code}

replace by:

{code:java}
byte[] leftNorms = new SlowMultiReaderWrapper(leftReader).norms(field);
{code}

*The current patch is in my opinion fine to commit, +1. Nice improvement!*
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179593#comment-13179593 ] 

Uwe Schindler edited comment on LUCENE-3628 at 1/4/12 4:14 PM:
---------------------------------------------------------------

Hi,
much more readable, thanks!

Just one question: The Lucene3xNormsConsumer could be theoretically moved to the PreflexRWCodec in test-framework, or am I missing something?

Means, its impl in Lucene3xNormsFormat should look like this from Lucene3xPostingsFormat:

{code:java}
@Override
public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException {
  throw new IllegalArgumentException("this codec can only be used for reading");
}
{code}
                
      was (Author: thetaphi):
    Hi,
much more readable, thanks!

Just one question: The Lucene3xNormsConsumer could be theoretically moved to the PreflexRWCodec in test-framework, or am I missing something?
                  
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179543#comment-13179543 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

BTW: I like IOUtils.deleteFiles *g*, should maybe renamed IOUtils.deleteFilesIgnoringExceptions()
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180294#comment-13180294 ] 

Uwe Schindler commented on LUCENE-3628:
---------------------------------------

I like the new code in MultiDocValues. The use of reflect.Array and getComponentType/newInstance is fine here, as its only executed when you actually request the merged array, which is expensive by default. If you access DocValues through the standard API, not direct via array, all is as usual. Merging the arrays is slow by default - so who cares? (<- this note is for [~rcmuir]).

I think we should fix the DirectoryReader test to not use MultiNorms, maybe its a specific test only for MultiNorms, so obsolete? I will look into it!

Additional note from discussion with [~simonw] on IRC: Maybe we should have the same DV cache in SlowMultiReaderWrapper for non-norm DocValues, too.
                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3628) Cut Norms over to DocValues

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180435#comment-13180435 ] 

Michael McCandless commented on LUCENE-3628:
--------------------------------------------

Patch looks great!  What an awesome step forward... once we later fix
sim to be able to do whatever it wants at indexing time (eg, use a
4-byte float), then apps are free to create arbitrary "norms" per doc!
Wonderful...

It's nice we now have a default merge base impl for DocValues, which
the 4.0 codec then overrides with it's low-RAM impls.  Though, why do
we need to add the 3 per-type add methods...?  Can we somehow use the
existing add that takes a DocValue...?  (We can also fix this up
separately after committing...).

The PerDocConsumer.pull seems a bit odd... we need this just because
we don't know if we are merging .normValues vs .docValues right?
Maybe rename pull to getDocValuesToMerge?

This change is curious:
{noformat}
+            try {
             assert dir.fileExists(IndexFileNames.segmentFileName(filename, "",
                 Writer.INDEX_EXTENSION));
+            } catch (IOException e) {
+            }
+            break;
{noformat}

... what motivated that?  Does MDW sometimes throw errant exceptions
in fileExists...?  Or some test was failing...?

Hmm I hit an exc in TestSearcherManager, but it doesn't always
reproduce:

{noformat}
NOTE: reproduce with: ant test -Dtestcase=TestSearcherManager -Dtestmethod=testSearcherManager -Dtests.seed=9ddad32eae88e7f:4561a85cf046520e:1f76c8c586f3eb2d -Dargs="-Dfile.encoding=UTF-8"
ENOTE: test params are: codec=Lucene40: {extra29=PostingsFormat(name=MockSep), extra28=PostingsFormat(name=Lucene40WithOrds), body=MockVariableIntBlock(baseBlockSize=43), extra27=PostingsFormat(name=NestedPulsing), extra26=MockVariableIntBlock(baseBlockSize=43), extra25=MockFixedIntBlock(blockSize=1841), extra24=Lucene40(minBlockSize=42 maxBlockSize=86), extra23=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra22=PostingsFormat(name=MockRandom), packID=MockVariableIntBlock(baseBlockSize=43), date=PostingsFormat(name=MockRandom), docid=PostingsFormat(name=MockSep), title=PostingsFormat(name=Lucene40WithOrds), extra20=PostingsFormat(name=Lucene40WithOrds), extra21=PostingsFormat(name=MockSep), extra38=PostingsFormat(name=NestedPulsing), extra8=PostingsFormat(name=MockRandom), extra12=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra37=MockVariableIntBlock(baseBlockSize=43), extra11=PostingsFormat(name=MockRandom), extra9=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra14=MockFixedIntBlock(blockSize=1841), extra13=Lucene40(minBlockSize=42 maxBlockSize=86), extra39=PostingsFormat(name=Lucene40WithOrds), extra34=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra16=PostingsFormat(name=NestedPulsing), extra15=MockVariableIntBlock(baseBlockSize=43), extra33=PostingsFormat(name=MockRandom), extra36=MockFixedIntBlock(blockSize=1841), extra18=PostingsFormat(name=MockSep), extra35=Lucene40(minBlockSize=42 maxBlockSize=86), extra17=PostingsFormat(name=Lucene40WithOrds), extra0=PostingsFormat(name=MockRandom), thisCodeMakesAbsolutelyNoSenseCanWeDeleteIt=PostingsFormat(name=NestedPulsing), extra1=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra2=Lucene40(minBlockSize=42 maxBlockSize=86), extra3=MockFixedIntBlock(blockSize=1841), extra5=PostingsFormat(name=NestedPulsing), extra6=PostingsFormat(name=Lucene40WithOrds), extra7=PostingsFormat(name=MockSep), titleTokenized=Pulsing40(freqCutoff=13 minBlockSize=3 maxBlockSize=44), extra30=PostingsFormat(name=NestedPulsing), extra31=PostingsFormat(name=Lucene40WithOrds), extra32=PostingsFormat(name=MockSep), extra10=PostingsFormat(name=MockSep)}, sim=RandomSimilarityProvider(queryNorm=true,coord=true): {extra29=BM25(k1=1.2,b=0.75), extra28=DFR I(ne)L2, body=DFR I(F)2, extra27=DFR I(F)B3(800.0), extra26=DFR I(n)LZ(0.3), extra25=DFR I(n)3(800.0), extra24=DFR GB2, extra23=DFR I(n)Z(0.3), extra22=DFR I(F)L2, packID=IB SPL-D2, date=DFR I(F)1, docid=DFR GL2, title=DFR I(n)2, extra20=DFR GLZ(0.3), extra21=IB SPL-L2, extra38=DefaultSimilarity, extra8=DFR I(ne)2, extra12=DefaultSimilarity, extra11=IB SPL-D2, extra37=IB SPL-D2, extra9=IB LL-L2, extra14=IB SPL-D1, extra13=DFR I(ne)B3(800.0), extra39=DFR I(ne)B3(800.0), extra16=IB SPL-LZ(0.3), extra34=IB SPL-L1, extra15=DFR I(n)L2, extra33=IB LL-L1, extra18=DFR I(ne)B2, extra36=DFR I(ne)L1, extra35=DFR I(ne)1, extra17=DFR I(n)L3(800.0), extra0=IB SPL-D2, extra1=DefaultSimilarity, extra2=DFR I(ne)B3(800.0), extra3=IB SPL-D1, extra5=IB SPL-LZ(0.3), extra6=DFR I(n)L3(800.0), extra7=DFR I(ne)B2, titleTokenized=DFR I(ne)B1, extra30=DFR I(ne)Z(0.3), extra31=DFR I(n)1, extra32=IB LL-D2, extra10=DFR GB3(800.0)}, locale=pt, timezone=VST
NOTE: all tests run in this JVM:
[TestShardSearching, TestComplexExplanations, TestTieredMergePolicy, TestLongPostings, TestFieldCacheRewriteMethod, TestSlowCollationMethods, TestDocsAndPositions, TestSearcherManager]

java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_6_nrm.cfs=1}
	at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:545)
	at org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase.runTest(ThreadedIndexingAndSearchingTestCase.java:629)
	at org.apache.lucene.search.TestSearcherManager.testSearcherManager(TestSearcherManager.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
	at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:24)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
	at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98)
	at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53)
	at org.junit.runner.JUnitCore.main(JUnitCore.java:45)
Caused by: java.lang.RuntimeException: unclosed IndexOutput: _6_nrm.cfs
	at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:469)
	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:441)
	at org.apache.lucene.store.CompoundFileWriter.getOutput(CompoundFileWriter.java:124)
	at org.apache.lucene.store.CompoundFileWriter.createOutput(CompoundFileWriter.java:260)
	at org.apache.lucene.store.CompoundFileDirectory.createOutput(CompoundFileDirectory.java:290)
	at org.apache.lucene.codecs.lucene40.values.Bytes$BytesWriterBase.getOrCreateDataOut(Bytes.java:257)
	at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$Writer.merge(FixedStraightBytesImpl.java:138)
	at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:90)
	at org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:57)
	at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:391)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:127)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3630)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3258)
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
{noformat}


                
> Cut Norms over to DocValues
> ---------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch, LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3628) Cut Norms over to IndexDocValues

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3628:
------------------------------------

    Attachment: LUCENE-3628.patch

Here is a first patch to cut over norms to DocValues. IndexReader now exposes a DocValues normValues(String) method in addition to the still present byte[] norms(String) method. On the codec level I moved all norm value buffering into the codec and remove the NormsWriter/Reader interface entirely in favor of PerDocProducer/Consumer. 

Lucene3x codec has now a private norms impl that reads and writes the old format exposed as DocValues. It basically works as it used to work except of the buffering which is now codec private.

Lucene4 & Sep codec now uses DocValues directly currently still bound to a single byte norm value. I didn't want to include changes to the similarity since they are kind of orthogonal to this issue and the patch is big enough already. 

SimpleText has its own simple norms impl also exposed via the DocValues API. To make this entire thing simpler I moved around some code in the values package to make merging work out of the box without writing any merge code if you don't want to do bulk copies etc. 

this is quite a big change... review welcome!
                
> Cut Norms over to IndexDocValues
> --------------------------------
>
>                 Key: LUCENE-3628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3628
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3628.patch
>
>
> since IR is now fully R/O and norms are inside codecs we can cut over to use a IDV impl for writing norms. LUCENE-3606 has some [ideas|https://issues.apache.org/jira/browse/LUCENE-3606?focusedCommentId=13160559&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13160559] about how this could be implemented

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org