You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2009/07/25 05:28:51 UTC

[jira] Created: (LUCENE-1761) low level Field metadata is never removed from index

low level Field metadata is never removed from index
----------------------------------------------------

                 Key: LUCENE-1761
                 URL: https://issues.apache.org/jira/browse/LUCENE-1761
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.4.1, 2.4, 2.3.2, 2.3.1, 2.3, 2.2
            Reporter: Hoss Man
            Priority: Minor
         Attachments: LUCENE-1761.patch

with heterogeneous docs, or an index whose fields evolve over time, field names that are no longer used (ie: all docs that ever referenced them have been deleted) still show up when you use IndexReader.getFieldNames.

It seems logical that segment merging should only preserve metadata about fields that actually existing the new segment, but even after deleting all documents from an index and optimizing the old field names are still present.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1761) low level Field metadata is never removed from index

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834567#action_12834567 ] 

Lance Norskog commented on LUCENE-1761:
---------------------------------------

Does this cause any performance cost? 

For example, if the "dead" field had norm vectors, will the norm vector byte-per-document array still be created and maintained?

> low level Field metadata is never removed from index
> ----------------------------------------------------
>
>                 Key: LUCENE-1761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1761
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1
>            Reporter: Hoss Man
>            Priority: Minor
>         Attachments: LUCENE-1761.patch
>
>
> with heterogeneous docs, or an index whose fields evolve over time, field names that are no longer used (ie: all docs that ever referenced them have been deleted) still show up when you use IndexReader.getFieldNames.
> It seems logical that segment merging should only preserve metadata about fields that actually existing the new segment, but even after deleting all documents from an index and optimizing the old field names are still present.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1761) low level Field metadata is never removed from index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated LUCENE-1761:
-----------------------------

    Attachment: LUCENE-1761.patch

test case demonstrating bug.

> low level Field metadata is never removed from index
> ----------------------------------------------------
>
>                 Key: LUCENE-1761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1761
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1
>            Reporter: Hoss Man
>            Priority: Minor
>         Attachments: LUCENE-1761.patch
>
>
> with heterogeneous docs, or an index whose fields evolve over time, field names that are no longer used (ie: all docs that ever referenced them have been deleted) still show up when you use IndexReader.getFieldNames.
> It seems logical that segment merging should only preserve metadata about fields that actually existing the new segment, but even after deleting all documents from an index and optimizing the old field names are still present.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1761) low level Field metadata is never removed from index

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834730#action_12834730 ] 

Michael McCandless commented on LUCENE-1761:
--------------------------------------------

I don't think there will be much perf loss.  Each dead field will cause a FieldInfo instance to be created (which is very small).

Norms won't be loaded unless something explicitly asks for them.  EG if you do a search against the dead field, that will create the 1 byte per doc array.  If you do a sort against the dead field, FieldCache will be populated (which is silly since the values will all be null/0).  But if no searching is done against the fields I believe there's very little cost.

But we really should fix merging to purge fields that don't occur anymore...

> low level Field metadata is never removed from index
> ----------------------------------------------------
>
>                 Key: LUCENE-1761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1761
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1
>            Reporter: Hoss Man
>            Priority: Minor
>         Attachments: LUCENE-1761.patch
>
>
> with heterogeneous docs, or an index whose fields evolve over time, field names that are no longer used (ie: all docs that ever referenced them have been deleted) still show up when you use IndexReader.getFieldNames.
> It seems logical that segment merging should only preserve metadata about fields that actually existing the new segment, but even after deleting all documents from an index and optimizing the old field names are still present.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org