You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2011/01/03 20:31:45 UTC

[jira] Created: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

omitTF is viral, but omitNorms is anti-viral.
---------------------------------------------

                 Key: LUCENE-2846
                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Robert Muir


omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.

but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
now you suddenly have a million 'real norms'.

I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
but another option is to make omitTF anti-viral, which is more "schemaless" i guess.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980171#action_12980171 ] 

Michael McCandless commented on LUCENE-2846:
--------------------------------------------

bq. Couldn't we investigate a sparse data structure to be used instead?

This would be very interesting to explore.  The fact that norms are dense, on disk and in memory, can cause horrific problems like your index taking much much more disk space and RAM on optimize/big merge finishing.

But I think that's orthogonal to the improvements here?  Ie this issue removes fake norms, invalid uses of default Sim, etc.

Also, I would worry about the lookup cost of sparse vectors in RAM -- looking up the norm per doc is a severe hotspot on Lucene.

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980164#action_12980164 ] 

Robert Muir commented on LUCENE-2846:
-------------------------------------

bq. I get the comparison to omitTF, but this functionality has been around a long time.

Can you explain what functionality you are losing with this patch?

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2846:
--------------------------------

    Attachment: LUCENE-2846.patch

ok here's the final patch: with a reasonably good test and docs.

I think this one is ready to commit: gets rid of fake norms and use of the static Similarity.getDefault()


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976947#action_12976947 ] 

Michael McCandless commented on LUCENE-2846:
--------------------------------------------

+1 for omitNorms to be viral.

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2846:
--------------------------------

    Attachment: LUCENE-2846.patch

sorry i had a piece of backwards logic in MultiNorms.

of course all tests pass either way, which is why we need a good mixed-schema test (with RIW) 
for this issue before it can go in (no matter what we do)

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979331#action_12979331 ] 

Robert Muir commented on LUCENE-2846:
-------------------------------------

an alternative to totally clear up the faking here that mike thought of:

If we can somehow differentiate between omitNorms (null), and 'doesnt have field' (say, exception),
we wouldn't need to fake. In multinorms we could then safely return null if any reader returns null,
but throw an exception if all readers throw an exception.


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2846:
--------------------------------

    Attachment: LUCENE-2846.patch

here's an updated patch:
* The IR.setNorm(float) is also removed, forcing the user to use the correct similarity versus us using the wrong one (the static)
* MultiNorms doesn't fake norms anymore, instead it handles the case of non-existent field versus omitted norms.
* When a document doesnt have a field, its (undefined) norms are written as zero bytes instead of Similarity.getDefault().encodeNorm(1f). 
* All uses of Similarity.get/setDefault are now gone in lucene core, except for in IndexSearcher and IndexWriterConfig.


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2846.
---------------------------------

    Resolution: Fixed

Committed revision 1058367.

I deprecated the dangerous setNorm(float) method in 3.x in revision 1058370,
instead pointing at setNorm(byte) and using Similarity.encodeNormValue(),
so you can ensure your Similarity is always used (not Similarity.getDefault)


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980147#action_12980147 ] 

Robert Muir commented on LUCENE-2846:
-------------------------------------

For branch 3.x, i would like to deprecate the IndexReader.setNorm(float) based method, so its no surprise when its removed here.

here was the changes entry (so i would insert some text like this in the deprecation): 
{noformat}
LUCENE-2846: Remove the deprecated IndexReader.setNorm(int, String, float).
  This method was only syntactic sugar for setNorm(int, String, byte), but
  using the global Similarity.getDefault().encodeNormValue.  Use the byte-based
  method instead to ensure that the norm is encoded with your Similarity.
{noformat}


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980241#action_12980241 ] 

Robert Muir commented on LUCENE-2846:
-------------------------------------

bq. But I think that's orthogonal to the improvements here? Ie this issue removes fake norms, invalid uses of default Sim, etc.

Right, after talking to Grant, I think his comments are on the larger issues of norms... I am totally not trying to address
this here. This one is just about cleaning up things that are "buggy", especially using Similarity.getDefault when we shouldn't.

The scope of this one (even though its just cleanup) is large enough I think... I'd like to proceed with this unless there are
any objections. We can address real improvements to norms somewhere else, and hopefully this will help make that easier.


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2846:
--------------------------------

    Attachment: LUCENE-2846.patch

here's an initial patch hacked up by mike and I... also removed the "multireader norms" method that 
takes a byte[]+offset from IndexReader.

one oddity is that MultiNorms.norms() always returns a filled byte[] here for non-atomic readers (never null).
But i think this is ok for MultiNorms, its not used in searching (only for SlowMultiReaderWrapper etc)

i think somehow it would be good to have more tests that test "doesnt have field" versus "omits norms",
and also (likely not in this is issue) we should think about IR's norm-setting methods.

I don't like that these use Similarity.getDefault(): it seems we could require you to pass in the Sim for the float case.
I also don't like that we expose a public setNorm that takes a byte value either!

Long-term we should look at pulling this norm-encoding stuff out of Sim... the Sim should just be dealing with floats,
this encoding stuff belongs somewhere else.


> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980156#action_12980156 ] 

Grant Ingersoll commented on LUCENE-2846:
-----------------------------------------

I get the comparison to omitTF, but this functionality has been around a long time.  Why does it have to be all or nothing?  Couldn't we investigate a sparse data structure to be used instead?  We use the current dense approach when a high percentage contain norms and the sparse when less have that amount?  I'm not sure what that data structure is just yet, but over in Mahout we have sparse and dense vectors and we have primitive collections that could be useful.

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2846:
--------------------------------

    Fix Version/s: 4.0

> omitTF is viral, but omitNorms is anti-viral.
> ---------------------------------------------
>
>                 Key: LUCENE-2846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2846
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>
> omitTF is viral. if you add document 1 with field "foo" as omitTF, then document 2 has field "foo" without omitTF, they are both treated as omitTF.
> but omitNorms is the opposite. if you have a million documents with field "foo" with omitNorms, then you add just one document without omitting norms, 
> now you suddenly have a million 'real norms'.
> I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s.
> but another option is to make omitTF anti-viral, which is more "schemaless" i guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org