You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/07/18 11:08:49 UTC

[jira] Created: (LUCENE-2548) Remove all interning of field names from flex API

Remove all interning of field names from flex API
-------------------------------------------------

                 Key: LUCENE-2548
                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Uwe Schindler
             Fix For: 4.0


In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.

Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889612#action_12889612 ] 

Robert Muir commented on LUCENE-2548:
-------------------------------------

Uwe, but removing intern() from Term is almost just as bad as removing Term, because we at least have to review all uses (e.g. Solr) and see if it would cause incorrect code (e.g. == comparison that is suddenly wrong) or performance problems in containers sorting terms or anything of the like?

Again, I don't personally have an opinion either way, I just mentioned why I didn't remove it, its like Token, still lots of code using it :) 


> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889755#action_12889755 ] 

Shai Erera commented on LUCENE-2548:
------------------------------------

Ohh, I see. I don't remember if I ever relied on interning for other purposes, but if that's the only reason, then I agree there's no point in interning anymore. But perhaps we should allow that through another API, in case someone relies on it elsewhere?

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889610#action_12889610 ] 

Robert Muir commented on LUCENE-2548:
-------------------------------------

bq. I think it's a nice API, and for most cases, a term will still be a Term and not a BytesRef + Field

Even if a term is a Term, a Term now is always a BytesRef + field behind the scenes anyway.

bq. Isn't it a convenient class?

Basically, this is why i didnt go this route of removing it (instead modifying Term class to work with bytesref).
The problem I saw was: if we have to modify tons of code to get rid of it, so would users too on upgrading.


> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889611#action_12889611 ] 

Uwe Schindler commented on LUCENE-2548:
---------------------------------------

I think the discussion about Term removal is not really related to this issue. Removing Term would only have the big advantage that we don't suddenly change Term to no longer intern() the field name and so maybe code outside Lucene using Terms and relying on the fact that the term field name is interned, may break. Removal of intern() must then be clearly noted in migration.

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889637#action_12889637 ] 

Shai Erera commented on LUCENE-2548:
------------------------------------

I agree. Term is frequently used (at least in our apps) and the wrapping around BytesRef is nice too. One can still call text() or the like and get the string rep. of it which in most cases is what you put there in the first place.

And I also agree about stopping interning field suddenly. What is the reason for stop doing that?

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889607#action_12889607 ] 

Robert Muir commented on LUCENE-2548:
-------------------------------------

bq. Maybe before this issue we should remove the Term class completely.

Sounds great... but there is a lot of code (eg in contrib, Solr) to fix if you want to do this.
I guess when i considered this option, i thought it was gonna be a ton of work.

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889609#action_12889609 ] 

Shai Erera commented on LUCENE-2548:
------------------------------------

Why remove Term? I think it's a nice API, and for most cases, a term will still be a Term and not a BytesRef + Field. Isn't it a convenient class? Is there an alternative one?

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2548) Remove all interning of field names from flex API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889642#action_12889642 ] 

Uwe Schindler commented on LUCENE-2548:
---------------------------------------

bq. And I also agree about stopping interning field suddenly. What is the reason for stop doing that?

I don't understand the question.

The reason for removing interning is to remove the cost of doing this without need in trunk. The interning was solely done for speeding up typical TermEnum iteration where each term's field need to be compared to detect a change. As fields are now no longer coupled to terms and Term*s*Enums (TermEnum was removed) only iterate over one field, this is useless and the cost for creating terms does no retify to keep it.

> Remove all interning of field names from flex API
> -------------------------------------------------
>
>                 Key: LUCENE-2548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2548
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) Robert?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org