You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2008/08/30 15:13:44 UTC

[jira] Created: (SOLR-743) Update the bitset usage in Schema to enums

Update the bitset usage in Schema to enums
------------------------------------------

                 Key: SOLR-743
                 URL: https://issues.apache.org/jira/browse/SOLR-743
             Project: Solr
          Issue Type: Improvement
            Reporter: Mark Miller
            Priority: Minor
             Fix For: 1.4


For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627503#action_12627503 ] 

Mark Miller commented on SOLR-743:
----------------------------------

Okay, I'm surprised at the results, but I suppose they are what they  
are. The onus should really be on me to prove the patch rather than  
you to disprove it, but thanks for saving me the time :)

Still, I'm looking further at how I am doing it, it seems odd it would  
be that much slower.  Method calls and a long shouldn't cost so much.  
If it is really that much slower, I switch my claim that I buy joshs  
argument (which is not laid out as general advice). If it's that much  
slower, he is wrong.

- Mark


On Sep 1, 2008, at 11:04 AM, "Yonik Seeley (JIRA)" <ji...@apache.org>  



> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627253#action_12627253 ] 

Mark Miller commented on SOLR-743:
----------------------------------

enums are shorter, clearer, and safer. if under 64 options, they are represented by a long and set operations are done with bitwise arithmetic, so while its going to be a bit slower, if its measurable I'd be pretty surprised (though I guess I'll be happy to measure). Some programmers still cling to bitsets, but I buy Josh Blochs arguments that the bitset is better left behind in favor of the enum - its just better code. I'll go with whatever the community wants though - this patch just presents an option.

- Mark

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627684#action_12627684 ] 

Yonik Seeley commented on SOLR-743:
-----------------------------------

bq. Any chance on sharing the test code used for that ? 

I checked it in for future reference.... it's commented out in BasicFunctionalityTest.

{code}
  public void testFieldPerf() {
    IndexSchema schema = h.getCore().getSchema();
    SchemaField[] fields = schema.getDynamicFieldPrototypes();
    boolean createNew = false;

    long start = System.currentTimeMillis();
    int ret = 0;
    for (int i=0; i<10000000; i++) {
      for (SchemaField f : fields) {
        if (createNew) f = new SchemaField(f, "fakename");
        if (f.indexed()) ret += 1;
        if (f.isCompressed()) ret += 2;
        if (f.isRequired()) ret += 3;
        if (f.multiValued()) ret += 4;
        if (f.omitNorms()) ret += 5;
        if (f.sortMissingFirst()) ret += 6;
        if (f.sortMissingLast())ret += 7;
        if (f.stored()) ret += 8;
        if (f.storeTermOffsets()) ret += 9;
        if (f.storeTermPositions()) ret += 10;
        if (f.storeTermVector()) ret += 11;
      }
    }
    long end = System.currentTimeMillis();
    System.out.println("ret=" + ret + " time="+ (end-start));
{code}

A native bitwise-and + check for non-zero is a single cycle instruction (TEST in x86).  Consider everything else that needs to be done for EnumSet.contains()
- possible null check on the EnumSet instance, method call, then null check on the param
- getClass() and compare to check to see that the enum being checked for is of the right class (COLOR and not SUIT, etc)
- cast to Enum type, get the ordinal value, do a shift by that value, then do the bitwise-and.



> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627468#action_12627468 ] 

Yonik Seeley commented on SOLR-743:
-----------------------------------

Neither patch will apply cleanly for some reason.... anyone else?

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated SOLR-743:
-----------------------------

    Attachment: SOLR-743.patch

Had a bit of debug in one of those classes. Also went through to make sure everything was still being done with bit twiddling. I can't imagine any speed loss with this  wouldn't be worth the clearer more maintainable code (though i should still try and work around calling ordinal() in there).

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved SOLR-743.
------------------------------

    Resolution: Invalid

I still have interest in going over this myself, but as far as solr is concerned, I suppose I'll trust the words of the Father.

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627551#action_12627551 ] 

Mark Miller commented on SOLR-743:
----------------------------------

Any chance on sharing the test code used for that ?

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627476#action_12627476 ] 

Yonik Seeley commented on SOLR-743:
-----------------------------------

bq. they are represented by a long and set operations are done with bitwise arithmetic

Internally, it's done with bitwise arithmetic...  but there is a lot of surrounding/supporting code that will take much longer to execute.

bq. I buy Josh Blochs arguments that the bitset is better left behind in favor of the enum

As a general rule, perhaps, but advice like this has to be taken in the context of the broad java dev world.  The majority of Java developers out there are building specific biz apps that one can either say meet or don't meet performance targets.  If those targets aren't met, then they can profile and find the problem spot.  Library and framework developers are a bit different IMO and the complexity / performance tradeoff shifts.  I bet you won't see any of the internal native bitvectors in the Java class libraries switching to EnumSet.


> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627220#action_12627220 ] 

Yonik Seeley commented on SOLR-743:
-----------------------------------

I worked hard to keep some of this code fast... I'd hate to see it slowed down in the name of type-safety (esp in code that Solr users don't have to interface with).

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated SOLR-743:
-----------------------------

    Attachment: SOLR-743.patch

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627552#action_12627552 ] 

Mark Miller commented on SOLR-743:
----------------------------------

bq. As a general rule, perhaps, but advice like this has to be taken in the context of the broad java dev world. The majority of Java developers out there are building specific biz apps that one can either say meet or don't meet performance targets. If those targets aren't met, then they can profile and find the problem spot. Library and framework developers are a bit different IMO and the complexity / performance tradeoff shifts. I bet you won't see any of the internal native bitvectors in the Java class libraries switching to EnumSet.

His advice, from my read, is that the performance penalty is small enough so that you can clearly say, there is no reason to use the bitset. Thats how I read it anyway. The performance measurements you gave refute that - if those numbers are accurate, at most, you can say, in general use enum, for performance spots you still should be looking at bitsets. Bloch says, drop the bitset, the performance loss is easily outweighed unqualified. With those losses, I easily agree with you that the performance loss is not outweighed (not even close?) - in addition, we are not doing something special - we are doing a classic case, with classic code. Thats why I still suspect maybe i have missed somewhere that is not doing bits, though I was pretty thorough in my check...

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629160#action_12629160 ] 

Mark Miller commented on SOLR-743:
----------------------------------

bq. Although it is slower (which doesn't surprise me), does it really matter?

Right...of course its going to be slower, but my guess was that it was slower in a way that compiler/hotspot optimizations, nature of the code, etc, would make the difference not really matter (its what effective java seemed to indicate, and since he wrote the code..). 

But the results yonik posted...if thats the slowdown, I don't see moving as an option...and if we were using enums I'd vote for a switch to bitset. Seems like quite a difference to me (though I still havn't had time to educate myself with the test code used).



> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627477#action_12627477 ] 

Yonik Seeley commented on SOLR-743:
-----------------------------------

OK, some performance results:
Setup: P4, WinXP, Java6 -server

Test: loop over all dynamic fields in the test schema, creating a new SchemaField from a prototype (as is done for each dynamic field accessed) and then test all it's properties (done for each field indexed).

solr trunk: 10391ms
this patch: 110891ms

Test: loop over all dynamic fields in the test schema and test all it's properties.

solr trunk: 6656ms
this patch: 17531ms


> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-743) Update the bitset usage in Schema to enums

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629147#action_12629147 ] 

David Smiley commented on SOLR-743:
-----------------------------------

Although it is slower (which doesn't surprise me), does it really matter?
I think it's a shame to still use the ancient techniques of an int bitset.  I've been cringing as I look over this code lately.  Since the code is already written and it is admittedly faster... there isn't a strong motivation to switch it.  But if it hadn't been written yet, I would be hard put to use an int bitset unless I knew an EnumSet would be a bottleneck (and I wouldn't).

> Update the bitset usage in Schema to enums
> ------------------------------------------
>
>                 Key: SOLR-743
>                 URL: https://issues.apache.org/jira/browse/SOLR-743
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-743.patch, SOLR-743.patch
>
>
> For all of the reasons given by smarter people than I (specifically, Effective Java), and because its just easier to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.