You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Hodges (JIRA)" <ji...@apache.org> on 2010/01/23 04:53:21 UTC

[jira] Created: (CASSANDRA-734) Table.open has a broken lock in it

Table.open has a broken lock in it
----------------------------------

                 Key: CASSANDRA-734
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jeff Hodges
            Priority: Minor


Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804340#action_12804340 ] 

Jonathan Ellis commented on CASSANDRA-734:
------------------------------------------

> How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null?

It doesn't: it solves the problem of doing get() on a thread-unsafe object while remaining high performance.  I'm saying, we can use Table.open in close to its current form by replacing the current HashMap w/ a NBHM.

> Not if we do the initTable work, and then later turn it in a NBHM.

True enough, but is that then really simpler than just fixing Table.open?

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804294#action_12804294 ] 

Jeff Hodges edited comment on CASSANDRA-734 at 1/24/10 8:19 PM:
----------------------------------------------------------------

bq. NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper). 

How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null? I haven't seen any docs on get with conditional set and conditional instantiation.


bq. wouldn't we just have to undo that for CASSANDRA-44?

Not if we do the initTable work, and then later turn it in a NBHM. With initTable in place, and we go to update a Table, we would only have to do a put without the conditional instantiation.

      was (Author: jmhodges):
    .bq NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper). 

How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null? I haven't seen any docs on get with conditional set and conditional instantiation.


.bq wouldn't we just have to undo that for CASSANDRA-44?

Not if we do the initTable work, and then later turn it in a NBHM. With initTable in place, and we go to update a Table, we would only have to do a put without the conditional instantiation.
  
> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-734:
-------------------------------------

          Component/s: Core
        Fix Version/s:     (was: 0.6)
                       0.5
             Assignee: Jonathan Ellis  (was: Jeff Hodges)
    Affects Version/s:     (was: 0.5)

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jeff Hodges
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 734-nbhm.txt, broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-734:
-------------------------------------

    Attachment: 734-nbhm.txt

that's why you have to do the second check once you synchronize.  it's a double-checked locking variant, using NBHM to provide thread safety on the initial get() [like you would with volatile, in standard non-broken DCL]

patch attached since i'm clearly not explaining this very well :)

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 734-nbhm.txt, broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hodges updated CASSANDRA-734:
----------------------------------

    Attachment: broken_lock_in_table_open.patch

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeff Hodges
>            Priority: Minor
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804225#action_12804225 ] 

Jeff Hodges edited comment on CASSANDRA-734 at 1/24/10 10:59 AM:
-----------------------------------------------------------------

To start with, synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower. http://www.ibm.com/developerworks/java/library/j-jtp04223.html

Second, any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower. There seems to be nothing that compiles down to just "lock around this get and, if that's null, create this other thing and then put it in there" in Java bytecode. The only other way to make this work is load all the tables at boot time. Which, of course, is a non-starter. However, synchronized does say "all this work has to be done together" which fixes our bug and has years of JVM hackers behind it making it as fast as possible. 

Third, we should be aiming as much for correctness as we can. A Cassandra node is eventually consistent, but its codebase is not. Fixing a bug that will, eventually, kick Twitter and Digg and Rackspace's ass now is better than holding off until a "faster" way can be found in some possible future where unicorns live and candy mountains are not just scary things in creepy guys basements.

synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways.

After this patch goes in, we should be re-evaluate all of these calls to Table.open(), though. I'm going to bet that in most cases it would make more sense for the client object to hold on to a reference to the Table if they need it and not let go every time like they do currently after the Table.open() call goes out of scope.

Edited for a friggin' "it's" grammar problem.

      was (Author: jmhodges):
    To start with, synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower. http://www.ibm.com/developerworks/java/library/j-jtp04223.html

Second, any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower. There seems to be nothing that compiles down to just "lock around this get and, if that's null, create this other thing and then put it in there" in Java bytecode. The only other way to make this work is load all the tables at boot time. Which, of course, is a non-starter. However, synchronized does say "all this work has to be done together" which fixes our bug and has years of JVM hackers behind it making it as fast as possible. 

Third, we should be aiming as much for correctness as we can. A Cassandra node is eventually consistent, but it's codebase is not. Fixing a bug that will, eventually, kick Twitter and Digg and Rackspace's ass now is better than holding off until a "faster" way can be found in some possible future where unicorns live and candy mountains are not just scary things in creepy guys basements.

synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways.

After this patch goes in, we should be re-evaluate all of these calls to Table.open(), though. I'm going to bet that in most cases it would make more sense for the client object to hold on to a reference to the Table if they need it and not let go every time like they do currently after the Table.open() call goes out of scope. 
  
> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804351#action_12804351 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

bq. It doesn't: it solves the problem of doing get() on a thread-unsafe object while remaining high performance. I'm saying, we can use Table.open in close to its current form by replacing the current HashMap w/ a NBHM, and continuing to use a synchronized block for if the get() is null. 

You've forgotten about instantiating the Table twice. One thread notices that the get is null and in another thread the same happens before the first thread manages to do a put.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804376#action_12804376 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

Works for me. Must have missed the last of your sentence.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 734-nbhm.txt, broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804294#action_12804294 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

.bq NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper). 

How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null? I haven't seen any docs on get with conditional set and conditional instantiation.


.bq wouldn't we just have to undo that for CASSANDRA-44?

Not if we do the initTable work, and then later turn it in a NBHM. With initTable in place, and we go to update a Table, we would only have to do a put without the conditional instantiation.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804340#action_12804340 ] 

Jonathan Ellis edited comment on CASSANDRA-734 at 1/25/10 1:16 AM:
-------------------------------------------------------------------

> How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null?

It doesn't: it solves the problem of doing get() on a thread-unsafe object while remaining high performance.  I'm saying, we can use Table.open in close to its current form by replacing the current HashMap w/ a NBHM, and continuing to use a synchronized block for if the get() is null.

> Not if we do the initTable work, and then later turn it in a NBHM.

True enough, but is that then really simpler than just fixing Table.open?

      was (Author: jbellis):
    > How does a NBHM solve the problem get-and-then-put-but-only-instantiate-the-object-at-all-if-get-is-null?

It doesn't: it solves the problem of doing get() on a thread-unsafe object while remaining high performance.  I'm saying, we can use Table.open in close to its current form by replacing the current HashMap w/ a NBHM.

> Not if we do the initTable work, and then later turn it in a NBHM.

True enough, but is that then really simpler than just fixing Table.open?
  
> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804010#action_12804010 ] 

Jonathan Ellis commented on CASSANDRA-734:
------------------------------------------

I don't think introducing full synchronized lock into a method called for basically every operation is a good idea.  We could use nonblockinghashmap for the container instead, which would take care of the get problem.  (we could just use NBHM with putIfAbsent if Table creation weren't something we want to avoid doing twice for the same keyspace in case of a race.)  what do Real Java Programmers do for "singleton cache?"

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804238#action_12804238 ] 

Gary Dusbabek commented on CASSANDRA-734:
-----------------------------------------

> The only other way to make this work is load all the tables at boot time.
See CassandraDaemon.setup().

Why not this:
1.  Create a synchronized initTables() method that is called from CD, that sets an initDone flag and blows up it is ever called again.
2.  Take the locking  and synchronicity out of Table.open() (there really is no point as long as the backing collection is unmodifiable), and turn it into a purely 'getter'-type method.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804296#action_12804296 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

bq. Not if we do the initTable work, and then later turn it in a NBHM. With initTable in place, and we go to update a Table, we would only have to do a put without the conditional instantiation.

(And it seems to me that if it's possible to construct a Table in more than one thread in our solution to CASSANDRA-44, we're very likely solving CASSANDRA-44 wrong.)

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804281#action_12804281 ] 

Jonathan Ellis edited comment on CASSANDRA-734 at 1/24/10 7:12 PM:
-------------------------------------------------------------------

> synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower

That's true for _uncontested_ syncs but that is not what we have here.  The JVM isn't going to be able to optimize those away, and it's going to be several orders of magnitude slower than w/o the sync.

> any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower

NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper).

> we should be aiming as much for correctness as we can

I never said otherwise.

> synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways

wrong, see above.

      was (Author: jbellis):
    > synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower

That's true for _uncontested_ syncs but that is not what we have here.  The JVM isn't going to be able to optimize those away, and it's going to be several orders of magnitude slower than w/o the sync.

> any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower

NBHM uses lower-level CAS which is much cheaper.

> we should be aiming as much for correctness as we can

I never said otherwise.

> synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways

wrong, see above.
  
> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804281#action_12804281 ] 

Jonathan Ellis commented on CASSANDRA-734:
------------------------------------------

> synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower

That's true for _uncontested_ syncs but that is not what we have here.  The JVM isn't going to be able to optimize those away, and it's going to be several orders of magnitude slower than w/o the sync.

> any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower

NBHM uses lower-level CAS which is much cheaper.

> we should be aiming as much for correctness as we can

I never said otherwise.

> synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways

wrong, see above.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804225#action_12804225 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

To start with, synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower. http://www.ibm.com/developerworks/java/library/j-jtp04223.html

Second, any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower. There seems to be nothing that compiles down to just "lock around this get and, if that's null, create this other thing and then put it in there" in Java bytecode. The only other way to make this work is load all the tables at boot time. Which, of course, is a non-starter. However, synchronized does say "all this work has to be done together" which fixes our bug and has years of JVM hackers behind it making it as fast as possible. 

Third, we should be aiming as much for correctness as we can. A Cassandra node is eventually consistent, but it's codebase is not. Fixing a bug that will, eventually, kick Twitter and Digg and Rackspace's ass now is better than holding off until a "faster" way can be found in some possible future where unicorns live and candy mountains are not just scary things in creepy guys basements.

synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways.

After this patch goes in, we should be re-evaluate all of these calls to Table.open(), though. I'm going to bet that in most cases it would make more sense for the client object to hold on to a reference to the Table if they need it and not let go every time like they do currently after the Table.open() call goes out of scope. 

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804279#action_12804279 ] 

Jeff Hodges commented on CASSANDRA-734:
---------------------------------------

Sweet. I hadn't seen that call to  Table.getAllTableNames() in CD.setup(). If the laziness constraint can be relaxed, that's fine by me! I'll write up a patch.

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804281#action_12804281 ] 

Jonathan Ellis edited comment on CASSANDRA-734 at 1/24/10 7:21 PM:
-------------------------------------------------------------------

> synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower

That's true for _uncontested_ syncs but that is not what we have here.  The JVM isn't going to be able to optimize those away, and it's going to be several orders of magnitude slower than w/o the sync.

> any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower

NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper).

> we should be aiming as much for correctness as we can

I never said otherwise.  But let's do it without causing unnecessary performance regressions.


      was (Author: jbellis):
    > synchronization being slow is mostly a scary story left around from the bad old days of Java 1.3 and lower

That's true for _uncontested_ syncs but that is not what we have here.  The JVM isn't going to be able to optimize those away, and it's going to be several orders of magnitude slower than w/o the sync.

> any thing we build instead of using synchronized will be nearly exactly duplicating synchronized's behavior except broken and slower

NBHM is lock-free (which actually means it uses lower-level CAS which is much cheaper).

> we should be aiming as much for correctness as we can

I never said otherwise.

> synchronized does a bang up job of fixing this bug now and doing so in a way that is more performant than other "correct" ways

wrong, see above.
  
> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-734) Table.open has a broken lock in it

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804282#action_12804282 ] 

Jonathan Ellis commented on CASSANDRA-734:
------------------------------------------

> Take the locking and synchronicity out of Table.open() 

wouldn't we just have to undo that for CASSANDRA-44?

> Table.open has a broken lock in it
> ----------------------------------
>
>                 Key: CASSANDRA-734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-734
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Jeff Hodges
>            Assignee: Jeff Hodges
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: broken_lock_in_table_open.patch
>
>
> Table.open's lock is used around the Map#put method call but not the #get. This makes it a source of spurious bugs. The attached patch synchronizes the entire Table.open method and removes the unused createLock static.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.