You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2011/08/09 01:54:27 UTC

[jira] [Created] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Use of Random.nextLong() in HRegionServer.addScanner(...)
---------------------------------------------------------

                 Key: HBASE-4178
                 URL: https://issues.apache.org/jira/browse/HBASE-4178
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.3
            Reporter: Lars Hofhansl
            Priority: Minor


ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
A client scanner would get results from a different server scanner, and maybe only from some of the region servers.

A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.

Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.

AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081935#comment-13081935 ] 

stack commented on HBASE-4178:
------------------------------

We have enough open ones already.

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081934#comment-13081934 ] 

stack commented on HBASE-4178:
------------------------------

Close is fine Lars.

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081323#comment-13081323 ] 

Gary Helmling commented on HBASE-4178:
--------------------------------------

What happens if a region server restarts?  Would we be resetting and starting the AtomicLong over again with numbers that were previously handed out?  If so, it's possible this change would vastly reduce the assignable space effectively used and increase the probability of collisions.

I'm not sure if there's behavior in the client code that would effectively invalidate the existing scanner in the case of a server restart -- would have to check.  We could also ensure uniqueness (and side-step the counter resetting) by checking the server start code either on the client side or changing scanner id from long to byte[] and prepending the start code with a separator.

The current random assignment does nicely avoid this potential issue, though.  Yes there _is_ a possibility of collisions.  But is this really an issue that needs fixing?  Personally, I'm open to arguments either way.

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081448#comment-13081448 ] 

stack commented on HBASE-4178:
------------------------------

IIUC, if a regionserver restarted, then it'd start scannerids over at zero again.  Any scanners that had been running against the server when it died will have to go get new ids for the regions they had been scanning over in their new locations (Scanner ids are scoped to a region scan; client sets up new scanner id every time it crosses into new region)

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081338#comment-13081338 ] 

Andrew Purtell commented on HBASE-4178:
---------------------------------------

If we really want to be bullet proof, scanner IDs could be UUIDs with MAC and time components. Seems like this issue is about what would be a really rare event though. (I guess an experiment to confirm?)

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081306#comment-13081306 ] 

Lars Hofhansl commented on HBASE-4178:
--------------------------------------

It would > 29m years... 

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4178:
---------------------------------

    Assignee: Lars Hofhansl

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl resolved HBASE-4178.
----------------------------------

    Resolution: Won't Fix

> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4178) Use of Random.nextLong() in HRegionServer.addScanner(...)

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081908#comment-13081908 ] 

Lars Hofhansl commented on HBASE-4178:
--------------------------------------

So in summary an AtomicLong that resets (naturally) when the region server restarts should work, but it is not clear that this is actually a worthwhile problem to fix.

I am happy to do the trivial AtomicLong fix, or to just close issue... Leaning towards the latter.


> Use of Random.nextLong() in HRegionServer.addScanner(...)
> ---------------------------------------------------------
>
>                 Key: HBASE-4178
>                 URL: https://issues.apache.org/jira/browse/HBASE-4178
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> ScannerIds are currently assigned by getting a random long. While it would be a rare occurrence that two scanners received the same ids on the same region server the results would seem to be... Bad.
> A client scanner would get results from a different server scanner, and maybe only from some of the region servers.
> A safer approach would be using an AtomicLong. We do not have to worry about running of numbers: If we got 10000 scanners per second it'd take > 2.9m years to reach 2^63.
> Then again the same reasoning would imply that this collisions would be happening too rarely to be of concern (assuming a good random number generator). So maybe this is a none-issue.
> AtomicLong would also imply a minor performance hit on multi core machines, as it would force a memory barrier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira