You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/05/12 23:53:04 UTC

[GitHub] [accumulo] keith-turner opened a new issue, #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

keith-turner opened a new issue, #2700:
URL: https://github.com/apache/accumulo/issues/2700

**Describe the bug**

While testing the scan server feature #2665 I was running a query client I wrote that was doing 512 concurrent small random lookups. These lookups were going to scan servers which were not behaving as expected. I ran java flight recorder and saw contention on 3 locks. Two locks were scan servers specific and one was related to the sha512 computation. I also saw that outside of lock contention, the sha512 computation was the top method for profiling. Even though I saw these issues in the scan server, I suspect the tserver would have the same problem because it uses the same code.

**To Reproduce**
Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
Would be good to avoid the lock contention and research the performance of the sha512 computation relative to small scan.

**Screenshots**

The following is the JFR data that shows lock contention

![Screenshot 2022-05-12 192749](https://user-images.githubusercontent.com/1268739/168185291-8fa7e50b-69b3-4619-a756-90b2b0762352.png)

The following is the JFR data that shows hot methods.

![Screenshot 2022-05-12 193928](https://user-images.githubusercontent.com/1268739/168185353-963b506f-03f1-471e-b280-e57a5e586203.png)

The JFR data came from profiling the scan server for 2 minutes while my test code was running doing 512 concurrent small lookups.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] keith-turner closed issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

keith-turner closed issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication
URL: https://github.com/apache/accumulo/issues/2700


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1127632917

   I'm wondering if there is any danger in caching the `password` and `zkData` input parameters passed to `ZKSecurityTool.checkCryptPass` when the return value of the method is true. You could even cache it within `ZKSecurityTool` so that it's encapsulated. I think I'll put up a PR for this idea and we can discuss it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1131600304

   Looking at the stack trace in the first image above the issue is that the Apache Commons Codec [Sha2Crypt.sha2Crypt](https://github.com/apache/commons-codec/blob/7decf21b6a137b1b9028d3903004fe53dc5ec849/src/main/java/org/apache/commons/codec/digest/Sha2Crypt.java#L167) method does not cache and `reset` the MesageDigest objects that it uses. The `sha2Crypt` method creates many new MessageDigest objects per method call and at the top of the stack trace in the image, the [ProviderConfig.getProvider](https://github.com/openjdk/jdk/blob/da75f3c4ad5bdf25167a3ed80e51f567ab3dbd01/src/java.base/share/classes/sun/security/jca/ProviderConfig.java#L164) is synchronized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1134855111

   Looking at [NIST SP800-63B](https://pages.nist.gov/800-63-3/sp800-63b.html#sec5) Section 5.1.1.2, it says
   
   > Verifiers SHALL store memorized secrets in a form that is resistant to offline attacks. Memorized secrets SHALL be salted and hashed using a suitable one-way key derivation function. Key derivation functions take a password, a salt, and a cost factor as inputs then generate a password hash. Their purpose is to make each password guessing trial by an attacker who has obtained a password hash file expensive and therefore the cost of a guessing attack high or prohibitive. Examples of suitable key derivation functions include Password-based Key Derivation Function 2 (PBKDF2) [[SP 800-132]](https://pages.nist.gov/800-63-3/sp800-63b.html#SP800-132) and Balloon [[BALLOON]](https://pages.nist.gov/800-63-3/sp800-63b.html#balloon). A memory-hard function SHOULD be used because it increases the cost of an attack. The key derivation function SHALL use an approved one-way function such as Keyed Hash Message Authentication Code (HMAC) [[FIPS 198-1]](https://pages.nist.gov/800-63-3/sp800-63b.h
 tml#FIPS198-1), any approved hash function in [SP 800-107](https://pages.nist.gov/800-63-3/sp800-63b.html#SP800-107), Secure Hash Algorithm 3 (SHA-3) [[FIPS 202]](https://pages.nist.gov/800-63-3/sp800-63b.html#FIPS202), CMAC [[SP 800-38B]](https://pages.nist.gov/800-63-3/sp800-63b.html#SP800-38B) or Keccak Message Authentication Code (KMAC), Customizable SHAKE (cSHAKE), or ParallelHash [[SP 800-185]](https://pages.nist.gov/800-63-3/sp800-63b.html#SP800-185). The chosen output length of the key derivation function SHOULD be the same as the length of the underlying one-way function output.
   
   This section doesn't deal strictly with passwords, but I wrote a [test](https://github.com/dlmarion/accumulo/blob/password_test/server/base/src/test/java/org/apache/accumulo/server/security/handler/PasswordHashTest.java) to compare the performance difference between the Commons-Codec Crypt approach that is being used currently vs `PBKDF2WithHmacSHA512` as suggested in NIST SP800-63B. It looks to be roughly 3x faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1132120248

   > Is there an upstream issue or patch available for commons-codec?
   
   Not that I could find. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1136438504

> The default is 5000, not 1000.

You're right, I conflated that with ROUNDS_MIN.

> But the question remains: do we want fast and efficient checks, or do we want slow checks/large iterations?

I get the rationale behind the slow checks / large iterations. I don't know that the lock contention is intentional as a throttle mechanism. I have no issue with that happening the first time to make sure that the password passed in is in fact the correct one. I think we need to do something here based on @keith-turner 's comment in #2707 (regarding the difference between caching and not caching successful password checks under load):

```
With these changes and 500 concurrent queries the average time is 400ms and the max times are around 1000ms (ignoring the first few lines because JIT probably has not kicked in scan servers). Without these changes the average times are around 1600ms to 1700ms and the max times 15,000ms to 20,000ms
```

I can't imagine that this is the desired behavior. Was it this bad before the change to commons-codec?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1135675967

> This section doesn't deal strictly with passwords, but I wrote a [test](https://github.com/dlmarion/accumulo/blob/password_test/server/base/src/test/java/org/apache/accumulo/server/security/handler/PasswordHashTest.java) to compare the performance difference between the Commons-Codec Crypt approach that is being used currently vs `PBKDF2WithHmacSHA512` as suggested in NIST SP800-63B. It looks to be roughly 3x faster.

Commons-codec Crypt produces a Linux-standard hash suitable for placing in /etc/shadow for a user password (see `man 5 crypt`). I don't want to do anything custom. If the faster approach can be incorporated upstream into commons-codec, I'd strongly prefer that route over any other alternative. Modern Linux defaults use `yescrypt`, `$y$` hashes, whereas we're using `sha512crypt`, `$6$`. I'd be fine with defaulting to a more efficient algorithm, if it is supported by commons-codec and is widely considered at least as secure as `sha512crypt`.

Something else to keep in mind: faster hashing makes brute force attacks more efficient. The fact that `sha512crypt` is slower isn't necessarily a bad thing. It would be nice if it didn't use as much CPU, though.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1136501004

   > I can't imagine that this is the desired behavior. Was it this bad before the change to commons-codec in #1798?
   
   I don't know. We didn't use commons-codec before, but we did use a SHA-2 algorithm, SHA-256. I think those changes are worth any side-effects, though, especially if it's being mitigated by #2707 or another means.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1135821742

   > I don't want to do anything custom.
   
   To be clear I wasn't doing anything custom, `PBKDF2WithHmacSHA512` is a JCA supported algorithm. Looking at [CODEC-133](https://issues.apache.org/jira/browse/CODEC-133), the commons-codec classes implement the Linux crypt algorithm, but I'm not quite sure of it's lineage. There are several things mentioned in that ticket that seem to be merged together.
   
   > Something else to keep in mind: faster hashing makes brute force attacks more efficient. The fact that sha512crypt is slower isn't necessarily a bad thing. It would be nice if it didn't use as much CPU, though.
   
   Agreed, my test performed 1000 iterations to match what the commons codec code is doing by default. I don't think we are providing commons-codec with any number of iterations. NIST SP800-63B suggest doing 10,000 iterations, it says:
   
   > For PBKDF2, the cost factor is an iteration count: the more times the PBKDF2 function is iterated, the longer it takes to compute the password hash. Therefore, the iteration count SHOULD be as large as verification server performance will allow, typically at least 10,000 iterations.
   
   So, my test is faster doing an apples-to-apples comparison, but would be slower when performing 10,000 iterations. We would still need the caching, IMO, since we are currently checking that the passwords match on each API call.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] keith-turner commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

keith-turner commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1142667416

   #2707 resolves this problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1132119643

   > Looking at the stack trace in the first image above the issue is that the Apache Commons Codec [Sha2Crypt.sha2Crypt](https://github.com/apache/commons-codec/blob/7decf21b6a137b1b9028d3903004fe53dc5ec849/src/main/java/org/apache/commons/codec/digest/Sha2Crypt.java#L167) method does not cache and `reset` the MesageDigest objects that it uses. The `sha2Crypt` method creates many new MessageDigest objects per method call and at the top of the stack trace in the image, the [ProviderConfig.getProvider](https://github.com/openjdk/jdk/blob/da75f3c4ad5bdf25167a3ed80e51f567ab3dbd01/src/java.base/share/classes/sun/security/jca/ProviderConfig.java#L164) is synchronized.
   
   Is there an upstream issue or patch available for commons-codec?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

dlmarion commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1125990995

   Which version of the JVM are you using? I found [this](https://bugs.openjdk.java.net/browse/JDK-8256523) which was recently backported to `11.0.12`. Also found [this](https://bugs.openjdk.java.net/browse/JDK-8259498) [this](https://github.com/openjdk/jdk/pull/1933). I would be curious to see if the performance is better with Java 17 if you are using Java 11. https://github.com/apache/accumulo/pull/2374 also popped up in my searching...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on issue #2700: Seeing lots of lock contention and CPU around sha512 computation during authentication

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on issue #2700:
URL: https://github.com/apache/accumulo/issues/2700#issuecomment-1136386342

   > > I don't want to do anything custom.
   > 
   > To be clear I wasn't doing anything custom, `PBKDF2WithHmacSHA512` is a JCA supported algorithm.
   
   Right, so, there's the algorithm, but then there's the serialization format. Crypt emits them in a well known serialization format. We'd have to do something custom for the serialization, even if we're using a built-in algorithm. I'd prefer to stick to the crypt standard for serialization, and limit ourselves to the algorithms it supports, rather than use a different algorithm and customize the serialization.
   
   > Looking at [CODEC-133](https://issues.apache.org/jira/browse/CODEC-133), the commons-codec classes implement the Linux crypt algorithm, but I'm not quite sure of it's lineage. There are several things mentioned in that ticket that seem to be merged together.
   
   Linux's crypt evolved out of the need to use something stronger than the DES algorithm, but needed to be serialized in a backwards-compatible way, and human-readable since it was going in a text file. They came up with the well-known prefix `$<algorithm_id>$`. The format of everything after that prefix depends on the algorithm, but ultimately, the hash is encoded using printable ASCII characters. The SHA-2 algorithms support a rounds from 1000 to 999,999,999 (also known as a "CPU time cost parameter" in the man page) parameter (defaults to 5000) and a 6-96 bit salt that is base-64 encoded.
   
   If there are several things mentioned in the ticket, it's probably because there are multiple algorithms that are supported.
   
   > > Something else to keep in mind: faster hashing makes brute force attacks more efficient. The fact that sha512crypt is slower isn't necessarily a bad thing. It would be nice if it didn't use as much CPU, though.
   > 
   > Agreed, my test performed 1000 iterations to match what the commons codec code is doing by default. I don't think we are providing commons-codec with any number of iterations. NIST SP800-63B suggest doing 10,000 iterations, it says:
   
   Even though the javadoc for Crypt incorrectly says the salt is specified without the rounds= parameter, you *can* specify that parameter to override the default of 5000. However, the Crypt API doesn't make this easy for us... we would have to randomly generate our own salt, base-64 encode it and pass it in, rather than rely on the library generating a random salt for us. And, their utility code to generate the salt and efficiently base64 is not public, so we'd have to rewrite it or copy it.
   
   It would be nice to have an upstream change that makes it easier to specify the rounds (and also to support newer algorithms, like yescrypt). commons-codec is a bit out of date in this regard.
   
   > 
   > > For PBKDF2, the cost factor is an iteration count: the more times the PBKDF2 function is iterated, the longer it takes to compute the password hash. Therefore, the iteration count SHOULD be as large as verification server performance will allow, typically at least 10,000 iterations.
   > 
   > So, my test is faster doing an apples-to-apples comparison, but would be slower when performing 10,000 iterations. We would still need the caching, IMO, since we are currently checking that the passwords match on each API call.
   
   Yours is not faster. The default is 5000, not 1000. Your test code uses 1000. When I ran your test with 5000, Crypt was faster. When I pre-generate the salts for Crypt, like you did with the PBKDF2, then it's slightly faster still.
   
   But the question remains: do we want fast and efficient checks, or do we want slow checks/large iterations? The original issue was about performance... but if the performance is intentionally slower here, if lock contention is intentional to throttle as a mechanism to protect against brute force attacks, perhaps there's no bug here to fix? Can we have both? Throttled, but less CPU usage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org