You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2018/03/28 21:20:27 UTC
[GitHub] keith-turner opened a new pull request #410: Fixed inefficient auths check
keith-turner opened a new pull request #410: Fixed inefficient auths check
URL: https://github.com/apache/accumulo/pull/410
[On the mailing list](https://lists.apache.org/thread.html/31ff119654efe1d2c7c95c544a1634f6cb3c0721108f169986de02dc@%3Cuser.accumulo.apache.org%3E) a performance problem with authorizations was identified. For the case when a user has a large number of authorizations scan scan be really slow. For example is a user has 100,000 auths and does a scan with 90,000 auths, its very slow. This caused by a subset check on the server side that uses lists. This PR changes the check to use an existing hashset.
The following is a performance test that was written to explore this problem. This code creates a user with 100,000 auths and then times scans with 0, 10, 100, 1000, 10000, and 100000 auths.
```java
public void testManyAuths(Connector conn) throws Exception {
conn.securityOperations().createLocalUser("bob", new PasswordToken("bob"));
List<byte[]> al = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
al.add(String.format("%06x", i).getBytes(StandardCharsets.UTF_8));
}
Authorizations auths = genAuths(100_000, 1);
conn.securityOperations().changeUserAuthorizations("bob", auths);
conn.tableOperations().create("bobsSpecialTable");
conn.securityOperations().grantTablePermission("bob", "bobsSpecialTable", TablePermission.READ);
conn = conn.getInstance().getConnector("bob", "bob");
runTest(conn, Authorizations.EMPTY);
runTest(conn, genAuths(100_000, 10_000));
runTest(conn, genAuths(100_000, 1_000));
runTest(conn, genAuths(100_000, 100));
runTest(conn, genAuths(100_000, 10));
runTest(conn, auths);
}
void runTest(Connector conn, Authorizations auths) throws Exception {
try (Scanner scanner = conn.createScanner("bobsSpecialTable", auths)) {
long start = System.currentTimeMillis();
// do a few warm up scans
for (int i = 0; i < 50 && System.currentTimeMillis() - start < 60000; i++) {
int count = 0;
for (Entry<Key,Value> entry : scanner) {
count++;
}
}
start = System.currentTimeMillis();
SummaryStatistics stats = new SummaryStatistics();
for (int i = 0; i < 100 && System.currentTimeMillis() - start < 120000; i++) {
long t1 = System.currentTimeMillis();
for (Entry<Key,Value> entry : scanner) {
}
long t2 = System.currentTimeMillis();
stats.addValue(t2 - t1);
}
System.out.printf("auths.size:%,7d mean:%.2f stddev:%.2f min:%.2f max:%.2f samples:%d\n",
auths.size(), stats.getGeometricMean(), stats.getStandardDeviation(), stats.getMin(),
stats.getMax(), stats.getN());
}
}
Authorizations genAuths(int max, int step) {
List<byte[]> al = new ArrayList<>();
for (int i = 0; i < max; i += step) {
al.add(String.format("%06x", i).getBytes(StandardCharsets.UTF_8));
}
return new Authorizations(al);
}
```
Before this PR, this test output :
```
auths.size: 0 mean:134.39 stddev:29.88 min:98.00 max:289.00 samples:100
auths.size: 10 mean:147.82 stddev:33.17 min:104.00 max:242.00 samples:100
auths.size: 100 mean:219.07 stddev:44.38 min:162.00 max:351.00 samples:100
auths.size: 1,000 mean:475.67 stddev:44.17 min:419.00 max:620.00 samples:100
auths.size: 10,000 mean:3516.51 stddev:105.13 min:3319.00 max:3785.00 samples:35
auths.size:100,000 mean:34615.27 stddev:400.12 min:34167.00 max:35109.00 samples:4
```
This shows that a scan with 0 auths took 134 milliseconds on average. A scan with 100,000 auths took 34 seconds on average. After this PR the test outputs :
```
auths.size: 0 mean:1.52 stddev:0.73 min:1.00 max:5.00 samples:100
auths.size: 10 mean:137.23 stddev:26.14 min:102.00 max:228.00 samples:100
auths.size: 100 mean:136.70 stddev:28.12 min:104.00 max:247.00 samples:100
auths.size: 1,000 mean:137.59 stddev:26.18 min:101.00 max:243.00 samples:100
auths.size: 10,000 mean:154.95 stddev:24.81 min:115.00 max:232.00 samples:100
auths.size:100,000 mean:347.62 stddev:56.12 min:250.00 max:549.00 samples:100
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services