You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/10 23:45:15 UTC

[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1009461640

Thinking about this I had a few random thoughts that I corralled into sections below.

## User experience

Thinking about one possible user experience is that we have a single API addition that allows scanners to specify their required consistency level. The default would be immediate consistency, which is the current behavior. This would make eventually consistent reads something someone would have to intentionally opt into with code changes. No existing code written against Accumulo would start doing eventually consistent reads all of a sudden.

Maybe something like the following in ScannerBase, so that the Scanner and BatchScanner will inherit the method.

```java
class ScannerBase {
enum ConsistencyLevel {
IMMEDIATE,
EVENTUAL
}

/**
* Allows setting a scanners required consistency level. If not set, the default it IMMEDIATE, which means the scanner will always see data written before the scans starts. When set to EVENTUAL the system may be able to serve scans faster while not guaranteeing data written prior to the scan start is seen.
*/
void setConsistency(ConsistencyLevel cl);

}
```

Maybe there does not need to be anything in the API other than this. How eventually consistent reads are handled at run time could possibly all be directed completely by run time configuration. Thinking that a developer will opt into eventually consistent reads (using the API) and an administrator will tweak how those are executed (through config changes at run time) in order to optimize the scans for their runtime environment.

## Runtime implementation of eventually consistent reads

Thinking we could possibly have the following high-level concepts for the runtime implementation.

* **Scan server group** : Zero or more scan servers that serve scans for a named group. The group could run as an autoscaled pod in kubernetes.
* **Scan server** : When a scan server is started, its group name can be provided as an argument similar to how compactors are started with a queue name. This scan server will serve eventually consistent reads for tablets.
* **Table to scan group mapping** : configuration that maps Accumulo tables to a scan server group. This allows tables to possibly have dedicated scan servers running in a kubernets pod.

One thing I am pondering is how should client chose a scan server. The simplest implementation would be for eventually consistent scans to do the following in the accumulo client.

* Determine what scan group the table is in
* Get the list of scan servers in that group
* Randomly chose one of them

I think this would work fairly well. Scan servers could respond w/ too busy response if they are overloaded and the client could randomly select another. It could also remember a server that reported as busy and not contact it for a while (maybe w/ exponential backoff).

One thing I would like, but I am not sure how to achieve that the moment is stickiness for tablets in order to achieve better cache utilization. For example, if I have 1000 scan servers maybe it would be nice if each tablet was only served by 10 of those scan servers in order to increase the chance of getting cache hits on the scan server for a given tablet. Could probably achieve this with hashing and modulus to map a tablet to a subset of server deterministically across clients. However, this simplistic limiting tablets to a fixed number of scan servers would not allow scaling in the case where a single tablet all of a sudden has an overwhelming number of scans. Maybe if we scan servers used a cluster wide caching mechanism, then stickiness would not be a concern. However local cache in RAM is hard to beat if you can use it.

## Prerequisites.

Thinking that inorder to implement this it may be best to first pay down some tech debt in the tserver and work on organizing the existing tablet scan code so that its more modular and self-contained (separated from other tablet code for writes and bookeeping). This could lead to making it easier to reuse code in the scan server.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org