You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/07 17:36:23 UTC

[GitHub] [accumulo] dlmarion opened a new issue #2411: External Isolated Scan Server

dlmarion opened a new issue #2411:
URL: https://github.com/apache/accumulo/issues/2411


   **Is your feature request related to a problem? Please describe.**
   Scan threads run within the Tablet Server have the ability to render the Tablet Server unusable, either because of GC pause causing the Tablet Server to lose its lock, or with the JVM crashing due to an OOME error. When this occurs, the tablets are re-hosted and the scan on a Tablet is restarted. If the underlying issue is due to the scan, then it may cause this to occur again.
   
   **Describe the solution you'd like**
   I'm envisioning a new server process that exists to run a single scan on a single Tablet. I'm imagining that this new process would have a Thrift API that is a subset of the Tablet Server's API, enough to support scans. Upon receiving a request to start a new scan, this new server process would create a Tablet object based on the information in the metadata table at that point in time.
   
   On the client side there would be a new type of locator, one that locates these new server processes that are currently not busy. A new type of Scanner (or new Scanner option) would be used to use this new type of locator to run the client scans using these new server processes. The client would communicate with one or more of these new server processes to complete the scan.
   
   **Describe alternatives you've considered**
   I believe that many alternatives have been attempted over the years, but this problem still exists.
   
   **Additional context**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1009461640


   Thinking about this I had a few random thoughts that I corralled into sections below.
   
   ## User experience
   
   Thinking about one possible user experience is that we have a single API addition that allows scanners to specify their required consistency level.  The default would be immediate consistency, which is the current behavior.  This would make eventually consistent reads something someone would have to intentionally opt into with code changes.  No existing code written against Accumulo would start doing eventually consistent reads all of a sudden.
   
   Maybe something like the following in ScannerBase, so that the Scanner and BatchScanner will inherit the method.
   
   ```java
   class ScannerBase {
       enum ConsistencyLevel {
           IMMEDIATE,
           EVENTUAL
       }
   
       /**
        *  Allows setting a scanners required consistency level. If not set, the default it IMMEDIATE, which means the scanner will always see data written before the scans starts.  When set to EVENTUAL the system may be able to serve scans faster while not guaranteeing data written prior to the scan start is seen. 
        */
       void setConsistency(ConsistencyLevel cl);
      
   }
   ```
   
   Maybe there does not need to be anything in the API other than this.  How eventually consistent reads are handled at run time could possibly all be directed completely by run time configuration.   Thinking that a developer will opt into eventually consistent reads (using the API) and an administrator will tweak how those are executed (through config changes at run time) in order to optimize the scans for their runtime environment.
   
   ## Runtime implementation of eventually consistent reads
   
   Thinking we could possibly have the following high-level concepts for the runtime implementation.
   
    * **Scan server group** : Zero or more scan servers that serve scans for a named group.  The group could run as an autoscaled pod in kubernetes.  
    * **Scan server** : When a scan server is started, its group name can be provided as an argument similar to how compactors are started with a queue name.  This scan server will serve eventually consistent reads for tablets.
    * **Table to scan group mapping** : configuration that maps Accumulo tables to a scan server group.  This allows tables to possibly have dedicated scan servers running in a kubernets pod.
   
   One thing I am pondering is how should client chose a scan server.  The simplest implementation would be for eventually consistent scans to do the following in the accumulo client.
   
     * Determine what scan group the table is in
     * Get the list of scan servers in that group
     * Randomly chose one of them
   
   I think this would work fairly well.  Scan servers could respond w/ too busy response if they are overloaded and the client could randomly select another.  It could also remember a server that reported as busy and not contact it for a while (maybe w/ exponential backoff).  
   
   One thing I would like, but I am not sure how to achieve that the moment is stickiness for tablets in order to achieve better cache utilization.  For example, if I have 1000 scan servers maybe it would be nice if each tablet was only served by 10 of those scan servers in order to increase the chance of getting cache hits on the scan server for a given tablet.  Could probably achieve this with hashing and modulus to map a tablet to a subset of server deterministically across clients.  However, this simplistic limiting tablets to a fixed number of scan servers would not allow scaling in the case where a single tablet all of a sudden has an overwhelming number of scans.   Maybe if we scan servers used a cluster wide caching mechanism, then stickiness would not be a concern.  However local cache in RAM is hard to beat if you can use it.
   
   ## Prerequisites. 
   
   Thinking that inorder to implement this it may be best to first pay down some tech debt in the tserver and work on organizing the existing tablet scan code so that its more modular and self-contained (separated from other tablet code for writes and bookeeping).  This could lead to making it easier to reuse code in the scan server.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007769803


   I pushed some changes to my fork to show others what I'm thinking about. The code in the branch builds, but the ScanServer would not work as is (it would exit immediately). But it's the start of a potential implementation.
   
   https://github.com/dlmarion/accumulo/commit/7df6929cffcc7aa6340c45c9be503347d5d3a226
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007845145


   > Is this basically creating servers that act as "clients" under today's design, but doing client-side iterators?
   
   I'm not quite sure what you mean by this, but I think the answer is no. I'm not sure if you looked at the referenced commit, but if you look at the [ScanServer](https://github.com/dlmarion/accumulo/commit/7df6929cffcc7aa6340c45c9be503347d5d3a226#diff-e3f63597c5c63a83f84e46e17cf0740d4178372810b2a0e5fb8df7a37ee07d15) class in that commit you will see that it uses the same server side APIs that exist today to perform scans. It would be driven by a Scanner or BatchScanner on the client side.
   
   > What's stopping users from implementing something like that on top of Accumulo today, without baking the complexity directly into Accumulo?
   
   Nothing, except that it would be brittle and they would have to come up with their own client and server-side implementations. I don't think that they would be able to easily extend TabletServer like I have done, and for sure that they would not be able to change the internals of the Accumulo client to use the new server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1011212339


   Thinking a bit more about making tablets sticky, was thinking an algorithm like the following may be sticky and resilient to high scan load on a few tablets.  The algorithm starts off always sending a tablet to a few scan servers and when those few get busy it will exponentially expand the number of servers used.
   
   * serverAddresses[] = an array of sorted scan server addresses that is usually the same on each client.  Its eventually consistent on the client.  For brief periods when it differs, clients will choose different scan servers for the same tablets.
   * NSS = num scan servers which would be serverAddresses.lenght()
   * ISS = initial scan servers to randomly select from when querying a tablet, could start off small like 2 or 4
   
   The algorithm is below and above are the initial variables
   
   1. serversToUse = ISS
   2. serverIndex = ( hash(tablet extent) % NSS + random.nextInt(serversToUse) ) % NSS
   3. serverAddress = serversAddresses[serverIndex]
   4. Send scan request to server address
   5. If get results then done
   6. If server reports as busy then set serversToUse=2*serversToUse and goto 2
   
   In step 6 of the algorithm, we may want to sleep also.  There are two cases to consider for this : many tablets are getting many query and a few tablets are getting many queries. Trying to think through if sleeping w/ exponential backoff in both cases makes sense.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1010172520


   FWIW, it looks like HBase has a similar concept - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hbase_read_replicas.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1009461640


   Thinking about this I had a few random thoughts that I corralled into sections below.
   
   ## User experience
   
   Thinking about one possible user experience is that we have a single API addition that allows scanners to specify their required consistency level.  The default would be immediate consistency, which is the current behavior.  This would make eventually consistent reads something someone would have to intentionally opt into with code changes.  No existing code written against Accumulo would start doing eventually consistent reads all of a sudden.
   
   Maybe something like the following in ScannerBase, so that the Scanner and BatchScanner will inherit the method.
   
   ```java
   class ScannerBase {
       enum ConsistencyLevel {
           IMMEDIATE,
           EVENTUAL
       }
   
       /**
        *  Allows setting a scanners required consistency level. If not set, the default it IMMEDIATE, which means the scanner will always see data written before the scans starts.  When set to EVENTUAL the system may be able to serve scans faster while not guaranteeing data written prior to the scan start is seen. 
        */
       void setConsistency(ConsistencyLevel cl);
      
   }
   ```
   
   Maybe there does not need to be anything in the API other than this.  How eventually consistent reads are handled at run time could possibly all be directed completely by run time configuration.   Thinking that a developer will opt into eventually consistent reads (using the API) and an administrator will tweak how those are executed (through config changes at run time) in order to optimize the scans for their runtime environment.
   
   ## Runtime implementation of eventually consistent reads
   
   Thinking we could possibly have the following high-level concepts for the runtime implementation.
   
    * *Scan server group* : Zero or more scan servers that serve scans for a named group.  The group could run as an autoscaled pod in kubernetes.  
    * *Scan server* : When a scan server is started, its group name can be provided as an argument similar to how compactors are started with a queue name.  This scan server will serve eventually consistent reads for tablets.
    * *Table to scan group mapping* : configuration that maps Accumulo tables to a scan server group.  This allows tables to possibly have dedicated scan servers running in a kubernets pod.
   
   One thing I am pondering is how should client chose a scan server.  The simplest implementation would be for eventually consistent scans to do the following in the accumulo client.
   
     * Determine what scan group the table is in
     * Get the list of scan servers in that group
     * Randomly chose one of them
   
   I think this would work fairly well.  Scan servers could respond w/ too busy response if they are overloaded and the client could randomly select another.  It could also remember a server that reported as busy and not contact it for a while (maybe w/ exponential backoff).  
   
   One thing I would like, but I am not sure how to achieve that the moment is stickiness for tablets in order to achieve better cache utilization.  For example, if I have 1000 scan servers maybe it would be nice if each tablet was only served by 10 of those scan servers in order to increase the chance of getting cache hits on the scan server for a given tablet.  Could probably achieve this with hashing and modulus to map a tablet to a subset of server deterministically across clients.  However, this simplistic limiting tablets to a fixed number of scan servers would not allow scaling in the case where a single tablet all of a sudden has an overwhelming number of scans.   Maybe if we scan servers used a cluster wide caching mechanism, then stickiness would not be a concern.  However local cache in RAM is hard to beat if you can use it.
   
   ## Prerequisites. 
   
   Thinking that inorder to implement this it may be best to first pay down some tech debt in the tserver and work on organizing the existing tablet scan code so that its more modular and self-contained (separated from other tablet code for writes and bookeeping).  This could lead to making it easier to reuse code in the scan server.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007769803






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1011212339


   Thinking a bit more about making tablets sticky, was thinking an algorithm like the following may be sticky and resilient to high load.  The algorithm starts of always sending a tablet to a few scan servers and when those few get busy it will exponentially expand the number of servers used.
   
   * serverAddresses[] = an array of sorted scan server addresses that is usually the same on each client.
   * NSS = num scan servers which would be serverAddresses.lenght()
   * ISS = initial scan servers to randomly select from when querying a tablet, could start off small like 2 or 4
   
   The algorithm is below and above are the initial variables
   
   1. serversToUse = ISS
   2. serverIndex = ( hash(tablet extent) % NSS + random.nextInt(serversToUse) ) % NSS
   3. serverAddress = serversAddresses[serverIndex]
   4. Send scan request to server address
   5. If get results then done
   6. If server reports as busy then set serversToUse=2*serversToUse and goto 2
   
   In step 6 of the algorithm, we may want to sleep also.  There are two cases to consider for this : many tablets are getting many query and a few tablets are getting many queries. Trying to think through if sleeping w/ exponential backoff in both cases makes sense.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ivakegg commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
ivakegg commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007723004


   This is essentially a "read only" tserver except that this process would not be tied to specific tablets.  I love the idea of separating the scans running iterators that are on the more expensive side into separate processes to avoid churn within the accumulo tservers.  The concept is relatively simple.  The trick is doing load balancing appropriately.
   
   On thought is to allow something like Kubernetes to manage instances of these and push the load balancing to an ingress controller.  Something simple baked into accumulo would be great, but appropriate hooks to determine the health and load on one of these processes could be useful.
   
   An additional thought to keep in mind is the ability to embed this server process in something else (e.g. a webserver or perhaps a separate microservice).  Being able to run standalone or embedded would be desirable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1015741550


   > @keith-turner - you mentioned that there was work to do in the tablet server as a prerequisite. Do you have any comments on what that might be?
   
   No, I don't have anything specific in mind ATM but I will take a look at the code and get back to you if I see anything specific. I just suspect there are things that could be done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1015740771


   @keith-turner  - you mentioned that there was work to do in the tablet server as a prerequisite. Do you have any comments on what that might be?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1010008666


   @keith-turner - Thanks for the comment. Regarding the scan groups, I was thinking of a default naive approach with no groups but allow the user to plug their own scan server locator into the client. Maybe over time we build in some smarter options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1015737106


   > Trying to think through if sleeping w/ exponential backoff in both cases makes sense.
   
   I was thinking about this and started thinking the following may be good goals.
   
    1. Only do exponential back off sleep when serversToUse has expanded to include all possible scan servers.  Do not do sleep when serversToUse is less than the possible number of scans servers.
    2. Make the expansion factor of serversToUse such that we always get to all servers in a fixed number of retries.  For example if we keep getting busy signals, then we should always expand to all servers by the Nth retry no matter how many servers.
   
   The reason for #1 is that we do not want to delay retrying with more servers in the case where only a single tablet is busy.  The reason for #2 is that we want to get to all of the servers quickly and start sleeping in the case where many servers are busy.  To accomplish #2 we can calculate the scale factor as a function of the number of servers.  For example if we had 1000 servers, started off with 3 servers, and wanted to get to all servers in 3 retries we could solve `3x^3=1000` to get the desired scale factor which is `x=(1000/3)^(1/3) =~ 6.934`.   In the previous post I made the scale factor was arbitrarily set to 2, so it would take more retries to get to all servers as the total servers increases.
   
   Below is an example with 1000 thousand scan servers where lots are busy of how this might work.  Since there are 1000 servers we use the scale factor of 6.934 calculated above.
   
     1. We chose randomly from 3 servers based on the tablet hash.
     2. We contact the server and it reports busy. We do not sleep and retry.
     3. We expand the number of scan servers to choose from to 3*6.934 = 20.802.  We randomly choose from 21 servers based on the tablet hash.
     4. We contact the server and it reports busy. We do not sleep and retry.
     5. We expand the number of scan servers to choose from to 20.802*6.934 = 144.241.  We randomly choose from 144 servers based on the tablet hash.
     6. We contact the server and it reports busy. We do not sleep and retry.
     7. We expand the number of scan servers to choose from to 144.241*6.934 = 1000.167.  We randomly choose from 1000 servers based on the tablet hash.
     8. We contact the server and it reports busy. We start sleeping with exponential backoff and retrying.  Now that are picking from all servers, there is no need to expand by 6.934 anymore.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007790437


   Is this basically creating servers that act as "clients" under today's design, but doing client-side iterators? What's stopping users from implementing something like that on top of Accumulo today, without baking the complexity directly into Accumulo?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1011212339


   Thinking a bit more about making tablets sticky, was thinking an algorithm like the following may be sticky and resilient to high scan load on a few tablets.  The algorithm starts of always sending a tablet to a few scan servers and when those few get busy it will exponentially expand the number of servers used.
   
   * serverAddresses[] = an array of sorted scan server addresses that is usually the same on each client.
   * NSS = num scan servers which would be serverAddresses.lenght()
   * ISS = initial scan servers to randomly select from when querying a tablet, could start off small like 2 or 4
   
   The algorithm is below and above are the initial variables
   
   1. serversToUse = ISS
   2. serverIndex = ( hash(tablet extent) % NSS + random.nextInt(serversToUse) ) % NSS
   3. serverAddress = serversAddresses[serverIndex]
   4. Send scan request to server address
   5. If get results then done
   6. If server reports as busy then set serversToUse=2*serversToUse and goto 2
   
   In step 6 of the algorithm, we may want to sleep also.  There are two cases to consider for this : many tablets are getting many query and a few tablets are getting many queries. Trying to think through if sleeping w/ exponential backoff in both cases makes sense.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007790437


   Is this basically creating servers that act as "clients" under today's design, but doing client-side iterators? What's stopping users from implementing something like that on top of Accumulo today, without baking the complexity directly into Accumulo?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1009461640


   Thinking about this I had a few random thoughts that I corralled into sections below.
   
   ## User experience
   
   Thinking about one possible user experience is that we have a single API addition that allows scanners to specify their required consistency level.  The default would be immediate consistency, which is the current behavior.  This would make eventually consistent reads something someone would have to intentionally opt into with code changes.  No existing code written against Accumulo would start doing eventually consistent reads all of a sudden.
   
   Maybe something like the following in ScannerBase, so that the Scanner and BatchScanner will inherit the method.
   
   ```java
   class ScannerBase {
       enum ConsistencyLevel {
           IMMEDIATE,
           EVENTUAL
       }
   
       /**
        *  Allows setting a scanners required consistency level. If not set, the default it IMMEDIATE, which means the scanner will always see data written before the scans starts.  When set to EVENTUAL the system may be able to serve scans faster while not guaranteeing data written prior to the scan start is seen. 
        */
       void setConsistency(ConsistencyLevel cl);
      
   }
   ```
   
   Maybe there does not need to be anything in the API other than this.  How eventually consistent reads are handled at run time could possibly all be directed completely by run time configuration.   Thinking that a developer will opt into eventually consistent reads (using the API) and an administrator will tweak how those are executed (through config changes at run time) in order to optimize the scans for their runtime environment.
   
   ## Runtime implementation of eventually consistent reads
   
   Thinking we could possibly have the following high-level concepts for the runtime implementation.
   
    * **Scan server group** : Zero or more scan servers that serve scans for a named group.  The group could run as an autoscaled pod in kubernetes.  
    * **Scan server** : When a scan server is started, its group name can be provided as an argument similar to how compactors are started with a queue name.  This scan server will serve eventually consistent reads for tablets.
    * **Table to scan group mapping** : configuration that maps Accumulo tables to a scan server group.  This allows tables to possibly have dedicated scan servers running in a kubernets pod.
   
   One thing I am pondering is how should client chose a scan server.  The simplest implementation would be for eventually consistent scans to do the following in the accumulo client.
   
     * Determine what scan group the table is in
     * Get the list of scan servers in that group
     * Randomly chose one of them
   
   I think this would work fairly well.  Scan servers could respond w/ too busy response if they are overloaded and the client could randomly select another.  It could also remember a server that reported as busy and not contact it for a while (maybe w/ exponential backoff).  
   
   One thing I would like, but I am not sure how to achieve that the moment is stickiness for tablets in order to achieve better cache utilization.  For example, if I have 1000 scan servers maybe it would be nice if each tablet was only served by 10 of those scan servers in order to increase the chance of getting cache hits on the scan server for a given tablet.  Could probably achieve this with hashing and modulus to map a tablet to a subset of server deterministically across clients.  However, this simplistic limiting tablets to a fixed number of scan servers would not allow scaling in the case where a single tablet all of a sudden has an overwhelming number of scans.   Maybe if scan servers used a cluster wide caching mechanism, then stickiness would not be a concern.  However local cache in RAM is hard to beat if you can use it.
   
   ## Prerequisites. 
   
   Thinking that inorder to implement this it may be best to first pay down some tech debt in the tserver and work on organizing the existing tablet scan code so that its more modular and self-contained (separated from other tablet code for writes and bookeeping).  This could lead to making it easier to reuse code in the scan server.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1009461640


   Thinking about this I had a few random thoughts that I corralled into sections below.
   
   ## User experience
   
   Thinking about one possible user experience is that we have a single API addition that allows scanners to specify their required consistency level.  The default would be immediate consistency, which is the current behavior.  This would make eventually consistent reads something someone would have to intentionally opt into with code changes.  No existing code written against Accumulo would start doing eventually consistent reads all of a sudden.
   
   Maybe something like the following in ScannerBase, so that the Scanner and BatchScanner will inherit the method.
   
   ```java
   class ScannerBase {
       enum ConsistencyLevel {
           IMMEDIATE,
           EVENTUAL
       }
   
       /**
        *  Allows setting a scanners required consistency level. If not set, the default it IMMEDIATE, which means the scanner will always see data written before the scans starts.  When set to EVENTUAL the system may be able to serve scans faster while not guaranteeing data written prior to the scan start is seen. 
        */
       void setConsistency(ConsistencyLevel cl);
      
   }
   ```
   
   Maybe there does not need to be anything in the API other than this.  How eventually consistent reads are handled at run time could possibly all be directed completely by run time configuration.   Thinking that a developer will opt into eventually consistent reads (using the API) and an administrator will tweak how those are executed (through config changes at run time) in order to optimize the scans for their runtime environment.
   
   ## Runtime implementation of eventually consistent reads
   
   Thinking we could possibly have the following high-level concepts for the runtime implementation.
   
    * **Scan server group** : Zero or more scan servers that serve scans for a named group.  The group could run as an autoscaled pod in kubernetes.  
    * **Scan server** : When a scan server is started, its group name can be provided as an argument similar to how compactors are started with a queue name.  This scan server will serve eventually consistent reads for tablets.
    * **Table to scan group mapping** : configuration that maps Accumulo tables to a scan server group.  This allows tables to possibly have dedicated scan servers running in a kubernets pod.
   
   One thing I am pondering is how should client chose a scan server.  The simplest implementation would be for eventually consistent scans to do the following in the accumulo client.
   
     * Determine what scan group the table is in
     * Get the list of scan servers in that group
     * Randomly chose one of them
   
   I think this would work fairly well.  Scan servers could respond w/ too busy response if they are overloaded and the client could randomly select another.  It could also remember a server that reported as busy and not contact it for a while (maybe w/ exponential backoff).  
   
   One thing I would like, but I am not sure how to achieve at the moment is stickiness for tablets in order to achieve better cache utilization.  For example, if I have 1000 scan servers maybe it would be nice if each tablet was only served by 10 of those scan servers in order to increase the chance of getting cache hits on the scan server for a given tablet.  Could probably achieve this with hashing and modulus to map a tablet to a subset of server deterministically across clients.  However, this simplistic limiting tablets to a fixed number of scan servers would not allow scaling in the case where a single tablet all of a sudden has an overwhelming number of scans.   Maybe if scan servers used a cluster wide caching mechanism, then stickiness would not be a concern.  However local cache in RAM is hard to beat if you can use it.
   
   ## Prerequisites. 
   
   Thinking that inorder to implement this it may be best to first pay down some tech debt in the tserver and work on organizing the existing tablet scan code so that its more modular and self-contained (separated from other tablet code for writes and bookeeping).  This could lead to making it easier to reuse code in the scan server.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1011212339


   Thinking a bit more about making tablets sticky, was thinking an algorithm like the following may be sticky and resilient to high scan load on a few tablets.  The algorithm starts of always sending a tablet to a few scan servers and when those few get busy it will exponentially expand the number of servers used.
   
   * serverAddresses[] = an array of sorted scan server addresses that is usually the same on each client.  Its eventually consistent on the client.  For brief periods when it differs, clients will choose different scan servers for the same tablets.
   * NSS = num scan servers which would be serverAddresses.lenght()
   * ISS = initial scan servers to randomly select from when querying a tablet, could start off small like 2 or 4
   
   The algorithm is below and above are the initial variables
   
   1. serversToUse = ISS
   2. serverIndex = ( hash(tablet extent) % NSS + random.nextInt(serversToUse) ) % NSS
   3. serverAddress = serversAddresses[serverIndex]
   4. Send scan request to server address
   5. If get results then done
   6. If server reports as busy then set serversToUse=2*serversToUse and goto 2
   
   In step 6 of the algorithm, we may want to sleep also.  There are two cases to consider for this : many tablets are getting many query and a few tablets are getting many queries. Trying to think through if sleeping w/ exponential backoff in both cases makes sense.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ivakegg commented on issue #2411: External Isolated Scan Server

Posted by GitBox <gi...@apache.org>.
ivakegg commented on issue #2411:
URL: https://github.com/apache/accumulo/issues/2411#issuecomment-1007723004


   This is essentially a "read only" tserver except that this process would not be tied to specific tablets.  I love the idea of separating the scans running iterators that are on the more expensive side into separate processes to avoid churn within the accumulo tservers.  The concept is relatively simple.  The trick is doing load balancing appropriately.
   
   On thought is to allow something like Kubernetes to manage instances of these and push the load balancing to an ingress controller.  Something simple baked into accumulo would be great, but appropriate hooks to determine the health and load on one of these processes could be useful.
   
   An additional thought to keep in mind is the ability to embed this server process in something else (e.g. a webserver or perhaps a separate microservice).  Being able to run standalone or embedded would be desirable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org