You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/02 10:09:39 UTC

[GitHub] [incubator-doris] marising opened a new pull request #4005: LRU cache for sql/partition cache #2581

marising opened a new pull request #4005:
URL: https://github.com/apache/incubator-doris/pull/4005

## Features
1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime
2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve
3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition,
4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data

## Cache fetch
1. HashMap guarantees to quickly find Cache nodes
2. Doubly linked list, put the most recently visited node at the bottom and the least visited at the top
3. The partition data under the Node node is stored in order according to the linked list, and sorted according to the partition key. Considering that the number of requested partitions will not be very large, the two ordered data are combined and the loop is used to find the partition.
4. Every access, will update the access time of the partition

## Cache update
1. Considering that the amount of updates will be relatively small, the Hash table is used here to find the corresponding partition.
2. Determine whether the updated version is higher than the existing version. If it is higher, it will be updated, otherwise it will not be updated.

## Cache pruning
1. The number below Part in the figure below represents the timestamp, and the timestamp of the most recent visit is saved
2. The entire algorithm uses a doubly linked list to find the nodes that have not been accessed recently, eliminate them, and then check them back and forth, and so on, until the memory reaches the standard
3. As shown in the figure below, find Node1 at the top, then find Part1, and eliminate, then find Part1 of Node2, and eliminate, thus eliminating the partition with timestamp 1-3
4. Node1 is cleaned up because none of the following parts

![image](https://user-images.githubusercontent.com/8611398/86345693-730f9380-bc8e-11ea-909c-5cf8702f75bc.png)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org