You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "tianhang tang (Jira)" <ji...@apache.org> on 2023/02/14 03:02:00 UTC

[jira] [Commented] (HBASE-26850) Optimize the implementation of LRUCache in LRUDictionary

    [ https://issues.apache.org/jira/browse/HBASE-26850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688248#comment-17688248 ] 

tianhang tang commented on HBASE-26850:
---------------------------------------

After discuss in HBASE-27621, I think this issue still has merit as we still use the LRUDictionary.

In total, it's a bug of the LRUMap implementation: Existing nodes should be reused when addingEntry.

In fact, The modified version is also the most classic implementation of LRUMap.

> Optimize the implementation of LRUCache in LRUDictionary
> --------------------------------------------------------
>
>                 Key: HBASE-26850
>                 URL: https://issues.apache.org/jira/browse/HBASE-26850
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>    Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11
>            Reporter: tianhang tang
>            Assignee: tianhang tang
>            Priority: Major
>         Attachments: image-2022-03-16-17-13-00-871.png
>
>
> *This issue is to unify the behavior of reading and writing links.*
>  
> During my research on HBASE-26849, I found that there seems to be something wrong with the implementation of LRUDictionary.
> It uses array to save data, and uses hashMap to save array index.
> For the write link:
> {code:java}
> CompressedKvEncoder#write
> ->
> LRUDictionary#findEntry
> ->
> LRUDictionary#addEntry
> {code}
> And for the read link:
> {code:java}
> CompressedKVDecoder#readIntoArray
> ->
> LRUDirectory#addEntry
> {code}
> Then we could see the logic in {_}findEntry{_}:
> {code:java}
>   @Override
>   public short findEntry(byte[] data, int offset, int length) {
>     short ret = backingStore.findIdx(data, offset, length);
>     if (ret == NOT_IN_DICTIONARY) {
>       addEntry(data, offset, length);
>     }
>     return ret;
>   }
>     private short findIdx(byte[] array, int offset, int length) {
>       Short s;
>       final Node comparisonNode = new Node();
>       comparisonNode.setContents(array, offset, length);
>       if ((s = nodeToIndex.get(comparisonNode)) != null) {
>         moveToHead(indexToNode[s]);
>         return s;
>       } else {
>         return -1;
>       }
>     }
> {code}
> At first it will try to find if there is same node in the cache, If there is, use it directly.
> The problem is, *if we have some same nodes, The behavior on the read and write links are inconsistent:*
> On the write link we will reuse just one node, but on the read link multiple copies will be stored in the array.
> but there is only one mapping from the node to the array index in the map. 
> So, on the read link, if the first 'same' node is evict, we will remove it from the map, then we might met a NPE because when the second 'same' node evict we can not find it in the map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)