You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semyon Danilov (Jira)" <ji...@apache.org> on 2022/09/22 09:56:00 UTC
[jira] [Commented] (IGNITE-17320) Implement B+Tree based sorted index storage
[ https://issues.apache.org/jira/browse/IGNITE-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608190#comment-17608190 ]
Semyon Danilov commented on IGNITE-17320:
-----------------------------------------
The patch looks good to me
> Implement B+Tree based sorted index storage
> -------------------------------------------
>
> Key: IGNITE-17320
> URL: https://issues.apache.org/jira/browse/IGNITE-17320
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-alpha6
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Like in IGNITE-17318, binary tuple format is a main goal here, because that's what we're gonna pass into API.
> Implementing a tree for indexes isn't hard by itself. I expect it to be stored in same partition files as raw storage, everything should be colocated in a single file.
> What's complicated is the inlining of data into trees pages. I see it as the following:
> * If tuple has no offset table, we can always store the entire payload in the tree. This is the best case scenario, because the size is fixed and known a priori, we don't even need to store it before the payload.
> * If tuple has an offset table, the inline size will immediately get bigger. In my view, we will have to:
> ** store the size of inlined payload, that's 4 bytes
> ** store null table if it's there, that's a known amount of bytes
> ** store header and offset table:
> *** if there are non-fixed-length columns, then a single entry in offset table can be up to 4 bytes
> *** if there are only fixed-length columns, like ints, floats or even bitsets, the amount of bytes per single entry can be accurately estimated with the upper bound
> ** then store a good amount of the actual columns data. How much? I'd be generous, but then we would probably have too much space, so all of this is debatable:
> *** for columns with fixed size, allocate room for entire value
> *** for strings and numbers (is there something else?) we have to pre-allocate a reasonable amount of bytes. Like, 8, for example. I don't know, there are defaults somewhere in the code of Ignite 2.x, we can use them.
> So, my point is, there's no new data format for inlined section of the tuple, we should reuse it and thus avoid many possible errors. And, if record fits into a tree page, there's no need in inserting it into a free list. Good!
> And yes, of course, there has to be a room for row id in inlined section as well.
> Last, meta tree is still a thing. But, we shouldn't identify indexes by their names, cause there's UUID id or even integer id (see IGNITE-17318).
> h3. Other ideas
> I don't like how durable background tasks work in Ignite 2.x, there are always some issues. I would prefer having a general-purposed "recycle bin" in partition and a background cleaner process that would clean it. Maybe this queue should contain other entities in the future.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)