You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semyon Danilov (Jira)" <ji...@apache.org> on 2022/09/22 09:56:00 UTC
[jira] [Commented] (IGNITE-17320) Implement B+Tree based sorted index storage

    [ https://issues.apache.org/jira/browse/IGNITE-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608190#comment-17608190 ] 

Semyon Danilov commented on IGNITE-17320:
-----------------------------------------

The patch looks good to me

> Implement B+Tree based sorted index storage
> -------------------------------------------
>
>                 Key: IGNITE-17320
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17320
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-alpha6
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Like in IGNITE-17318, binary tuple format is a main goal here, because that's what we're gonna pass into API.
> Implementing a tree for indexes isn't hard by itself. I expect it to be stored in same partition files as raw storage, everything should be colocated in a single file.
> What's complicated is the inlining of data into trees pages. I see it as the following:
>  * If tuple has no offset table, we can always store the entire payload in the tree. This is the best case scenario, because the size is fixed and known a priori, we don't even need to store it before the payload.
>  * If tuple has an offset table, the inline size will immediately get bigger. In my view, we will have to:
>  ** store the size of inlined payload, that's 4 bytes
>  ** store null table if it's there, that's a known amount of bytes
>  ** store header and offset table:
>  *** if there are non-fixed-length columns, then a single entry in offset table can be up to 4 bytes
>  *** if there are only fixed-length columns, like ints, floats or even bitsets, the amount of bytes per single entry can be accurately estimated with the upper bound
>  ** then store a good amount of the actual columns data. How much? I'd be generous, but then we would probably have too much space, so all of this is debatable:
>  *** for columns with fixed size, allocate room for entire value
>  *** for strings and numbers (is there something else?) we have to pre-allocate a reasonable amount of bytes. Like, 8, for example. I don't know, there are defaults somewhere in the code of Ignite 2.x, we can use them.
> So, my point is, there's no new data format for inlined section of the tuple, we should reuse it and thus avoid many possible errors. And, if record fits into a tree page, there's no need in inserting it into a free list. Good!
> And yes, of course, there has to be a room for row id in inlined section as well.
> Last, meta tree is still a thing. But, we shouldn't identify indexes by their names, cause there's UUID id or even integer id (see IGNITE-17318).
> h3. Other ideas
> I don't like how durable background tasks work in Ignite 2.x, there are always some issues. I would prefer having a general-purposed "recycle bin" in partition and a background cleaner process that would clean it. Maybe this queue should contain other entities in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)