You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by 许益铭 <x1...@gmail.com> on 2021/02/08 09:11:59 UTC

why tuple memory need sorted by solt size

why tuple memory need to be sorted by slot size? is has any optimize?

Re: why tuple memory need sorted by solt size

Posted by Tim Armstrong <ti...@gmail.com>.
I think it made more sense when we tried to pad the tuple so that slots
were aligned, since it reduced the amount of padding. But now we don't add
any padding the order of slots doesn't affect the tuple size. On amd64 I
don't think we've tended to see any real penalty from unaligned accesses -
it's possible if a field is across two cache lines.

Agree that you would probably want to optimize to cluster together
frequently accessed slots, at least if the tuples are large enough to span
multiple cache lines (i.e. > 64b)

On Mon, 8 Feb 2021 at 08:43, Csaba Ringhofer <cs...@cloudera.com>
wrote:

> I think that alignment is not the goal here, because the tuples themselves
> are not aligned, as there is no padding at their end -  e.g. if  tuple's
> size is 17 byte, all kind the first tuple will start at offset 0, the next
> at 17 ...
> a comment about the lack of padding:
>
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java#L68
>
> I have a different reason in my memories, but I didn't find the comment
> that mentioned it:
> The goal of sorting is related to big tuples that span more than 1 cache
> page - by sorting, all small tuples will move to the end (just before the
> null indicator flag bytes). so we will have "dense" cache pages that are
> used by many slots, and "sparse" ones used by only by a few big slots. If
> all slots are accessed with the same probability during expression
> evaluation, this layout increases the possibility that some "sparse" pages
> won't be accessed at all, leading to smaller pressure on cache.
>
> Note that this is a far from optimal strategy IMPO - it could be improved
> by considering when a slot will be used, e.g. slots used in predicates
> could be moved near the null indicator flag. If the predicate fails, the
> rest of the slots (and the pages that contain them) are not accessed again.
>
>
>
>
> On Mon, Feb 8, 2021 at 2:24 PM Zoltán Borók-Nagy <bo...@apache.org>
> wrote:
>
> > Though we don't require tuples to have any memory alignment based on the
> > comment in
> >
> >
> https://github.com/apache/impala/blob/81d5377c27f1940235db332e43f1d0f073cf3d2f/be/src/runtime/tuple.h#L61-L63
> > , but I do believe we sort slots to get a packed and aligned memory
> layout
> > for the tuples in most cases. CPU operations on aligned addresses are
> more
> > efficient than operations on unaligned addresses.
> >
> > BR,
> >     Zoltan
> >
> >
> > On Mon, Feb 8, 2021 at 10:12 AM 许益铭 <x1...@gmail.com> wrote:
> >
> > > why tuple memory need to be sorted by slot size? is has any optimize?
> > >
> >
>

Re: why tuple memory need sorted by solt size

Posted by Csaba Ringhofer <cs...@cloudera.com>.
I think that alignment is not the goal here, because the tuples themselves
are not aligned, as there is no padding at their end -  e.g. if  tuple's
size is 17 byte, all kind the first tuple will start at offset 0, the next
at 17 ...
a comment about the lack of padding:
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java#L68

I have a different reason in my memories, but I didn't find the comment
that mentioned it:
The goal of sorting is related to big tuples that span more than 1 cache
page - by sorting, all small tuples will move to the end (just before the
null indicator flag bytes). so we will have "dense" cache pages that are
used by many slots, and "sparse" ones used by only by a few big slots. If
all slots are accessed with the same probability during expression
evaluation, this layout increases the possibility that some "sparse" pages
won't be accessed at all, leading to smaller pressure on cache.

Note that this is a far from optimal strategy IMPO - it could be improved
by considering when a slot will be used, e.g. slots used in predicates
could be moved near the null indicator flag. If the predicate fails, the
rest of the slots (and the pages that contain them) are not accessed again.




On Mon, Feb 8, 2021 at 2:24 PM Zoltán Borók-Nagy <bo...@apache.org>
wrote:

> Though we don't require tuples to have any memory alignment based on the
> comment in
>
> https://github.com/apache/impala/blob/81d5377c27f1940235db332e43f1d0f073cf3d2f/be/src/runtime/tuple.h#L61-L63
> , but I do believe we sort slots to get a packed and aligned memory layout
> for the tuples in most cases. CPU operations on aligned addresses are more
> efficient than operations on unaligned addresses.
>
> BR,
>     Zoltan
>
>
> On Mon, Feb 8, 2021 at 10:12 AM 许益铭 <x1...@gmail.com> wrote:
>
> > why tuple memory need to be sorted by slot size? is has any optimize?
> >
>

Re: why tuple memory need sorted by solt size

Posted by Zoltán Borók-Nagy <bo...@apache.org>.
Though we don't require tuples to have any memory alignment based on the
comment in
https://github.com/apache/impala/blob/81d5377c27f1940235db332e43f1d0f073cf3d2f/be/src/runtime/tuple.h#L61-L63
, but I do believe we sort slots to get a packed and aligned memory layout
for the tuples in most cases. CPU operations on aligned addresses are more
efficient than operations on unaligned addresses.

BR,
    Zoltan


On Mon, Feb 8, 2021 at 10:12 AM 许益铭 <x1...@gmail.com> wrote:

> why tuple memory need to be sorted by slot size? is has any optimize?
>