You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Veeral Shah <ve...@gmail.com> on 2016/03/07 14:48:09 UTC

Question on LSMified secondary indexes

   Had a small question around asterixdb LSMified secondry index structure
(Disclaimer : I dont have complete familiarity with the LSM file formats in
asterixdb - hence below also sort of validates my understanding),

In a classic LSM implementation, files (aka SST files in leveldb) at level
L1 onwards (L1, L2, L3,...), are sorted based on keys (assuming key types
where linear order can be imposed) and the key ranges of any two SST files,
in a level, are mutually exclusive. Since the LSM files are immutable, a
file's key range at a level remains fixed (unless compaction happens). So
taking a case of a asterixdb LSMified secondary index, assuming the keys
are byte strings, all keys should be alphabetically sorted.

 If above is true, there may be a merit in creating an additional in-memory
"coarse-index" (for each level) which keeps info about key.max and key.min
for each file region (say an SST file is broken into multiple smaller
regions of fixed size). This would help in quickly figuring out where a key
could fall, in  a given LSM level. (The coarse-index is to be
created/recreated during startup and compaction)

 Thanks and regards
Veeral Shah

Re: Question on LSMified secondary indexes

Posted by Chen Li <ch...@gmail.com>.
The second feature (using metadata to describe the content of each LSM
component) is described in Sattam's PhD thesis.

Chen

On Mon, Mar 7, 2016 at 6:28 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Hi Veeral,
> Two points:
> 1. Keys in our LSM index are not mutually exclusive between different
> components.
> 2. We have the concept of filtered LSM index in which we store (min-max) of
> a field(not necessarily the key) and we can use them then when answering
> queries to know which components to look at.
>
> Hope this helps,
> Abdullah.
>
> On Mon, Mar 7, 2016 at 4:48 PM, Veeral Shah <ve...@gmail.com> wrote:
>
> >    Had a small question around asterixdb LSMified secondry index
> structure
> > (Disclaimer : I dont have complete familiarity with the LSM file formats
> in
> > asterixdb - hence below also sort of validates my understanding),
> >
> > In a classic LSM implementation, files (aka SST files in leveldb) at
> level
> > L1 onwards (L1, L2, L3,...), are sorted based on keys (assuming key types
> > where linear order can be imposed) and the key ranges of any two SST
> files,
> > in a level, are mutually exclusive. Since the LSM files are immutable, a
> > file's key range at a level remains fixed (unless compaction happens). So
> > taking a case of a asterixdb LSMified secondary index, assuming the keys
> > are byte strings, all keys should be alphabetically sorted.
> >
> >  If above is true, there may be a merit in creating an additional
> in-memory
> > "coarse-index" (for each level) which keeps info about key.max and
> key.min
> > for each file region (say an SST file is broken into multiple smaller
> > regions of fixed size). This would help in quickly figuring out where a
> key
> > could fall, in  a given LSM level. (The coarse-index is to be
> > created/recreated during startup and compaction)
> >
> >  Thanks and regards
> > Veeral Shah
> >
>

Re: Question on LSMified secondary indexes

Posted by Veeral Shah <ve...@gmail.com>.
Thanks ABdullah for clarifying this. I read about LSM filters in Sattam's
paper, now looking at the code.

Thanks and regards
Veeral Shah

On Mon, Mar 7, 2016 at 7:58 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Hi Veeral,
> Two points:
> 1. Keys in our LSM index are not mutually exclusive between different
> components.
> 2. We have the concept of filtered LSM index in which we store (min-max) of
> a field(not necessarily the key) and we can use them then when answering
> queries to know which components to look at.
>
> Hope this helps,
> Abdullah.
>
> On Mon, Mar 7, 2016 at 4:48 PM, Veeral Shah <ve...@gmail.com> wrote:
>
> >    Had a small question around asterixdb LSMified secondry index
> structure
> > (Disclaimer : I dont have complete familiarity with the LSM file formats
> in
> > asterixdb - hence below also sort of validates my understanding),
> >
> > In a classic LSM implementation, files (aka SST files in leveldb) at
> level
> > L1 onwards (L1, L2, L3,...), are sorted based on keys (assuming key types
> > where linear order can be imposed) and the key ranges of any two SST
> files,
> > in a level, are mutually exclusive. Since the LSM files are immutable, a
> > file's key range at a level remains fixed (unless compaction happens). So
> > taking a case of a asterixdb LSMified secondary index, assuming the keys
> > are byte strings, all keys should be alphabetically sorted.
> >
> >  If above is true, there may be a merit in creating an additional
> in-memory
> > "coarse-index" (for each level) which keeps info about key.max and
> key.min
> > for each file region (say an SST file is broken into multiple smaller
> > regions of fixed size). This would help in quickly figuring out where a
> key
> > could fall, in  a given LSM level. (The coarse-index is to be
> > created/recreated during startup and compaction)
> >
> >  Thanks and regards
> > Veeral Shah
> >
>

Re: Question on LSMified secondary indexes

Posted by abdullah alamoudi <ba...@gmail.com>.
Hi Veeral,
Two points:
1. Keys in our LSM index are not mutually exclusive between different
components.
2. We have the concept of filtered LSM index in which we store (min-max) of
a field(not necessarily the key) and we can use them then when answering
queries to know which components to look at.

Hope this helps,
Abdullah.

On Mon, Mar 7, 2016 at 4:48 PM, Veeral Shah <ve...@gmail.com> wrote:

>    Had a small question around asterixdb LSMified secondry index structure
> (Disclaimer : I dont have complete familiarity with the LSM file formats in
> asterixdb - hence below also sort of validates my understanding),
>
> In a classic LSM implementation, files (aka SST files in leveldb) at level
> L1 onwards (L1, L2, L3,...), are sorted based on keys (assuming key types
> where linear order can be imposed) and the key ranges of any two SST files,
> in a level, are mutually exclusive. Since the LSM files are immutable, a
> file's key range at a level remains fixed (unless compaction happens). So
> taking a case of a asterixdb LSMified secondary index, assuming the keys
> are byte strings, all keys should be alphabetically sorted.
>
>  If above is true, there may be a merit in creating an additional in-memory
> "coarse-index" (for each level) which keeps info about key.max and key.min
> for each file region (say an SST file is broken into multiple smaller
> regions of fixed size). This would help in quickly figuring out where a key
> could fall, in  a given LSM level. (The coarse-index is to be
> created/recreated during startup and compaction)
>
>  Thanks and regards
> Veeral Shah
>