You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/06/03 04:54:11 UTC

Storing and loading the FST directly from disk

Is it possible to iterate over the FST while it's still on disk?  If
not is that type of functionality planned?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> Wow you're right, the FST size in RAM with 50 mil date 1 ms
> incremented keys is less than 1K.  That's insane!

This does sound insane. Are you sure you're building everything right
(not pruning anything)? You could always enumerate the FST to get the
keys back to make sure it's actually working. Or check for exist(key)
for every key in the input.

The compression ratio should be good for shared prefixes, but 1K seems
a bit too small for 50mil entries...

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> here you should rather store pointers to another file and mmap that
> file. Keep your FST as lean and compact as possible and make sure its
> in memory. The compression should do a good job for you here!

Wow you're right, the FST size in RAM with 50 mil date 1 ms
incremented keys is less than 1K.  That's insane!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> IMO the byte[] representation is so compact
> that it doesn't really matter if you use FS Cache memory or JVM memory
> so I'd rather go for jvm memory here too.

In principle I agree, however in practice HBase users will likely have
a lot of keys per region and server.  If we stored every single key,
I'm not sure what the memory requirements would be, I'll probably run
some tests to get an idea of number of keys -> memory size.  I guess a
linear scan would still be fine, eg, the key would be constructed via
the FST (which may not be performant), and the value would be loaded
from the MMap, which would be sequential.

> here you should rather store pointers to another file and mmap that
> file. Keep your FST as lean and compact as possible and make sure its
> in memory. The compression should do a good job for you here!

Right, we'll see.

On Fri, Jun 3, 2011 at 12:00 AM, Simon Willnauer
<si...@googlemail.com> wrote:
> I agree with Robert and Dawid that once you go past the page fault
> border you will loose performance. The problem here is that you don't
> realize it immediately. IMO the byte[] representation is so compact
> that it doesn't really matter if you use FS Cache memory or JVM memory
> so I'd rather go for jvm memory here too.
>
> it seems you are having a slightly different usecase here:
>
>> On a related note, is it OK to place lets say 1K byte[]s into the
>> Output portion of the FST?
>
> here you should rather store pointers to another file and mmap that
> file. Keep your FST as lean and compact as possible and make sure its
> in memory. The compression should do a good job for you here!
>
> simon
> On Fri, Jun 3, 2011 at 8:19 AM, Dawid Weiss
> <da...@cs.put.poznan.pl> wrote:
>>> In any case I browsed the code a bit more, it'd be easy to test out
>>> the MMap'ing because the FST simply reads directly from a byte[],
>>> which could be converted to a ByteBuffer (that's MMap'd).  Nice!
>>
>> The FST is basically a byte[], but like Robert said -- the layout of
>> transitions and states will require nearly random access patterns all
>> around it. There are some optimizations one can make to lay out
>> initial nodes close to each other, for example (and I've done it as
>> part of another project) -- these do optimize CPU cache utilization
>> and I would assume MMAP'ped buffers as well -- but once you go past
>> page fault zone the performance will be terrible. And I assume the
>> degradation will not be smooth, but very sharp.
>>
>> This said, I'd be interested in looking at the benchmarks if you want
>> to invest some time and do it ;)
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> (hint: try to provide a representation
> that will share as many suffixes and prefixes as possible since these
> conflate into a single path, no matter how many sequences you have)

It's just be user created keys, which will be sorted at least, and
probably will be highly likely to share large portions of the prefix.
Eg, I think timestamp is a common key type.

On Fri, Jun 3, 2011 at 12:06 AM, Dawid Weiss
<da...@cs.put.poznan.pl> wrote:
>> here you should rather store pointers to another file and mmap that
>> file. Keep your FST as lean and compact as possible and make sure its
>> in memory. The compression should do a good job for you here!
>
> Yes, this is a good idea. If you can share a sample of that data that
> you want to keep in an FST I may be able to recommend something to
> keep it smaller in the FST (hint: try to provide a representation
> that will share as many suffixes and prefixes as possible since these
> conflate into a single path, no matter how many sequences you have).
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> here you should rather store pointers to another file and mmap that
> file. Keep your FST as lean and compact as possible and make sure its
> in memory. The compression should do a good job for you here!

Yes, this is a good idea. If you can share a sample of that data that
you want to keep in an FST I may be able to recommend something to
keep it smaller in the FST (hint: try to provide a representation
that will share as many suffixes and prefixes as possible since these
conflate into a single path, no matter how many sequences you have).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Simon Willnauer <si...@googlemail.com>.

I agree with Robert and Dawid that once you go past the page fault
border you will loose performance. The problem here is that you don't
realize it immediately. IMO the byte[] representation is so compact
that it doesn't really matter if you use FS Cache memory or JVM memory
so I'd rather go for jvm memory here too.

it seems you are having a slightly different usecase here:

> On a related note, is it OK to place lets say 1K byte[]s into the
> Output portion of the FST?

here you should rather store pointers to another file and mmap that
file. Keep your FST as lean and compact as possible and make sure its
in memory. The compression should do a good job for you here!

simon
On Fri, Jun 3, 2011 at 8:19 AM, Dawid Weiss
<da...@cs.put.poznan.pl> wrote:
>> In any case I browsed the code a bit more, it'd be easy to test out
>> the MMap'ing because the FST simply reads directly from a byte[],
>> which could be converted to a ByteBuffer (that's MMap'd).  Nice!
>
> The FST is basically a byte[], but like Robert said -- the layout of
> transitions and states will require nearly random access patterns all
> around it. There are some optimizations one can make to lay out
> initial nodes close to each other, for example (and I've done it as
> part of another project) -- these do optimize CPU cache utilization
> and I would assume MMAP'ped buffers as well -- but once you go past
> page fault zone the performance will be terrible. And I assume the
> degradation will not be smooth, but very sharp.
>
> This said, I'd be interested in looking at the benchmarks if you want
> to invest some time and do it ;)
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> In any case I browsed the code a bit more, it'd be easy to test out
> the MMap'ing because the FST simply reads directly from a byte[],
> which could be converted to a ByteBuffer (that's MMap'd).  Nice!

The FST is basically a byte[], but like Robert said -- the layout of
transitions and states will require nearly random access patterns all
around it. There are some optimizations one can make to lay out
initial nodes close to each other, for example (and I've done it as
part of another project) -- these do optimize CPU cache utilization
and I would assume MMAP'ped buffers as well -- but once you go past
page fault zone the performance will be terrible. And I assume the
degradation will not be smooth, but very sharp.

This said, I'd be interested in looking at the benchmarks if you want
to invest some time and do it ;)

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

In any case I browsed the code a bit more, it'd be easy to test out
the MMap'ing because the FST simply reads directly from a byte[],
which could be converted to a ByteBuffer (that's MMap'd).  Nice!

On Thu, Jun 2, 2011 at 8:57 PM, Robert Muir <rc...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:49 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>> because of how the datastructure is laid out. if you use mmap, instead
>>> of lots of seeks, it will simply be lots of page faults instead.
>>
>> Ok, right, however assuming the pages are in RAM it could/should be
>> ok.  Eg, in HBase we'd want to store the keys alongside the column
>> family values, on disk.  Loading the keys into heap with the values
>> still on disk/system IO cache probably is non-optimal.
>>
>
> you don't have to believe me... if you want to use an FST, load it up in RAM.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Jun 2, 2011 at 11:49 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
>> because of how the datastructure is laid out. if you use mmap, instead
>> of lots of seeks, it will simply be lots of page faults instead.
>
> Ok, right, however assuming the pages are in RAM it could/should be
> ok.  Eg, in HBase we'd want to store the keys alongside the column
> family values, on disk.  Loading the keys into heap with the values
> still on disk/system IO cache probably is non-optimal.
>

you don't have to believe me... if you want to use an FST, load it up in RAM.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> because of how the datastructure is laid out. if you use mmap, instead
> of lots of seeks, it will simply be lots of page faults instead.

Ok, right, however assuming the pages are in RAM it could/should be
ok.  Eg, in HBase we'd want to store the keys alongside the column
family values, on disk.  Loading the keys into heap with the values
still on disk/system IO cache probably is non-optimal.

On Thu, Jun 2, 2011 at 8:45 PM, Robert Muir <rc...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:39 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>> MMAP changes nothing: as its not a sequential pattern.. and things like
>>> asking for a term range would be extra bad.
>>
>> Why's that?  If the associated pages are in the system IO cache it'd be fine.
>>
>
> because of how the datastructure is laid out. if you use mmap, instead
> of lots of seeks, it will simply be lots of page faults instead.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Jun 2, 2011 at 11:39 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
>> MMAP changes nothing: as its not a sequential pattern.. and things like
>> asking for a term range would be extra bad.
>
> Why's that?  If the associated pages are in the system IO cache it'd be fine.
>

because of how the datastructure is laid out. if you use mmap, instead
of lots of seeks, it will simply be lots of page faults instead.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> MMAP changes nothing: as its not a sequential pattern.. and things like
> asking for a term range would be extra bad.

Why's that?  If the associated pages are in the system IO cache it'd be fine.

On a related note, is it OK to place lets say 1K byte[]s into the
Output portion of the FST?

On Thu, Jun 2, 2011 at 8:36 PM, Robert Muir <rc...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:18 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>> This would be very bad, if you were to lookup "foobar" you would have
>>> to do a seek on every byte
>>
>> I think it'd be MMap'd.  There was a lengthy discussion at
>> LUCENE-2843, and there's an open issue to in fact store all terms in
>> RAM as an FST.
>>
>
> all in RAM is completely different. all on disk will be bad, because
> when traversing terms in the FST, its random access all the way. MMAP
> changes nothing: as its not a sequential pattern.. and things like
> asking for a term range would be extra bad.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Jun 2, 2011 at 11:18 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
>> This would be very bad, if you were to lookup "foobar" you would have
>> to do a seek on every byte
>
> I think it'd be MMap'd.  There was a lengthy discussion at
> LUCENE-2843, and there's an open issue to in fact store all terms in
> RAM as an FST.
>

all in RAM is completely different. all on disk will be bad, because
when traversing terms in the FST, its random access all the way. MMAP
changes nothing: as its not a sequential pattern.. and things like
asking for a term range would be extra bad.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> if you want to use it for this purpose, you don't need an FST, you can
> just use an NFA/DFA of prefixes instead, as you only need to answer
> accept/reject.

Right, however if we already have the FST, then if it supports
accept/reject efficiently, we'd simply reuse it (thereby removing the
bloom filter as yet another heap consumer).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Jun 2, 2011 at 11:18 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> How is the FST with bloom filter like access?
>

if you want to use it for this purpose, you don't need an FST, you can
just use an NFA/DFA of prefixes instead, as you only need to answer
accept/reject.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Jason Rutherglen <ja...@gmail.com>.

> This would be very bad, if you were to lookup "foobar" you would have
> to do a seek on every byte

I think it'd be MMap'd.  There was a lengthy discussion at
LUCENE-2843, and there's an open issue to in fact store all terms in
RAM as an FST.

I'm looking at the FST for HBase's key index which is the equivalent
of the terms index.  However it'd be interesting to see if in fact all
keys could be stored in the FST on disk, then very little would need
to be loaded into RAM.  How is the FST with bloom filter like access?

On Thu, Jun 2, 2011 at 8:06 PM, Robert Muir <rc...@gmail.com> wrote:
> This would be very bad, if you were to lookup "foobar" you would have
> to do a seek on every byte.
>
> instead, load up "part of the terms" in RAM, and the rest on disk
> sequentially... this is how the lucene index works (the terms index is
> in FST in ram, referring to terms dictionary on disk, so you do one
> seek instead).
>
> On Thu, Jun 2, 2011 at 10:54 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> Is it possible to iterate over the FST while it's still on disk?  If
>> not is that type of functionality planned?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Storing and loading the FST directly from disk

Posted by Robert Muir <rc...@gmail.com>.

This would be very bad, if you were to lookup "foobar" you would have
to do a seek on every byte.

instead, load up "part of the terms" in RAM, and the rest on disk
sequentially... this is how the lucene index works (the terms index is
in FST in ram, referring to terms dictionary on disk, so you do one
seek instead).

On Thu, Jun 2, 2011 at 10:54 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Is it possible to iterate over the FST while it's still on disk?  If
> not is that type of functionality planned?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org