You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2012/06/27 22:36:25 UTC

Count of keys of an FST

The FST class has a number of methods that return counts, which one returns
the total number of keys that have been encoded into the FST?

Re: Count of keys of an FST

Posted by Jason Rutherglen <ja...@gmail.com>.
Thanks, it's done!  :)

https://issues.apache.org/jira/browse/CASSANDRA-4324

On Thu, Jun 28, 2012 at 9:36 AM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> Let me know if you need that snippet of code to count the keys; or try
> it yourself -- should be good practice :)
>
> Dawid
>
> On Thu, Jun 28, 2012 at 3:32 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
> > I looked at the sources and didn't see a key count.
> >
> > Thanks Dawid and Mike.
> >
> > On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> I believe node and arc count are stored, but not key count.  But check
> >> the sources to be sure!
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
> >> <da...@cs.put.poznan.pl> wrote:
> >> > If you need the count with constant time then yes, you should store it
> >> > separately. You could also make a transducer that would store it at
> >> > the root node as side-effect of values associated with keys, but it's
> >> > kind of ugly.
> >> >
> >> > Please check the fst header though -- I'm not sure, maybe Mike wrote
> >> > it so that the node count/ keys count is in there.
> >> >
> >> > Dawid
> >> >
> >> > On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
> >> > <ja...@gmail.com> wrote:
> >> >> Sounds like I should just count as the keys are added and store the
> >> >> count
> >> >> separately.
> >> >>
> >> >> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
> >> >> <da...@cs.put.poznan.pl>
> >> >> wrote:
> >> >>>
> >> >>> I don't think there is one that you could use out of the box... but
> >> >>> maybe I'm wrong and it's stored in the header somewhere (don't have
> >> >>> the source in front of me).
> >> >>>
> >> >>> To calculate it by hand the worst case is that you'll need a
> recursive
> >> >>> traversal, which would mean O(number of stored states) with
> >> >>> intermediate count caches or O(number of keys) without any caches
> and
> >> >>> memory overhead (just recursive traversal).
> >> >>>
> >> >>> Dawid
> >> >>>
> >> >>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
> >> >>> <ja...@gmail.com> wrote:
> >> >>> > The FST class has a number of methods that return counts, which
> one
> >> >>> > returns
> >> >>> > the total number of keys that have been encoded into the FST?
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Count of keys of an FST

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
Let me know if you need that snippet of code to count the keys; or try
it yourself -- should be good practice :)

Dawid

On Thu, Jun 28, 2012 at 3:32 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> I looked at the sources and didn't see a key count.
>
> Thanks Dawid and Mike.
>
> On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> I believe node and arc count are stored, but not key count.  But check
>> the sources to be sure!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
>> <da...@cs.put.poznan.pl> wrote:
>> > If you need the count with constant time then yes, you should store it
>> > separately. You could also make a transducer that would store it at
>> > the root node as side-effect of values associated with keys, but it's
>> > kind of ugly.
>> >
>> > Please check the fst header though -- I'm not sure, maybe Mike wrote
>> > it so that the node count/ keys count is in there.
>> >
>> > Dawid
>> >
>> > On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
>> > <ja...@gmail.com> wrote:
>> >> Sounds like I should just count as the keys are added and store the
>> >> count
>> >> separately.
>> >>
>> >> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
>> >> <da...@cs.put.poznan.pl>
>> >> wrote:
>> >>>
>> >>> I don't think there is one that you could use out of the box... but
>> >>> maybe I'm wrong and it's stored in the header somewhere (don't have
>> >>> the source in front of me).
>> >>>
>> >>> To calculate it by hand the worst case is that you'll need a recursive
>> >>> traversal, which would mean O(number of stored states) with
>> >>> intermediate count caches or O(number of keys) without any caches and
>> >>> memory overhead (just recursive traversal).
>> >>>
>> >>> Dawid
>> >>>
>> >>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
>> >>> <ja...@gmail.com> wrote:
>> >>> > The FST class has a number of methods that return counts, which one
>> >>> > returns
>> >>> > the total number of keys that have been encoded into the FST?
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Count of keys of an FST

Posted by Jason Rutherglen <ja...@gmail.com>.
I looked at the sources and didn't see a key count.

Thanks Dawid and Mike.

On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I believe node and arc count are stored, but not key count.  But check
> the sources to be sure!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
> <da...@cs.put.poznan.pl> wrote:
> > If you need the count with constant time then yes, you should store it
> > separately. You could also make a transducer that would store it at
> > the root node as side-effect of values associated with keys, but it's
> > kind of ugly.
> >
> > Please check the fst header though -- I'm not sure, maybe Mike wrote
> > it so that the node count/ keys count is in there.
> >
> > Dawid
> >
> > On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
> > <ja...@gmail.com> wrote:
> >> Sounds like I should just count as the keys are added and store the
> count
> >> separately.
> >>
> >> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss <
> dawid.weiss@cs.put.poznan.pl>
> >> wrote:
> >>>
> >>> I don't think there is one that you could use out of the box... but
> >>> maybe I'm wrong and it's stored in the header somewhere (don't have
> >>> the source in front of me).
> >>>
> >>> To calculate it by hand the worst case is that you'll need a recursive
> >>> traversal, which would mean O(number of stored states) with
> >>> intermediate count caches or O(number of keys) without any caches and
> >>> memory overhead (just recursive traversal).
> >>>
> >>> Dawid
> >>>
> >>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
> >>> <ja...@gmail.com> wrote:
> >>> > The FST class has a number of methods that return counts, which one
> >>> > returns
> >>> > the total number of keys that have been encoded into the FST?
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Count of keys of an FST

Posted by Michael McCandless <lu...@mikemccandless.com>.
I believe node and arc count are stored, but not key count.  But check
the sources to be sure!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
<da...@cs.put.poznan.pl> wrote:
> If you need the count with constant time then yes, you should store it
> separately. You could also make a transducer that would store it at
> the root node as side-effect of values associated with keys, but it's
> kind of ugly.
>
> Please check the fst header though -- I'm not sure, maybe Mike wrote
> it so that the node count/ keys count is in there.
>
> Dawid
>
> On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> Sounds like I should just count as the keys are added and store the count
>> separately.
>>
>> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss <da...@cs.put.poznan.pl>
>> wrote:
>>>
>>> I don't think there is one that you could use out of the box... but
>>> maybe I'm wrong and it's stored in the header somewhere (don't have
>>> the source in front of me).
>>>
>>> To calculate it by hand the worst case is that you'll need a recursive
>>> traversal, which would mean O(number of stored states) with
>>> intermediate count caches or O(number of keys) without any caches and
>>> memory overhead (just recursive traversal).
>>>
>>> Dawid
>>>
>>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
>>> <ja...@gmail.com> wrote:
>>> > The FST class has a number of methods that return counts, which one
>>> > returns
>>> > the total number of keys that have been encoded into the FST?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Count of keys of an FST

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
If you need the count with constant time then yes, you should store it
separately. You could also make a transducer that would store it at
the root node as side-effect of values associated with keys, but it's
kind of ugly.

Please check the fst header though -- I'm not sure, maybe Mike wrote
it so that the node count/ keys count is in there.

Dawid

On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Sounds like I should just count as the keys are added and store the count
> separately.
>
> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss <da...@cs.put.poznan.pl>
> wrote:
>>
>> I don't think there is one that you could use out of the box... but
>> maybe I'm wrong and it's stored in the header somewhere (don't have
>> the source in front of me).
>>
>> To calculate it by hand the worst case is that you'll need a recursive
>> traversal, which would mean O(number of stored states) with
>> intermediate count caches or O(number of keys) without any caches and
>> memory overhead (just recursive traversal).
>>
>> Dawid
>>
>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
>> <ja...@gmail.com> wrote:
>> > The FST class has a number of methods that return counts, which one
>> > returns
>> > the total number of keys that have been encoded into the FST?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Count of keys of an FST

Posted by Jason Rutherglen <ja...@gmail.com>.
Sounds like I should just count as the keys are added and store the count
separately.

On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> I don't think there is one that you could use out of the box... but
> maybe I'm wrong and it's stored in the header somewhere (don't have
> the source in front of me).
>
> To calculate it by hand the worst case is that you'll need a recursive
> traversal, which would mean O(number of stored states) with
> intermediate count caches or O(number of keys) without any caches and
> memory overhead (just recursive traversal).
>
> Dawid
>
> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
> > The FST class has a number of methods that return counts, which one
> returns
> > the total number of keys that have been encoded into the FST?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Count of keys of an FST

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
I don't think there is one that you could use out of the box... but
maybe I'm wrong and it's stored in the header somewhere (don't have
the source in front of me).

To calculate it by hand the worst case is that you'll need a recursive
traversal, which would mean O(number of stored states) with
intermediate count caches or O(number of keys) without any caches and
memory overhead (just recursive traversal).

Dawid

On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> The FST class has a number of methods that return counts, which one returns
> the total number of keys that have been encoded into the FST?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org