You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/05/12 13:38:15 UTC

[jira] [Created] (LUCENE-5667) Optimize common-prefix across all terms in a field

Michael McCandless created LUCENE-5667:
------------------------------------------

             Summary: Optimize common-prefix across all terms in a field
                 Key: LUCENE-5667
                 URL: https://issues.apache.org/jira/browse/LUCENE-5667
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.9, 5.0


I tested different UUID sources in Lucene
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
and I was surprised to see that Flake IDs were slower than UUID V1.
They use the same raw sources of info (timestamp, node id, sequence
counter) but Flake ID preserves total order by keeping the timestamp
"intact" in the leading 64 bits.

I think the reason might be because a Flake ID will typically have a
longish common prefix for all docs, and I think we might be able to
optimize this in block-tree by storing that common prefix outside of
the FST, or maybe just pre-computing the common prefix on init and
storing the "effective" start node for the FST.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org