You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2012/02/01 18:28:58 UTC

[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED

     [ https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3729:
---------------------------------------

    Attachment: LUCENE-3729.patch

New patch, still just prototyping on FC, but now all tests pass.

I enabled packing and the wikipedia title data is now ~43.8% smaller than what FC does today (PagedBytes + PackedInts).

Results are about the same as before:
{noformat}
            PKLookup      127.87        2.68      115.22        5.37  -15% -   -3%
       TermTitleSort       69.77        4.32       64.91        2.62  -15% -    3%
        TermBGroup1M       35.85        1.03       34.49        0.92   -8% -    1%
         TermGroup1M       26.58        0.72       26.13        0.38   -5% -    2%
             Respell       75.04        2.63       74.28        1.33   -6% -    4%
              Fuzzy1       86.35        1.93       86.27        1.34   -3% -    3%
              Phrase       18.92        0.57       18.94        0.57   -5% -    6%
            SpanNear        1.46        0.02        1.46        0.05   -4% -    5%
        SloppyPhrase       15.85        0.69       15.93        0.69   -7% -    9%
              Fuzzy2       31.37        0.61       31.65        0.53   -2% -    4%
      TermBGroup1M1P       44.99        1.32       45.47        0.74   -3% -    5%
          AndHighMed       40.22        1.00       41.43        0.32    0% -    6%
            Wildcard       26.11        1.15       27.15        0.21   -1% -    9%
          OrHighHigh        6.14        0.42        6.40        0.34   -7% -   17%
           OrHighMed       10.65        0.72       11.10        0.60   -7% -   17%
         AndHighHigh        9.16        0.33        9.56        0.04    0% -    8%
             Prefix3       43.07        2.32       45.34        0.55   -1% -   12%
                Term       34.11        1.60       36.09        1.19   -2% -   14%
              IntNRQ        7.66        0.64        8.22        0.57   -7% -   25%
{noformat}
                
> Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
> --------------------------------------------------------------
>
>                 Key: LUCENE-3729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3729
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-3729.patch, LUCENE-3729.patch, LUCENE-3729.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org