You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/06/08 18:47:23 UTC

[jira] [Created] (LUCENE-4123) Add CachingRAMDirectory

Michael McCandless created LUCENE-4123:
------------------------------------------

             Summary: Add CachingRAMDirectory
                 Key: LUCENE-4123
                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/store
            Reporter: Michael McCandless
            Assignee: Michael McCandless


The directory is very simple and useful if you have an index that you
know fully fits into available RAM.  You could also use FileSwitchDir if
you want to leave some files (eg stored fields or term vectors) on disk.

It wraps any other Directory and delegates all writing (IndexOutput) to
it, but for reading (IndexInput), it allocates a single byte[] and fully
reads the file in and then serves requests off that single byte[].  It's
more GC friendly than RAMDir since it only allocates a single array per
file.

It has a few nocommits still, but all tests pass if I wrap the delegate
inside MockDirectoryWrapper using this.

I tested with 1M Wikipedia english index (would like to test w/ 10M docs
but I don't have enough RAM...); it seems to give a nice speedup:

{noformat}
                Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
             Respell      197.00        7.27      203.19        8.17   -4% -   11%
            PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
              Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
              Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
       TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
              Phrase       34.85        0.40       38.75        0.61    8% -   14%
      TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
            SpanNear       53.08        1.11       59.53        2.94    4% -   20%
    TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
        SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
            Wildcard      238.10        4.29      272.78        4.97   10% -   18%
           OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
             Prefix3      288.46        8.10      350.40        5.38   16% -   26%
          OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
              IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
                Term      757.12       39.03      958.62       22.68   17% -   36%
         AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
          AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291886#comment-13291886 ] 

Michael McCandless commented on LUCENE-4123:
--------------------------------------------

bq. I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this.

I think you're right!  Here are the results w/ the code dup removed (same static seed as previous 5M doc results):

{noformat}
                Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
              IntNRQ       16.36        0.86       16.92        0.75   -6% -   14%
      TermBGroup1M1P       91.71        3.03       95.07        3.94   -3% -   11%
         TermGroup1M       58.14        1.00       60.38        1.53    0% -    8%
        TermBGroup1M      103.11        1.76      108.14        2.63    0% -    9%
             Prefix3      108.83        0.97      115.05        2.89    2% -    9%
            Wildcard       67.27        0.72       71.22        1.71    2% -    9%
             Respell      102.29        7.78      109.08        7.22   -7% -   23%
              Fuzzy2       42.46        2.95       45.51        3.31   -7% -   23%
              Fuzzy1       72.46        3.55       77.96        4.51   -3% -   19%
                Term      247.45       17.73      268.17       12.28   -3% -   22%
           OrHighMed       22.38        1.19       24.47        1.64   -3% -   23%
          OrHighHigh       18.01        0.92       19.71        1.20   -2% -   22%
         AndHighHigh       30.79        0.35       33.80        0.37    7% -   12%
            PKLookup       84.71        2.40       93.95        2.32    5% -   16%
            SpanNear       10.54        0.13       12.02        0.13   11% -   16%
          AndHighMed      119.18        1.05      136.64        1.80   12% -   17%
        SloppyPhrase       15.50        0.15       18.26        0.30   14% -   20%
              Phrase       20.64        0.12       24.94        0.48   17% -   23%
{noformat}

So I'll remove the code dup.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-4123:
---------------------------------------

    Attachment: LUCENE-4123.patch

New patch, also overriding createSlicer so that opening CFS files is more efficient.  But this resulted in a new nocommit: how to [efficiently] enforce the slice length so you get EOFE if you try to read beyond your slice ...

Tests pass.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448829#comment-13448829 ] 

Shai Erera commented on LUCENE-4123:
------------------------------------

Besides Uwe's ideas for improvements, is this Directory operable? I.e., if you chose to commit what you have accomplished so far, do tests fail? Is it safe to use?

I'm thinking "progress, not perfection" -- we can always introduce improvements later.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-4123:
---------------------------------------

    Attachment: LUCENE-4123.patch

New patch, catching AIOOBE and throwing EOFException, and removing the specialized impls.

I moved it to core temporarily to make it easier to test (add -Dtests.directory=CachingRAMDirectory).  I'll move it back to misc/ before committing ...
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-4123:
---------------------------------------

    Attachment: LUCENE-4123.patch

Thanks Robert!  That's awesome feedback ... new patch.

I also added a check in SimpleFSIndexInput.clone() to throw ACE if it was closed already.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291884#comment-13291884 ] 

Michael McCandless commented on LUCENE-4123:
--------------------------------------------

Results for 5M doc index:

{noformat}
                Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
             Respell      104.06        7.63      108.59        7.55   -9% -   20%
         TermGroup1M       57.94        1.59       60.70        0.30    1% -    8%
        TermBGroup1M      103.28        2.54      108.51        2.54    0% -   10%
              Fuzzy2       43.07        2.96       45.32        3.06   -8% -   20%
              Fuzzy1       72.64        4.73       76.92        4.38   -6% -   19%
      TermBGroup1M1P       90.14        3.03       95.95        3.81   -1% -   14%
              IntNRQ       16.01        0.95       17.17        0.33    0% -   16%
            PKLookup       86.21        2.51       92.55        2.59    1% -   13%
            Wildcard       65.51        3.13       71.00        1.45    1% -   16%
           OrHighMed       21.64        1.83       23.56        1.24   -4% -   25%
             Prefix3      105.33        4.94      114.75        2.46    1% -   16%
          OrHighHigh       17.39        1.45       18.97        0.95   -4% -   24%
         AndHighHigh       30.05        1.14       33.42        0.88    4% -   18%
                Term      243.13        9.03      273.92        8.26    5% -   20%
        SloppyPhrase       15.80        0.28       17.84        0.78    6% -   19%
            SpanNear       10.52        0.14       11.97        0.25    9% -   17%
          AndHighMed      117.60        3.54      135.91        2.49   10% -   21%
              Phrase       20.15        0.78       24.22        0.26   14% -   26%
{noformat}

                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448842#comment-13448842 ] 

Michael McCandless commented on LUCENE-4123:
--------------------------------------------

I believe it is safe ... eg all tests pass if I wrap MDW's delegate w/ this in newDirectory ...

I'll update the patch w/ Uwe and Robert's suggestions ...
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291857#comment-13291857 ] 

Simon Willnauer commented on LUCENE-4123:
-----------------------------------------

bq.I tested with 1M Wikipedia english index (would like to test w/ 10M docs
but I don't have enough RAM...); it seems to give a nice speedup:

#fail! :)
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404452#comment-13404452 ] 

Uwe Schindler commented on LUCENE-4123:
---------------------------------------

When thinking more about the patch:
Can we make this IndexInput impl extend ByteArrayDataInput somehow? I would also like to fix ByteArrayDataInput to correctly rethrow AIOOBE and remove the vint methods. We already did tests with FSTs that showed that the code duplication is not helpful.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449931#comment-13449931 ] 

Michael McCandless commented on LUCENE-4123:
--------------------------------------------

bq. I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files > 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size.

Actually I think the two dirs have different use cases.

So I think we should do both: 1) fix RAMDir to do better buffering
(LUCENE-3659) and 2) add this new dir.

RAMDir is good for pure in-memory indices (for testing, or transient
usage, etc.) or for pulling in a read-only index from disk, while
CachingRAMDir (I think we should rename it to CachingDirWrapper) is
good if you want to write to the index but also want persistence,
since all writes go straight to the wrapped directory.

I don't think the limitations of this dir (max 2.1 GB file size) need
to block committing ... the javadocs call this out, and we can improve
it later.  It could be wrapping the byte[] in ByteBuffer and using
ByteBufferII doesn't lose any perf: that would be great. But we can
explore that after committing.

But definitely +1 to get LUCENE-3659 in...

                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448884#comment-13448884 ] 

Robert Muir commented on LUCENE-4123:
-------------------------------------

also readBytes should not catch ArrayIndexOutOfBoundsException. it must be the more general IndexOutOfBoundsException.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449225#comment-13449225 ] 

Uwe Schindler commented on LUCENE-4123:
---------------------------------------

Mike,
I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files > 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size.

We should really get LUCENE-3659 in. The only missing parts are:
- make RAMFile visible to ConcurrentMap after IndexOutput is closed, so we dont need synchronization on RAMFile
- use maybe Robert's cool ByteBufferIndexInput from LUCENE-4364
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404429#comment-13404429 ] 

Uwe Schindler commented on LUCENE-4123:
---------------------------------------

You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE and rethrow as EOFException (just copy the code). This does not have any speed effect. Otherwise some tests will definitely fail.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291860#comment-13291860 ] 

Robert Muir commented on LUCENE-4123:
-------------------------------------

I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this.
                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-4123:
---------------------------------------

    Attachment: LUCENE-4123.patch
    
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448870#comment-13448870 ] 

Robert Muir commented on LUCENE-4123:
-------------------------------------

looks good... i dont really like that close() is a no-op and that seek() has no checks (since its deferred, if you seek somewhere negative you wont know until later).

you could probably fix both of these, e.g. keep the byte[] final but let close() turn set the position negative, catch NegativeArray and throw ACE.
then just throw IAE on seek if the incoming long is negative at least, since you reserve it to mean closed.

I also don't like that its a delegator.

should the underlying read check for BufferedII and pass useBuffer=false?



                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -   11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -    8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -   13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -   14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -   16%
>               Phrase       34.85        0.40       38.75        0.61    8% -   14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -   20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -   20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -   21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -   23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -   18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -   29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -   26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -   31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -   32%
>                 Term      757.12       39.03      958.62       22.68   17% -   36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -   39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -   40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org