You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/06/08 18:47:23 UTC
[jira] [Created] (LUCENE-4123) Add CachingRAMDirectory
Michael McCandless created LUCENE-4123:
------------------------------------------
Summary: Add CachingRAMDirectory
Key: LUCENE-4123
URL: https://issues.apache.org/jira/browse/LUCENE-4123
Project: Lucene - Java
Issue Type: Bug
Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
The directory is very simple and useful if you have an index that you
know fully fits into available RAM. You could also use FileSwitchDir if
you want to leave some files (eg stored fields or term vectors) on disk.
It wraps any other Directory and delegates all writing (IndexOutput) to
it, but for reading (IndexInput), it allocates a single byte[] and fully
reads the file in and then serves requests off that single byte[]. It's
more GC friendly than RAMDir since it only allocates a single array per
file.
It has a few nocommits still, but all tests pass if I wrap the delegate
inside MockDirectoryWrapper using this.
I tested with 1M Wikipedia english index (would like to test w/ 10M docs
but I don't have enough RAM...); it seems to give a nice speedup:
{noformat}
Task QPS base StdDev base QPS cachedStdDev cached Pct diff
Respell 197.00 7.27 203.19 8.17 -4% - 11%
PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
Phrase 34.85 0.40 38.75 0.61 8% - 14%
TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
SpanNear 53.08 1.11 59.53 2.94 4% - 20%
TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
Wildcard 238.10 4.29 272.78 4.97 10% - 18%
OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
Prefix3 288.46 8.10 350.40 5.38 16% - 26%
OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
Term 757.12 39.03 958.62 22.68 17% - 36%
AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291886#comment-13291886 ]
Michael McCandless commented on LUCENE-4123:
--------------------------------------------
bq. I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this.
I think you're right! Here are the results w/ the code dup removed (same static seed as previous 5M doc results):
{noformat}
Task QPS base StdDev base QPS cachedStdDev cached Pct diff
IntNRQ 16.36 0.86 16.92 0.75 -6% - 14%
TermBGroup1M1P 91.71 3.03 95.07 3.94 -3% - 11%
TermGroup1M 58.14 1.00 60.38 1.53 0% - 8%
TermBGroup1M 103.11 1.76 108.14 2.63 0% - 9%
Prefix3 108.83 0.97 115.05 2.89 2% - 9%
Wildcard 67.27 0.72 71.22 1.71 2% - 9%
Respell 102.29 7.78 109.08 7.22 -7% - 23%
Fuzzy2 42.46 2.95 45.51 3.31 -7% - 23%
Fuzzy1 72.46 3.55 77.96 4.51 -3% - 19%
Term 247.45 17.73 268.17 12.28 -3% - 22%
OrHighMed 22.38 1.19 24.47 1.64 -3% - 23%
OrHighHigh 18.01 0.92 19.71 1.20 -2% - 22%
AndHighHigh 30.79 0.35 33.80 0.37 7% - 12%
PKLookup 84.71 2.40 93.95 2.32 5% - 16%
SpanNear 10.54 0.13 12.02 0.13 11% - 16%
AndHighMed 119.18 1.05 136.64 1.80 12% - 17%
SloppyPhrase 15.50 0.15 18.26 0.30 14% - 20%
Phrase 20.64 0.12 24.94 0.48 17% - 23%
{noformat}
So I'll remove the code dup.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4123:
---------------------------------------
Attachment: LUCENE-4123.patch
New patch, also overriding createSlicer so that opening CFS files is more efficient. But this resulted in a new nocommit: how to [efficiently] enforce the slice length so you get EOFE if you try to read beyond your slice ...
Tests pass.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448829#comment-13448829 ]
Shai Erera commented on LUCENE-4123:
------------------------------------
Besides Uwe's ideas for improvements, is this Directory operable? I.e., if you chose to commit what you have accomplished so far, do tests fail? Is it safe to use?
I'm thinking "progress, not perfection" -- we can always introduce improvements later.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4123:
---------------------------------------
Attachment: LUCENE-4123.patch
New patch, catching AIOOBE and throwing EOFException, and removing the specialized impls.
I moved it to core temporarily to make it easier to test (add -Dtests.directory=CachingRAMDirectory). I'll move it back to misc/ before committing ...
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4123:
---------------------------------------
Attachment: LUCENE-4123.patch
Thanks Robert! That's awesome feedback ... new patch.
I also added a check in SimpleFSIndexInput.clone() to throw ACE if it was closed already.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291884#comment-13291884 ]
Michael McCandless commented on LUCENE-4123:
--------------------------------------------
Results for 5M doc index:
{noformat}
Task QPS base StdDev base QPS cachedStdDev cached Pct diff
Respell 104.06 7.63 108.59 7.55 -9% - 20%
TermGroup1M 57.94 1.59 60.70 0.30 1% - 8%
TermBGroup1M 103.28 2.54 108.51 2.54 0% - 10%
Fuzzy2 43.07 2.96 45.32 3.06 -8% - 20%
Fuzzy1 72.64 4.73 76.92 4.38 -6% - 19%
TermBGroup1M1P 90.14 3.03 95.95 3.81 -1% - 14%
IntNRQ 16.01 0.95 17.17 0.33 0% - 16%
PKLookup 86.21 2.51 92.55 2.59 1% - 13%
Wildcard 65.51 3.13 71.00 1.45 1% - 16%
OrHighMed 21.64 1.83 23.56 1.24 -4% - 25%
Prefix3 105.33 4.94 114.75 2.46 1% - 16%
OrHighHigh 17.39 1.45 18.97 0.95 -4% - 24%
AndHighHigh 30.05 1.14 33.42 0.88 4% - 18%
Term 243.13 9.03 273.92 8.26 5% - 20%
SloppyPhrase 15.80 0.28 17.84 0.78 6% - 19%
SpanNear 10.52 0.14 11.97 0.25 9% - 17%
AndHighMed 117.60 3.54 135.91 2.49 10% - 21%
Phrase 20.15 0.78 24.22 0.26 14% - 26%
{noformat}
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448842#comment-13448842 ]
Michael McCandless commented on LUCENE-4123:
--------------------------------------------
I believe it is safe ... eg all tests pass if I wrap MDW's delegate w/ this in newDirectory ...
I'll update the patch w/ Uwe and Robert's suggestions ...
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291857#comment-13291857 ]
Simon Willnauer commented on LUCENE-4123:
-----------------------------------------
bq.I tested with 1M Wikipedia english index (would like to test w/ 10M docs
but I don't have enough RAM...); it seems to give a nice speedup:
#fail! :)
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404452#comment-13404452 ]
Uwe Schindler commented on LUCENE-4123:
---------------------------------------
When thinking more about the patch:
Can we make this IndexInput impl extend ByteArrayDataInput somehow? I would also like to fix ByteArrayDataInput to correctly rethrow AIOOBE and remove the vint methods. We already did tests with FSTs that showed that the code duplication is not helpful.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449931#comment-13449931 ]
Michael McCandless commented on LUCENE-4123:
--------------------------------------------
bq. I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files > 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size.
Actually I think the two dirs have different use cases.
So I think we should do both: 1) fix RAMDir to do better buffering
(LUCENE-3659) and 2) add this new dir.
RAMDir is good for pure in-memory indices (for testing, or transient
usage, etc.) or for pulling in a read-only index from disk, while
CachingRAMDir (I think we should rename it to CachingDirWrapper) is
good if you want to write to the index but also want persistence,
since all writes go straight to the wrapped directory.
I don't think the limitations of this dir (max 2.1 GB file size) need
to block committing ... the javadocs call this out, and we can improve
it later. It could be wrapping the byte[] in ByteBuffer and using
ByteBufferII doesn't lose any perf: that would be great. But we can
explore that after committing.
But definitely +1 to get LUCENE-3659 in...
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448884#comment-13448884 ]
Robert Muir commented on LUCENE-4123:
-------------------------------------
also readBytes should not catch ArrayIndexOutOfBoundsException. it must be the more general IndexOutOfBoundsException.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449225#comment-13449225 ]
Uwe Schindler commented on LUCENE-4123:
---------------------------------------
Mike,
I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files > 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size.
We should really get LUCENE-3659 in. The only missing parts are:
- make RAMFile visible to ConcurrentMap after IndexOutput is closed, so we dont need synchronization on RAMFile
- use maybe Robert's cool ByteBufferIndexInput from LUCENE-4364
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404429#comment-13404429 ]
Uwe Schindler commented on LUCENE-4123:
---------------------------------------
You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE and rethrow as EOFException (just copy the code). This does not have any speed effect. Otherwise some tests will definitely fail.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291860#comment-13291860 ]
Robert Muir commented on LUCENE-4123:
-------------------------------------
I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this.
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4123:
---------------------------------------
Attachment: LUCENE-4123.patch
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448870#comment-13448870 ]
Robert Muir commented on LUCENE-4123:
-------------------------------------
looks good... i dont really like that close() is a no-op and that seek() has no checks (since its deferred, if you seek somewhere negative you wont know until later).
you could probably fix both of these, e.g. keep the byte[] final but let close() turn set the position negative, catch NegativeArray and throw ACE.
then just throw IAE on seek if the incoming long is negative at least, since you reserve it to mean closed.
I also don't like that its a delegator.
should the underlying read check for BufferedII and pass useBuffer=false?
> Add CachingRAMDirectory
> -----------------------
>
> Key: LUCENE-4123
> URL: https://issues.apache.org/jira/browse/LUCENE-4123
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4123.patch, LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM. You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[]. It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
> Task QPS base StdDev base QPS cachedStdDev cached Pct diff
> Respell 197.00 7.27 203.19 8.17 -4% - 11%
> PKLookup 121.12 2.80 125.46 3.20 -1% - 8%
> Fuzzy2 66.62 2.62 69.91 2.85 -3% - 13%
> Fuzzy1 206.20 6.47 222.21 6.52 1% - 14%
> TermGroup100K 160.14 6.62 175.71 3.79 3% - 16%
> Phrase 34.85 0.40 38.75 0.61 8% - 14%
> TermBGroup100K 363.75 15.74 406.98 13.23 3% - 20%
> SpanNear 53.08 1.11 59.53 2.94 4% - 20%
> TermBGroup100K1P 222.53 9.78 252.86 5.96 6% - 21%
> SloppyPhrase 70.36 2.05 79.95 4.48 4% - 23%
> Wildcard 238.10 4.29 272.78 4.97 10% - 18%
> OrHighMed 123.49 4.85 149.32 4.66 12% - 29%
> Prefix3 288.46 8.10 350.40 5.38 16% - 26%
> OrHighHigh 76.46 3.27 93.13 2.96 13% - 31%
> IntNRQ 92.25 2.12 113.47 5.74 14% - 32%
> Term 757.12 39.03 958.62 22.68 17% - 36%
> AndHighHigh 103.03 4.48 133.89 3.76 21% - 39%
> AndHighMed 376.36 16.58 493.99 10.00 23% - 40%
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org