You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "LuYunCheng (Jira)" <ji...@apache.org> on 2022/06/27 13:09:00 UTC
[jira] [Created] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy
LuYunCheng created LUCENE-10627:
-----------------------------------
Summary: Using CompositeByteBuf to Reduce Memory Copy
Key: LUCENE-10627
URL: https://issues.apache.org/jira/browse/LUCENE-10627
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/store
Reporter: LuYunCheng
I see When Lucene Do flush and merge store fields, need many memory copies:
{code:java}
Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable [0x00007f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code}
When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies:
With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
# bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
# compressor copy dict and data into one block buffer
# do compress
# copy compressed data out
With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
# bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress
# do compress
# copy compressed data out
I think we can use CompositeByteBuf to reduce temp memory copies:
# we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress
I write a simple mini benchamrk in test code:
*LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms
*DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:115ms, New elapse:12ms
And I run runStoredFieldsBenchmark with doc_limit=-1:
shows:
||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
|Baseline|318877.00|606288.00|
|Candidate|314442.00|604719.00|
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org