You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/07/22 09:25:21 UTC
[jira] [Created] (LUCENE-7390) Let BKDWriter use temp heap for
sorting points in proportion to IndexWriter's indexing buffer
Michael McCandless created LUCENE-7390:
------------------------------------------
Summary: Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's indexing buffer
Key: LUCENE-7390
URL: https://issues.apache.org/jira/browse/LUCENE-7390
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Fix For: master (7.0), 6.2
With Lucene's default codec, when writing dimensional points, we only give {{BKDWriter}} 16 MB heap to use for sorting, regardless of how large IW's indexing buffer is. A custom codec can change this but that's a little steep.
I've been testing indexing performance on a points-heavy dataset, 1.2 billion taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear performance problems because flushing the large segments forces {{BKDwriter}} to switch to offline sorting which causes the DWPTs take too long to flush. They then fall behind, and Lucene does a hard stall on incoming indexing threads until they catch up.
[~rcmuir] had a simple idea to let IW pass the allowed temp heap usage to {{PointsWriter.writeField}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org