You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2016/07/22 15:51:20 UTC

[jira] [Commented] (LUCENE-7390) Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's indexing buffer

    [ https://issues.apache.org/jira/browse/LUCENE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389727#comment-15389727 ] 

Robert Muir commented on LUCENE-7390:
-------------------------------------

+1

I have a little concern about this being fairly sizeable amount of ram, but i dont know if its worth the effort to e.g. compute this somewhere else, reserve the space away, pass thru to PointValuesWriter, increase the default rambuffer (else we reserve the whole thing by default), and so on. Seems messy no matter how I look at it.

It is a little annoying that performance is so sensitive to this change, we should look into that more somehow. Maybe we can improve it so it does not need so much RAM.

> Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's indexing buffer
> ---------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7390
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7390.patch
>
>
> With Lucene's default codec, when writing dimensional points, we only give {{BKDWriter}} 16 MB heap to use for sorting, regardless of how large IW's indexing buffer is.  A custom codec can change this but that's a little steep.
> I've been testing indexing performance on a points-heavy dataset, 1.2 billion taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear performance problems because flushing the large segments forces {{BKDwriter}} to switch to offline sorting which causes the DWPTs take too long to flush.  They then fall behind, and Lucene does a hard stall on incoming indexing threads until they catch up.
> [~rcmuir] had a simple idea to let IW pass the allowed temp heap usage to {{PointsWriter.writeField}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org