You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/03/03 15:20:30 UTC

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

mayya-sharipova commented on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1058148842


   I've benchmarked the results with ann-benchmarks on glove-100-angular (M:16,  efConstruction:100)
   
   - baseline: main branch where we unset RAMBufferSizeMB, which defaults to **16Mb** with segments force merged to 1.
   - candidate: this PR, where RAMBufferSizeMB similarly is set to **16Mb**, also force merge at the end.
   
   **Indexing**
   - baseline took Built index in 1099 secs, around **18mins**
   - candidate took 586 secs, around **10 mins**
   - search performance is the same.
   
   2022-03-03T15:01:49.958373Z; main
   IW 1 [2022-03-03T15:14:33.924666Z; main]
   
   
   <details>
    <summary>Details on the search performance </summary>
   
   </details>
   
   <details>
    <summary>Details on the candidate </summary>
   
   Indexing output
   
    ```txt
   IW 0 [2022-03-03T14:30:49.413950Z; main]: init: create=true reader=null
      ramBufferSizeMB=16.0
       maxBufferedDocs=-1
   IW 0 [2022-03-03T14:30:49.424202Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1183514 documents; now flush
   IW 0 [2022-03-03T14:30:50.824200Z; main]: now flush at close
   IW 0 [2022-03-03T14:30:50.824401Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-03-03T14:30:50.824515Z; main]:   index before flush
   DW 0 [2022-03-03T14:30:50.824557Z; main]: startFullFlush
   DW 0 [2022-03-03T14:30:50.827209Z; main]: anyChanges? numDocsInRam=1183514 deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-03-03T14:30:50.831053Z; main]: flush postings as segment _0 numDocs=1183514
   HNSW 0 [2022-03-03T14:30:52.334343Z; main]: build graph from 1183514 vectors
   ...
   HNSW 0 [2022-03-03T14:40:31.049504Z; main]: built 1180000 in 5585/578724 ms
   ...
   IW 0 [2022-03-03T14:40:33.492318Z; main]: 582671 msec to write vectors
   IFD 0 [2022-03-03T14:40:34.655718Z; main]: 20 msec to checkpoint
   Indexed 1183514 documents in 585s
   Force merge index in luceneknn-100-16-100.train-16-100.index
   IFD 1 [2022-03-03T14:40:34.671943Z; main]: 0 msec to checkpoint
   Built index in 586.944657087326
   ```
   
   **Files in the index**
   
   ```txt
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0.fdm
    10080 -rw-r--r--  1 mayyasharipova  staff   4.6M  3 Mar 14:30 _0.fdt
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0_Lucene90FieldsIndex-doc_ids_0.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0_Lucene90FieldsIndexfile_pointers_1.tmp
   929304 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec
   924624 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec_temp_3.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vem
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vex
   953168 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 _0_knn_buffered_vectors_temp_2.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 write.lock
   ```
   </details>
   
   
   <details>
    <summary>Details on the baseline </summary>
   
   Indexing output
   
    ```txt
   Built index in 1099.0846738815308
   ```
   
   **Files in the index**
   
   ```txt
   drwxr-xr-x  12 mayyasharipova  staff   384B  3 Mar 15:14 .
   drwxr-xr-x  42 mayyasharipova  staff   1.3K  3 Mar 15:14 ..
   -rw-r--r--   1 mayyasharipova  staff   201B  3 Mar 15:03 _w.fdm
   -rw-r--r--   1 mayyasharipova  staff   4.6M  3 Mar 15:03 _w.fdt
   -rw-r--r--   1 mayyasharipova  staff   3.5K  3 Mar 15:03 _w.fdx
   -rw-r--r--   1 mayyasharipova  staff   192B  3 Mar 15:14 _w.fnm
   -rw-r--r--   1 mayyasharipova  staff   532B  3 Mar 15:14 _w.si
   -rw-r--r--   1 mayyasharipova  staff   451M  3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vec
   -rw-r--r--   1 mayyasharipova  staff   309K  3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vem
   -rw-r--r--   1 mayyasharipova  staff    82M  3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vex
   -rw-r--r--   1 mayyasharipova  staff   154B  3 Mar 15:14 segments_2
   -rw-r--r--   1 mayyasharipova  staff     0B  3 Mar 14:56 write.lock
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org