You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/04/02 01:50:00 UTC

[jira] [Commented] (LUCENE-9855) Reconsider codec name VectorFormat

    [ https://issues.apache.org/jira/browse/LUCENE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313509#comment-17313509 ] 

Julie Tibshirani commented on LUCENE-9855:
------------------------------------------

To me it seems best to avoid tying the format name to HNSW. It's very possible that we'll evolve the implementation, as ANN is a developing area. I don't think we typically mention a specific algorithm/ data structure in format names, for example {{PointsFormat}} doesn't mention BKD trees.

{{NeighborsFormat}} also doesn't feel precise to me. We support NN search on points, so it doesn't distinguish this format carefully. And in the future, it may be possible the format will offer other operations on high-dimensional vectors like radius queries?

My current favorite is {{NumericVectorsFormat}} then {{VectorValuesFormat}}. {{DenseVectorFormat}} could work too (as long as we don't add sparse high-dimensional vectors!) but I understand [~sokolov]'s concern around 'dense' having multiple meanings.

> Reconsider codec name VectorFormat
> ----------------------------------
>
>                 Key: LUCENE-9855
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9855
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: Tomoko Uchida
>            Priority: Blocker
>
> There is some discussion about the codec name for ann search.
> https://lists.apache.org/thread.html/r3a6fa29810a1e85779de72562169e72d927d5a5dd2f9ea97705b8b2e%40%3Cdev.lucene.apache.org%3E
> Main points here are 1) use plural form for consistency, and 2) use more specific name for ann search (second point could be optional).
> A few alternatives were proposed:
> - VectorsFormat
> - VectorValuesFormat
> - NeighborsFormat
> - DenseVectorsFormat



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org