You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by MyCoy Z <my...@gmail.com> on 2023/07/11 18:42:22 UTC
Benefits of using bytes vector for HNSW
Hi, Lucene Dev Community:
I'm wondering what benefits we could get by indexing the byte-vectors to
build an HNSW rather than using the floats.
I can think of storage and performance improvements.
However, due to some internal platform limitations, we cannot actually try
to build such a graph on production data.
So it would be great if anyone could provide some industrial experience,
for example how much storage can be saved and how much performance can be
improved?
Thanks
Re: Benefits of using bytes vector for HNSW
Posted by Alessandro Benedetti <a....@sease.io>.
Hi !
You are on the spot, you deal with data 4 times smaller (of course also
able to represent 1/4 of the information).
But if you are ok with that you may achieve a lighter memory footprint (not
4 times lighter as there are a lot of boilerplate structures as well, but
still a decent improvement).
Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*
e-mail: a.benedetti@sease.io
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
On Tue, 11 Jul 2023 at 20:42, MyCoy Z <my...@gmail.com> wrote:
> Hi, Lucene Dev Community:
>
> I'm wondering what benefits we could get by indexing the byte-vectors to
> build an HNSW rather than using the floats.
>
> I can think of storage and performance improvements.
> However, due to some internal platform limitations, we cannot actually try
> to build such a graph on production data.
>
> So it would be great if anyone could provide some industrial experience,
> for example how much storage can be saved and how much performance can be
> improved?
>
> Thanks
>