You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by MyCoy Z <my...@gmail.com> on 2023/07/11 18:42:22 UTC

Benefits of using bytes vector for HNSW

Hi, Lucene Dev Community:

I'm wondering what benefits we could get by indexing the byte-vectors to
build an HNSW rather than using the floats.

I can think of storage and performance improvements.
However, due to some internal platform limitations, we cannot actually try
to build such a graph on production data.

So it would be great if anyone could provide some industrial experience,
for example how much storage can be saved and how much performance can be
improved?

Thanks

Re: Benefits of using bytes vector for HNSW

Posted by Alessandro Benedetti <a....@sease.io>.
Hi !
You are on the spot, you deal with data 4 times smaller (of course also
able to represent 1/4 of the information).
But if you are ok with that you may achieve a lighter memory footprint (not
4 times lighter as there are a lot of boilerplate structures as well, but
still a decent improvement).

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Tue, 11 Jul 2023 at 20:42, MyCoy Z <my...@gmail.com> wrote:

> Hi, Lucene Dev Community:
>
> I'm wondering what benefits we could get by indexing the byte-vectors to
> build an HNSW rather than using the floats.
>
> I can think of storage and performance improvements.
> However, due to some internal platform limitations, we cannot actually try
> to build such a graph on production data.
>
> So it would be great if anyone could provide some industrial experience,
> for example how much storage can be saved and how much performance can be
> improved?
>
> Thanks
>