You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Klaus Schaefers <kl...@gmail.com> on 2017/12/13 09:28:48 UTC
By pass text processing
Hi,
I would like to build an extension to use lucene for image retrieval. I
would present each image as a binary vector (visual bag of words). For now
I can construct a string like "F1 F2 F10..." to insert my bit vector into
lucene. Off course this adds quite some overhead, so I was wondering if I
can directly write into the underlying storage engines...?
Cheers,
Klaus
--
“Overfitting” is not about an excessive amount of physical exercise...
Re: By pass text processing
Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Klaus,
Don't you use clustering and quantize vectors to make visual bag of words?
If you do these, I don't think you need to worry about overhead to store vectors to Lucene
because the number of clusters can be the ceiling of the number of words.
I used this technique in Apache alike which is a part of Apache Labs[1].
Apache alike uses Mahout for clustering of visual descriptors and Lucene for searching
similar pictures. The architecture can be found at [2].
Koji
[1] http://labs.apache.org/labs.html
[2] http://svn.apache.org/repos/asf/labs/alike/trunk/alike-architecture.pptx
On 2017/12/13 18:28, Klaus Schaefers wrote:
> Hi,
>
> I would like to build an extension to use lucene for image retrieval. I would present each image as
> a binary vector (visual bag of words). For now I can construct a string like "F1 F2 F10..." to
> insert my bit vector into lucene. Off course this adds quite some overhead, so I was wondering if I
> can directly write into the underlying storage engines...?
>
> Cheers,
>
> Klaus
>
> --
> “Overfitting” is not about an excessive amount of physical exercise...
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: By pass text processing
Posted by Adrien Grand <jp...@gmail.com>.
The StringField class is probably what you are after.
NOTE: please use the java-user list for such questions in the future.
Le mer. 13 déc. 2017 à 10:28, Klaus Schaefers <kl...@gmail.com> a
écrit :
> Hi,
>
> I would like to build an extension to use lucene for image retrieval. I
> would present each image as a binary vector (visual bag of words). For now
> I can construct a string like "F1 F2 F10..." to insert my bit vector into
> lucene. Off course this adds quite some overhead, so I was wondering if I
> can directly write into the underlying storage engines...?
>
> Cheers,
>
> Klaus
>
>
> --
> “Overfitting” is not about an excessive amount of physical exercise...
>