You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Klaus Schaefers <kl...@gmail.com> on 2017/12/13 09:28:48 UTC

By pass text processing

Hi,

I would like to build an extension to use lucene for image retrieval. I
would present each image as a binary vector (visual bag of words). For now
I can construct a string like "F1 F2 F10..." to insert my bit vector into
lucene. Off course this adds quite some overhead, so I was wondering if I
can directly write into the underlying storage engines...?

Cheers,

Klaus

-- 
“Overfitting” is not about an excessive amount of physical exercise...

Re: By pass text processing

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Klaus,

Don't you use clustering and quantize vectors to make visual bag of words?
If you do these, I don't think you need to worry about overhead to store vectors to Lucene
because the number of clusters can be the ceiling of the number of words.

I used this technique in Apache alike which is a part of Apache Labs[1].
Apache alike uses Mahout for clustering of visual descriptors and Lucene for searching
similar pictures. The architecture can be found at [2].

Koji

[1] http://labs.apache.org/labs.html
[2] http://svn.apache.org/repos/asf/labs/alike/trunk/alike-architecture.pptx


On 2017/12/13 18:28, Klaus Schaefers wrote:
> Hi,
> 
> I would like to build an extension to use lucene for image retrieval. I would present each image as 
> a binary vector (visual bag of words). For now I can construct a string like "F1 F2 F10..." to 
> insert my bit vector into lucene. Off course this adds quite some overhead, so I was wondering if I 
> can directly write into the underlying storage engines...?
> 
> Cheers,
> 
> Klaus
> 
> -- 
> “Overfitting” is not about an excessive amount of physical exercise...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: By pass text processing

Posted by Adrien Grand <jp...@gmail.com>.
The StringField class is probably what you are after.

NOTE: please use the java-user list for such questions in the future.

Le mer. 13 déc. 2017 à 10:28, Klaus Schaefers <kl...@gmail.com> a
écrit :

> Hi,
>
> I would like to build an extension to use lucene for image retrieval. I
> would present each image as a binary vector (visual bag of words). For now
> I can construct a string like "F1 F2 F10..." to insert my bit vector into
> lucene. Off course this adds quite some overhead, so I was wondering if I
> can directly write into the underlying storage engines...?
>
> Cheers,
>
> Klaus
>
>
> --
> “Overfitting” is not about an excessive amount of physical exercise...
>