You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Chris Spencer <ch...@gmail.com> on 2011/04/06 23:53:19 UTC

Indexing Non-Textual Data

Hi,

I'm new to PyLucene, so forgive me if this is a newbie question. I have a
dataset composed of several thousand lists of 128 integer features, each
list associated with a class label. Would it be possible to use Lucene as a
classifier, by indexing the label with respect to these integer features,
and then classify a new list by finding the most similar labels with Lucene?

I've been going through the PyLucene samples, but they only seem to involve
indexing text, not continuous features (understandably). Could anyone point
me to an example that indexes non-textual data?

I think the project Lire (http://www.semanticmetadata.net/lire/) is using
Lucene to do something similar to this, although with an emphasis on image
features. I've dug into their code a little, but I'm not a strong Java
programmer, so I'm not sure how they're pulling it off, nor how I might
translate this into the PyLucene API. In your opinion, is this a practical
use of Lucene?

Regards,
Chris

Re: Indexing Non-Textual Data

Posted by Andi Vajda <va...@apache.org>.
  Hi,

On Wed, 6 Apr 2011, Chris Spencer wrote:

> I'm new to PyLucene, so forgive me if this is a newbie question. I have a
> dataset composed of several thousand lists of 128 integer features, each
> list associated with a class label. Would it be possible to use Lucene as a
> classifier, by indexing the label with respect to these integer features,
> and then classify a new list by finding the most similar labels with Lucene?

I believe there is support in Lucene for indexing numeric values using a 
Trie. Please ask on java-user@lucene.apache.org (subscribe first by sending 
mail to jave-user-subscribe@lucene.apache.org). There are many more Lucene 
experts with answers there.

For example, this class may be relevant:
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/document/NumericField.html

Andi..

>
> I've been going through the PyLucene samples, but they only seem to involve
> indexing text, not continuous features (understandably). Could anyone point
> me to an example that indexes non-textual data?
>
> I think the project Lire (http://www.semanticmetadata.net/lire/) is using
> Lucene to do something similar to this, although with an emphasis on image
> features. I've dug into their code a little, but I'm not a strong Java
> programmer, so I'm not sure how they're pulling it off, nor how I might
> translate this into the PyLucene API. In your opinion, is this a practical
> use of Lucene?
>
> Regards,
> Chris
>