You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mandy Günther <ma...@googlemail.com> on 2009/02/11 19:05:45 UTC

What's the best way to store metadata?

Hi all,
I want to use Lucene in my project but I have the following problem:
My goal is to store metadata in the index.

Her is an example of the stuff I need to index

1; My; determiner;			
2; project; noun; 1
3; uses; verb; 1
4; Lucene; noun; 1
6; I; noun;		
7; need; verb; 6
8; help; noun; 6

It's a file containing rows with 4 columns.
The first column is a sequential number. The second column is the text
that need to be indexed. In my original file it's more than one word.
The third column
classifies the second column and the fourth column tells me where the
sentence of the word in the second column starts.
I will have large datafiles(up to 30GB) to index. Indexing and
searching performance is very important.
I tried to store the meta data(column no. 3 and 4) over the field names, e.g.

doc.add(new Field("1|determiner", My, ...));
doc.add(new Field("1|noun", project, ...));
...
doc.add(new Field("2|noun", I, ...));

But that's not a classy way!

I was thinking about using Payloads. But the API says the Payload
Status is experimantal and the performance is getting bad using
payloads.
I am not even sure if Payloads are meant to be used to store that kind of data.
Is it possible to store that kind of metadata as payload?
Is there another way to store the meta data. Should I use a database
to store the extra information?
Is somebody there who had the same problem or has experience with Payloads?
So what would be the best way to store metadata?

Thanks for your advice.
Mandy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What's the best way to store metadata?

Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 11, 2009, at 1:05 PM, Mandy Günther wrote:

> Hi all,
> I want to use Lucene in my project but I have the following problem:
> My goal is to store metadata in the index.
>
> Her is an example of the stuff I need to index
>
> 1; My; determiner;			
> 2; project; noun; 1
> 3; uses; verb; 1
> 4; Lucene; noun; 1
> 6; I; noun;		
> 7; need; verb; 6
> 8; help; noun; 6
>
> It's a file containing rows with 4 columns.
> The first column is a sequential number. The second column is the text
> that need to be indexed. In my original file it's more than one word.
> The third column
> classifies the second column and the fourth column tells me where the
> sentence of the word in the second column starts.
> I will have large datafiles(up to 30GB) to index. Indexing and
> searching performance is very important.
> I tried to store the meta data(column no. 3 and 4) over the field  
> names, e.g.
>
> doc.add(new Field("1|determiner", My, ...));
> doc.add(new Field("1|noun", project, ...));
> ...
> doc.add(new Field("2|noun", I, ...));
>
> But that's not a classy way!
>
> I was thinking about using Payloads. But the API says the Payload
> Status is experimantal and the performance is getting bad using
> payloads.
>
> I am not even sure if Payloads are meant to be used to store that  
> kind of data.
> Is it possible to store that kind of metadata as payload?

I think it makes sense to store these as payloads.  You might even  
look at the 2.9-dev trunk version for the new AttributeSource  
capabilities that allow you to, essentially, strongly type payloads  
and provide custom serialization capabilities.

As for the "experimental" phase of payloads, I'd say they are here to  
stay, it's just that the exact methods may change.

How do you plan to search using the info?  Note, that the new  
AttributeSource stuff is not yet searchable, whereas the Payload stuff  
is.  Performance, I would think, should be roughly equivalent to using  
Spans, but don't quote me on it.

HTH,
Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org