You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2010/03/12 13:51:52 UTC

Question on number of fields in a document

Hello there! We are indexing metadata for our medias. One ideia is that each
user adds its own metadata, so each document may have different
number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
the this relax approach.

Also, considering that each user may define its own metadata, we may have
several different types of fields. Is there a limit for this?

Regards

-- 
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.

RE: Question on number of fields in a document

Posted by Uwe Schindler <uw...@thetaphi.de>.
You get memory problems if you turn on norms for all those fields (as norms are large byte[] arrays per field). But this is not a hard limitation, but you should take care.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, March 12, 2010 2:43 PM
> To: java-user@lucene.apache.org
> Subject: Re: Question on number of fields in a document
> 
> There's no requirement that all documents have the same
> fields, Lucene is fine with different docs having different
> fields.
> 
> There's no limit on the number of different fields allowed
> that I know of, but I'm sure someone will chime in if there
> is....
> 
> HTH
> Erick
> 
> On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho <
> viniciusccarvalho@gmail.com> wrote:
> 
> > Hello there! We are indexing metadata for our medias. One ideia is
> that
> > each
> > user adds its own metadata, so each document may have different
> > number/name/type of fields. Is this ok on Lucene? I mean, is Lucene
> ok with
> > the this relax approach.
> >
> > Also, considering that each user may define its own metadata, we may
> have
> > several different types of fields. Is there a limit for this?
> >
> > Regards
> >
> > --
> > The intuitive mind is a sacred gift and the
> > rational mind is a faithful servant. We have
> > created a society that honors the servant and
> > has forgotten the gift.
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on number of fields in a document

Posted by Renaud Delbru <re...@deri.org>.
There is some bottleneck when you have a large number of fields and of 
words. Each field has its own list of terms which means that the 
dictionary, in the worst case, could be of size n*m (with n the number 
of fields, and m the number of terms).
This can lead to some overhead when looking up a term in the case where 
n and m is large. (Term lookup occurs for each keyword in a query).

Another problem (for the end user) of using an arbitrary number of 
fields is that the user will have to know exactly which field names to 
query. By default, Lucene cannot search efficiently on an arbitrary 
number of fields, unless you create a "content" field that you will use 
to index the values from all the fields. This will duplicate the data 
inside the index (even if it is cheap to index two times the same data, 
it can be problematic for very large index).

We have released recently a plugin for Lucene (SIREn [1]) that tackles 
such particular problem. It has been developped initially to create a 
search engine for RDF data (standard model for data interchange on the 
web). It allows to index an arbitrary number of fields without facing 
the two previous problems, but also to keep web scale performance. In 
addition, it allows to use keyword search on the field names, and better 
support of multi-valued fields.

I think the best it to give try, do a benchmark using Lucene and SIREn, 
and see which one answers more your needs (in term of response time, and 
also on search capabilities). If your index stays relatively small (few 
thousands or maybe millions of documents), then maybe Lucene is a good 
choice, but if your expect to have a large index (millions of documents) 
with an arbitrary number of fields (thousands or even more like tens of 
thousands), then maybe SIREn will be more suitable.

[1] http://siren.sindice.com/
-- 
Renaud Delbru

On 12/03/10 13:43, Erick Erickson wrote:
> There's no requirement that all documents have the same
> fields, Lucene is fine with different docs having different
> fields.
>
> There's no limit on the number of different fields allowed
> that I know of, but I'm sure someone will chime in if there
> is....
>
> HTH
> Erick
>
> On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho<
> viniciusccarvalho@gmail.com>  wrote:
>
>    
>> Hello there! We are indexing metadata for our medias. One ideia is that
>> each
>> user adds its own metadata, so each document may have different
>> number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
>> the this relax approach.
>>
>> Also, considering that each user may define its own metadata, we may have
>> several different types of fields. Is there a limit for this?
>>
>> Regards
>>
>> --
>> The intuitive mind is a sacred gift and the
>> rational mind is a faithful servant. We have
>> created a society that honors the servant and
>> has forgotten the gift.
>>
>>      
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on number of fields in a document

Posted by Erick Erickson <er...@gmail.com>.
There's no requirement that all documents have the same
fields, Lucene is fine with different docs having different
fields.

There's no limit on the number of different fields allowed
that I know of, but I'm sure someone will chime in if there
is....

HTH
Erick

On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

> Hello there! We are indexing metadata for our medias. One ideia is that
> each
> user adds its own metadata, so each document may have different
> number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
> the this relax approach.
>
> Also, considering that each user may define its own metadata, we may have
> several different types of fields. Is there a limit for this?
>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.
>