You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Travis Low <tl...@4centurion.com> on 2011/11/01 18:26:32 UTC

Multivalued fields question

Greetings.  We're finally kicking off our little Solr project.  We're
indexing a paltry 25,000 records but each has MANY documents attached, so
we're using Tika to parse those documents into a big long string, which we
use in a call to solrj.addField("relateddoccontents",
bigLongStringOfDocumentContents).  We don't care about search results
pointing back to a particular document, just one of the 25K records, so
this should work.

Now my question.  Many of these records have related records in other
tables, and there are several types of these related records.  For example,
we have record #100 that my have blue records with numbers 1111, 2222,
3333, and 4444, and red records with numbers 5555, 6666, 7777, 8888.
Currently we're just handling these the same way as related document
contents -- we concatenate them, separated by spaces, into one long string,
then we do solrj.addField("redRecords", stringOfRedRecordNumbers).  That
is, stringOfRedRecordNumbers is "1111 2222 3333 4444".

We have no need to show these records to the user in Solr search results,
because we're going to use the database for displaying of detailed
information for any records found.  Is there any reason to specify
redRecords and blueRecords as multivalued fields in schema.xml?  And if we
did that, we'd call solrj.addField() once for each value, would we not?

cheers,

Travis

Re: Multivalued fields question

Posted by Travis Low <tl...@4centurion.com>.
Thanks much, Erick.  Between your explanation, and what I read at
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html,
the utility of multiValued fields is clear.

On Thu, Nov 3, 2011 at 8:26 AM, Erick Erickson <er...@gmail.com>wrote:

> multiValued has nothing to do with how many tokens are in the field,
> it's just whether you can call document.add("field1", val1) more than
> once on the same field. Or, equivalently, in input document in XML
> has two <field> entries with the same name="field" entries. So it
> strictly depends upon whether you want to take it upon yourself
> to make these long strings or call document.add once for each
> value in the field.
>
> The field is returned as an array if it's multiValued....
>
> Just to make your life interesting.... If you define your increment gap as
> 0,
> there is no difference between how multiValued fields are searched
> as opposed to single-valued fields.
>
> FWIW
> Erick
>
> On Tue, Nov 1, 2011 at 1:26 PM, Travis Low <tl...@4centurion.com> wrote:
> > Greetings.  We're finally kicking off our little Solr project.  We're
> > indexing a paltry 25,000 records but each has MANY documents attached, so
> > we're using Tika to parse those documents into a big long string, which
> we
> > use in a call to solrj.addField("relateddoccontents",
> > bigLongStringOfDocumentContents).  We don't care about search results
> > pointing back to a particular document, just one of the 25K records, so
> > this should work.
> >
> > Now my question.  Many of these records have related records in other
> > tables, and there are several types of these related records.  For
> example,
> > we have record #100 that my have blue records with numbers 1111, 2222,
> > 3333, and 4444, and red records with numbers 5555, 6666, 7777, 8888.
> > Currently we're just handling these the same way as related document
> > contents -- we concatenate them, separated by spaces, into one long
> string,
> > then we do solrj.addField("redRecords", stringOfRedRecordNumbers).  That
> > is, stringOfRedRecordNumbers is "1111 2222 3333 4444".
> >
> > We have no need to show these records to the user in Solr search results,
> > because we're going to use the database for displaying of detailed
> > information for any records found.  Is there any reason to specify
> > redRecords and blueRecords as multivalued fields in schema.xml?  And if
> we
> > did that, we'd call solrj.addField() once for each value, would we not?
> >
> > cheers,
> >
> > Travis
> >
>



-- 

**

*Travis Low, Director of Development*


** <tl...@4centurion.com>* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.

Re: Multivalued fields question

Posted by Erick Erickson <er...@gmail.com>.
multiValued has nothing to do with how many tokens are in the field,
it's just whether you can call document.add("field1", val1) more than
once on the same field. Or, equivalently, in input document in XML
has two <field> entries with the same name="field" entries. So it
strictly depends upon whether you want to take it upon yourself
to make these long strings or call document.add once for each
value in the field.

The field is returned as an array if it's multiValued....

Just to make your life interesting.... If you define your increment gap as 0,
there is no difference between how multiValued fields are searched
as opposed to single-valued fields.

FWIW
Erick

On Tue, Nov 1, 2011 at 1:26 PM, Travis Low <tl...@4centurion.com> wrote:
> Greetings.  We're finally kicking off our little Solr project.  We're
> indexing a paltry 25,000 records but each has MANY documents attached, so
> we're using Tika to parse those documents into a big long string, which we
> use in a call to solrj.addField("relateddoccontents",
> bigLongStringOfDocumentContents).  We don't care about search results
> pointing back to a particular document, just one of the 25K records, so
> this should work.
>
> Now my question.  Many of these records have related records in other
> tables, and there are several types of these related records.  For example,
> we have record #100 that my have blue records with numbers 1111, 2222,
> 3333, and 4444, and red records with numbers 5555, 6666, 7777, 8888.
> Currently we're just handling these the same way as related document
> contents -- we concatenate them, separated by spaces, into one long string,
> then we do solrj.addField("redRecords", stringOfRedRecordNumbers).  That
> is, stringOfRedRecordNumbers is "1111 2222 3333 4444".
>
> We have no need to show these records to the user in Solr search results,
> because we're going to use the database for displaying of detailed
> information for any records found.  Is there any reason to specify
> redRecords and blueRecords as multivalued fields in schema.xml?  And if we
> did that, we'd call solrj.addField() once for each value, would we not?
>
> cheers,
>
> Travis
>