You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Houghton <aa...@roarmouse.org> on 2010/11/28 17:30:29 UTC

multi-valued with metadata?

I've done my best to search through the archives for this problem, and
found at least one person dealing with a similar issue (with no
responses).  I'm sure this has been asked more than once before, but
my search-fu is apparently lacking.

Essentially, I need to be able to retrieve some metadata (author IDs)
along with the multi-valued fields holding document authors; e.g., I
search in document titles, I get the doc ID, the list of authors, and
*their* IDs, so I can drill down to other papers by these authors.

A basic sample of the XML data:

<docs>
<doc>
  <id>A001</id>
  <title>This is a title</title>
  <authors>
    <author>
      <id>111</id>
      <name>John Smith</name>
    </author>
    <author>
      <id>222</id>
      <name>Mary Johnson</name>
    </author>
  </authors>
</doc>
<doc>
  <id>A002</id>
  <title>And this is another title</title>
  <authors>
    <author>
      <id>222</id>
      <name>Mary Johnson</name>
    </author>
    <author>
      <id>333</id>
      <name>Alice Pocahontas</name>
    </author>
  </authors>
</doc>
</docs>

The only reasonable option I've thought of is to index the authors
twice, with one list of names and one of IDs.  It's not clear to me
that SOLR guarantees ordered results of these multi-valued fields,
though.  Another option would be to delimit the ID in some manner so
that I could search on and pull the ID from the author fields, but
only display the name.

A final option, and the one I'm hoping for, is to find that SOLR has
some built-in support for this kind of thing.

- Andrew

Re: multi-valued with metadata?

Posted by Erick Erickson <er...@gmail.com>.
Two things come to mind, neither optimal, but...

First, index both author and ID with a delimiter, something like
Mary Johnson | 222
and deal with breaking that info up for display when you were displaying
the documents. Make sure your analyzer breaks this up appropriately or
your searching will be "interesting".

The other would be to do the above in a different field, perhaps stored only
so you'd have info that's never displayed to the user but the info would
still be available from the doc.

I'm pretty sure that order is preserved for mutli-valued fields, but I'm not
100%
sure that behavior is guaranteed in the future.


Best
Erick
On Sun, Nov 28, 2010 at 11:30 AM, Andrew Houghton <aa...@roarmouse.org> wrote:

> I've done my best to search through the archives for this problem, and
> found at least one person dealing with a similar issue (with no
> responses).  I'm sure this has been asked more than once before, but
> my search-fu is apparently lacking.
>
> Essentially, I need to be able to retrieve some metadata (author IDs)
> along with the multi-valued fields holding document authors; e.g., I
> search in document titles, I get the doc ID, the list of authors, and
> *their* IDs, so I can drill down to other papers by these authors.
>
> A basic sample of the XML data:
>
> <docs>
> <doc>
>   <id>A001</id>
>   <title>This is a title</title>
>   <authors>
>     <author>
>       <id>111</id>
>       <name>John Smith</name>
>     </author>
>     <author>
>       <id>222</id>
>       <name>Mary Johnson</name>
>     </author>
>   </authors>
> </doc>
> <doc>
>   <id>A002</id>
>   <title>And this is another title</title>
>   <authors>
>     <author>
>       <id>222</id>
>       <name>Mary Johnson</name>
>     </author>
>     <author>
>       <id>333</id>
>       <name>Alice Pocahontas</name>
>     </author>
>   </authors>
> </doc>
> </docs>
>
> The only reasonable option I've thought of is to index the authors
> twice, with one list of names and one of IDs.  It's not clear to me
> that SOLR guarantees ordered results of these multi-valued fields,
> though.  Another option would be to delimit the ID in some manner so
> that I could search on and pull the ID from the author fields, but
> only display the name.
>
> A final option, and the one I'm hoping for, is to find that SOLR has
> some built-in support for this kind of thing.
>
> - Andrew
>

Re: multi-valued with metadata?

Posted by "Binkley, Peter" <Pe...@ualberta.ca>.
The best Solr solution often involves indexing the same source fields into
several Solr fields for different purposes. In this case I'd go with your
idea of delimiting the author id, along with indexing it separately:

<field name="id">A001</field>
<field name="author_id">111</field>
<field name="author_name">John Smith</field>
<field name="author_id_name">111|John Smith</field>

The field author_id_name could be stored but not indexed. You would retrieve
the id and author_id_name fields, and then parse the author id out of the
author_id_name field and use it for drill-down searches against the
author_id field. The link pointing to the drill-down search would use the
name part of the author_id_name as its anchor.

Peter


On 2010/11/28 9:30 AM, "Andrew Houghton" <aa...@roarmouse.org> wrote:

> I've done my best to search through the archives for this problem, and
> found at least one person dealing with a similar issue (with no
> responses).  I'm sure this has been asked more than once before, but
> my search-fu is apparently lacking.
> 
> Essentially, I need to be able to retrieve some metadata (author IDs)
> along with the multi-valued fields holding document authors; e.g., I
> search in document titles, I get the doc ID, the list of authors, and
> *their* IDs, so I can drill down to other papers by these authors.
> 
> A basic sample of the XML data:
> 
> <docs>
> <doc>
>   <id>A001</id>
>   <title>This is a title</title>
>   <authors>
>     <author>
>       <id>111</id>
>       <name>John Smith</name>
>     </author>
>     <author>
>       <id>222</id>
>       <name>Mary Johnson</name>
>     </author>
>   </authors>
> </doc>
> <doc>
>   <id>A002</id>
>   <title>And this is another title</title>
>   <authors>
>     <author>
>       <id>222</id>
>       <name>Mary Johnson</name>
>     </author>
>     <author>
>       <id>333</id>
>       <name>Alice Pocahontas</name>
>     </author>
>   </authors>
> </doc>
> </docs>
> 
> The only reasonable option I've thought of is to index the authors
> twice, with one list of names and one of IDs.  It's not clear to me
> that SOLR guarantees ordered results of these multi-valued fields,
> though.  Another option would be to delimit the ID in some manner so
> that I could search on and pull the ID from the author fields, but
> only display the name.
> 
> A final option, and the one I'm hoping for, is to find that SOLR has
> some built-in support for this kind of thing.
> 
> - Andrew
>