You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sam ” <sk...@gmail.com> on 2012/03/26 21:00:07 UTC

document inside document?

Hey,

I am making an image search engine where people can tag images with various
items that are themselves tagged.
For example, http://example.com/abc.jpg is tagged with the following three
items:
- item1 that is tagged with: tall blond woman
- item2 that is tagged with: yellow purse
- item3 that is tagged with: gucci red dress

Querying for +yellow +purse  will return the example image. But, querying
for +gucci +purse will not because the image does not have an item tagged
with both gucci and purse.

In addition to "items", each image has various metadata such as alt text,
location, description, photo credit.. etc  that should be available for
search.

How should I write my schema.xml ?
If imageUrl is primary key, do I implement my own fieldType for items, so
that I can write:
<field name="items" type="myItemType" multiValued="true"/>
What would myItemType look like so that solr would know the example image
will not be part of the query, +gucci +purse??

If itemId is primary key, I can use result grouping (
http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt
text and other image metadata for each item.

Or, should I create different schema for item search and metadata search?

Thanks.
Sam.

Re: document inside document?

Posted by Erick Erickson <er...@gmail.com>.
For your tagging, think about using multiValued="true" with
an increment gap of, say, 100. Then your searches
on this field can be phrase queries with a smaller slop
e.g. "tall woman"~90 would match, but "purse gucci"~90
would not because "purse" and "gucci" are not within 90
tokens of each other.

As far as the metadata is concerned, this is just specifying
which fields should be queried, see the "qf" parameter
in edismax.

As far as fieldType, spend some time with admin/analysis to understand
the kinds that various tokenizers and filters do, your question is really
too broad to answer. I'd start with one of the text types and iterate.

Grouping on primary key is a pretty useless thing to do, what is your
use case?

And you'll just have to get used to denormalizing data with Solr/Lucene,
which is hard for a DB person, it just feels icky <G>..

Best
Erick

On Mon, Mar 26, 2012 at 3:00 PM, sam ” <sk...@gmail.com> wrote:
> Hey,
>
> I am making an image search engine where people can tag images with various
> items that are themselves tagged.
> For example, http://example.com/abc.jpg is tagged with the following three
> items:
> - item1 that is tagged with: tall blond woman
> - item2 that is tagged with: yellow purse
> - item3 that is tagged with: gucci red dress
>
> Querying for +yellow +purse  will return the example image. But, querying
> for +gucci +purse will not because the image does not have an item tagged
> with both gucci and purse.
>
> In addition to "items", each image has various metadata such as alt text,
> location, description, photo credit.. etc  that should be available for
> search.
>
> How should I write my schema.xml ?
> If imageUrl is primary key, do I implement my own fieldType for items, so
> that I can write:
> <field name="items" type="myItemType" multiValued="true"/>
> What would myItemType look like so that solr would know the example image
> will not be part of the query, +gucci +purse??
>
> If itemId is primary key, I can use result grouping (
> http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt
> text and other image metadata for each item.
>
> Or, should I create different schema for item search and metadata search?
>
> Thanks.
> Sam.