You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saïd Radhouani <r....@gmail.com> on 2010/06/26 00:01:13 UTC

Setting many properties for a multivalued field. Schema.xml ? External file?

Hi,

I'm trying to index data containing a multivalued field "picture", that has three properties: url, caption and description:

<picture/> 
	<url/>
	<caption/>
	<description/>

Thus, each indexed document might have many pictures, each of them has a url, a caption, and a description.

I wonder wether it's possible to store this data using only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using an external file to sore the properties of each picture, but I haven't tried yet this solution, waiting for your suggestions...

Thanks,
-Saïd


Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks Geert-Jan, this is indeed very helpful.

The delimiters I gave were just for the need of the example. I will use non frequent delimiter.

Cheers,
-Saïd

On Jun 26, 2010, at 1:53 PM, Geert-Jan Brits wrote:

>> If I understand your suggestion correctly, you said that there's NO need to
> have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of >information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
> 
> I think you should ask the exact opposite question. If you don't do anything
> with these fields which Solr is particularly good at (searching / filtering
> / faceting/ sorting) why go through the trouble of creating dynamic fields?
> (more fields is more overhead cost/ tracking cost no matter how you look at
> it)
> 
> Moreover, indeed from a client-view it's easier the way I suggested, since
> otherwise you:
> - would have to ask (through SolrJ) to include all dynamic fields to be
> returned in the Fl-field (
> http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
> because a-priori you don't know how many dynamic-fields to query. So in
> other words you can't just ask SOlr (though SolrJ lik you asked) to just
> return all dynamic fields beginning with pic_*. (afaik)
> - your client iterate code (looping the pics) is a bit more involved.
> 
> HTH, Cheers,
> 
> Geert-Jan
> 
> 2010/6/26 Saïd Radhouani <r....@gmail.com>
> 
>> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
>> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
>> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
>> [0,N] would be the best solution.
>> 
>> Regarding the other suggestion:
>> 
>>> If you dont need search at all on these fields, the best thing imo is to
>>> store all pic-related info of all pics together by concatenating them
>> with
>>> some delimiter which you know how to seperate at the client-side.
>>> That or just store it in an external RDB since solr is just sitting on
>> the
>>> data and not doing anything intelligent with it.
>> 
>> If I understand your suggestion correctly, you said that there's NO need to
>> have many Dynamic Fields; instead, we can have one definitive field name,
>> which can store a long string (concatenation of information about tens of
>> pictures), e.g., using "-" and "%" delimiters:
>> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> 
>> I don't clearly see the reason of doing this. Is there a gain in terms of
>> performance? Or does this make programming on the client-side easier? Or
>> something else?
>> 
>> 
>> My other question was: in case we use Dynamic Fields, is there a
>> documentation about using SolrJ for this purpose?
>> 
>> Thanks
>> -Saïd
>> 
>> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>> 
>>> You can treat dynamic fields like any other field, so you can facet,
>> sort,
>>> filter, etc on these fields (afaik)
>>> 
>>> I believe the confusion arises that sometimes the usecase for dynamic
>> fields
>>> seems to be ill-understood, i.e: to be able to use them to do some kind
>> of
>>> wildcard search, e.g: search for a value in any of the dynamic fields at
>>> once like pic_url_*. This however is NOT possible.
>>> 
>>> As far as your question goes:
>>> 
>>>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>>> pic
>>>> To the best of my knowledge, everyone is saying that faceting cannot be
>>> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>>> following and it's working: I assume that the stored > >pictures have a
>>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
>> it
>>> means that the underlying doc has at least one picture:
>>>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>>>> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>>> do the same thing without assuming that pictures have a sequential
>> number.
>>> 
>>> If I understand your question correctly: faceting on docs with and
>> without
>>> pics could ofcourse by done like you mention, however it  would be more
>>> efficient to have an extra field defined:  hasAtLestOnePic with values (0
>> |
>>> 1)
>>> use that to facet / filter on.
>>> 
>>> you can extend this to NrOfPics [0,N)  if you need to filter / facet on
>> docs
>>> with a certain nr of pics.
>>> 
>>> also I wondered what else you wanted to do with this pic-related info. Do
>>> you want to search on pic-description / pic-caption for instance? In that
>>> case the dynamic-fields approach may not be what you want: how would you
>>> know in which dynamic-field to search for a particular term? Would if be
>>> pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
>> fields,
>>> but you need to know how many pics an upperbound for the nr of pics and
>> it
>>> really doesn't feel right, to me at least.
>>> 
>>> If you need search on pic_description for instance, but don't mind what
>> pic
>>> matches, you could create a single field pic_description and put in the
>>> concat of all pic-descriptions and search on that, or just make it a a
>>> multi-valued field.
>>> 
>>> If you dont need search at all on these fields, the best thing imo is to
>>> store all pic-related info of all pics together by concatenating them
>> with
>>> some delimiter which you know how to seperate at the client-side.
>>> That or just store it in an external RDB since solr is just sitting on
>> the
>>> data and not doing anything intelligent with it.
>>> 
>>> I assume btw that you don't want to sort/ facet on pic-desc /
>> pic_caption/
>>> pic_url either ( I have a hard time thinking of a useful usecase for
>> that)
>>> 
>>> HTH,
>>> 
>>> Geert-Jan
>>> 
>>> 
>>> 
>>> 2010/6/26 Saïd Radhouani <r....@gmail.com>
>>> 
>>>> Thanks so much Otis. This is working great.
>>>> 
>>>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>>>> pic
>>>> 
>>>> To the best of my knowledge, everyone is saying that faceting cannot be
>>>> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>>>> following and it's working: I assume that the stored pictures have a
>>>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
>> index, it
>>>> means that the underlying doc has at least one picture:
>>>> 
>>>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>>>> 
>>>> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>>>> do the same thing without assuming that pictures have a sequential
>> number.
>>>> 
>>>> Also, do you have any documentation about handling Dynamic Fields using
>>>> SolrJ. So far, I found only issues about that on JIRA, but no
>> documentation.
>>>> 
>>>> Thanks a lot.
>>>> 
>>>> -Saïd
>>>> 
>>>> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>>>> 
>>>>> Saïd,
>>>>> 
>>>>> Dynamic fields could help here, for example imagine a doc with:
>>>>> id
>>>>> pic_url_*
>>>>> pic_caption_*
>>>>> pic_description_*
>>>>> 
>>>>> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
>>>>> 
>>>>> So, for you:
>>>>> 
>>>>> <dynamicField name="pic_url_*"  type="string"  indexed="true"
>>>> stored="true"/>
>>>>> <dynamicField name="pic_caption_*"  type="text"  indexed="true"
>>>> stored="true"/>
>>>>> <dynamicField name="pic_description_*"  type="text"  indexed="true"
>>>> stored="true"/>
>>>>> 
>>>>> Then you can add docs with unlimited number of
>>>> pic_(url|caption|description)_* fields, e.g.
>>>>> 
>>>>> id
>>>>> pic_url_1
>>>>> pic_caption_1
>>>>> pic_description_1
>>>>> 
>>>>> id
>>>>> pic_url_2
>>>>> pic_caption_2
>>>>> pic_description_2
>>>>> 
>>>>> 
>>>>> Otis
>>>>> ----
>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Original Message ----
>>>>>> From: Saïd Radhouani <r....@gmail.com>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Fri, June 25, 2010 6:01:13 PM
>>>>>> Subject: Setting many properties for a multivalued field. Schema.xml ?
>>>> External file?
>>>>>> 
>>>>>> Hi,
>>>>> 
>>>>> I'm trying to index data containing a multivalued field "picture",
>>>>>> that has three properties: url, caption and description:
>>>>> 
>>>>> <picture/>
>>>>>> 
>>>>>  <url/>
>>>>> 
>>>>>> <caption/>
>>>>>  <description/>
>>>>> 
>>>>> Thus, each
>>>>>> indexed document might have many pictures, each of them has a url, a
>>>> caption,
>>>>>> and a description.
>>>>> 
>>>>> I wonder wether it's possible to store this data using
>>>>>> only schema.xml. I couldn't figure it out so far. Instead, I'm
>> thinking
>>>> of using
>>>>>> an external file to sore the properties of each picture, but I haven't
>>>> tried yet
>>>>>> this solution, waiting for your suggestions...
>>>>> 
>>>>> Thanks,
>>>>> -Saïd
>>>> 
>>>> 
>> 
>> 


Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Geert-Jan Brits <gb...@gmail.com>.
btw, be careful with you delimiters: pic_url may possibly contain a '-',
etc.

2010/6/26 Geert-Jan Brits <gb...@gmail.com>

> >If I understand your suggestion correctly, you said that there's NO need
> to have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of >information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
> >I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
>
> I think you should ask the exact opposite question. If you don't do
> anything with these fields which Solr is particularly good at (searching /
> filtering / faceting/ sorting) why go through the trouble of creating
> dynamic fields?  (more fields is more overhead cost/ tracking cost no matter
> how you look at it)
>
> Moreover, indeed from a client-view it's easier the way I suggested, since
> otherwise you:
> - would have to ask (through SolrJ) to include all dynamic fields to be
> returned in the Fl-field (
> http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
> because a-priori you don't know how many dynamic-fields to query. So in
> other words you can't just ask SOlr (though SolrJ lik you asked) to just
> return all dynamic fields beginning with pic_*. (afaik)
> - your client iterate code (looping the pics) is a bit more involved.
>
> HTH, Cheers,
>
> Geert-Jan
>
> 2010/6/26 Saïd Radhouani <r....@gmail.com>
>
>> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
>> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
>> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
>> [0,N] would be the best solution.
>>
>> Regarding the other suggestion:
>>
>> > If you dont need search at all on these fields, the best thing imo is to
>> > store all pic-related info of all pics together by concatenating them
>> with
>> > some delimiter which you know how to seperate at the client-side.
>> > That or just store it in an external RDB since solr is just sitting on
>> the
>> > data and not doing anything intelligent with it.
>>
>> If I understand your suggestion correctly, you said that there's NO need
>> to have many Dynamic Fields; instead, we can have one definitive field name,
>> which can store a long string (concatenation of information about tens of
>> pictures), e.g., using "-" and "%" delimiters:
>> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>>
>> I don't clearly see the reason of doing this. Is there a gain in terms of
>> performance? Or does this make programming on the client-side easier? Or
>> something else?
>>
>>
>> My other question was: in case we use Dynamic Fields, is there a
>> documentation about using SolrJ for this purpose?
>>
>> Thanks
>> -Saïd
>>
>> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>>
>> > You can treat dynamic fields like any other field, so you can facet,
>> sort,
>> > filter, etc on these fields (afaik)
>> >
>> > I believe the confusion arises that sometimes the usecase for dynamic
>> fields
>> > seems to be ill-understood, i.e: to be able to use them to do some kind
>> of
>> > wildcard search, e.g: search for a value in any of the dynamic fields at
>> > once like pic_url_*. This however is NOT possible.
>> >
>> > As far as your question goes:
>> >
>> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>> > pic
>> >> To the best of my knowledge, everyone is saying that faceting cannot be
>> > done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>> > following and it's working: I assume that the stored > >pictures have a
>> > sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
>> index, it
>> > means that the underlying doc has at least one picture:
>> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> >> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>> > do the same thing without assuming that pictures have a sequential
>> number.
>> >
>> > If I understand your question correctly: faceting on docs with and
>> without
>> > pics could ofcourse by done like you mention, however it  would be more
>> > efficient to have an extra field defined:  hasAtLestOnePic with values
>> (0 |
>> > 1)
>> > use that to facet / filter on.
>> >
>> > you can extend this to NrOfPics [0,N)  if you need to filter / facet on
>> docs
>> > with a certain nr of pics.
>> >
>> > also I wondered what else you wanted to do with this pic-related info.
>> Do
>> > you want to search on pic-description / pic-caption for instance? In
>> that
>> > case the dynamic-fields approach may not be what you want: how would you
>> > know in which dynamic-field to search for a particular term? Would if be
>> > pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
>> fields,
>> > but you need to know how many pics an upperbound for the nr of pics and
>> it
>> > really doesn't feel right, to me at least.
>> >
>> > If you need search on pic_description for instance, but don't mind what
>> pic
>> > matches, you could create a single field pic_description and put in the
>> > concat of all pic-descriptions and search on that, or just make it a a
>> > multi-valued field.
>> >
>> > If you dont need search at all on these fields, the best thing imo is to
>> > store all pic-related info of all pics together by concatenating them
>> with
>> > some delimiter which you know how to seperate at the client-side.
>> > That or just store it in an external RDB since solr is just sitting on
>> the
>> > data and not doing anything intelligent with it.
>> >
>> > I assume btw that you don't want to sort/ facet on pic-desc /
>> pic_caption/
>> > pic_url either ( I have a hard time thinking of a useful usecase for
>> that)
>> >
>> > HTH,
>> >
>> > Geert-Jan
>> >
>> >
>> >
>> > 2010/6/26 Saïd Radhouani <r....@gmail.com>
>> >
>> >> Thanks so much Otis. This is working great.
>> >>
>> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>> >> pic
>> >>
>> >> To the best of my knowledge, everyone is saying that faceting cannot be
>> >> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>> >> following and it's working: I assume that the stored pictures have a
>> >> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
>> index, it
>> >> means that the underlying doc has at least one picture:
>> >>
>> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> >>
>> >> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>> >> do the same thing without assuming that pictures have a sequential
>> number.
>> >>
>> >> Also, do you have any documentation about handling Dynamic Fields using
>> >> SolrJ. So far, I found only issues about that on JIRA, but no
>> documentation.
>> >>
>> >> Thanks a lot.
>> >>
>> >> -Saïd
>> >>
>> >> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>> >>
>> >>> Saïd,
>> >>>
>> >>> Dynamic fields could help here, for example imagine a doc with:
>> >>> id
>> >>> pic_url_*
>> >>> pic_caption_*
>> >>> pic_description_*
>> >>>
>> >>> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
>> >>>
>> >>> So, for you:
>> >>>
>> >>> <dynamicField name="pic_url_*"  type="string"  indexed="true"
>> >> stored="true"/>
>> >>> <dynamicField name="pic_caption_*"  type="text"  indexed="true"
>> >> stored="true"/>
>> >>> <dynamicField name="pic_description_*"  type="text"  indexed="true"
>> >> stored="true"/>
>> >>>
>> >>> Then you can add docs with unlimited number of
>> >> pic_(url|caption|description)_* fields, e.g.
>> >>>
>> >>> id
>> >>> pic_url_1
>> >>> pic_caption_1
>> >>> pic_description_1
>> >>>
>> >>> id
>> >>> pic_url_2
>> >>> pic_caption_2
>> >>> pic_description_2
>> >>>
>> >>>
>> >>> Otis
>> >>> ----
>> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> >>> Lucene ecosystem search :: http://search-lucene.com/
>> >>>
>> >>>
>> >>>
>> >>> ----- Original Message ----
>> >>>> From: Saïd Radhouani <r....@gmail.com>
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Sent: Fri, June 25, 2010 6:01:13 PM
>> >>>> Subject: Setting many properties for a multivalued field. Schema.xml
>> ?
>> >> External file?
>> >>>>
>> >>>> Hi,
>> >>>
>> >>> I'm trying to index data containing a multivalued field "picture",
>> >>>> that has three properties: url, caption and description:
>> >>>
>> >>> <picture/>
>> >>>>
>> >>>   <url/>
>> >>>
>> >>>> <caption/>
>> >>>   <description/>
>> >>>
>> >>> Thus, each
>> >>>> indexed document might have many pictures, each of them has a url, a
>> >> caption,
>> >>>> and a description.
>> >>>
>> >>> I wonder wether it's possible to store this data using
>> >>>> only schema.xml. I couldn't figure it out so far. Instead, I'm
>> thinking
>> >> of using
>> >>>> an external file to sore the properties of each picture, but I
>> haven't
>> >> tried yet
>> >>>> this solution, waiting for your suggestions...
>> >>>
>> >>> Thanks,
>> >>> -Saïd
>> >>
>> >>
>>
>>
>

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Geert-Jan Brits <gb...@gmail.com>.
>If I understand your suggestion correctly, you said that there's NO need to
have many Dynamic Fields; instead, we can have one definitive field name,
which can store a long string (concatenation of >information about tens of
pictures), e.g., using "-" and "%" delimiters:
pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>I don't clearly see the reason of doing this. Is there a gain in terms of
performance? Or does this make programming on the client-side easier? Or
something else?

I think you should ask the exact opposite question. If you don't do anything
with these fields which Solr is particularly good at (searching / filtering
/ faceting/ sorting) why go through the trouble of creating dynamic fields?
 (more fields is more overhead cost/ tracking cost no matter how you look at
it)

Moreover, indeed from a client-view it's easier the way I suggested, since
otherwise you:
- would have to ask (through SolrJ) to include all dynamic fields to be
returned in the Fl-field (
http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
because a-priori you don't know how many dynamic-fields to query. So in
other words you can't just ask SOlr (though SolrJ lik you asked) to just
return all dynamic fields beginning with pic_*. (afaik)
- your client iterate code (looping the pics) is a bit more involved.

HTH, Cheers,

Geert-Jan

2010/6/26 Saïd Radhouani <r....@gmail.com>

> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
> [0,N] would be the best solution.
>
> Regarding the other suggestion:
>
> > If you dont need search at all on these fields, the best thing imo is to
> > store all pic-related info of all pics together by concatenating them
> with
> > some delimiter which you know how to seperate at the client-side.
> > That or just store it in an external RDB since solr is just sitting on
> the
> > data and not doing anything intelligent with it.
>
> If I understand your suggestion correctly, you said that there's NO need to
> have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>
> I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
>
>
> My other question was: in case we use Dynamic Fields, is there a
> documentation about using SolrJ for this purpose?
>
> Thanks
> -Saïd
>
> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>
> > You can treat dynamic fields like any other field, so you can facet,
> sort,
> > filter, etc on these fields (afaik)
> >
> > I believe the confusion arises that sometimes the usecase for dynamic
> fields
> > seems to be ill-understood, i.e: to be able to use them to do some kind
> of
> > wildcard search, e.g: search for a value in any of the dynamic fields at
> > once like pic_url_*. This however is NOT possible.
> >
> > As far as your question goes:
> >
> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
> w/o
> > pic
> >> To the best of my knowledge, everyone is saying that faceting cannot be
> > done on dynamic fields (only on definitive field names). Thus, I tried
> the
> > following and it's working: I assume that the stored > >pictures have a
> > sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
> it
> > means that the underlying doc has at least one picture:
> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
> >> While this is working fine, I'm wondering whether there's a cleaner way
> to
> > do the same thing without assuming that pictures have a sequential
> number.
> >
> > If I understand your question correctly: faceting on docs with and
> without
> > pics could ofcourse by done like you mention, however it  would be more
> > efficient to have an extra field defined:  hasAtLestOnePic with values (0
> |
> > 1)
> > use that to facet / filter on.
> >
> > you can extend this to NrOfPics [0,N)  if you need to filter / facet on
> docs
> > with a certain nr of pics.
> >
> > also I wondered what else you wanted to do with this pic-related info. Do
> > you want to search on pic-description / pic-caption for instance? In that
> > case the dynamic-fields approach may not be what you want: how would you
> > know in which dynamic-field to search for a particular term? Would if be
> > pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
> fields,
> > but you need to know how many pics an upperbound for the nr of pics and
> it
> > really doesn't feel right, to me at least.
> >
> > If you need search on pic_description for instance, but don't mind what
> pic
> > matches, you could create a single field pic_description and put in the
> > concat of all pic-descriptions and search on that, or just make it a a
> > multi-valued field.
> >
> > If you dont need search at all on these fields, the best thing imo is to
> > store all pic-related info of all pics together by concatenating them
> with
> > some delimiter which you know how to seperate at the client-side.
> > That or just store it in an external RDB since solr is just sitting on
> the
> > data and not doing anything intelligent with it.
> >
> > I assume btw that you don't want to sort/ facet on pic-desc /
> pic_caption/
> > pic_url either ( I have a hard time thinking of a useful usecase for
> that)
> >
> > HTH,
> >
> > Geert-Jan
> >
> >
> >
> > 2010/6/26 Saïd Radhouani <r....@gmail.com>
> >
> >> Thanks so much Otis. This is working great.
> >>
> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
> w/o
> >> pic
> >>
> >> To the best of my knowledge, everyone is saying that faceting cannot be
> >> done on dynamic fields (only on definitive field names). Thus, I tried
> the
> >> following and it's working: I assume that the stored pictures have a
> >> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
> index, it
> >> means that the underlying doc has at least one picture:
> >>
> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
> >>
> >> While this is working fine, I'm wondering whether there's a cleaner way
> to
> >> do the same thing without assuming that pictures have a sequential
> number.
> >>
> >> Also, do you have any documentation about handling Dynamic Fields using
> >> SolrJ. So far, I found only issues about that on JIRA, but no
> documentation.
> >>
> >> Thanks a lot.
> >>
> >> -Saïd
> >>
> >> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
> >>
> >>> Saïd,
> >>>
> >>> Dynamic fields could help here, for example imagine a doc with:
> >>> id
> >>> pic_url_*
> >>> pic_caption_*
> >>> pic_description_*
> >>>
> >>> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> >>>
> >>> So, for you:
> >>>
> >>> <dynamicField name="pic_url_*"  type="string"  indexed="true"
> >> stored="true"/>
> >>> <dynamicField name="pic_caption_*"  type="text"  indexed="true"
> >> stored="true"/>
> >>> <dynamicField name="pic_description_*"  type="text"  indexed="true"
> >> stored="true"/>
> >>>
> >>> Then you can add docs with unlimited number of
> >> pic_(url|caption|description)_* fields, e.g.
> >>>
> >>> id
> >>> pic_url_1
> >>> pic_caption_1
> >>> pic_description_1
> >>>
> >>> id
> >>> pic_url_2
> >>> pic_caption_2
> >>> pic_description_2
> >>>
> >>>
> >>> Otis
> >>> ----
> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>> Lucene ecosystem search :: http://search-lucene.com/
> >>>
> >>>
> >>>
> >>> ----- Original Message ----
> >>>> From: Saïd Radhouani <r....@gmail.com>
> >>>> To: solr-user@lucene.apache.org
> >>>> Sent: Fri, June 25, 2010 6:01:13 PM
> >>>> Subject: Setting many properties for a multivalued field. Schema.xml ?
> >> External file?
> >>>>
> >>>> Hi,
> >>>
> >>> I'm trying to index data containing a multivalued field "picture",
> >>>> that has three properties: url, caption and description:
> >>>
> >>> <picture/>
> >>>>
> >>>   <url/>
> >>>
> >>>> <caption/>
> >>>   <description/>
> >>>
> >>> Thus, each
> >>>> indexed document might have many pictures, each of them has a url, a
> >> caption,
> >>>> and a description.
> >>>
> >>> I wonder wether it's possible to store this data using
> >>>> only schema.xml. I couldn't figure it out so far. Instead, I'm
> thinking
> >> of using
> >>>> an external file to sore the properties of each picture, but I haven't
> >> tried yet
> >>>> this solution, waiting for your suggestions...
> >>>
> >>> Thanks,
> >>> -Saïd
> >>
> >>
>
>

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks Geert-Jan for the detailed answer. Actually, I don't search at all on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the number of pictures). Thus, your suggestion of adding an extra field NrOfPics [0,N] would be the best solution.

Regarding the other suggestion:

> If you dont need search at all on these fields, the best thing imo is to
> store all pic-related info of all pics together by concatenating them with
> some delimiter which you know how to seperate at the client-side.
> That or just store it in an external RDB since solr is just sitting on the
> data and not doing anything intelligent with it.

If I understand your suggestion correctly, you said that there's NO need to have many Dynamic Fields; instead, we can have one definitive field name, which can store a long string (concatenation of information about tens of pictures), e.g., using "-" and "%" delimiters: pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...

I don't clearly see the reason of doing this. Is there a gain in terms of performance? Or does this make programming on the client-side easier? Or something else?


My other question was: in case we use Dynamic Fields, is there a documentation about using SolrJ for this purpose? 

Thanks
-Saïd

On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:

> You can treat dynamic fields like any other field, so you can facet, sort,
> filter, etc on these fields (afaik)
> 
> I believe the confusion arises that sometimes the usecase for dynamic fields
> seems to be ill-understood, i.e: to be able to use them to do some kind of
> wildcard search, e.g: search for a value in any of the dynamic fields at
> once like pic_url_*. This however is NOT possible.
> 
> As far as your question goes:
> 
>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
> pic
>> To the best of my knowledge, everyone is saying that faceting cannot be
> done on dynamic fields (only on definitive field names). Thus, I tried the
> following and it's working: I assume that the stored > >pictures have a
> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
> means that the underlying doc has at least one picture:
>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> While this is working fine, I'm wondering whether there's a cleaner way to
> do the same thing without assuming that pictures have a sequential number.
> 
> If I understand your question correctly: faceting on docs with and without
> pics could ofcourse by done like you mention, however it  would be more
> efficient to have an extra field defined:  hasAtLestOnePic with values (0 |
> 1)
> use that to facet / filter on.
> 
> you can extend this to NrOfPics [0,N)  if you need to filter / facet on docs
> with a certain nr of pics.
> 
> also I wondered what else you wanted to do with this pic-related info. Do
> you want to search on pic-description / pic-caption for instance? In that
> case the dynamic-fields approach may not be what you want: how would you
> know in which dynamic-field to search for a particular term? Would if be
> pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic fields,
> but you need to know how many pics an upperbound for the nr of pics and it
> really doesn't feel right, to me at least.
> 
> If you need search on pic_description for instance, but don't mind what pic
> matches, you could create a single field pic_description and put in the
> concat of all pic-descriptions and search on that, or just make it a a
> multi-valued field.
> 
> If you dont need search at all on these fields, the best thing imo is to
> store all pic-related info of all pics together by concatenating them with
> some delimiter which you know how to seperate at the client-side.
> That or just store it in an external RDB since solr is just sitting on the
> data and not doing anything intelligent with it.
> 
> I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/
> pic_url either ( I have a hard time thinking of a useful usecase for that)
> 
> HTH,
> 
> Geert-Jan
> 
> 
> 
> 2010/6/26 Saïd Radhouani <r....@gmail.com>
> 
>> Thanks so much Otis. This is working great.
>> 
>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
>> pic
>> 
>> To the best of my knowledge, everyone is saying that faceting cannot be
>> done on dynamic fields (only on definitive field names). Thus, I tried the
>> following and it's working: I assume that the stored pictures have a
>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
>> means that the underlying doc has at least one picture:
>> 
>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> 
>> While this is working fine, I'm wondering whether there's a cleaner way to
>> do the same thing without assuming that pictures have a sequential number.
>> 
>> Also, do you have any documentation about handling Dynamic Fields using
>> SolrJ. So far, I found only issues about that on JIRA, but no documentation.
>> 
>> Thanks a lot.
>> 
>> -Saïd
>> 
>> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>> 
>>> Saïd,
>>> 
>>> Dynamic fields could help here, for example imagine a doc with:
>>> id
>>> pic_url_*
>>> pic_caption_*
>>> pic_description_*
>>> 
>>> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
>>> 
>>> So, for you:
>>> 
>>> <dynamicField name="pic_url_*"  type="string"  indexed="true"
>> stored="true"/>
>>> <dynamicField name="pic_caption_*"  type="text"  indexed="true"
>> stored="true"/>
>>> <dynamicField name="pic_description_*"  type="text"  indexed="true"
>> stored="true"/>
>>> 
>>> Then you can add docs with unlimited number of
>> pic_(url|caption|description)_* fields, e.g.
>>> 
>>> id
>>> pic_url_1
>>> pic_caption_1
>>> pic_description_1
>>> 
>>> id
>>> pic_url_2
>>> pic_caption_2
>>> pic_description_2
>>> 
>>> 
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>> 
>>> 
>>> 
>>> ----- Original Message ----
>>>> From: Saïd Radhouani <r....@gmail.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Fri, June 25, 2010 6:01:13 PM
>>>> Subject: Setting many properties for a multivalued field. Schema.xml ?
>> External file?
>>>> 
>>>> Hi,
>>> 
>>> I'm trying to index data containing a multivalued field "picture",
>>>> that has three properties: url, caption and description:
>>> 
>>> <picture/>
>>>> 
>>>   <url/>
>>> 
>>>> <caption/>
>>>   <description/>
>>> 
>>> Thus, each
>>>> indexed document might have many pictures, each of them has a url, a
>> caption,
>>>> and a description.
>>> 
>>> I wonder wether it's possible to store this data using
>>>> only schema.xml. I couldn't figure it out so far. Instead, I'm thinking
>> of using
>>>> an external file to sore the properties of each picture, but I haven't
>> tried yet
>>>> this solution, waiting for your suggestions...
>>> 
>>> Thanks,
>>> -Saïd
>> 
>> 


Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Geert-Jan Brits <gb...@gmail.com>.
You can treat dynamic fields like any other field, so you can facet, sort,
filter, etc on these fields (afaik)

I believe the confusion arises that sometimes the usecase for dynamic fields
seems to be ill-understood, i.e: to be able to use them to do some kind of
wildcard search, e.g: search for a value in any of the dynamic fields at
once like pic_url_*. This however is NOT possible.

As far as your question goes:

>Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
pic
>To the best of my knowledge, everyone is saying that faceting cannot be
done on dynamic fields (only on definitive field names). Thus, I tried the
following and it's working: I assume that the stored > >pictures have a
sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
means that the underlying doc has at least one picture:
> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
> While this is working fine, I'm wondering whether there's a cleaner way to
do the same thing without assuming that pictures have a sequential number.

If I understand your question correctly: faceting on docs with and without
pics could ofcourse by done like you mention, however it  would be more
efficient to have an extra field defined:  hasAtLestOnePic with values (0 |
1)
use that to facet / filter on.

you can extend this to NrOfPics [0,N)  if you need to filter / facet on docs
with a certain nr of pics.

also I wondered what else you wanted to do with this pic-related info. Do
you want to search on pic-description / pic-caption for instance? In that
case the dynamic-fields approach may not be what you want: how would you
know in which dynamic-field to search for a particular term? Would if be
pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic fields,
but you need to know how many pics an upperbound for the nr of pics and it
really doesn't feel right, to me at least.

If you need search on pic_description for instance, but don't mind what pic
matches, you could create a single field pic_description and put in the
concat of all pic-descriptions and search on that, or just make it a a
multi-valued field.

If you dont need search at all on these fields, the best thing imo is to
store all pic-related info of all pics together by concatenating them with
some delimiter which you know how to seperate at the client-side.
That or just store it in an external RDB since solr is just sitting on the
data and not doing anything intelligent with it.

I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/
pic_url either ( I have a hard time thinking of a useful usecase for that)

HTH,

Geert-Jan



2010/6/26 Saïd Radhouani <r....@gmail.com>

> Thanks so much Otis. This is working great.
>
> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
> pic
>
> To the best of my knowledge, everyone is saying that faceting cannot be
> done on dynamic fields (only on definitive field names). Thus, I tried the
> following and it's working: I assume that the stored pictures have a
> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
> means that the underlying doc has at least one picture:
>
> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>
> While this is working fine, I'm wondering whether there's a cleaner way to
> do the same thing without assuming that pictures have a sequential number.
>
> Also, do you have any documentation about handling Dynamic Fields using
> SolrJ. So far, I found only issues about that on JIRA, but no documentation.
>
> Thanks a lot.
>
> -Saïd
>
> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>
> > Saïd,
> >
> > Dynamic fields could help here, for example imagine a doc with:
> > id
> > pic_url_*
> > pic_caption_*
> > pic_description_*
> >
> > See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> >
> > So, for you:
> >
> > <dynamicField name="pic_url_*"  type="string"  indexed="true"
>  stored="true"/>
> > <dynamicField name="pic_caption_*"  type="text"  indexed="true"
>  stored="true"/>
> > <dynamicField name="pic_description_*"  type="text"  indexed="true"
>  stored="true"/>
> >
> > Then you can add docs with unlimited number of
> pic_(url|caption|description)_* fields, e.g.
> >
> > id
> > pic_url_1
> > pic_caption_1
> > pic_description_1
> >
> > id
> > pic_url_2
> > pic_caption_2
> > pic_description_2
> >
> >
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > ----- Original Message ----
> >> From: Saïd Radhouani <r....@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Fri, June 25, 2010 6:01:13 PM
> >> Subject: Setting many properties for a multivalued field. Schema.xml ?
> External file?
> >>
> >> Hi,
> >
> > I'm trying to index data containing a multivalued field "picture",
> >> that has three properties: url, caption and description:
> >
> > <picture/>
> >>
> >    <url/>
> >
> >> <caption/>
> >    <description/>
> >
> > Thus, each
> >> indexed document might have many pictures, each of them has a url, a
> caption,
> >> and a description.
> >
> > I wonder wether it's possible to store this data using
> >> only schema.xml. I couldn't figure it out so far. Instead, I'm thinking
> of using
> >> an external file to sore the properties of each picture, but I haven't
> tried yet
> >> this solution, waiting for your suggestions...
> >
> > Thanks,
> > -Saïd
>
>

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Saïd Radhouani <r....@gmail.com>.
Thanks so much Otis. This is working great.

Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic

To the best of my knowledge, everyone is saying that faceting cannot be done on dynamic fields (only on definitive field names). Thus, I tried the following and it's working: I assume that the stored pictures have a sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the underlying doc has at least one picture: 

...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*

While this is working fine, I'm wondering whether there's a cleaner way to do the same thing without assuming that pictures have a sequential number.

Also, do you have any documentation about handling Dynamic Fields using SolrJ. So far, I found only issues about that on JIRA, but no documentation.

Thanks a lot.

-Saïd

On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:

> Saïd,
> 
> Dynamic fields could help here, for example imagine a doc with:
> id
> pic_url_*
> pic_caption_*
> pic_description_*
> 
> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> 
> So, for you:
> 
> <dynamicField name="pic_url_*"  type="string"  indexed="true"  stored="true"/>
> <dynamicField name="pic_caption_*"  type="text"  indexed="true"  stored="true"/>
> <dynamicField name="pic_description_*"  type="text"  indexed="true"  stored="true"/>
> 
> Then you can add docs with unlimited number of pic_(url|caption|description)_* fields, e.g.
> 
> id
> pic_url_1
> pic_caption_1
> pic_description_1
> 
> id
> pic_url_2
> pic_caption_2
> pic_description_2
> 
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
>> From: Saïd Radhouani <r....@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Fri, June 25, 2010 6:01:13 PM
>> Subject: Setting many properties for a multivalued field. Schema.xml ? External file?
>> 
>> Hi,
> 
> I'm trying to index data containing a multivalued field "picture", 
>> that has three properties: url, caption and description:
> 
> <picture/> 
>> 
>    <url/>
> 
>> <caption/>
>    <description/>
> 
> Thus, each 
>> indexed document might have many pictures, each of them has a url, a caption, 
>> and a description.
> 
> I wonder wether it's possible to store this data using 
>> only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using 
>> an external file to sore the properties of each picture, but I haven't tried yet 
>> this solution, waiting for your suggestions...
> 
> Thanks,
> -Saïd


Re: Setting many properties for a multivalued field. Schema.xml ? External file?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Saïd,

Dynamic fields could help here, for example imagine a doc with:
id
 pic_url_*
 pic_caption_*
 pic_description_*

See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields

So, for you:

<dynamicField name="pic_url_*"  type="string"  indexed="true"  stored="true"/>
<dynamicField name="pic_caption_*"  type="text"  indexed="true"  stored="true"/>
<dynamicField name="pic_description_*"  type="text"  indexed="true"  stored="true"/>

Then you can add docs with unlimited number of pic_(url|caption|description)_* fields, e.g.

id
pic_url_1
pic_caption_1
pic_description_1

id
pic_url_2
pic_caption_2
pic_description_2


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Saïd Radhouani <r....@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Fri, June 25, 2010 6:01:13 PM
> Subject: Setting many properties for a multivalued field. Schema.xml ? External file?
> 
> Hi,

I'm trying to index data containing a multivalued field "picture", 
> that has three properties: url, caption and description:

<picture/> 
> 
    <url/>
    
> <caption/>
    <description/>

Thus, each 
> indexed document might have many pictures, each of them has a url, a caption, 
> and a description.

I wonder wether it's possible to store this data using 
> only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using 
> an external file to sore the properties of each picture, but I haven't tried yet 
> this solution, waiting for your suggestions...

Thanks,
-Saïd