You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gunaranjan Chandraraju <ch...@apple.com> on 2009/01/23 08:38:17 UTC

How to make Relationships work for Multi-valued Index Fields?

Hi
I may be completely off on this being new to SOLR but I am not sure  
how to index related groups of fields in a document and preserver  
their 'grouping'.   I  would appreciate any help on this.    Detailed  
description of the problem below.

I am trying to index an entity that can have multiple occurrences in  
the same document - e.g. Address.  The address could be Shipping,  
Home, Office etc.   Each address element has multiple values in it  
like street, state etc.    Thus each address element is a group with  
the state and street in one address element being related to each other.

It looks like this in my source xml

<record>
    <coreInfo id="123" , .../>
    <address street="XYZ1" State="CA" ...type="home" />
    <address street="XYZ2" state="CA" ... type="Office"/>
    <address street="XYZ3" state="CA" ....type="Other"/>
</record>

I have setup my DIH to treat these as entities as below

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
      <entity name ="f" processor="FileListEntityProcessor"
              baseDir="***"
              fileName=".*xml"
              rootEntity="false"
              dataSource="null" >
         <entity
            name="record"
	   processor="XPathEntityProcessor"
	   stream="false"
	   forEach="/record"
            url="${f.fileAbsolutePath}">
                 <field column="ID" xpath="/record/@id" />

                 <!-- Address  -->
                  <entity
                      name="record_adr"
	 	     processor="XPathEntityProcessor"
	             stream="false"
	             forEach="/record/address"
       		     url="${f.fileAbsolutePath}">
                          <field column="address_street"  xpath="/ 
record/address/@street" />
			 <field column="address_state"   xpath="/record/address//@state" />
                          <field column="address_type"    xpath="/ 
record/address//@type" />
		</entity>
            </entity>
      </entity>
    </document>
</dataConfig>


The problem is as follows.  DIH seems to treat these as entities but  
solr seems to flatten them out on indexing to fields in a document  
(losing the entity part).

So when I search for the an ID - in the response all the street fields  
are bunched to-gather, followed by all the state fields type etc.   
Thus I can't associate which street address corresponds to which  
address type in the response.

What seems harder is this - say I need to query on 'Street' = XYZ1 and  
type="Office".  This should NOT return a document since the street for  
the office address is "XY2" and not "XYZ1".  However when I query for  
address_state:"XYZ1" and address_type:"Office" I get back this document.

The problem seems to be that while DIH allows 'entities' within a  
document  the SOLR schema does not preserve them - it 'flattens' all  
of them out as indices for the document.

I could work around the problem by creating SOLR fields like  
"home_address_street" and "office_address_street" and do some xpath  
mapping.  However I don't want to do it as we can have multiple  
'other' addresses.  Also I have other fields whose type is not easily  
distinguished like address.

As I mentioned being new to SOLR I might have completely goofed on a  
way to set it up - much appreciate any direction on it. I am using  
SOLR 1.3

Regards,
Guna

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

Hi Fergus,
XPathEntityprocessor can read multivalued fields easily

eg
<dataConfig>
   <dataSource type="FileDataSource" encoding="UTF-8" />
   <document>
     <entity name ="f" processor="FileListEntityProcessor"
             baseDir="***"
             fileName=".*xml"
             rootEntity="false"
             dataSource="null" >
        <entity
          name="record"
          processor="XPathEntityProcessor"
          forEach="/record"
          url="${f.fileAbsolutePath}">
                <field column="ID" xpath="/record/@id"
commonField="true"/> ***change**
                <field column="address_street"
xpath="/record/address/@street" />
                         <field column="address_state"
xpath="/record/address/@state" />
                         <field column="address_type"
xpath="/record/address/@type" />

           </entity>
     </entity>
   </document>
</dataConfig>


In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3 <address> tags then
you get a List<String> for each fields where the length of the
list==3. If an item is missing it will be added as a null.

ensure that the fields are marked as multiValued="true" in the
schema.xml. Otherwise it does not return List<String>  . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: <field column="address_state"   multiValued="true"
xpath="/record/address/@state" />


I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble



On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള്‍  नोब्ळ्
<no...@gmail.com> wrote:
> nesting of an XPathEntityProcessor into another XPathEntityProcessor
> is possible only if a field in an xml is a filename/url .
> what is the purpose of nesting like this?
> is it because you have multiple addresses? the possible solutions are
> discussed elsewhere in this thread
>
> On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>> Hello,
>>
>> I am also a newbie and was wanting to do almost the exact same thing.
>> I was planning on doing the equivalent of:-
>>
>> <dataConfig>
>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>    <document>
>>      <entity name ="f" processor="FileListEntityProcessor"
>>              baseDir="***"
>>              fileName=".*xml"
>>              rootEntity="false"
>>              dataSource="null" >
>>         <entity
>>           name="record"
>>           processor="XPathEntityProcessor"
>>           stream="false"
>>           rootEntity="false"            ***changed***
>>           forEach="/record"
>>           url="${f.fileAbsolutePath}">
>>                 <field column="ID" xpath="/record/@id" commonField="true"/> ***change**
>>                 <!-- Address  -->
>>                  <entity
>>                     name="record_adr"
>>                     processor="XPathEntityProcessor"
>>                     stream="false"
>>                     forEach="/record/address"
>>                     url="${f.fileAbsolutePath}">
>>                          <field column="address_street"  xpath="/
>> record/address/@street" />
>>                          <field column="address_state"   xpath="/record/address//@state" />
>>                          <field column="address_type"    xpath="/
>> record/address//@type" />
>>                </entity>
>>            </entity>
>>      </entity>
>>    </document>
>> </dataConfig>
>>
>> ID is no longer unique within Solr, There would be multiple "documents"
>> with a given ID; one for each address. You can then search on ID and get
>> the three addresses, you can also search on an address more sensibly.
>>
>> I have not been able to try this yet as other issues are still to be
>> dealt with.
>>
>> Comments?????
>>
>>>Hi
>>>I may be completely off on this being new to SOLR but I am not sure
>>>how to index related groups of fields in a document and preserver
>>>their 'grouping'.   I  would appreciate any help on this.    Detailed
>>>description of the problem below.
>>>
>>>I am trying to index an entity that can have multiple occurrences in
>>>the same document - e.g. Address.  The address could be Shipping,
>>>Home, Office etc.   Each address element has multiple values in it
>>>like street, state etc.    Thus each address element is a group with
>>>the state and street in one address element being related to each other.
>>>
>>>It looks like this in my source xml
>>>
>>><record>
>>>    <coreInfo id="123" , .../>
>>>    <address street="XYZ1" State="CA" ...type="home" />
>>>    <address street="XYZ2" state="CA" ... type="Office"/>
>>>    <address street="XYZ3" state="CA" ....type="Other"/>
>>></record>
>>>
>>>I have setup my DIH to treat these as entities as below
>>>
>>><dataConfig>
>>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>>    <document>
>>>      <entity name ="f" processor="FileListEntityProcessor"
>>>              baseDir="***"
>>>              fileName=".*xml"
>>>              rootEntity="false"
>>>              dataSource="null" >
>>>         <entity
>>>            name="record"
>>>          processor="XPathEntityProcessor"
>>>          stream="false"
>>>          forEach="/record"
>>>            url="${f.fileAbsolutePath}">
>>>                 <field column="ID" xpath="/record/@id" />
>>>
>>>                 <!-- Address  -->
>>>                  <entity
>>>                      name="record_adr"
>>>                    processor="XPathEntityProcessor"
>>>                    stream="false"
>>>                    forEach="/record/address"
>>>                            url="${f.fileAbsolutePath}">
>>>                          <field column="address_street"  xpath="/
>>>record/address/@street" />
>>>                        <field column="address_state"   xpath="/record/address//@state" />
>>>                          <field column="address_type"    xpath="/
>>>record/address//@type" />
>>>               </entity>
>>>            </entity>
>>>      </entity>
>>>    </document>
>>></dataConfig>
>>>
>>>
>>>The problem is as follows.  DIH seems to treat these as entities but
>>>solr seems to flatten them out on indexing to fields in a document
>>>(losing the entity part).
>>>
>>>So when I search for the an ID - in the response all the street fields
>>>are bunched to-gather, followed by all the state fields type etc.
>>>Thus I can't associate which street address corresponds to which
>>>address type in the response.
>>>
>>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>>type="Office".  This should NOT return a document since the street for
>>>the office address is "XY2" and not "XYZ1".  However when I query for
>>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>>
>>>The problem seems to be that while DIH allows 'entities' within a
>>>document  the SOLR schema does not preserve them - it 'flattens' all
>>>of them out as indices for the document.
>>>
>>>I could work around the problem by creating SOLR fields like
>>>"home_address_street" and "office_address_street" and do some xpath
>>>mapping.  However I don't want to do it as we can have multiple
>>>'other' addresses.  Also I have other fields whose type is not easily
>>>distinguished like address.
>>>
>>>As I mentioned being new to SOLR I might have completely goofed on a
>>>way to set it up - much appreciate any direction on it. I am using
>>>SOLR 1.3
>>>
>>>Regards,
>>>Guna
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

nesting of an XPathEntityProcessor into another XPathEntityProcessor
is possible only if a field in an xml is a filename/url .
what is the purpose of nesting like this?
is it because you have multiple addresses? the possible solutions are
discussed elsewhere in this thread

On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hello,
>
> I am also a newbie and was wanting to do almost the exact same thing.
> I was planning on doing the equivalent of:-
>
> <dataConfig>
>    <dataSource type="FileDataSource" encoding="UTF-8" />
>    <document>
>      <entity name ="f" processor="FileListEntityProcessor"
>              baseDir="***"
>              fileName=".*xml"
>              rootEntity="false"
>              dataSource="null" >
>         <entity
>           name="record"
>           processor="XPathEntityProcessor"
>           stream="false"
>           rootEntity="false"            ***changed***
>           forEach="/record"
>           url="${f.fileAbsolutePath}">
>                 <field column="ID" xpath="/record/@id" commonField="true"/> ***change**
>                 <!-- Address  -->
>                  <entity
>                     name="record_adr"
>                     processor="XPathEntityProcessor"
>                     stream="false"
>                     forEach="/record/address"
>                     url="${f.fileAbsolutePath}">
>                          <field column="address_street"  xpath="/
> record/address/@street" />
>                          <field column="address_state"   xpath="/record/address//@state" />
>                          <field column="address_type"    xpath="/
> record/address//@type" />
>                </entity>
>            </entity>
>      </entity>
>    </document>
> </dataConfig>
>
> ID is no longer unique within Solr, There would be multiple "documents"
> with a given ID; one for each address. You can then search on ID and get
> the three addresses, you can also search on an address more sensibly.
>
> I have not been able to try this yet as other issues are still to be
> dealt with.
>
> Comments?????
>
>>Hi
>>I may be completely off on this being new to SOLR but I am not sure
>>how to index related groups of fields in a document and preserver
>>their 'grouping'.   I  would appreciate any help on this.    Detailed
>>description of the problem below.
>>
>>I am trying to index an entity that can have multiple occurrences in
>>the same document - e.g. Address.  The address could be Shipping,
>>Home, Office etc.   Each address element has multiple values in it
>>like street, state etc.    Thus each address element is a group with
>>the state and street in one address element being related to each other.
>>
>>It looks like this in my source xml
>>
>><record>
>>    <coreInfo id="123" , .../>
>>    <address street="XYZ1" State="CA" ...type="home" />
>>    <address street="XYZ2" state="CA" ... type="Office"/>
>>    <address street="XYZ3" state="CA" ....type="Other"/>
>></record>
>>
>>I have setup my DIH to treat these as entities as below
>>
>><dataConfig>
>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>    <document>
>>      <entity name ="f" processor="FileListEntityProcessor"
>>              baseDir="***"
>>              fileName=".*xml"
>>              rootEntity="false"
>>              dataSource="null" >
>>         <entity
>>            name="record"
>>          processor="XPathEntityProcessor"
>>          stream="false"
>>          forEach="/record"
>>            url="${f.fileAbsolutePath}">
>>                 <field column="ID" xpath="/record/@id" />
>>
>>                 <!-- Address  -->
>>                  <entity
>>                      name="record_adr"
>>                    processor="XPathEntityProcessor"
>>                    stream="false"
>>                    forEach="/record/address"
>>                            url="${f.fileAbsolutePath}">
>>                          <field column="address_street"  xpath="/
>>record/address/@street" />
>>                        <field column="address_state"   xpath="/record/address//@state" />
>>                          <field column="address_type"    xpath="/
>>record/address//@type" />
>>               </entity>
>>            </entity>
>>      </entity>
>>    </document>
>></dataConfig>
>>
>>
>>The problem is as follows.  DIH seems to treat these as entities but
>>solr seems to flatten them out on indexing to fields in a document
>>(losing the entity part).
>>
>>So when I search for the an ID - in the response all the street fields
>>are bunched to-gather, followed by all the state fields type etc.
>>Thus I can't associate which street address corresponds to which
>>address type in the response.
>>
>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>type="Office".  This should NOT return a document since the street for
>>the office address is "XY2" and not "XYZ1".  However when I query for
>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>
>>The problem seems to be that while DIH allows 'entities' within a
>>document  the SOLR schema does not preserve them - it 'flattens' all
>>of them out as indices for the document.
>>
>>I could work around the problem by creating SOLR fields like
>>"home_address_street" and "office_address_street" and do some xpath
>>mapping.  However I don't want to do it as we can have multiple
>>'other' addresses.  Also I have other fields whose type is not easily
>>distinguished like address.
>>
>>As I mentioned being new to SOLR I might have completely goofed on a
>>way to set it up - much appreciate any direction on it. I am using
>>SOLR 1.3
>>
>>Regards,
>>Guna
>
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Fergus McMenemie <fe...@twig.me.uk>.

Hello,

I am also a newbie and was wanting to do almost the exact same thing.
I was planning on doing the equivalent of:-

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
      <entity name ="f" processor="FileListEntityProcessor"
              baseDir="***"
              fileName=".*xml"
              rootEntity="false"
              dataSource="null" >
         <entity
           name="record"
	   processor="XPathEntityProcessor"
	   stream="false"
           rootEntity="false"            ***changed***
	   forEach="/record"
           url="${f.fileAbsolutePath}">
                 <field column="ID" xpath="/record/@id" commonField="true"/> ***change**
                 <!-- Address  -->
                  <entity
                     name="record_adr"
	 	     processor="XPathEntityProcessor"
	             stream="false"
	             forEach="/record/address"
       		     url="${f.fileAbsolutePath}">
                          <field column="address_street"  xpath="/ 
record/address/@street" />
			  <field column="address_state"   xpath="/record/address//@state" />
                          <field column="address_type"    xpath="/ 
record/address//@type" />
		</entity>
            </entity>
      </entity>
    </document>
</dataConfig>

ID is no longer unique within Solr, There would be multiple "documents"
with a given ID; one for each address. You can then search on ID and get 
the three addresses, you can also search on an address more sensibly.

I have not been able to try this yet as other issues are still to be
dealt with.

Comments?????

>Hi
>I may be completely off on this being new to SOLR but I am not sure  
>how to index related groups of fields in a document and preserver  
>their 'grouping'.   I  would appreciate any help on this.    Detailed  
>description of the problem below.
>
>I am trying to index an entity that can have multiple occurrences in  
>the same document - e.g. Address.  The address could be Shipping,  
>Home, Office etc.   Each address element has multiple values in it  
>like street, state etc.    Thus each address element is a group with  
>the state and street in one address element being related to each other.
>
>It looks like this in my source xml
>
><record>
>    <coreInfo id="123" , .../>
>    <address street="XYZ1" State="CA" ...type="home" />
>    <address street="XYZ2" state="CA" ... type="Office"/>
>    <address street="XYZ3" state="CA" ....type="Other"/>
></record>
>
>I have setup my DIH to treat these as entities as below
>
><dataConfig>
>    <dataSource type="FileDataSource" encoding="UTF-8" />
>    <document>
>      <entity name ="f" processor="FileListEntityProcessor"
>              baseDir="***"
>              fileName=".*xml"
>              rootEntity="false"
>              dataSource="null" >
>         <entity
>            name="record"
>	   processor="XPathEntityProcessor"
>	   stream="false"
>	   forEach="/record"
>            url="${f.fileAbsolutePath}">
>                 <field column="ID" xpath="/record/@id" />
>
>                 <!-- Address  -->
>                  <entity
>                      name="record_adr"
>	 	     processor="XPathEntityProcessor"
>	             stream="false"
>	             forEach="/record/address"
>       		     url="${f.fileAbsolutePath}">
>                          <field column="address_street"  xpath="/ 
>record/address/@street" />
>			 <field column="address_state"   xpath="/record/address//@state" />
>                          <field column="address_type"    xpath="/ 
>record/address//@type" />
>		</entity>
>            </entity>
>      </entity>
>    </document>
></dataConfig>
>
>
>The problem is as follows.  DIH seems to treat these as entities but  
>solr seems to flatten them out on indexing to fields in a document  
>(losing the entity part).
>
>So when I search for the an ID - in the response all the street fields  
>are bunched to-gather, followed by all the state fields type etc.   
>Thus I can't associate which street address corresponds to which  
>address type in the response.
>
>What seems harder is this - say I need to query on 'Street' = XYZ1 and  
>type="Office".  This should NOT return a document since the street for  
>the office address is "XY2" and not "XYZ1".  However when I query for  
>address_state:"XYZ1" and address_type:"Office" I get back this document.
>
>The problem seems to be that while DIH allows 'entities' within a  
>document  the SOLR schema does not preserve them - it 'flattens' all  
>of them out as indices for the document.
>
>I could work around the problem by creating SOLR fields like  
>"home_address_street" and "office_address_street" and do some xpath  
>mapping.  However I don't want to do it as we can have multiple  
>'other' addresses.  Also I have other fields whose type is not easily  
>distinguished like address.
>
>As I mentioned being new to SOLR I might have completely goofed on a  
>way to set it up - much appreciate any direction on it. I am using  
>SOLR 1.3
>
>Regards,
>Guna

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

I thought 1.3 supported dynamic fields in schema.xml?

Guna

On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>
> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>> chandraraju@apple.com> wrote:
>>
>>>
>>> <record>
>>> <coreInfo id="123" , .../>
>>> <address street="XYZ1" State="CA" ...type="home" />
>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>> </record>
>>>
>>> I have setup my DIH to treat these as entities as below
>>>
>>> <dataConfig>
>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>> <document>
>>>   <entity name ="f" processor="FileListEntityProcessor"
>>>           baseDir="***"
>>>           fileName=".*xml"
>>>           rootEntity="false"
>>>           dataSource="null" >
>>>      <entity
>>>         name="record"
>>>         processor="XPathEntityProcessor"
>>>         stream="false"
>>>         forEach="/record"
>>>         url="${f.fileAbsolutePath}">
>>>              <field column="ID" xpath="/record/@id" />
>>>
>>>              <!-- Address  -->
>>>               <entity
>>>                   name="record_adr"
>>>                   processor="XPathEntityProcessor"
>>>                   stream="false"
>>>                   forEach="/record/address"
>>>                   url="${f.fileAbsolutePath}">
>>>                       <field column="address_street"
>>> xpath="/record/address/@street" />
>>>                       <field column="address_state"
>>> xpath="/record/address//@state" />
>>>                       <field column="address_type"
>>> xpath="/record/address//@type" />
>>>              </entity>
>>>         </entity>
>>>   </entity>
>>> </document>
>>> </dataConfig>
>>>
>>
>> I think the only way is to create a dynamic field for each attribute
>> (street, state etc.). Write a transformer to copy the fields from  
>> your data
>> config to appropriately named dynamic field (e.g. street_1,  
>> state_1, etc).
>> To maintain this counter you will need to get/store it with
>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>
>> I cant't think of an easier way.
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

Thanks
Much appreciate the guidance. I think I will go with the single field  
approach for now.  Also will take a look at the URL below and come  
back if I have any ideas.


Guna
On Jan 25, 2009, at 12:49 AM, Shalin Shekhar Mangar wrote:

> On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju <
> chandraraju@apple.com> wrote:
>
>> Thanks
>> This sounds redundant to me - to store the fields separately and then
>> concat all of them to one copy field again.
>>
>
> Sometimes that may be the only way. For example, if you want to  
> facet on
> some of those fields, as well as to search them all.
>
>
>>
>> My XML is like this
>> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>>
>> I am currently using XPATH or XSL to separate them into individual  
>> indexed
>> fields like: address_state_1, address_type_1 etc. in SOLR.
>>
>> From what you say, it looks to me that I might as well just treat the
>> entire address as a single 'text field' and search within the text  
>> after
>> tokenizing.  This way I don't need to have the _1, _2 as the single  
>> text
>> field will contain the information together (and thus grouped - so  
>> I know
>> which is shipping/billing etc?).    Will there be any performance  
>> difference
>> between this and the copy field approach?
>>
>
> No I think, one field may even be better since you are creating less  
> number
> of fields. If you never need to do faceting and you don't want to  
> get the
> contents of each address field separately. This is your best option.
>
>
>>
>> Is there no other way (programmatic) to search across multiple  
>> fields?  I
>> did take a quick look at dismax but again it needs the field names  
>> to be
>> specifically mentioned in the config file or in the query.  I can't  
>> do this
>> as I am not able to predict the number of fields (e.g. credit cards  
>> a person
>> can have?).
>>
>> I like SOLR, but to me, this seems to be a very common and simple  
>> search
>> scenario/pattern - however its implementation in SOLR is appearing  
>> to be not
>> very straightforward.   (My apologies, if I on the wrong track here  
>> because
>> I don't understand SOLR well.  )
>
>
> There had been some discussion on having wildcards in field names.  
> But I
> guess nobody contributed (or had the need?) for the complete  
> proposal. Copy
> Fields give a lot of flexibility which is what most people use.
>
> http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju <
chandraraju@apple.com> wrote:

> Thanks
> This sounds redundant to me - to store the fields separately and then
> concat all of them to one copy field again.
>

Sometimes that may be the only way. For example, if you want to facet on
some of those fields, as well as to search them all.


>
> My XML is like this
> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>
> I am currently using XPATH or XSL to separate them into individual indexed
> fields like: address_state_1, address_type_1 etc. in SOLR.
>
> From what you say, it looks to me that I might as well just treat the
> entire address as a single 'text field' and search within the text after
> tokenizing.  This way I don't need to have the _1, _2 as the single text
> field will contain the information together (and thus grouped - so I know
> which is shipping/billing etc?).    Will there be any performance difference
> between this and the copy field approach?
>

No I think, one field may even be better since you are creating less number
of fields. If you never need to do faceting and you don't want to get the
contents of each address field separately. This is your best option.


>
> Is there no other way (programmatic) to search across multiple fields?  I
> did take a quick look at dismax but again it needs the field names to be
> specifically mentioned in the config file or in the query.  I can't do this
> as I am not able to predict the number of fields (e.g. credit cards a person
> can have?).
>
>  I like SOLR, but to me, this seems to be a very common and simple search
> scenario/pattern - however its implementation in SOLR is appearing to be not
> very straightforward.   (My apologies, if I on the wrong track here because
> I don't understand SOLR well.  )


There had been some discussion on having wildcards in field names. But I
guess nobody contributed (or had the need?) for the complete proposal. Copy
Fields give a lot of flexibility which is what most people use.

http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Alexander Ramos Jardim <al...@gmail.com>.

Hey Gunaranjan,

I have the same scenario as you.

A lucene index is denormalized. It should not contain entity relationship.
When I need to do something like you are doing, I group the related values
in one field.

Let's say we have 2 credit cards. the first has id 30459673 and taxes at
1.5%/month and the second has id 56305 and taxes at 2.5%. What I do is
create a multivalued field that I index the values as "id ^ taxes". In the
client side I put the logic to parse the string in a convenient way to work
with the values. I expect that helps you.

2009/1/25 Gunaranjan Chandraraju <ch...@apple.com>

> Paul
> Its not just about merging the fields or resource usage.  If you look at
> the scenario below, the issue is that it mixes up my fields (shipping and
> billing address) for instance.  I can't merge them and still keep the
> 'distinction' for search.    Your case is a 'generalization' field.  Thus
> the search will work.   I know mine is a trivial example and can be overcome
> by just two fields (shipping_address & billing_address  - but can I am
> talking of cases when we have many such 'groups of fields').
>
> In general such one to many relationship for indices in a 'document' is
> also really really common :).  Again I am not trying to argue a point - I
> would be happy to get some idea on how to do it and be corrected if I'm
> wrong.
>
> Lastly (while thats not my worry point right now), I tend to be careful
> with resources. When dealing with very large data, I will avoid any
> unnecessary overhead as-far-as-possible and take every optimization I get :)
>
> Guna
>
>
> On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:
>
>  Guna,
>>
>> it's really really normal to duplicate stuffs to be merged into a field.
>>
>> We do this all the time, for example to have a field
>> "text-in-any-language" while a field "text-in-english" is also there and the
>> queries boost matches in text-in-any-language less than text-in-english (if
>> user is in english).
>>
>> This difference in weighting is the gold of Lucene I feel (of retrieval
>> generally).
>> Also, depending on the field you make different indexing, while still
>> copying it in solr (for example use a different analyzer per language).
>>
>> paul
>>
>> PS: don't be scared with resources, this is the side of the world where
>> the resource is the least the problem! (typically a "catch-all-field"
>> wouldn't be stored though as this would then load the memory).
>>
>>
>> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
>>
>>  Thanks
>>> This sounds redundant to me - to store the fields separately and then
>>> concat all of them to one copy field again.
>>>
>>> My XML is like this
>>> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>>>
>>> I am currently using XPATH or XSL to separate them into individual
>>> indexed fields like: address_state_1, address_type_1 etc. in SOLR.
>>>
>>> From what you say, it looks to me that I might as well just treat the
>>> entire address as a single 'text field' and search within the text after
>>> tokenizing.  This way I don't need to have the _1, _2 as the single text
>>> field will contain the information together (and thus grouped - so I know
>>> which is shipping/billing etc?).    Will there be any performance difference
>>> between this and the copy field approach?
>>>
>>> Is there no other way (programmatic) to search across multiple fields?  I
>>> did take a quick look at dismax but again it needs the field names to be
>>> specifically mentioned in the config file or in the query.  I can't do this
>>> as I am not able to predict the number of fields (e.g. credit cards a person
>>> can have?).
>>>
>>> I like SOLR, but to me, this seems to be a very common and simple search
>>> scenario/pattern - however its implementation in SOLR is appearing to be not
>>> very straightforward.   (My apologies, if I on the wrong track here because
>>> I don't understand SOLR well.  )
>>>
>>> Regards,
>>> Guna
>>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>>  for searching you need to put them in a single field . use <copyField>
>>>> in schema.xml to achieve that
>>>>
>>>> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
>>>> <ch...@apple.com> wrote:
>>>>
>>>>> I make this approach work with XPATH and XSL.   However, this approach
>>>>> creates multiple fields of like this
>>>>>
>>>>> address_state_1
>>>>> address_state_2
>>>>> ...
>>>>> address_state_10
>>>>>
>>>>> and
>>>>>
>>>>> credit_card_1
>>>>> credit_card_2
>>>>> credit_card_3
>>>>>
>>>>>
>>>>> How do I search for a credit_card.    The query syntax does not seem to
>>>>> support wild cards in field names.   For e.g. I cant seem to do this ->
>>>>> credit_card*:1234 4567 7890 1234
>>>>>
>>>>> On the search side I would not know how many credit card fields  got
>>>>> created
>>>>> for a document and so I need that to be dynamic.
>>>>>
>>>>> -g
>>>>>
>>>>>
>>>>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>>>>>
>>>>>  Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>>>>>
>>>>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>>>>>> shalinmangar@gmail.com> wrote:
>>>>>>
>>>>>>  On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>>>>>> chandraraju@apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> <record>
>>>>>>>> <coreInfo id="123" , .../>
>>>>>>>> <address street="XYZ1" State="CA" ...type="home" />
>>>>>>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>>>>>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>>>>>>> </record>
>>>>>>>>
>>>>>>>> I have setup my DIH to treat these as entities as below
>>>>>>>>
>>>>>>>> <dataConfig>
>>>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>>>>> <document>
>>>>>>>> <entity name ="f" processor="FileListEntityProcessor"
>>>>>>>>       baseDir="***"
>>>>>>>>       fileName=".*xml"
>>>>>>>>       rootEntity="false"
>>>>>>>>       dataSource="null" >
>>>>>>>>  <entity
>>>>>>>>     name="record"
>>>>>>>>     processor="XPathEntityProcessor"
>>>>>>>>     stream="false"
>>>>>>>>     forEach="/record"
>>>>>>>>     url="${f.fileAbsolutePath}">
>>>>>>>>          <field column="ID" xpath="/record/@id" />
>>>>>>>>
>>>>>>>>          <!-- Address  -->
>>>>>>>>           <entity
>>>>>>>>               name="record_adr"
>>>>>>>>               processor="XPathEntityProcessor"
>>>>>>>>               stream="false"
>>>>>>>>               forEach="/record/address"
>>>>>>>>               url="${f.fileAbsolutePath}">
>>>>>>>>                   <field column="address_street"
>>>>>>>> xpath="/record/address/@street" />
>>>>>>>>                   <field column="address_state"
>>>>>>>> xpath="/record/address//@state" />
>>>>>>>>                   <field column="address_type"
>>>>>>>> xpath="/record/address//@type" />
>>>>>>>>          </entity>
>>>>>>>>     </entity>
>>>>>>>> </entity>
>>>>>>>> </document>
>>>>>>>> </dataConfig>
>>>>>>>>
>>>>>>>>
>>>>>>> I think the only way is to create a dynamic field for each attribute
>>>>>>> (street, state etc.). Write a transformer to copy the fields from
>>>>>>> your
>>>>>>> data
>>>>>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>>>>>> etc).
>>>>>>> To maintain this counter you will need to get/store it with
>>>>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>>>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>>>>>
>>>>>>> I cant't think of an easier way.
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Shalin Shekhar Mangar.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>>
>>
>


-- 
Alexander Ramos Jardim

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

Paul
Its not just about merging the fields or resource usage.  If you look  
at the scenario below, the issue is that it mixes up my fields  
(shipping and billing address) for instance.  I can't merge them and  
still keep the 'distinction' for search.    Your case is a  
'generalization' field.  Thus the search will work.   I know mine is a  
trivial example and can be overcome by just two fields  
(shipping_address & billing_address  - but can I am talking of cases  
when we have many such 'groups of fields').

In general such one to many relationship for indices in a 'document'  
is also really really common :).  Again I am not trying to argue a  
point - I would be happy to get some idea on how to do it and be  
corrected if I'm wrong.

Lastly (while thats not my worry point right now), I tend to be  
careful with resources. When dealing with very large data, I will  
avoid any unnecessary overhead as-far-as-possible and take every  
optimization I get :)

Guna

On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:

> Guna,
>
> it's really really normal to duplicate stuffs to be merged into a  
> field.
>
> We do this all the time, for example to have a field "text-in-any- 
> language" while a field "text-in-english" is also there and the  
> queries boost matches in text-in-any-language less than text-in- 
> english (if user is in english).
>
> This difference in weighting is the gold of Lucene I feel (of  
> retrieval generally).
> Also, depending on the field you make different indexing, while  
> still copying it in solr (for example use a different analyzer per  
> language).
>
> paul
>
> PS: don't be scared with resources, this is the side of the world  
> where the resource is the least the problem! (typically a "catch-all- 
> field" wouldn't be stored though as this would then load the memory).
>
>
> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
>
>> Thanks
>> This sounds redundant to me - to store the fields separately and  
>> then concat all of them to one copy field again.
>>
>> My XML is like this
>> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>>
>> I am currently using XPATH or XSL to separate them into individual  
>> indexed fields like: address_state_1, address_type_1 etc. in SOLR.
>>
>> From what you say, it looks to me that I might as well just treat  
>> the entire address as a single 'text field' and search within the  
>> text after tokenizing.  This way I don't need to have the _1, _2 as  
>> the single text field will contain the information together (and  
>> thus grouped - so I know which is shipping/billing etc?).    Will  
>> there be any performance difference between this and the copy field  
>> approach?
>>
>> Is there no other way (programmatic) to search across multiple  
>> fields?  I did take a quick look at dismax but again it needs the  
>> field names to be specifically mentioned in the config file or in  
>> the query.  I can't do this as I am not able to predict the number  
>> of fields (e.g. credit cards a person can have?).
>>
>> I like SOLR, but to me, this seems to be a very common and simple  
>> search scenario/pattern - however its implementation in SOLR is  
>> appearing to be not very straightforward.   (My apologies, if I on  
>> the wrong track here because I don't understand SOLR well.  )
>>
>> Regards,
>> Guna
>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
>> नोब्ळ् wrote:
>>
>>> for searching you need to put them in a single field . use  
>>> <copyField>
>>> in schema.xml to achieve that
>>>
>>> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
>>> <ch...@apple.com> wrote:
>>>> I make this approach work with XPATH and XSL.   However, this  
>>>> approach
>>>> creates multiple fields of like this
>>>>
>>>> address_state_1
>>>> address_state_2
>>>> ...
>>>> address_state_10
>>>>
>>>> and
>>>>
>>>> credit_card_1
>>>> credit_card_2
>>>> credit_card_3
>>>>
>>>>
>>>> How do I search for a credit_card.    The query syntax does not  
>>>> seem to
>>>> support wild cards in field names.   For e.g. I cant seem to do  
>>>> this ->
>>>> credit_card*:1234 4567 7890 1234
>>>>
>>>> On the search side I would not know how many credit card fields   
>>>> got created
>>>> for a document and so I need that to be dynamic.
>>>>
>>>> -g
>>>>
>>>>
>>>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>>>>
>>>>> Oops, one more gotcha. The dynamic field support is only in 1.4  
>>>>> trunk.
>>>>>
>>>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>>>>> shalinmangar@gmail.com> wrote:
>>>>>
>>>>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>>>>> chandraraju@apple.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> <record>
>>>>>>> <coreInfo id="123" , .../>
>>>>>>> <address street="XYZ1" State="CA" ...type="home" />
>>>>>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>>>>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>>>>>> </record>
>>>>>>>
>>>>>>> I have setup my DIH to treat these as entities as below
>>>>>>>
>>>>>>> <dataConfig>
>>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>>>> <document>
>>>>>>> <entity name ="f" processor="FileListEntityProcessor"
>>>>>>>        baseDir="***"
>>>>>>>        fileName=".*xml"
>>>>>>>        rootEntity="false"
>>>>>>>        dataSource="null" >
>>>>>>>   <entity
>>>>>>>      name="record"
>>>>>>>      processor="XPathEntityProcessor"
>>>>>>>      stream="false"
>>>>>>>      forEach="/record"
>>>>>>>      url="${f.fileAbsolutePath}">
>>>>>>>           <field column="ID" xpath="/record/@id" />
>>>>>>>
>>>>>>>           <!-- Address  -->
>>>>>>>            <entity
>>>>>>>                name="record_adr"
>>>>>>>                processor="XPathEntityProcessor"
>>>>>>>                stream="false"
>>>>>>>                forEach="/record/address"
>>>>>>>                url="${f.fileAbsolutePath}">
>>>>>>>                    <field column="address_street"
>>>>>>> xpath="/record/address/@street" />
>>>>>>>                    <field column="address_state"
>>>>>>> xpath="/record/address//@state" />
>>>>>>>                    <field column="address_type"
>>>>>>> xpath="/record/address//@type" />
>>>>>>>           </entity>
>>>>>>>      </entity>
>>>>>>> </entity>
>>>>>>> </document>
>>>>>>> </dataConfig>
>>>>>>>
>>>>>>
>>>>>> I think the only way is to create a dynamic field for each  
>>>>>> attribute
>>>>>> (street, state etc.). Write a transformer to copy the fields  
>>>>>> from your
>>>>>> data
>>>>>> config to appropriately named dynamic field (e.g. street_1,  
>>>>>> state_1,
>>>>>> etc).
>>>>>> To maintain this counter you will need to get/store it with
>>>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>>>>
>>>>>> I cant't think of an easier way.
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>
>>>
>>>
>>> -- 
>>> --Noble Paul
>>
>

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Paul Libbrecht <pa...@activemath.org>.

Guna,

it's really really normal to duplicate stuffs to be merged into a field.

We do this all the time, for example to have a field "text-in-any- 
language" while a field "text-in-english" is also there and the  
queries boost matches in text-in-any-language less than text-in- 
english (if user is in english).

This difference in weighting is the gold of Lucene I feel (of  
retrieval generally).
Also, depending on the field you make different indexing, while still  
copying it in solr (for example use a different analyzer per language).

paul

PS: don't be scared with resources, this is the side of the world  
where the resource is the least the problem! (typically a "catch-all- 
field" wouldn't be stored though as this would then load the memory).


Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :

> Thanks
> This sounds redundant to me - to store the fields separately and  
> then concat all of them to one copy field again.
>
> My XML is like this
> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>
> I am currently using XPATH or XSL to separate them into individual  
> indexed fields like: address_state_1, address_type_1 etc. in SOLR.
>
> From what you say, it looks to me that I might as well just treat  
> the entire address as a single 'text field' and search within the  
> text after tokenizing.  This way I don't need to have the _1, _2 as  
> the single text field will contain the information together (and  
> thus grouped - so I know which is shipping/billing etc?).    Will  
> there be any performance difference between this and the copy field  
> approach?
>
> Is there no other way (programmatic) to search across multiple  
> fields?  I did take a quick look at dismax but again it needs the  
> field names to be specifically mentioned in the config file or in  
> the query.  I can't do this as I am not able to predict the number  
> of fields (e.g. credit cards a person can have?).
>
> I like SOLR, but to me, this seems to be a very common and simple  
> search scenario/pattern - however its implementation in SOLR is  
> appearing to be not very straightforward.   (My apologies, if I on  
> the wrong track here because I don't understand SOLR well.  )
>
> Regards,
> Guna
> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
> नोब्ळ् wrote:
>
>> for searching you need to put them in a single field . use  
>> <copyField>
>> in schema.xml to achieve that
>>
>> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
>> <ch...@apple.com> wrote:
>>> I make this approach work with XPATH and XSL.   However, this  
>>> approach
>>> creates multiple fields of like this
>>>
>>> address_state_1
>>> address_state_2
>>> ...
>>> address_state_10
>>>
>>> and
>>>
>>> credit_card_1
>>> credit_card_2
>>> credit_card_3
>>>
>>>
>>> How do I search for a credit_card.    The query syntax does not  
>>> seem to
>>> support wild cards in field names.   For e.g. I cant seem to do  
>>> this ->
>>> credit_card*:1234 4567 7890 1234
>>>
>>> On the search side I would not know how many credit card fields   
>>> got created
>>> for a document and so I need that to be dynamic.
>>>
>>> -g
>>>
>>>
>>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>>>
>>>> Oops, one more gotcha. The dynamic field support is only in 1.4  
>>>> trunk.
>>>>
>>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>>>> shalinmangar@gmail.com> wrote:
>>>>
>>>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>>>> chandraraju@apple.com> wrote:
>>>>>
>>>>>>
>>>>>> <record>
>>>>>> <coreInfo id="123" , .../>
>>>>>> <address street="XYZ1" State="CA" ...type="home" />
>>>>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>>>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>>>>> </record>
>>>>>>
>>>>>> I have setup my DIH to treat these as entities as below
>>>>>>
>>>>>> <dataConfig>
>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>>> <document>
>>>>>> <entity name ="f" processor="FileListEntityProcessor"
>>>>>>         baseDir="***"
>>>>>>         fileName=".*xml"
>>>>>>         rootEntity="false"
>>>>>>         dataSource="null" >
>>>>>>    <entity
>>>>>>       name="record"
>>>>>>       processor="XPathEntityProcessor"
>>>>>>       stream="false"
>>>>>>       forEach="/record"
>>>>>>       url="${f.fileAbsolutePath}">
>>>>>>            <field column="ID" xpath="/record/@id" />
>>>>>>
>>>>>>            <!-- Address  -->
>>>>>>             <entity
>>>>>>                 name="record_adr"
>>>>>>                 processor="XPathEntityProcessor"
>>>>>>                 stream="false"
>>>>>>                 forEach="/record/address"
>>>>>>                 url="${f.fileAbsolutePath}">
>>>>>>                     <field column="address_street"
>>>>>> xpath="/record/address/@street" />
>>>>>>                     <field column="address_state"
>>>>>> xpath="/record/address//@state" />
>>>>>>                     <field column="address_type"
>>>>>> xpath="/record/address//@type" />
>>>>>>            </entity>
>>>>>>       </entity>
>>>>>> </entity>
>>>>>> </document>
>>>>>> </dataConfig>
>>>>>>
>>>>>
>>>>> I think the only way is to create a dynamic field for each  
>>>>> attribute
>>>>> (street, state etc.). Write a transformer to copy the fields  
>>>>> from your
>>>>> data
>>>>> config to appropriately named dynamic field (e.g. street_1,  
>>>>> state_1,
>>>>> etc).
>>>>> To maintain this counter you will need to get/store it with
>>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>>>
>>>>> I cant't think of an easier way.
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>>
>>
>> -- 
>> --Noble Paul
>

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

Thanks
This sounds redundant to me - to store the fields separately and then  
concat all of them to one copy field again.

My XML is like this
<address street="XYZ" state="CA" country="1" type="shipping" ...>

I am currently using XPATH or XSL to separate them into individual  
indexed fields like: address_state_1, address_type_1 etc. in SOLR.

 From what you say, it looks to me that I might as well just treat the  
entire address as a single 'text field' and search within the text  
after tokenizing.  This way I don't need to have the _1, _2 as the  
single text field will contain the information together (and thus  
grouped - so I know which is shipping/billing etc?).    Will there be  
any performance difference between this and the copy field approach?

Is there no other way (programmatic) to search across multiple  
fields?  I did take a quick look at dismax but again it needs the  
field names to be specifically mentioned in the config file or in the  
query.  I can't do this as I am not able to predict the number of  
fields (e.g. credit cards a person can have?).

  I like SOLR, but to me, this seems to be a very common and simple  
search scenario/pattern - however its implementation in SOLR is  
appearing to be not very straightforward.   (My apologies, if I on the  
wrong track here because I don't understand SOLR well.  )

Regards,
Guna
On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> for searching you need to put them in a single field . use <copyField>
> in schema.xml to achieve that
>
> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
> <ch...@apple.com> wrote:
>> I make this approach work with XPATH and XSL.   However, this  
>> approach
>> creates multiple fields of like this
>>
>> address_state_1
>> address_state_2
>> ...
>> address_state_10
>>
>> and
>>
>> credit_card_1
>> credit_card_2
>> credit_card_3
>>
>>
>> How do I search for a credit_card.    The query syntax does not  
>> seem to
>> support wild cards in field names.   For e.g. I cant seem to do  
>> this ->
>> credit_card*:1234 4567 7890 1234
>>
>> On the search side I would not know how many credit card fields   
>> got created
>> for a document and so I need that to be dynamic.
>>
>> -g
>>
>>
>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>>
>>> Oops, one more gotcha. The dynamic field support is only in 1.4  
>>> trunk.
>>>
>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>>> shalinmangar@gmail.com> wrote:
>>>
>>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>>> chandraraju@apple.com> wrote:
>>>>
>>>>>
>>>>> <record>
>>>>> <coreInfo id="123" , .../>
>>>>> <address street="XYZ1" State="CA" ...type="home" />
>>>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>>>> </record>
>>>>>
>>>>> I have setup my DIH to treat these as entities as below
>>>>>
>>>>> <dataConfig>
>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>> <document>
>>>>>  <entity name ="f" processor="FileListEntityProcessor"
>>>>>          baseDir="***"
>>>>>          fileName=".*xml"
>>>>>          rootEntity="false"
>>>>>          dataSource="null" >
>>>>>     <entity
>>>>>        name="record"
>>>>>        processor="XPathEntityProcessor"
>>>>>        stream="false"
>>>>>        forEach="/record"
>>>>>        url="${f.fileAbsolutePath}">
>>>>>             <field column="ID" xpath="/record/@id" />
>>>>>
>>>>>             <!-- Address  -->
>>>>>              <entity
>>>>>                  name="record_adr"
>>>>>                  processor="XPathEntityProcessor"
>>>>>                  stream="false"
>>>>>                  forEach="/record/address"
>>>>>                  url="${f.fileAbsolutePath}">
>>>>>                      <field column="address_street"
>>>>> xpath="/record/address/@street" />
>>>>>                      <field column="address_state"
>>>>> xpath="/record/address//@state" />
>>>>>                      <field column="address_type"
>>>>> xpath="/record/address//@type" />
>>>>>             </entity>
>>>>>        </entity>
>>>>>  </entity>
>>>>> </document>
>>>>> </dataConfig>
>>>>>
>>>>
>>>> I think the only way is to create a dynamic field for each  
>>>> attribute
>>>> (street, state etc.). Write a transformer to copy the fields from  
>>>> your
>>>> data
>>>> config to appropriately named dynamic field (e.g. street_1,  
>>>> state_1,
>>>> etc).
>>>> To maintain this counter you will need to get/store it with
>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>>
>>>> I cant't think of an easier way.
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>
>>
>
>
>
> -- 
> --Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

for searching you need to put them in a single field . use <copyField>
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
<ch...@apple.com> wrote:
> I make this approach work with XPATH and XSL.   However, this approach
> creates multiple fields of like this
>
> address_state_1
> address_state_2
> ...
> address_state_10
>
> and
>
> credit_card_1
> credit_card_2
> credit_card_3
>
>
> How do I search for a credit_card.    The query syntax does not seem to
> support wild cards in field names.   For e.g. I cant seem to do this ->
> credit_card*:1234 4567 7890 1234
>
> On the search side I would not know how many credit card fields  got created
> for a document and so I need that to be dynamic.
>
> -g
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>> shalinmangar@gmail.com> wrote:
>>
>>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>> chandraraju@apple.com> wrote:
>>>
>>>>
>>>> <record>
>>>>  <coreInfo id="123" , .../>
>>>>  <address street="XYZ1" State="CA" ...type="home" />
>>>>  <address street="XYZ2" state="CA" ... type="Office"/>
>>>>  <address street="XYZ3" state="CA" ....type="Other"/>
>>>> </record>
>>>>
>>>> I have setup my DIH to treat these as entities as below
>>>>
>>>> <dataConfig>
>>>>  <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>  <document>
>>>>   <entity name ="f" processor="FileListEntityProcessor"
>>>>           baseDir="***"
>>>>           fileName=".*xml"
>>>>           rootEntity="false"
>>>>           dataSource="null" >
>>>>      <entity
>>>>         name="record"
>>>>         processor="XPathEntityProcessor"
>>>>         stream="false"
>>>>         forEach="/record"
>>>>         url="${f.fileAbsolutePath}">
>>>>              <field column="ID" xpath="/record/@id" />
>>>>
>>>>              <!-- Address  -->
>>>>               <entity
>>>>                   name="record_adr"
>>>>                   processor="XPathEntityProcessor"
>>>>                   stream="false"
>>>>                   forEach="/record/address"
>>>>                   url="${f.fileAbsolutePath}">
>>>>                       <field column="address_street"
>>>> xpath="/record/address/@street" />
>>>>                       <field column="address_state"
>>>> xpath="/record/address//@state" />
>>>>                       <field column="address_type"
>>>> xpath="/record/address//@type" />
>>>>              </entity>
>>>>         </entity>
>>>>   </entity>
>>>>  </document>
>>>> </dataConfig>
>>>>
>>>
>>> I think the only way is to create a dynamic field for each attribute
>>> (street, state etc.). Write a transformer to copy the fields from your
>>> data
>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>> etc).
>>> To maintain this counter you will need to get/store it with
>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>
>>> I cant't think of an easier way.
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
>



-- 
--Noble Paul

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

I make this approach work with XPATH and XSL.   However, this approach  
creates multiple fields of like this

address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.    The query syntax does not seem  
to support wild cards in field names.   For e.g. I cant seem to do  
this ->   credit_card*:1234 4567 7890 1234

On the search side I would not know how many credit card fields  got  
created for a document and so I need that to be dynamic.

-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>
> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>> chandraraju@apple.com> wrote:
>>
>>>
>>> <record>
>>>  <coreInfo id="123" , .../>
>>>  <address street="XYZ1" State="CA" ...type="home" />
>>>  <address street="XYZ2" state="CA" ... type="Office"/>
>>>  <address street="XYZ3" state="CA" ....type="Other"/>
>>> </record>
>>>
>>> I have setup my DIH to treat these as entities as below
>>>
>>> <dataConfig>
>>>  <dataSource type="FileDataSource" encoding="UTF-8" />
>>>  <document>
>>>    <entity name ="f" processor="FileListEntityProcessor"
>>>            baseDir="***"
>>>            fileName=".*xml"
>>>            rootEntity="false"
>>>            dataSource="null" >
>>>       <entity
>>>          name="record"
>>>          processor="XPathEntityProcessor"
>>>          stream="false"
>>>          forEach="/record"
>>>          url="${f.fileAbsolutePath}">
>>>               <field column="ID" xpath="/record/@id" />
>>>
>>>               <!-- Address  -->
>>>                <entity
>>>                    name="record_adr"
>>>                    processor="XPathEntityProcessor"
>>>                    stream="false"
>>>                    forEach="/record/address"
>>>                    url="${f.fileAbsolutePath}">
>>>                        <field column="address_street"
>>> xpath="/record/address/@street" />
>>>                        <field column="address_state"
>>> xpath="/record/address//@state" />
>>>                        <field column="address_type"
>>> xpath="/record/address//@type" />
>>>               </entity>
>>>          </entity>
>>>    </entity>
>>>  </document>
>>> </dataConfig>
>>>
>>
>> I think the only way is to create a dynamic field for each attribute
>> (street, state etc.). Write a transformer to copy the fields from  
>> your data
>> config to appropriately named dynamic field (e.g. street_1,  
>> state_1, etc).
>> To maintain this counter you will need to get/store it with
>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>
>> I cant't think of an easier way.
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Yes Solr does. But DataImportHandler with the 1.3 release does not support
it.

However, you can use the trunk data import handler jar with Solr 1.3 if you
do not feel comfortable using Solr 1.4 trunk.

On Fri, Jan 23, 2009 at 1:36 PM, Gunaranjan Chandraraju <
chandraraju@apple.com> wrote:

>
> I thought 1.3 supported dynamic fields in schema.xml?
>
> Guna
>
>
> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>
>  Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>
>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>> shalinmangar@gmail.com> wrote:
>>
>>  On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>> chandraraju@apple.com> wrote:
>>>
>>>
>>>> <record>
>>>>  <coreInfo id="123" , .../>
>>>>  <address street="XYZ1" State="CA" ...type="home" />
>>>>  <address street="XYZ2" state="CA" ... type="Office"/>
>>>>  <address street="XYZ3" state="CA" ....type="Other"/>
>>>> </record>
>>>>
>>>> I have setup my DIH to treat these as entities as below
>>>>
>>>> <dataConfig>
>>>>  <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>  <document>
>>>>   <entity name ="f" processor="FileListEntityProcessor"
>>>>           baseDir="***"
>>>>           fileName=".*xml"
>>>>           rootEntity="false"
>>>>           dataSource="null" >
>>>>      <entity
>>>>         name="record"
>>>>         processor="XPathEntityProcessor"
>>>>         stream="false"
>>>>         forEach="/record"
>>>>         url="${f.fileAbsolutePath}">
>>>>              <field column="ID" xpath="/record/@id" />
>>>>
>>>>              <!-- Address  -->
>>>>               <entity
>>>>                   name="record_adr"
>>>>                   processor="XPathEntityProcessor"
>>>>                   stream="false"
>>>>                   forEach="/record/address"
>>>>                   url="${f.fileAbsolutePath}">
>>>>                       <field column="address_street"
>>>> xpath="/record/address/@street" />
>>>>                       <field column="address_state"
>>>> xpath="/record/address//@state" />
>>>>                       <field column="address_type"
>>>> xpath="/record/address//@type" />
>>>>              </entity>
>>>>         </entity>
>>>>   </entity>
>>>>  </document>
>>>> </dataConfig>
>>>>
>>>>
>>> I think the only way is to create a dynamic field for each attribute
>>> (street, state etc.). Write a transformer to copy the fields from your
>>> data
>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>> etc).
>>> To maintain this counter you will need to get/store it with
>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>
>>> I cant't think of an easier way.
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Gunaranjan Chandraraju <ch...@apple.com>.

I thought 1.3 supported dynamic fields in schema.xml?

Guna

On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>
> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>> chandraraju@apple.com> wrote:
>>
>>>
>>> <record>
>>>  <coreInfo id="123" , .../>
>>>  <address street="XYZ1" State="CA" ...type="home" />
>>>  <address street="XYZ2" state="CA" ... type="Office"/>
>>>  <address street="XYZ3" state="CA" ....type="Other"/>
>>> </record>
>>>
>>> I have setup my DIH to treat these as entities as below
>>>
>>> <dataConfig>
>>>  <dataSource type="FileDataSource" encoding="UTF-8" />
>>>  <document>
>>>    <entity name ="f" processor="FileListEntityProcessor"
>>>            baseDir="***"
>>>            fileName=".*xml"
>>>            rootEntity="false"
>>>            dataSource="null" >
>>>       <entity
>>>          name="record"
>>>          processor="XPathEntityProcessor"
>>>          stream="false"
>>>          forEach="/record"
>>>          url="${f.fileAbsolutePath}">
>>>               <field column="ID" xpath="/record/@id" />
>>>
>>>               <!-- Address  -->
>>>                <entity
>>>                    name="record_adr"
>>>                    processor="XPathEntityProcessor"
>>>                    stream="false"
>>>                    forEach="/record/address"
>>>                    url="${f.fileAbsolutePath}">
>>>                        <field column="address_street"
>>> xpath="/record/address/@street" />
>>>                        <field column="address_state"
>>> xpath="/record/address//@state" />
>>>                        <field column="address_type"
>>> xpath="/record/address//@type" />
>>>               </entity>
>>>          </entity>
>>>    </entity>
>>>  </document>
>>> </dataConfig>
>>>
>>
>> I think the only way is to create a dynamic field for each attribute
>> (street, state etc.). Write a transformer to copy the fields from  
>> your data
>> config to appropriately named dynamic field (e.g. street_1,  
>> state_1, etc).
>> To maintain this counter you will need to get/store it with
>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>
>> I cant't think of an easier way.
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
> chandraraju@apple.com> wrote:
>
>>
>> <record>
>>   <coreInfo id="123" , .../>
>>   <address street="XYZ1" State="CA" ...type="home" />
>>   <address street="XYZ2" state="CA" ... type="Office"/>
>>   <address street="XYZ3" state="CA" ....type="Other"/>
>> </record>
>>
>> I have setup my DIH to treat these as entities as below
>>
>> <dataConfig>
>>   <dataSource type="FileDataSource" encoding="UTF-8" />
>>   <document>
>>     <entity name ="f" processor="FileListEntityProcessor"
>>             baseDir="***"
>>             fileName=".*xml"
>>             rootEntity="false"
>>             dataSource="null" >
>>        <entity
>>           name="record"
>>           processor="XPathEntityProcessor"
>>           stream="false"
>>           forEach="/record"
>>           url="${f.fileAbsolutePath}">
>>                <field column="ID" xpath="/record/@id" />
>>
>>                <!-- Address  -->
>>                 <entity
>>                     name="record_adr"
>>                     processor="XPathEntityProcessor"
>>                     stream="false"
>>                     forEach="/record/address"
>>                     url="${f.fileAbsolutePath}">
>>                         <field column="address_street"
>>  xpath="/record/address/@street" />
>>                         <field column="address_state"
>> xpath="/record/address//@state" />
>>                         <field column="address_type"
>>  xpath="/record/address//@type" />
>>                </entity>
>>           </entity>
>>     </entity>
>>   </document>
>> </dataConfig>
>>
>
> I think the only way is to create a dynamic field for each attribute
> (street, state etc.). Write a transformer to copy the fields from your data
> config to appropriately named dynamic field (e.g. street_1, state_1, etc).
> To maintain this counter you will need to get/store it with
> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>
> I cant't think of an easier way.
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: How to make Relationships work for Multi-valued Index Fields?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
chandraraju@apple.com> wrote:

>
> <record>
>   <coreInfo id="123" , .../>
>   <address street="XYZ1" State="CA" ...type="home" />
>   <address street="XYZ2" state="CA" ... type="Office"/>
>   <address street="XYZ3" state="CA" ....type="Other"/>
> </record>
>
> I have setup my DIH to treat these as entities as below
>
> <dataConfig>
>   <dataSource type="FileDataSource" encoding="UTF-8" />
>   <document>
>     <entity name ="f" processor="FileListEntityProcessor"
>             baseDir="***"
>             fileName=".*xml"
>             rootEntity="false"
>             dataSource="null" >
>        <entity
>           name="record"
>           processor="XPathEntityProcessor"
>           stream="false"
>           forEach="/record"
>           url="${f.fileAbsolutePath}">
>                <field column="ID" xpath="/record/@id" />
>
>                <!-- Address  -->
>                 <entity
>                     name="record_adr"
>                     processor="XPathEntityProcessor"
>                     stream="false"
>                     forEach="/record/address"
>                     url="${f.fileAbsolutePath}">
>                         <field column="address_street"
>  xpath="/record/address/@street" />
>                         <field column="address_state"
> xpath="/record/address//@state" />
>                         <field column="address_type"
>  xpath="/record/address//@type" />
>                </entity>
>           </entity>
>     </entity>
>   </document>
> </dataConfig>
>

I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from your data
config to appropriately named dynamic field (e.g. street_1, state_1, etc).
To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
-- 
Regards,
Shalin Shekhar Mangar.