You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/09/13 21:57:01 UTC

[DIH] Multiple repeat XPath stmts

I'm trying to import several RSS feeds using DIH and running into a  
bit of a problem.  Some feeds define a GUID value that I map to my  
Solr ID, while others don't.  I also have a link field which I fill in  
with the RSS link field.  For the feeds that don't have the GUID value  
set, I want to use the link field as the id.  However, if I define the  
same XPath twice, but map it to two diff. columns I don't get the id  
value set.

For instance, I want to do:
schema.xml
<field name="id" type="string" indexed="true" stored="true"  
required="true"/>
<field name="link" type="string" indexed="true" stored="false"/>

DIH config:
<field column="id" xpath="/rss/channel/item/link" />
<field column="link" xpath="/rss/channel/item/link" />

Because I am consolidating multiple fields, I'm not able to do  
copyFields, unless of course, I wanted to implement conditional copy  
fields (only copy if the field is not defined) which I would rather not.

How do I solve this?

Thanks,
Grant

Re: [DIH] Multiple repeat XPath stmts

Posted by alesp <pe...@gmail.com>.
TNX. A lifesaver...

--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Multiple-repeat-XPath-stmts-tp499770p3989439.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [DIH] Multiple repeat XPath stmts

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
if you wish to use conditional copy you can use a RegexTransformer

<field column="guid"  xpath="/rss/channel/guid"/>
<field column="id" regex=".*" sourceColName="guid"
replaceWith="${entityname.guid}"/>

this means that if guid!= null 'id' will be set to guid


On Mon, Sep 14, 2009 at 4:16 PM, Grant Ingersoll <gs...@apache.org> wrote:
> As I said, copying is not an option.  That will break everything else.
>
> On Sep 14, 2009, at 1:07 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> The XPathRecordreader has a limit one mapping per xpath. So copying is
>> the best solution
>>
>> On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie <fe...@twig.me.uk>
>> wrote:
>>>>
>>>> I'm trying to import several RSS feeds using DIH and running into a
>>>> bit of a problem.  Some feeds define a GUID value that I map to my
>>>> Solr ID, while others don't.  I also have a link field which I fill in
>>>> with the RSS link field.  For the feeds that don't have the GUID value
>>>> set, I want to use the link field as the id.  However, if I define the
>>>> same XPath twice, but map it to two diff. columns I don't get the id
>>>> value set.
>>>>
>>>> For instance, I want to do:
>>>> schema.xml
>>>> <field name="id" type="string" indexed="true" stored="true"
>>>> required="true"/>
>>>> <field name="link" type="string" indexed="true" stored="false"/>
>>>>
>>>> DIH config:
>>>> <field column="id" xpath="/rss/channel/item/link" />
>>>> <field column="link" xpath="/rss/channel/item/link" />
>>>>
>>>> Because I am consolidating multiple fields, I'm not able to do
>>>> copyFields, unless of course, I wanted to implement conditional copy
>>>> fields (only copy if the field is not defined) which I would rather not.
>>>>
>>>> How do I solve this?
>>>>
>>>
>>> How about.
>>>
>>> <entity name="x" ... transformer="TemplateTransformer">
>>>  <field column="link" xpath="/rss/channel/item/link" />
>>>  <field column="GUID" xpath="/rss/channel/GUID" />
>>>  <field column="id"   template="${x.link}" />
>>>  <field column-"id"   template="${x.GUID}" />
>>>
>>> The TemplateTransformer does nothing if its source expression is null.
>>> So the first transform assign the fallback value to ID, this is
>>> overwritten by the GUID if it is defined.
>>>
>>> You can not sort of do if-then-else using a combination of template
>>> and regex transformers. Adding a bit of maths to the transformers and
>>> I think we will have a turing complete language:-)
>>>
>>> fergus.
>>>
>>>> Thanks,
>>>> Grant
>>>
>>> --
>>>
>>> ===============================================================
>>> Fergus McMenemie               Email:fergus@twig.me.uk
>>> Techmore Ltd                   Phone:(UK) 07721 376021
>>>
>>> Unix/Mac/Intranets             Analyst Programmer
>>> ===============================================================
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: [DIH] Multiple repeat XPath stmts

Posted by Grant Ingersoll <gs...@apache.org>.
As I said, copying is not an option.  That will break everything else.

On Sep 14, 2009, at 1:07 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> The XPathRecordreader has a limit one mapping per xpath. So copying is
> the best solution
>
> On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie  
> <fe...@twig.me.uk> wrote:
>>> I'm trying to import several RSS feeds using DIH and running into a
>>> bit of a problem.  Some feeds define a GUID value that I map to my
>>> Solr ID, while others don't.  I also have a link field which I  
>>> fill in
>>> with the RSS link field.  For the feeds that don't have the GUID  
>>> value
>>> set, I want to use the link field as the id.  However, if I define  
>>> the
>>> same XPath twice, but map it to two diff. columns I don't get the id
>>> value set.
>>>
>>> For instance, I want to do:
>>> schema.xml
>>> <field name="id" type="string" indexed="true" stored="true"
>>> required="true"/>
>>> <field name="link" type="string" indexed="true" stored="false"/>
>>>
>>> DIH config:
>>> <field column="id" xpath="/rss/channel/item/link" />
>>> <field column="link" xpath="/rss/channel/item/link" />
>>>
>>> Because I am consolidating multiple fields, I'm not able to do
>>> copyFields, unless of course, I wanted to implement conditional copy
>>> fields (only copy if the field is not defined) which I would  
>>> rather not.
>>>
>>> How do I solve this?
>>>
>>
>> How about.
>>
>> <entity name="x" ... transformer="TemplateTransformer">
>>  <field column="link" xpath="/rss/channel/item/link" />
>>  <field column="GUID" xpath="/rss/channel/GUID" />
>>  <field column="id"   template="${x.link}" />
>>  <field column-"id"   template="${x.GUID}" />
>>
>> The TemplateTransformer does nothing if its source expression is  
>> null.
>> So the first transform assign the fallback value to ID, this is
>> overwritten by the GUID if it is defined.
>>
>> You can not sort of do if-then-else using a combination of template
>> and regex transformers. Adding a bit of maths to the transformers and
>> I think we will have a turing complete language:-)
>>
>> fergus.
>>
>>> Thanks,
>>> Grant
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: [DIH] Multiple repeat XPath stmts

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
The XPathRecordreader has a limit one mapping per xpath. So copying is
the best solution

On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>>I'm trying to import several RSS feeds using DIH and running into a
>>bit of a problem.  Some feeds define a GUID value that I map to my
>>Solr ID, while others don't.  I also have a link field which I fill in
>>with the RSS link field.  For the feeds that don't have the GUID value
>>set, I want to use the link field as the id.  However, if I define the
>>same XPath twice, but map it to two diff. columns I don't get the id
>>value set.
>>
>>For instance, I want to do:
>>schema.xml
>><field name="id" type="string" indexed="true" stored="true"
>>required="true"/>
>><field name="link" type="string" indexed="true" stored="false"/>
>>
>>DIH config:
>><field column="id" xpath="/rss/channel/item/link" />
>><field column="link" xpath="/rss/channel/item/link" />
>>
>>Because I am consolidating multiple fields, I'm not able to do
>>copyFields, unless of course, I wanted to implement conditional copy
>>fields (only copy if the field is not defined) which I would rather not.
>>
>>How do I solve this?
>>
>
> How about.
>
> <entity name="x" ... transformer="TemplateTransformer">
>  <field column="link" xpath="/rss/channel/item/link" />
>  <field column="GUID" xpath="/rss/channel/GUID" />
>  <field column="id"   template="${x.link}" />
>  <field column-"id"   template="${x.GUID}" />
>
> The TemplateTransformer does nothing if its source expression is null.
> So the first transform assign the fallback value to ID, this is
> overwritten by the GUID if it is defined.
>
> You can not sort of do if-then-else using a combination of template
> and regex transformers. Adding a bit of maths to the transformers and
> I think we will have a turing complete language:-)
>
> fergus.
>
>>Thanks,
>>Grant
>
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: [DIH] Multiple repeat XPath stmts

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>I'm trying to import several RSS feeds using DIH and running into a  
>bit of a problem.  Some feeds define a GUID value that I map to my  
>Solr ID, while others don't.  I also have a link field which I fill in  
>with the RSS link field.  For the feeds that don't have the GUID value  
>set, I want to use the link field as the id.  However, if I define the  
>same XPath twice, but map it to two diff. columns I don't get the id  
>value set.
>
>For instance, I want to do:
>schema.xml
><field name="id" type="string" indexed="true" stored="true"  
>required="true"/>
><field name="link" type="string" indexed="true" stored="false"/>
>
>DIH config:
><field column="id" xpath="/rss/channel/item/link" />
><field column="link" xpath="/rss/channel/item/link" />
>
>Because I am consolidating multiple fields, I'm not able to do  
>copyFields, unless of course, I wanted to implement conditional copy  
>fields (only copy if the field is not defined) which I would rather not.
>
>How do I solve this?
>

How about.

<entity name="x" ... transformer="TemplateTransformer">
  <field column="link" xpath="/rss/channel/item/link" />
  <field column="GUID" xpath="/rss/channel/GUID" />
  <field column="id"   template="${x.link}" />
  <field column-"id"   template="${x.GUID}" />

The TemplateTransformer does nothing if its source expression is null.
So the first transform assign the fallback value to ID, this is
overwritten by the GUID if it is defined.

You can not sort of do if-then-else using a combination of template
and regex transformers. Adding a bit of maths to the transformers and
I think we will have a turing complete language:-) 

fergus.

>Thanks,
>Grant

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================