You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/03/12 11:23:30 UTC

Re: Problem using DIH templatetransformer to create uniqueKey: solved

Folks,

Template transformer will fail to return if a variable if undefined,
however the regex transformer does still return. So where the
following would fail:-

<field column="id" template="${jc.fileAbsolutePath}${x.vurl}" />

This can be used instead:-

<field column="id" regex=(.*)" relpaceWith="$1${x.vurl}" sourceColName="fileAbsolutePath" />

So I guess we have the best of both worlds!

Fergus.

>Hmmm. Just gave that a go! No luck
>But how many layers of defaults do we need?
>
>
>Rgds Fergus
>
>>What about having the template transformer support ${field:default}  
>>syntax?  I'm assuming it doesn't support that currently right?  The  
>>replace stuff in the config files does though.
>>
>>	Erik
>>
>>
>>On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote:
>>
>>> Paul,
>>>
>>> Following up your usenet sussgetion:
>>>
>>> <field column="id" template="${jc.fileAbsolutePath}${x.vurl}"
>>> ignoreMissingVariables="true"/>
>>>
>>> and to add more to what I was thinking...
>>>
>>> if the field is undefined in the input document, but the schema.xml
>>> does allow a default value, then TemplateTransformer can use the
>>> default value. If there is no default value defined in schema.xml
>>> then it can fail as at present. This would allow "" or any other
>>> value to be fed into TemplateTransformer, and still enable avoidance
>>> of the partial strings you referred to.
>>>
>>> Regards Fergus.
>>>
>>>>> Hello,
>>>>>
>>>>> templatetransformer behaves rather ungracefully if one of the  
>>>>> replacement
>>>>> fields is missing.
>>>>
>>>> Looking at TemplateString.java I see that left to itself fillTokens  
>>>> would
>>>> replace a missing variable with "". It is an extra check in  
>>>> TemplateTransformer
>>>> that is throwing the warning and stopping the row being returned.  
>>>> Commenting
>>>> out the check seems to solve my problem.
>>>>
>>>> Having done this, an undefined replacement string in  
>>>> TemplateTransformer
>>>> is replaced with "". However a neater fix would probably involve  
>>>> making
>>>> use of the default value which can be assigned to a row? in  
>>>> schema.xml.
>>>>
>>>>> I am parsing a single XML document into multiple separate solr  
>>>>> documents.
>>>>> It turns out that none of the source documents fields can be used  
>>>>> to create
>>>>> a uniqueKey alone. I need to combine two, using template  
>>>>> transformer as
>>>>> follows:
>>>>>
>>>>> <entity name="x"
>>>>> dataSource="myfilereader"
>>>>> processor="XPathEntityProcessor"
>>>>> url="${jc.fileAbsolutePath}"
>>>>> rootEntity="true"
>>>>> stream="false"
>>>>> forEach="/record | /record/mediaBlock"
>>>>> transformer 
>>>>> ="DateFormatTransformer,TemplateTransformer,RegexTransformer"
>>>>>>
>>>>>
>>>>> <field column="fileAbsolutePath"    template="$ 
>>>>> {jc.fileAbsolutePath}" />
>>>>> <field column="fileWebPath"         regex="$ 
>>>>> {dataimporter.request.installdir}(.*)" replaceWith="/ford$1"  
>>>>> sourceColName="fileAbsolutePath"/>
>>>>> <field column="id"                  template="$ 
>>>>> {jc.fileAbsolutePath}${x.vurl}" />
>>>>> <field column="vurl"                xpath="/record/mediaBlock/ 
>>>>> mediaObject/@vurl" />
>>>>>
>>>>> The trouble is that vurl is only defined as a child of "/record/ 
>>>>> mediaBlock"
>>>>> so my attempt to create id, the uniqueKey fails for the parent  
>>>>> document "/record"
>>>>>
>>>>> I am hacking around with "TemplateTransformer.java" to sort this  
>>>>> but was
>>>>> wondering if there was a good reason for this behavior.
>>>>>

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================