You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/03/12 11:23:30 UTC
Re: Problem using DIH templatetransformer to create uniqueKey:
solved
Folks,
Template transformer will fail to return if a variable if undefined,
however the regex transformer does still return. So where the
following would fail:-
<field column="id" template="${jc.fileAbsolutePath}${x.vurl}" />
This can be used instead:-
<field column="id" regex=(.*)" relpaceWith="$1${x.vurl}" sourceColName="fileAbsolutePath" />
So I guess we have the best of both worlds!
Fergus.
>Hmmm. Just gave that a go! No luck
>But how many layers of defaults do we need?
>
>
>Rgds Fergus
>
>>What about having the template transformer support ${field:default}
>>syntax? I'm assuming it doesn't support that currently right? The
>>replace stuff in the config files does though.
>>
>> Erik
>>
>>
>>On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote:
>>
>>> Paul,
>>>
>>> Following up your usenet sussgetion:
>>>
>>> <field column="id" template="${jc.fileAbsolutePath}${x.vurl}"
>>> ignoreMissingVariables="true"/>
>>>
>>> and to add more to what I was thinking...
>>>
>>> if the field is undefined in the input document, but the schema.xml
>>> does allow a default value, then TemplateTransformer can use the
>>> default value. If there is no default value defined in schema.xml
>>> then it can fail as at present. This would allow "" or any other
>>> value to be fed into TemplateTransformer, and still enable avoidance
>>> of the partial strings you referred to.
>>>
>>> Regards Fergus.
>>>
>>>>> Hello,
>>>>>
>>>>> templatetransformer behaves rather ungracefully if one of the
>>>>> replacement
>>>>> fields is missing.
>>>>
>>>> Looking at TemplateString.java I see that left to itself fillTokens
>>>> would
>>>> replace a missing variable with "". It is an extra check in
>>>> TemplateTransformer
>>>> that is throwing the warning and stopping the row being returned.
>>>> Commenting
>>>> out the check seems to solve my problem.
>>>>
>>>> Having done this, an undefined replacement string in
>>>> TemplateTransformer
>>>> is replaced with "". However a neater fix would probably involve
>>>> making
>>>> use of the default value which can be assigned to a row? in
>>>> schema.xml.
>>>>
>>>>> I am parsing a single XML document into multiple separate solr
>>>>> documents.
>>>>> It turns out that none of the source documents fields can be used
>>>>> to create
>>>>> a uniqueKey alone. I need to combine two, using template
>>>>> transformer as
>>>>> follows:
>>>>>
>>>>> <entity name="x"
>>>>> dataSource="myfilereader"
>>>>> processor="XPathEntityProcessor"
>>>>> url="${jc.fileAbsolutePath}"
>>>>> rootEntity="true"
>>>>> stream="false"
>>>>> forEach="/record | /record/mediaBlock"
>>>>> transformer
>>>>> ="DateFormatTransformer,TemplateTransformer,RegexTransformer"
>>>>>>
>>>>>
>>>>> <field column="fileAbsolutePath" template="$
>>>>> {jc.fileAbsolutePath}" />
>>>>> <field column="fileWebPath" regex="$
>>>>> {dataimporter.request.installdir}(.*)" replaceWith="/ford$1"
>>>>> sourceColName="fileAbsolutePath"/>
>>>>> <field column="id" template="$
>>>>> {jc.fileAbsolutePath}${x.vurl}" />
>>>>> <field column="vurl" xpath="/record/mediaBlock/
>>>>> mediaObject/@vurl" />
>>>>>
>>>>> The trouble is that vurl is only defined as a child of "/record/
>>>>> mediaBlock"
>>>>> so my attempt to create id, the uniqueKey fails for the parent
>>>>> document "/record"
>>>>>
>>>>> I am hacking around with "TemplateTransformer.java" to sort this
>>>>> but was
>>>>> wondering if there was a good reason for this behavior.
>>>>>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================