You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/02/21 14:32:50 UTC
Re: DIH transformers - sect 2 - SOLR-1033
I have created SOLR-1033 in JIRA to address this issue.
At 13:32 +0000 21/2/09, Fergus McMenemie wrote:
>>On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>>>
>>> 2) Having used TemplateTransformer to assign a value to an
>>> entity column that column cannot be used in other
>>> TemplateTransformer operations. In my project I am
>>> attempting to reuse "x.fileWebPath". To fix this, the
>>> last line of transformRow() in TemplateTransformer.java
>>> needs replaced with the following which as well as
>>> 'putting' the templated-ed string in 'row' also saves it
>>> into the 'resolver'.
>>>
>>> **originally**
>>> row.put(column, resolver.replaceTokens(expr));
>>> }
>>>
>>> **new**
>>> String columnName = map.get(DataImporter.COLUMN);
>>> expr=resolver.replaceTokens(expr);
>>> row.put(columnName, expr);
>>> resolverMapCopy.put(columnName, expr);
>>> }
>>
>>isn't it better to write a custom transformer to achieve this. I did
>>not want a standard component to change the state of the
>>VariableResolver .
>>
>>I am not sure what is the best way.
>>
>
>Noble, (Good to have email working :-)
>
>Hmm not sure why this requires a custom transformer. Why is this not
>more in the nature of a bug fix? Also the current behavior temporarily
>adds all the column names into the resolver for the duration of the
>TemplateTransformer's operation, removing them again at the end. I
>do not think there is any permanent change to the state of the
>VariableResolver.
>
>Surely if we have defined a value for a column, that value should be
>temporarily available in subsequent template or regexp operations?
>
>Fergus.
>
>>>
>>>
>>> <dataConfig>
>>> <dataSource name="myfilereader" type="FileDataSource"/>
>>> <document>
>>> <entity name="jc"
>>> processor="FileListEntityProcessor"
>>> fileName="^.*\.xml$"
>>> newerThan="'NOW-1000DAYS'"
>>> recursive="true"
>>> rootEntity="false"
>>> dataSource="null"
>>> baseDir="/Volumes/spare/ts/solr/content"
>>> >
>>> <entity name="x"
>>> dataSource="myfilereader"
>>> processor="XPathEntityProcessor"
>>> url="${jc.fileAbsolutePath}"
>>> rootEntity="true"
>>> stream="false"
>>> forEach="/record | /record/mediaBlock"
>>> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">
>>>
>>> <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>>> <field column="fileWebPath" regex="${x.test}(.*)" replaceWith="/ford$1" sourceColName="fileAbsolutePath"/>
>>> <field column="title" xpath="/record/title" />
>>> <field column="para1" name="para" xpath="/record/sect1/para" />
>>> <field column="para2" name="para" xpath="/record/list/listitem/para" />
>>> <field column="pubdate" xpath="/record/metadata/date[@qualifier='pubDate']" dateTimeFormat="yyyyMMdd" />
>>>
>>> <field column="vurl" xpath="/record/mediaBlock/mediaObject/@vurl" />
>>> <field column="imgSrcArticle" template="${dataimporter.request.fordinstalldir}" />
>>> <field column="imgCpation" xpath="/record/mediaBlock/caption" />
>>>
>>> <field column="test" template="${dataimporter.request.contentinstalldir}" />
>>> <!-- **problem is that vurl is just a fragment of the info needed to access the picture. -->
>>> <field column="imgWebPathICON" regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}s.jpg" sourceColName="fileWebPath"/>
>>> <field column="imgWebPathFULL" regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}.jpg" sourceColName="fileWebPath"/>
>>> <field column="vdkvgwkey" template="${jc.fileAbsolutePath}#${x.vurl}" />
>>> </entity>
>>> </entity>
>>> </document>
>>> </dataConfig>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================