You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/02/21 14:32:50 UTC

Re: DIH transformers - sect 2 - SOLR-1033

I have created SOLR-1033 in JIRA to address this issue.

At 13:32 +0000 21/2/09, Fergus McMenemie wrote:
>>On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>>>
>>>  2) Having used TemplateTransformer to assign a value to an
>>>     entity column that column cannot be used in other
>>>     TemplateTransformer operations. In my project I am
>>>     attempting to reuse "x.fileWebPath". To fix this, the
>>>     last line of transformRow() in TemplateTransformer.java
>>>     needs replaced with the following which as well as
>>>     'putting' the templated-ed string in 'row' also saves it
>>>     into the 'resolver'.
>>>
>>>     **originally**
>>>      row.put(column, resolver.replaceTokens(expr));
>>>      }
>>>
>>>     **new**
>>>      String columnName = map.get(DataImporter.COLUMN);
>>>      expr=resolver.replaceTokens(expr);
>>>      row.put(columnName, expr);
>>>      resolverMapCopy.put(columnName, expr);
>>>      }
>>
>>isn't it better to write a custom transformer to achieve this. I did
>>not want a standard component to change the state of the
>>VariableResolver .
>>
>>I am not sure what is the best way.
>>
>
>Noble, (Good to have email working :-)
>
>Hmm not sure why this requires a custom transformer. Why is this not 
>more in the nature of a bug fix? Also the current behavior temporarily
>adds all the column names into the resolver for the duration of the 
>TemplateTransformer's operation, removing them again at the end. I
>do not think there is any permanent change to the state of the 
>VariableResolver.
>
>Surely if we have defined a value for a column, that value should be
>temporarily available in subsequent template or regexp operations?
>
>Fergus.
>
>>>
>>>
>>>   <dataConfig>
>>>   <dataSource name="myfilereader" type="FileDataSource"/>
>>>    <document>
>>>    <entity name="jc"
>>>               processor="FileListEntityProcessor"
>>>               fileName="^.*\.xml$"
>>>               newerThan="'NOW-1000DAYS'"
>>>               recursive="true"
>>>               rootEntity="false"
>>>               dataSource="null"
>>>               baseDir="/Volumes/spare/ts/solr/content"
>>>               >
>>>    <entity name="x"
>>>                  dataSource="myfilereader"
>>>                  processor="XPathEntityProcessor"
>>>                  url="${jc.fileAbsolutePath}"
>>>                  rootEntity="true"
>>>                  stream="false"
>>>                  forEach="/record | /record/mediaBlock"
>>>                  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">
>>>
>>> <field column="fileAbsolutePath"       template="${jc.fileAbsolutePath}" />
>>> <field column="fileWebPath"            regex="${x.test}(.*)" replaceWith="/ford$1" sourceColName="fileAbsolutePath"/>
>>> <field column="title"                  xpath="/record/title" />
>>> <field column="para1" name="para"      xpath="/record/sect1/para" />
>>> <field column="para2" name="para"      xpath="/record/list/listitem/para" />
>>> <field column="pubdate"                xpath="/record/metadata/date[@qualifier='pubDate']" dateTimeFormat="yyyyMMdd"   />
>>>
>>> <field column="vurl"                   xpath="/record/mediaBlock/mediaObject/@vurl" />
>>> <field column="imgSrcArticle"          template="${dataimporter.request.fordinstalldir}" />
>>> <field column="imgCpation"             xpath="/record/mediaBlock/caption"  />
>>>
>>> <field column="test"                   template="${dataimporter.request.contentinstalldir}" />
>>> <!-- **problem is that vurl is just a fragment of the info needed to access the picture. -->
>>> <field column="imgWebPathICON"         regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}s.jpg" sourceColName="fileWebPath"/>
>>> <field column="imgWebPathFULL"         regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}.jpg"  sourceColName="fileWebPath"/>
>>> <field column="vdkvgwkey"              template="${jc.fileAbsolutePath}#${x.vurl}" />
>>>       </entity>
>>>       </entity>
>>>       </document>
>>>    </dataConfig>

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================