You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2014/03/03 18:54:19 UTC
RegexTransformer and xpath in DataImportHandler
Good afternoon,
I have this DIH:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="blogFeed"
pk="id"
url="https://redacted/"
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">
<field column="id" xpath="/rss/channel/item/id" />
<field column="link" xpath="/rss/channel/item/link" />
<field column="blogtitle" xpath="/rss/channel/item/title" />
<field column="short_blogtitle" xpath="/rss/channel/item/title" />
<field column="short_blogtitle" regex="^(.{250})([^\.]*\.)(.*)$"
replaceWith="$1" sourceColName="blogtitle" />
<field column="pubdateiso" xpath="/rss/channel/item/pubDate"
dateTimeFormat="yyyy-MM-dd" />
<field column="category" xpath="/rss/channel/item/category" />
<field column="author" xpath="/rss/channel/item/author" />
<field column="authoremail" xpath="/rss/channel/item/authoremail" />
<field column="content" xpath="/rss/channel/item/content" />
<field column="summary" xpath="/rss/channel/item/summary" />
<field column="index_category" template="ConnectionsBlogs"/>
</entity>
</document>
</dataConfig>
I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.
If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.
What am I doing wrong here? Is there a way to populate both?
And I CANNOT use copyfield here because then the update.chain won't work
Thanks,
--
View this message in context: http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.