You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2014/03/03 18:54:19 UTC

RegexTransformer and xpath in DataImportHandler

Good afternoon,
I have this DIH:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="blogFeed"
            pk="id"
            url="https://redacted/"
            processor="XPathEntityProcessor"
            forEach="/rss/channel/item"
           
transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer">

		<field column="id" xpath="/rss/channel/item/id" />
        	<field column="link" xpath="/rss/channel/item/link" />
		<field column="blogtitle" xpath="/rss/channel/item/title" />
		<field column="short_blogtitle" xpath="/rss/channel/item/title" />
		<field column="short_blogtitle" regex="^(.{250})([^\.]*\.)(.*)$"
replaceWith="$1" sourceColName="blogtitle" />
		<field column="pubdateiso" xpath="/rss/channel/item/pubDate"
dateTimeFormat="yyyy-MM-dd" />
        	<field column="category" xpath="/rss/channel/item/category" />
		<field column="author" xpath="/rss/channel/item/author" />
		<field column="authoremail" xpath="/rss/channel/item/authoremail" />
		<field column="content" xpath="/rss/channel/item/content" />
		<field column="summary" xpath="/rss/channel/item/summary" />
		<field column="index_category" template="ConnectionsBlogs"/>
        
    </entity>
</document>
</dataConfig>

I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.

If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.

What am I doing wrong here? Is there a way to populate both? 
And I CANNOT use copyfield here because then the update.chain won't work

Thanks,





--
View this message in context: http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.