You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by penela <pe...@gmail.com> on 2011/08/09 12:38:03 UTC

XPathProcessor foreach not working properly inside another entity

Hi!

What I'm trying to do is get RSS urls from a MySQL DB of my own, an use them
as the url endpoint for indexing the feed articles (mixing db and rss core
DIH examples to some extent).

My data-config looks like this:
<dataConfig>
    <dataSource type="URLDataSource" name="rss-ds" />
    <dataSource type="JdbcDataSource" name="db-ds"
driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1/tapmeme"  /> 
    <document>
		<entity name="feed" dataSource="db-ds" query="SELECT TAPmeme.urls.urlID as
'id', TAPmeme.urls.url as 'entity-name', TAPmeme.urls.source as 'url' FROM
TAPmeme.urls">
			
			<entity name="rss" dataSource="rss-ds"
					pk="link"	                
	                url="${feed.url}"
	                processor="XPathEntityProcessor"
	                forEach="/rss/channel | /rss/channel/item"
	                transformer="DateFormatTransformer">
					
	            <field column="source" xpath="/rss/channel/title"
commonField="true" />
	            <field column="source-link" xpath="/rss/channel/link"
commonField="true" />
	            <field column="subject" xpath="/rss/channel/subject"
commonField="true" />
				
	            <field column="title" xpath="/rss/channel/item/title" />
	            <field column="link" xpath="/rss/channel/item/link" />
	            <field column="description"
xpath="/rss/channel/item/description" />
	            <field column="creator" xpath="/rss/channel/item/creator" />
	            <field column="item-subject" xpath="/rss/channel/item/subject"
/>
	            <field column="date" xpath="/rss/channel/item/date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
	        </entity>
        </entity>
    </document>
</dataConfig>

(The table schema is a bit messed up with wrong named keys after too much
testing, but it shouldn't be an issue here).

The issue with that is that foreach is not working properly (it works if
using only the URLDataSource), and it only indexes the first article of each
RSS feed.

Any ideas?

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/XPathProcessor-foreach-not-working-properly-inside-another-entity-tp3238535p3238535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XPathProcessor foreach not working properly inside another entity

Posted by penela <pe...@gmail.com>.
After a bit of better targeted search on the forum, I''ve found this solution
by Noble Paull:
http://lucene.472066.n3.nabble.com/DIH-Http-input-bug-problem-with-two-level-RSS-walker-tp491046p491047.html

Using rootEntity="false" in the outer entity seems to make it work as
expected.

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/XPathProcessor-foreach-not-working-properly-inside-another-entity-tp3238535p3238569.html
Sent from the Solr - User mailing list archive at Nabble.com.