You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alexandre Rafalovitch (JIRA)" <ji...@apache.org> on 2017/03/16 03:56:41 UTC

[jira] [Updated] (SOLR-7383) DIH rss example is broken again

     [ https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexandre Rafalovitch updated SOLR-7383:
----------------------------------------
    Attachment: atom_20170315.tgz

Attached is a replacement example that uses StackOverflow ATOM feed and demonstrates ALL and more features than the original RSS example (as far as I can tell). And some features (e.g. commonField) now actually work.

It has a different directory name, so can be decompressed alongside other DIH examples. 

It is not cleaned up, as I need to double-check camelCases vs dashes vs underscores, spaces vs tabs and maybe another comment or two (and removing checkist comment at the top of DIH definition file)

But it should work and demonstrate a nice example. The solrconfig.xml file is super-minimal similar to work in SOLR-9601. It also uses new updateProcessors syntax.

If this looks good, then RSS example will just be deleted and this will be the new one.

I will appreciate the reviews and comments, as this example is 15! times smaller than the RSS one.

> DIH rss example is broken again
> -------------------------------
>
>                 Key: SOLR-7383
>                 URL: https://issues.apache.org/jira/browse/SOLR-7383
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 5.0, 6.0
>            Reporter: Upayavira
>            Assignee: Alexandre Rafalovitch
>            Priority: Minor
>         Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure RSS. Perhaps we should depend upon something more static, rather than an external service that is free to change as it desires.
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="slashdot"
>                 pk="link"
>                 url="http://rss.slashdot.org/Slashdot/slashdot"
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer">
> 				
>             <field column="source" xpath="/RDF/channel/title" commonField="true" />
>             <field column="source-link" xpath="/RDF/channel/link" commonField="true" />
>             <field column="subject" xpath="/RDF/channel/subject" commonField="true" />
> 			
>             <field column="title" xpath="/RDF/item/title" />
>             <field column="link" xpath="/RDF/item/link" />
>             <field column="description" xpath="/RDF/item/description" />
>             <field column="creator" xpath="/RDF/item/creator" />
>             <field column="item-subject" xpath="/RDF/item/subject" />
>             <field column="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
>             <field column="slash-department" xpath="/RDF/item/department" />
>             <field column="slash-section" xpath="/RDF/item/section" />
>             <field column="slash-comments" xpath="/RDF/item/comments" />
>         </entity>
>     </document>
> </dataConfig>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org