You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Kate McGonigal (JIRA)" <ji...@apache.org> on 2011/08/03 00:36:27 UTC

[jira] [Issue Comment Edited] (CONNECTORS-235) item description element not indexed

    [ https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078466#comment-13078466 ] 

Kate McGonigal edited comment on CONNECTORS-235 at 8/2/11 10:35 PM:
--------------------------------------------------------------------

I also tried setting "Dechromed Content" to "if present, in 'description' field", but that just seems to hang the ingestion process at the beginning: the job status gets to "Running", but it never finishes and nothing is ever sent to Solr and the number of "Active" documents never decreases.


The log file shows:
Error tossed: java.lang.String cannot be cast to org.apache.manifoldcf.core.interfaces.CharacterInput
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.manifoldcf.core.interfaces.CharacterInput
	at org.apache.manifoldcf.crawler.jobs.Carrydown.getDataValuesAsFiles(Carrydown.java:595)
	at org.apache.manifoldcf.crawler.jobs.JobManager.retrieveParentDataAsFiles(JobManager.java:4274)
	at org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.retrieveParentDataAsFiles(WorkerThread.java:1220)
	at org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:827)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:342)

      was (Author: kmcgonig):
    I also tried setting "Dechromed Content" to "if present, in 'description' field", but that just seems to hang the ingestion process at the beginning: the job status gets to "Running", but it never finishes and nothing is ever sent to Solr and the number of "Active" documents never decreases.
  
> item description element not indexed
> ------------------------------------
>
>                 Key: CONNECTORS-235
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: RSS connector
>    Affects Versions: ManifoldCF 0.2
>            Reporter: Kate McGonigal
>
> The RSS feed's *item* description is not written to any field in the Solr index. 
> I have a typical RSS feed with the general structure:
> <rss>
>     <channel>
>         <title></title>
>         <link></link>
>         <description></description>
>         <item>
>             <title></title>
>             <link></link>
>             <pubDate></pubDate>
>             <description> *** the description I do want *** </description>
>             <author></author>
>             <category></category>
>         </item>
>     </channel>
> </rss>
> Example:
> For the RSS feed: http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
> the rss/channel/item/description field is not indexed into Solr.
> Example notes:
>   - what does get written to the Solr "description" field is the description metadata from the website, i.e. "Jazz radio show from Winnipeg on CKUW 95.9 FM, hosted by Maurice Hogue." in this case.
>   - on the "Dechromed Content" tab of the job, "No dechromed content" is selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira