You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2012/10/23 23:36:14 UTC
[jira] [Resolved] (CONNECTORS-557) web crawler feed into solr with
all html tag removed
[ https://issues.apache.org/jira/browse/CONNECTORS-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-557.
------------------------------------
Resolution: Not A Problem
Fix Version/s: ManifoldCF 1.1
Assignee: Karl Wright
> web crawler feed into solr with all html tag removed
> ----------------------------------------------------
>
> Key: CONNECTORS-557
> URL: https://issues.apache.org/jira/browse/CONNECTORS-557
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 1.0
> Environment: ManifoldCF1.0 --> Solr4
> Reporter: Gene Liu
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.1
>
>
> All html tags are removed when manifoldCF feeds the content into Solr
> I am new for Solr. I use manifoldcf crawling webpages and send the content into solr for indexing. I found that all the html tags are removed when I get query result from solr. I am not sure if manifoldcf removed them before sending to solr or solr removed them.
> p.s. As I could not find a way to send email to the user list, so I open a ticket here.
> Appreciate any suggestions/comments.
> Gene
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira