You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/11/27 16:01:43 UTC
[jira] Created: (NUTCH-583) FeedParser empty links for items
FeedParser empty links for items
--------------------------------
Key: NUTCH-583
URL: https://issues.apache.org/jira/browse/NUTCH-583
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 1.0.0
FeedParser in feed plugin just discards the item if it does not have <link> element. However Rss 2.0 does not necessitate the <link> element for each <item>.
Moreover sometimes the link is given in the <guid> element which is a globally unique identifier for the item. I think we can search the url for an item first, then if it is still not found, we can use the feed's url, but with merging all the parse texts into one Parse object.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-583) FeedParser empty links for items
Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sami Siren updated NUTCH-583:
-----------------------------
Fix Version/s: (was: 1.0.0)
1.1
pushing this to 1.1
> FeedParser empty links for items
> --------------------------------
>
> Key: NUTCH-583
> URL: https://issues.apache.org/jira/browse/NUTCH-583
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 1.1
>
>
> FeedParser in feed plugin just discards the item if it does not have <link> element. However Rss 2.0 does not necessitate the <link> element for each <item>.
> Moreover sometimes the link is given in the <guid> element which is a globally unique identifier for the item. I think we can search the url for an item first, then if it is still not found, we can use the feed's url, but with merging all the parse texts into one Parse object.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-583) FeedParser empty links for items
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated NUTCH-583:
------------------------------------
Fix Version/s: (was: 1.1)
- pushing this out per http://bit.ly/c7tBv9
> FeedParser empty links for items
> --------------------------------
>
> Key: NUTCH-583
> URL: https://issues.apache.org/jira/browse/NUTCH-583
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
>
> FeedParser in feed plugin just discards the item if it does not have <link> element. However Rss 2.0 does not necessitate the <link> element for each <item>.
> Moreover sometimes the link is given in the <guid> element which is a globally unique identifier for the item. I think we can search the url for an item first, then if it is still not found, we can use the feed's url, but with merging all the parse texts into one Parse object.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.