You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2013/11/14 13:41:20 UTC

[jira] [Commented] (CONNECTORS-805) Crawling author metadata from feeds

    [ https://issues.apache.org/jira/browse/CONNECTORS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822378#comment-13822378 ] 

Karl Wright commented on CONNECTORS-805:
----------------------------------------

r1541890 adds support for author name and author email constructs at the item/entry level.

Still working on supporting these constructs at the channel/source level.


> Crawling author metadata from feeds
> -----------------------------------
>
>                 Key: CONNECTORS-805
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-805
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: RSS connector
>    Affects Versions: ManifoldCF 1.4
>            Reporter: Benjamin Brandmeier
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.5
>
>
> Functionality for retrieving the author of a RSS entry.
> The RSS specifications treat this differently:
> RSS 2.0 (Source -> http://www.rssboard.org/rss-specification#ltauthorgtSubelementOfLtitemgt):
> <author> sub-element of <item>
> It's the email address of the author of the item. For newspapers and magazines syndicating via RSS, the author is the person who wrote the article that the <item> describes.
> Atom (Source -> http://www.ietf.org/rfc/rfc4287.txt):
> The "atom:author" element is a Person construct that indicates the author of the entry or feed.
> atomAuthor = element atom:author { atomPersonConstruct }
> If an atom:entry element does not contain atom:author elements, then
> the atom:author elements of the contained atom:source element are
> considered to apply.  In an Atom Feed Document, the atom:author
> elements of the containing atom:feed element are considered to apply
> to the entry if there are no atom:author elements in the locations
> described above.
> The atomPersonConstruct looks like this:
>    atomPersonConstruct =
>       atomCommonAttributes,
>       (element atom:name { text }
>        & element atom:uri { atomUri }?
>        & element atom:email { atomEmailAddress }?
>        & extensionElement*)
> where atomCommonAttributes is defined like this:
> atomCommonAttributes =
>       attribute xml:base { atomUri }?,
>       attribute xml:lang { atomLanguageTag }?,
>       undefinedAttribute*
> Further more there exists a atom:contributor tag:
> The "atom:contributor" element is a Person construct that indicates a person or other entity who contributed to the entry or feed.
> atomContributor = element atom:contributor { atomPersonConstruct }
> For further information please check the specifciation.
> Dublin Core (Source -> http://dublincore.org/documents/dcmi-type-vocabulary/index.shtml#elements-creator)
> <dc:creator>
> The primary individual responsible for the content of the resource.
> The element can be at the <item>, <image> or <channel> level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)