You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/03/26 21:21:15 UTC

[jira] [Commented] (ANY23-168) RDFa properties in elements not picked up

    [ https://issues.apache.org/jira/browse/ANY23-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948403#comment-13948403 ] 

Lewis John McGibbney commented on ANY23-168:
--------------------------------------------

[~rubenverborgh], did you try to set the boolean configuration property 'any23.extraction.head.meta' to true? 

By default org.apache.any23.extractor.html.HTMLMetaExtractor is disabled.

> RDFa properties in <meta> elements not picked up
> ------------------------------------------------
>
>                 Key: ANY23-168
>                 URL: https://issues.apache.org/jira/browse/ANY23-168
>             Project: Apache Any23
>          Issue Type: Bug
>            Reporter: Ruben Verborgh
>              Labels: meta-tags, rdfa
>             Fix For: 1.0.0
>
>
> RDFa annotations in <meta> elements are not picked up:
> http://ruben.verborgh.org/tmp/dctitle-test.html
> http://any23.org/any23/?uri=http%3A%2F%2Fruben.verborgh.org%2Ftmp%2Fdctitle-test.html
> The Structured Data Testing Tool finds them:
> http://www.google.com/webmasters/tools/richsnippets?q=http%3A%2F%2Fruben.verborgh.org%2Ftmp%2Fdctitle-test.html
> Additionally, I wonder whether it's a good idea to drop the dcterms:title property extracted from <title> of an actual dc:title property is present. This allows for more meaningful titles, for instance:
>     <title>HTML Title – Website Name</title>
>     <meta property="dc:title" content="DC Title"/>
> This would allow to overcome the common situation that the HTML <title> also contains the website name etc., so is not suited for a "clean" dc:title. I would thus say that an actual dc:title has precedence over an implied dc:title from <title>.
> Furthermore, I'm confused by the double appearance of
> <http://ruben.verborgh.org/tmp/dctitle-test.html> dcterms:title "HTML Title – Website Name" .
> AND
> <http://ruben.verborgh.org/tmp/dctitle-test.html> <http://www.w3.org/1999/xhtml/microdata#item> _:nodecfcd208495d565ef66e7dff9f98764da ;
> 	dcterms:title "HTML Title – Website Name" .
> Should the page itself AND some blank node have this dcterms:title? (And what happens if the <meta> tags are parsed?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)