You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2018/08/17 20:36:00 UTC

[jira] [Commented] (ANY23-389) RDFa extraction breaks when base element uses relative href

    [ https://issues.apache.org/jira/browse/ANY23-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584355#comment-16584355 ] 

Hudson commented on ANY23-389:
------------------------------

SUCCESS: Integrated in Jenkins build Any23-trunk #1618 (See [https://builds.apache.org/job/Any23-trunk/1618/])
ANY23-389 fix html base elements for RDFa (hans: rev ef7826df5e4ff9a2d32d1b9105760760a0293581)
* (edit) test-resources/src/test/resources/html/rdfa/opengraph-structured-properties.html
* (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java


> RDFa extraction breaks when base element uses relative href
> -----------------------------------------------------------
>
>                 Key: ANY23-389
>                 URL: https://issues.apache.org/jira/browse/ANY23-389
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: extractors
>    Affects Versions: 2.3
>            Reporter: Hans Brende
>            Assignee: Hans Brende
>            Priority: Major
>             Fix For: 2.3
>
>
> I noticed that when extracting from html such as this:
> {code:html}
> <html prefix="og: http://ogp.me/ns#">
> <head>
>     <base href="">
>     <link rel="icon" type="image/x-icon" href="https://static1.squarespace.com/static/55085720e4b0813599644fae/t/56291c91e4b0377cf53e5981/favicon.ico"/>
>     <meta property="og:site_name" content="36°N"/>
>     <meta property="og:title" content="36°N Friends &amp; Family Night"/>
>     <meta property="og:latitude" content="36.1604966"/>
>     <meta property="og:longitude" content="-95.9889172"/>
>     <meta property="og:street-address" content="201 North Elgin Avenue"/>
>     <meta property="og:locality" content="Tulsa"/>
>     <meta property="og:region" content="OK"/>
>     <meta property="og:postal-code" content="74120"/>
>     <meta property="og:country-name" content="United States"/>
>     <meta property="og:url" content="https://www.36degreesnorth.co/events/2018/8/2/36n-friends-family-night"/>
>     <meta property="og:type" content="website"/>
>     <meta property="og:description" content="Hey 36°N Members! Grab your family or a close friend, and join us for a fun night at the ballpark. We reserved the Coors Light Refinery Deck at ONEOK Field, so we can all hang out, enjoy a buffet and watch the game in the shade.  Dinner starts at 6:30. Game starts at 7:00.  $5/person. $20/family (co"/>
>     <meta property="og:image" content="http://static1.squarespace.com/static/55085720e4b0813599644fae/5768549715d5db9b150af935/5a62695653450a1e55940197/1528903903136/DRILLERS+FAMILY+NIGHT-+square.png?format=1000w"/>
>     <meta property="og:image:width" content="800"/>
>     <meta property="og:image:height" content="800"/>
> </head><body></body>
> </html>
> {code}
> none of the rdfa11 triples (neither the og properties nor the icon property) are extracted as expected, apparently due to the underlying rdfa11 parser requiring an *absolute base href* rather than a relative one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)