You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Hans Brende (JIRA)" <ji...@apache.org> on 2018/08/17 17:03:00 UTC

[jira] [Created] (ANY23-389) RDFa extraction breaks when base element uses relative href

Hans Brende created ANY23-389:
---------------------------------

             Summary: RDFa extraction breaks when base element uses relative href
                 Key: ANY23-389
                 URL: https://issues.apache.org/jira/browse/ANY23-389
             Project: Apache Any23
          Issue Type: Bug
          Components: extractors
    Affects Versions: 2.3
            Reporter: Hans Brende
             Fix For: 2.3


I noticed that when extracting from html such as this:

{code:html}
<html prefix="og: http://ogp.me/ns#">
<head>
    <base href="">
    <link rel="icon" type="image/x-icon" href="https://static1.squarespace.com/static/55085720e4b0813599644fae/t/56291c91e4b0377cf53e5981/favicon.ico"/>
    <meta property="og:site_name" content="36°N"/>
    <meta property="og:title" content="36°N Friends &amp; Family Night"/>
    <meta property="og:latitude" content="36.1604966"/>
    <meta property="og:longitude" content="-95.9889172"/>
    <meta property="og:street-address" content="201 North Elgin Avenue"/>
    <meta property="og:locality" content="Tulsa"/>
    <meta property="og:region" content="OK"/>
    <meta property="og:postal-code" content="74120"/>
    <meta property="og:country-name" content="United States"/>
    <meta property="og:url" content="https://www.36degreesnorth.co/events/2018/8/2/36n-friends-family-night"/>
    <meta property="og:type" content="website"/>
    <meta property="og:description" content="Hey 36°N Members! Grab your family or a close friend, and join us for a fun night at the ballpark. We reserved the Coors Light Refinery Deck at ONEOK Field, so we can all hang out, enjoy a buffet and watch the game in the shade.  Dinner starts at 6:30. Game starts at 7:00.  $5/person. $20/family (co"/>
    <meta property="og:image" content="http://static1.squarespace.com/static/55085720e4b0813599644fae/5768549715d5db9b150af935/5a62695653450a1e55940197/1528903903136/DRILLERS+FAMILY+NIGHT-+square.png?format=1000w"/>
    <meta property="og:image:width" content="800"/>
    <meta property="og:image:height" content="800"/>
</head><body></body>
</html>
{code}

none of the rdfa11 triples (neither the og properties nor the icon property) are extracted as expected, apparently due to the underlying rdfa11 parser requiring an *absolute base href* rather than a relative one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)