You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by lewismc <gi...@git.apache.org> on 2015/03/20 17:13:08 UTC

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

GitHub user lewismc opened a pull request:

    https://github.com/apache/any23/pull/16

    ANY23-226 Extract JSON-LD embedded in HTML

    Initial patch for this support. 
    It is not working correctly @ansell can you have a look into the parsing of JSONLD textual content?
    I've provided a '//' comment to where I can see the correct parser being selected. It seems to not parse and extract the JSONLD so I know I am doing something wrong.
    Thank you very much @ansell if you can have a wee look.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/any23 ANY23-226

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16
    
----
commit 1e3eb9c31af2f93906eee1081179d73c30a0881b
Author: Lewis John McGibbney <le...@jpl.nasa.gov>
Date:   2015-03-20T15:55:29Z

    ANY23-226 Extract JSON-LD embedded in HTML

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/any23/pull/16


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/16#issuecomment-84058410
  
    Important to state, this is largely based off of our existing META extractor. We are merely looking for /HTML/HEAD/SCRIPT/ presence. 
    Therefore, this initial effort needs to be augmented by a fully functional implementation which can catch presence of JSONLD in body as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/16#issuecomment-84706349
  
    The test failures are in the Microdata parsing code, not JSONLD-Java, so I thought it was fine to push this even though it was going to break the Jenkins build (it was already silently broken before due to the swallowed exception). The JSONLD parsing now works, the key fix on what you had done was to send the first child of the script element, which is the actual JSON code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

Posted by ansell <gi...@git.apache.org>.
Github user ansell commented on the pull request:

    https://github.com/apache/any23/pull/16#issuecomment-84257478
  
    The main bug was that the entire script node was being sent to JSONLD-Java, and not just its content.
    
    However, I also made a few other changes while doing that testing.
    
    It turned out that the jsonld was invalid, but somehow the exception when parses fail was changed to be silently swallowed, so the only indication was that the count was 0. I turned on the exception propagation again (no reason it should be swallowed outside of temporary testing).
    
    However, in addition to the 4 tests currently failing on the core tests, there are now other tests failing due to an inability to parse "<div itemscope>"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] any23 pull request: ANY23-226 Extract JSON-LD embedded in HTML

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on the pull request:

    https://github.com/apache/any23/pull/16#issuecomment-84384643
  
    Ok Peter thank you for looking. This is great. I have not seen the test
    failures. Can you please tell me if it is in Any23 or in jsonld-Java?
    We could upgrade the Jsonld-Java implementation as well. To the 0.5.1
    release
    
    On Saturday, March 21, 2015, Peter Ansell <no...@github.com> wrote:
    
    > The main bug was that the entire script node was being sent to
    > JSONLD-Java, and not just its content.
    >
    > However, I also made a few other changes while doing that testing.
    >
    > It turned out that the jsonld was invalid, but somehow the exception when
    > parses fail was changed to be silently swallowed, so the only indication
    > was that the count was 0. I turned on the exception propagation again (no
    > reason it should be swallowed outside of temporary testing).
    >
    > However, in addition to the 4 tests currently failing on the core tests,
    > there are now other tests failing due to an inability to parse "
    > "
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/any23/pull/16#issuecomment-84257478>.
    >
    
    
    -- 
    *Lewis*



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---