You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/02 08:40:00 UTC
[jira] [Commented] (ANY23-336) Parsing json-ld content takes
prohibitively long time
[ https://issues.apache.org/jira/browse/ANY23-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422022#comment-16422022 ]
ASF GitHub Bot commented on ANY23-336:
--------------------------------------
GitHub user HansBrende opened a pull request:
https://github.com/apache/any23/pull/69
ANY23-336 Hacky patch to tide us over until jsonldjava 0.11.2 release
This is a hacky patch for Any23 that does exactly the same thing my [JSONLD-JAVA pull request](https://github.com/jsonld-java/jsonld-java/pull/228) does.
With this patch in place, the amortized extraction time for the site https://www.guthriegreen.com dropped from 30 seconds to 0.3 seconds, a difference of **two orders of magnitude**!!! š±š®šµ
This patch should be removed as soon as we can upgrade to JSONLD-JAVA version 0.11.2.
mvn clean test -> all tests pass.
@lewismc @ansell Any comments before I merge this?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HansBrende/any23 ANY23-336
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/69.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #69
----
commit 80a5c5230ba55fa1f36166baccf4a5ac5d157a73
Author: Hans <fi...@...>
Date: 2018-04-02T08:05:34Z
ANY23-336 Hacky patch to tide us over until jsonldjava 0.11.2 release
----
> Parsing json-ld content takes prohibitively long time
> -----------------------------------------------------
>
> Key: ANY23-336
> URL: https://issues.apache.org/jira/browse/ANY23-336
> Project: Apache Any23
> Issue Type: Bug
> Components: core, extractors
> Affects Versions: 2.2
> Reporter: Hans Brende
> Assignee: Peter Ansell
> Priority: Critical
> Fix For: 2.3
>
> Attachments: Screen Shot 2018-03-27 at 2.52.15 PM.png, Screen Shot 2018-03-27 at 2.54.43 PM.png
>
>
> Using the page [https://www.guthriegreen.com|https://www.guthriegreen.com/]Ā as a benchmark, a page fetch took about 100 ms, while simply *parsing*Ā the json-ld content on that page took a *staggeringĀ 27400 ms*.Ā For reference, I'm using Java 8, build 162, on a Macbook Pro (early 2015).
> The bad news is that this is not our fault.
> I've profiled thisĀ behavior down to the {{com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient)}} function. 94% of the parsing time is spent there. This function is called when trying to load remote json-ld contexts.Ā
> In order to avoid loading remote contexts repeatedly, this function tries to *cache*Ā them by using a {{CachingHttpClient}} fromĀ the httpclient-osgi library.
> Unfortunately, that strategy is *not* working, as I have recorded exactly *zero* cache hits, meaning that *every* retrieval is a cache miss and a remote context is re-fetched via http every single time it's accessed.
> Ā
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)