You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/16 01:08:00 UTC

[jira] [Commented] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

    [ https://issues.apache.org/jira/browse/NUTCH-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476724#comment-17476724 ] 

ASF GitHub Bot commented on NUTCH-2938:
---------------------------------------

lewismc opened a new pull request #725:
URL: https://github.com/apache/nutch/pull/725


   PR addresses https://issues.apache.org/jira/browse/NUTCH-2938
   We could improve the performance of this plugin if we could reuse the repository connection however I am not entirely sure how to do that right now because this is done down in the Any23 layer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Use Any23's RepositoryWriter to write structured data to Rdf4j repository
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2938
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2938
>             Project: Nutch
>          Issue Type: Improvement
>          Components: any23, plugin
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>
> I have been running a patch which leverages [Any23's RepositoryWriter|https://any23.apache.org/apidocs/org/apache/any23/writer/RepositoryWriter.html] (implemented as one of a number of TripleHandler's via [CompositeTripleHandler|https://any23.apache.org/apidocs/org/apache/any23/writer/CompositeTripleHandler.html]) to write Any23 extractions to [GraphDB|https://www.ontotext.com/products/graphdb/]. This enables us to build a content graph from data across the enterprise.
> This feature is turned off by default so will not change existing Any23 behaviour. I have concerns about the performance of this patch because right now we need to create a new repository connection for each URL. This is not great so I will definitely improve on it.
> PR coming up.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)