You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/16 00:44:00 UTC
[jira] [Created] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository
Lewis John McGibbney created NUTCH-2938:
-------------------------------------------
Summary: Use Any23's RepositoryWriter to write structured data to Rdf4j repository
Key: NUTCH-2938
URL: https://issues.apache.org/jira/browse/NUTCH-2938
Project: Nutch
Issue Type: Improvement
Components: any23, plugin
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Fix For: 1.19
I have been running a patch which leverages [Any23's RepositoryWriter|https://any23.apache.org/apidocs/org/apache/any23/writer/RepositoryWriter.html] (implemented as one of a number of TripleHandler's via [CompositeTripleHandler|https://any23.apache.org/apidocs/org/apache/any23/writer/CompositeTripleHandler.html]) to write Any23 extractions to [GraphDB|https://www.ontotext.com/products/graphdb/]. This enables us to build a content graph from data across the enterprise.
This feature is turned off by default so will not change existing Any23 behaviour. I have concerns about the performance of this patch because right now we need to create a new repository connection for each URL. This is not great so I will definitely improve on it.
PR coming up.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)