You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (Jira)" <ji...@apache.org> on 2022/01/16 00:44:00 UTC

[jira] [Created] (NUTCH-2938) Use Any23's RepositoryWriter to write structured data to Rdf4j repository

Lewis John McGibbney created NUTCH-2938:
-------------------------------------------

             Summary: Use Any23's RepositoryWriter to write structured data to Rdf4j repository
                 Key: NUTCH-2938
                 URL: https://issues.apache.org/jira/browse/NUTCH-2938
             Project: Nutch
          Issue Type: Improvement
          Components: any23, plugin
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
             Fix For: 1.19


I have been running a patch which leverages [Any23's RepositoryWriter|https://any23.apache.org/apidocs/org/apache/any23/writer/RepositoryWriter.html] (implemented as one of a number of TripleHandler's via [CompositeTripleHandler|https://any23.apache.org/apidocs/org/apache/any23/writer/CompositeTripleHandler.html]) to write Any23 extractions to [GraphDB|https://www.ontotext.com/products/graphdb/]. This enables us to build a content graph from data across the enterprise.
This feature is turned off by default so will not change existing Any23 behaviour. I have concerns about the performance of this patch because right now we need to create a new repository connection for each URL. This is not great so I will definitely improve on it.
PR coming up.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)