You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "Timothy Higinbottom (Jira)" <ji...@apache.org> on 2021/04/09 07:03:00 UTC

[jira] [Created] (JENA-2083) Support skipping/ignoring errors with tdbloader

Timothy Higinbottom created JENA-2083:
-----------------------------------------

             Summary: Support skipping/ignoring errors with tdbloader
                 Key: JENA-2083
                 URL: https://issues.apache.org/jira/browse/JENA-2083
             Project: Apache Jena
          Issue Type: New Feature
          Components: TDB, TDB2
            Reporter: Timothy Higinbottom


Hi all,

I have a fairly large (~22,000) number of N-Triples files I hope to import into TDB2 to query with Fuseki.

I boosted the RAM allotted to the JVM and used the parallel mode from tdb2.tdbloader. This whizzed through the first 1,000 of the files.

However, some of the files are incorrectly serialized, so they caused errors when Jena tried to read them. It is not feasible right now to sort out the defective files from the good ones before running tdbloader.

It would be great if tdbloader could add an option to skip the files that error so that it can continue to process the other files.

The main reason this should be part of tdbloader itself is that the alternative (running xargs or a loop in Bash) decreases performance because then the loading is effectively synchronous and the user can't take advantage of the tdbloader modes and batching.

Thanks for this great project!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)