You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2017/12/27 20:05:00 UTC

[jira] [Created] (ANY23-318) ExtractionException handling in BaseRDFExtractor.java kills entire extraction

Lewis John McGibbney created ANY23-318:
------------------------------------------

             Summary: ExtractionException handling in BaseRDFExtractor.java kills entire extraction
                 Key: ANY23-318
                 URL: https://issues.apache.org/jira/browse/ANY23-318
             Project: Apache Any23
          Issue Type: Bug
          Components: core, extractors
    Affects Versions: 2.1
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
            Priority: Blocker
             Fix For: 2.2


Right now the following snippet of code contained within BaseRDFExtractor.java kills entire extractions. I propose to merely log the errors and continue with the extraction.

{code}
         } catch (RDFParseException ex) {
-            throw new ExtractionException("Error while parsing RDF document.", ex, extractionResult);
+            LOG.error("Error while parsing RDF document.", ex, extractionResult);
         }
     }
{code}

The parsing strictness is inherited from the underlying semargl parsers which expect perfect syntax for input data... in the 'wild' however, this unfortunately is not realistic. 
The solution is for us to log the Exception, issues, etc. and carry on with the extraction.
Patch coming up.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)