You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by GitBox <gi...@apache.org> on 2019/09/13 01:47:15 UTC

[GitHub] [any23] HansBrende edited a comment on issue #104: Any23 295: Implement ability to use librdfa

HansBrende edited a comment on issue #104: Any23 295: Implement ability to use librdfa
URL: https://github.com/apache/any23/pull/104#issuecomment-531068423
 
 
   @lewismc My first thought is: if the performance of this module is not as good as that of our current implementation, then in its current form, what is the added value?
   
   My second thought is: the benchmarks do not test the Any23 `Extractor` wrappers around these rdf4j parsers, only the underlying parsers themselves. However, in Any23's `BaseRDFExtractor`, due to a lot of bugs in the semargl html parser, we had to preprocess the input stream using jsoup before passing "clean html" into the underlying parser. I am curious as to whether or not the `librdfa` parser would have any of those same html parsing bugs. If _not_, if I can take the preprocessing logic out of `BaseRDFExtractor` and move it to the semargl parser specifically, and **if** the librdfa parser can still pass the entire test suite without using the jsoup-preprocessed stream, then there would be a much better case for including it (as its performance would then likely eclipse our current rdfa performance without the preprocessing overhead).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services