You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Tim Allison <ta...@apache.org> on 2019/04/10 18:59:06 UTC

regression results

All,

  Again, my apologies for being late, but the results might still be
useful for work towards 4.1.1.

http://162.242.228.174/reports/poi-4.1.0-reports.zip

Some tentative observations:
1) there was the new and non-replicable set of problems with the XSSFBParser.

2) The emf/wmf regressions are responsible for the decrease in
attachments and common words.

3) It looks like there are spacing problems/new line problems with the
update emf/wmf code, but that might be on Tika's side.

4) The large increase in common words in ooxml that were formally
tika-ooxml is caused by ZipSalvager.  On the Tika side, we're now
creating a valid zip from truncated zips and rerunning the parse.  So,
we used to get the content via the PkgParser and that content would
have gone into "attachments".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org