You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2017/04/20 01:41:29 UTC

[tika] 02/02: update CHANGES.txt in prep for release. reorder changes to most significant first...changes in default behavior then new parsers...Completely subjective, and I'm open to reordering!

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git

commit 941d61a3258d92b8a69ac651586ad0490e313532
Author: tballison <ta...@mitre.org>
AuthorDate: Wed Apr 19 21:41:18 2017 -0400

    update CHANGES.txt in prep for release.
    reorder changes to most significant first...changes in default behavior
     then new parsers...Completely subjective, and I'm open to reordering!
---
 CHANGES.txt | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 610c186..50f2b0e 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,25 +5,39 @@ Release 1.15 - ??
     Users who wish to parse only the container document should set
     an EmptyParser as the Parser.class in the ParseContext.
 
-  * Add support for the XLSB format (TIKA-1195).
-
   * Change default behavior of Office Parsers to _not_ extract
     Macros.  User needs to setExtractMacros to "true" (TIKA-2302).
 
+  * Added tika-eval module (TIKA-1332).
+
   * Unified logging across Tika: SLF4J as logging API, Apache Log4j as
     implementation with JCL and JUL bridges in standalone tools like
     tika-app, tika-batch and tika-server (TIKA-2245).
 
-  * Extract images and thumbnails from ODT via Sam Bayer (TIKA-2295).
+  * Add parser for XLSB files (TIKA-1195).
+
+  * Add parsers for EMF/WMF files (TIKA-2246/TIKA-2247).
+
+  * Add parsers for WordPerfect and QuattroPro (.qpw) files.
+    Contributed by Pascal Essiembre (TIKA-1946 and TIKA-2228).
+
+  * Add experimental SAX parser for .pptx files. To select this parser,
+    set useSAXPptxExtractor(true) on OfficeParserConfig (TIKA-2210).
+
+  * Add experimental SAX parser for .docx files. To select this parser,
+    set useSAXDocxExtractor(true) on OfficeParserConfig (TIKA-1321, TIKA-2191).
+
+  * Add mime detection and parser for Word 2006ML format (TIKA-2179).
 
   * Enabled configuration of the EncodingDetector used by
     parsers that extend AbstractEncodingDetectorParser (TIKA-2273).
 
-  * Added tika-eval module (TIKA-1332).
+  * Prevent easily preventable OOMs for both detection and parsing
+    of some compression formats (TIKA-2330).
 
-  * Fix potential NPE in FeedParser via Julien Nioche (TIKA-2269).
+  * Extract images and thumbnails from ODT via Sam Bayer (TIKA-2295).
 
-  * Add parsers for EMF/WMF files (TIKA-2246/TIKA-2247).
+  * Fix potential NPE in FeedParser via Julien Nioche (TIKA-2269).
 
   * Official mime types for BMP, EMF and WMF have been registered with
     IANA, so switch to these (image/bmp image/emf image/wmf) (TIKA-2250)
@@ -45,15 +59,9 @@ Release 1.15 - ??
   * Mime magic for the OneNote family (.one / .onetoc / .onepkg), no parser
     (TIKA-2224).
 
-  * Add parsers for WordPerfect and QuattroPro (.qpw) files.
-    Contributed by Pascal Essiembre (TIKA-1946 and TIKA-2228).
-
   * Add configurability of "preserve-interword-spacing" to
     TesseractOCRParser (TIKA-2190).
 
-  * Added experimental SAX parser for .pptx files. To select this parser,
-    set useSAXPptxExtractor(true) on OfficeParserConfig (TIKA-2210).
-
   * Upgrade to PDFBox 2.0.5 and JempBox 1.8.13 (TIKA-2209/TIKA-2236).
 
   * Refactor MockParser to consolidate service loading
@@ -63,16 +71,12 @@ Release 1.15 - ??
     footnotes, endnotes and comments in legacy .docx parser (TIKA-2192).
 
   * Allow extraction of PDActions (including Javascript) from
-    PDFs (TIKA-2090).
+    PDFs (TIKA-2090).  This is turned off by default.  Users
+    must setExtractActions(true) on the PDFParserConfig.
 
   * Change default behavior in experimental .docx parser to ignore
     deleted text to align with .doc (TIKA-2187).
 
-  * Added experimental SAX parser for .docx files. To select this parser,
-    set useSAXDocxExtractor(true) on OfficeParserConfig (TIKA-1321, TIKA-2191).
-
-  * Add mime detection and parser for Word 2006ML format (TIKA-2179).
-
   * Upgrade to POI 3.16 (TIKA-2116, TIKA-2181, TIKA-2329).
 
   * Allow configuration of timeout for ForkParser (TIKA-2170).
@@ -82,7 +86,7 @@ Release 1.15 - ??
 
   * Add .jpx, .jp2, .ppm to formats handled by Tesseract (TIKA-2174).
 
-  * Upgrade SQLite "provided" dependency to 3.15.1.
+  * Upgrade SQLite "provided" dependency to 3.16.1 (TIKA-2334).
 
   * Update Apache CXF version to 3.0.12 (TIKA-2292).
 

-- 
To stop receiving notification emails like this one, please contact
"commits@tika.apache.org" <co...@tika.apache.org>.