You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2019/02/28 02:53:03 UTC

[tika] branch TIKA-2833 created (now d3317f9)

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-2833
in repository https://gitbox.apache.org/repos/asf/tika.git.


      at d3317f9  TIKA-2833 -- initial commit with csv detection and swapping out the TXTParser in favor of the CSVParser

This branch includes the following new commits:

     new 19ab44f  TIKA-1: Standard {trunk,branches,tags} setup
     new f10cd42  TIKA-1: Standard README, NOTICE, and LICENSE files.
     new 3794e9a  TIKA-4: Basic Maven 2 POM and source tree for Tika.
     new 6e750bb  TIKA-4: Ignore Eclipse project files.
     new c5417be  TIKA-2: Basic web site based on Maven 2.
     new c914984  TIKA-2: The site is deployed to the incubator/tika/site directory in svn.
     new 99b1e06  TIKA-4: Added brief Maven build instructions and some other project documentation.
     new 747610b  - update POM to include additional developer attributes for mattmann
     new 3aff751  - placeholder for unit tests
     new c88c4da  Add Rida Benjelloun id and email
     new 7b7ac41  Changelog for Tika.
     new 77428b0  patch for TIKA-5
     new 6e3ee16  TIKA-7: Added the Lius Lite code from Rida. External dependencies are not included, need to update the POM with proper dependency settings.
     new 4e56c5e  pom.xml: Replaced tabs with spaces, fixed indentation.
     new b27dfe0  TIKA-7: Added missing dependencies to POM.
     new 2b86daf  TIKA-8: Replaced the jmimeinfo dependency with a trivial mime type detector.
     new d363b82  TIKA-10 Remove MimeInfoException catch clauses and import from TestParsers. Contributed by Keith R. Bennett.
     new f42035e  TIKA-13 Fix obsolete package names in config.xml. Contributed by Keith R. Bennett.
     new 3c85d05  - fix for TIKA-11
     new eb9d2e9  - addendum to TIKA-11 (move /src/main/test -> moved to /src/test)
     new a2b47b8  - fixed typo (K. Bennett via mattmann)
     new c47f57c  TIKA-12: Support MIME type detection based on a URL. Patch from Keith Bennett.
     new 74f807d  TIKA-12: Added MimeTypesUtils test case contributed by Keith Bennett.
     new d7dabee  TIKA-19: fix org.apache.tika.TestParsers, test more file types and improve exception handling in LiusConfig and ParserFactory. Includes fixes from TIKA-16 and TIKA-14 which were contributed by Keith R. Bennett, thanks!
     new 1e2373c  remove redundant sourceDirectory statements, we're using the standard Maven layout now
     new 346d584  TIKA-15: Applied patch from Keith Bennett.
     new 9dab155  - bring CHANGES.txt up to date
     new 21cf8be  - fix for TIKA-18
     new 780f13d  TIKA-12 - Decouple Parser from ParserConfig
     new 3762413  - patch for TIKA-6
     new 64c1e0e  - patch for TIKA-6 (cont.)
     new 773fbd2  - fix license header: d'oh
     new 53f61c8  TIKA-25 - Removed hardcoded reference to C:\oo.xml, as suggested by Keith Bennett.
     new 9f4c38c  - fix for TIKA-17
     new 32fda63  - fix for TIKA-22
     new 033a07c  TIKA-21 - Simplified configuration code     - LiusConfig is now instantiated as: new LiusConfig("config.file");     - Dropped use of static caching and maps for config objects     - Made configuration objects immutable (except for Content values)
     new f8183f2  TIKA-17 - Rename all "Luis" classes to be "Tika" classes
     new 0c20384  TIKA-27 - Replaced more "lius" references with "tika"
     new 83cb301  - remove Hadoop/Nutch Configuration and Configurable interfaces - wire together MimeUtils and tika config.xml file - wire together MimeUtils and TikaConfig
     new b2c0c6d  TIKA-30 - Added utility constructors to TikaConfig     - TikaConfig(String), calls TikaConfig(File)     - TikaConfig(File), calls TikaConfig(Document)     - TikaConfig(URL), calls TikaConfig(Document)     - TikaConfig(InputStream), calls TikaConfig(Document)     - TikaConfig(Document), calls TikaConfig(Element)     - TikaConfig(Element), the base implementation
     new bcc9f0c  - fix for TIKA-28
     new 4707526  TIKA-26 - Implemented Parser.getContent(String) in the base class
     new 6244bec  TIKA-26 - Implemented Parser.getStrContent() in the base class
     new 522f4f3  TIKA-26 - Use Map<String, Content> instead of List<Content>
     new da4fde4  typo
     new cfe3527  TIKA-31 - protected Parser.parse(InputStream stream, Iterable<Content> contents)
     new d3e678b  TIKA-32 - remove useless CDATA clauses, and code cleanup - contributed by Keith R. Bennett, thanks!
     new 6db2b1d  - fix for TIKA-36
     new 23c4ddd  Tika-38. TXTParser Keith contribution
     new c30bbcb  TIKA-33 - Stateless parsers
     new 3db2089  Update CHANGES.txt file
     new 7bdb1c8  TIKA-35 Extract MsOffice properties. I have implement a method in Utils class that allows to copy InputStream in memory.
     new 29eedb3  - make Tika=>TIKA to be consistent with JIRA key names - use apache committer ID if Tika committer
     new 701d470  - fix for TIKA-34 (contributed by K. Bennett)
     new a03498c  TIKA-35 - Extract MsOffice properties, use RereadableInputStream developed by K. Bennett
     new 53d14c5  TIKA-35 Close RereadableInputStream in MSExtractor and RereadableInputStreamTest classes.
     new 9a00212  ZIP extraction. Three methods has been added to ParseUtils class. getParsersFromZip() methods return a list of parsers. consult unit test class to see how it works.
     new d4bb41f  TIKA-44 - Spaces for indentation
     new aab5b10  TIKA-42 - Content class needs (String, String, String) constructor     - Patch from Keith Bennett.
     new ee2c3b9  TIKA-43 - Parser interface
     new dcd8975  TIKA-43 - Parser interface
     new f00e6fb  TIKA-47 - Remove TikaLogger     - Removed org.apache.tika.log     - Moved log4j/log4j.properties to src/test/resources     - Use a system property instead of code to configure Log4J
     new 6d37de9  TIKA-46 - Use Metadata in Parser     - With improvements by Chris Mattmann
     new d840df1  Set svn:eol-style to native
     new 58c2360  TIKA-46 - Use Metadata in Parser     - Use Metadata.TITLE as suggested by Chris
     new 62e58ea  TIKA-46 - Use Metadata in Parser     - Moved metadata configuration to the Parser classes     - Removed the Content class
     new aceff84  TIKA-48 - Merge MS Extractors and Parsers     - Moved MSExtractor base class to org.apache.tika.ms.MSParser     - Extracted the PropertiesReaderListener class to a top level class     - Merged MS Extractor classes to MS Parsers     - Refactored the Excel parsing functionality into smaller methods     - Various cleanups (indentation, formatting, etc.)
     new 810b1d4  TIKA-45 - RereadableInputStream needs to be able to read to the end of the original stream on first rewind     - Committed patch from Keith Bennett
     new e01051b  TIKA-41 - Resource files occur twice in jar file     - Use declarative constructs to put the resources in the correct place
     new 09b699a  - update to include Jukka's update for TIKA-41
     new 54aa413  TIKA-49 - use the correct Apache license headers, thanks to Robert Burrell Donkin
     new fa555fa  TIKA-49 - use the correct Apache license headers, thanks to Robert Burrell Donkin
     new aefc60f  svn:ignore more files
     new 838fe5f  TIKA-51, Leftover temp files after running Tika tests, fixed. Also added TIKA_ prefix to all File.createTempFile() calls
     new 7d91d37  TIKA-40 - Tika needs to support diverse character encodings     - Use ICU4J to parse text content     - Support Metadata.CONTENT_ENCODING hints in TXTParser     - Added specific test cases for TXTParser
     new e7d7a1c  - fix for TIKA-55 (contributed by K. Bennett)
     new 423f67e  TIKA-52 - RereadableInputStream needs to support not closing the input stream it wraps
     new c943f69  TIKA-53 - XHTML SAX events from parsers
     new d064cb2  TIKA-57 - Rename org.apache.tika.ms to org.apache.tika.parser.ms
     new a6ca816  update issueManagement section
     new ee39fc8  TIKA-62 - Use TikaConfig.getDefaultConfig() instead of a hardcoded config path in TestParsers
     new 70517c3  TIKA-58 - Replace jtidy html parser with nekohtml based parser
     new 5e0a2b3  add acknowledgment as required by NekoHTML license
     new 3fb58b7  TIKA-60 - Rename Microsoft parser classes
     new e759bbb  TIKA-60 - Rename Microsoft parser classes
     new 1081cb5  TIKA-63 - Avoid multiple passes over the input stream in Microsoft parsers     - Use POIFSFileSystem as the source of both metadata and text content     - Added separate test case classes for the Microsoft parsers     - Got rid of some extra listeners and exceptions
     new 9fce256  TIKA-66 - Use Java 5 features in org.apache.tika.mime     - Use Java 5 generics and foreach constructs to simplify code     - Removed some unused variables and method parameters     - Other minor cleanups
     new 9477c5e  - make test case class name consistent with other names (i.e., start with "Test...")
     new b12c01d  - fix for TIKA-56
     new a328f9c  TIKA-65 - Add encode detection support for HTML parser
     new b0a87ad  remove failing test temporarily
     new 1d2e41f  TIKA-68 - Add dummy parser classes to be used as sentinels
     new 703c4b0  TIKA-67 - Add an auto-detecting Parser implementation
     new 8004791  TIKA-70 - Better MIME information for the Open Document formats
     new 0038570  TIKA-70 - Better MIME information for the Open Document formats
     new e1da9a1  Removed an extra debug print
     new 580824e  TIKA-71 - Remove ParserConfig and ParserFactory
     new 67c79ba  Testing new committer status; added my name.
     new f7079fd  Moved name to its correct position in alphabetical order.  (Sorry!)
     new 9ffdd54  TIKA-72: Added Metadata.RESOURCE_NAME_KEY, and changed uses of "filename" to it.
     new a8d1e67  TIKA-72: As per Chris' suggestion, moved RESOURCE_NAME_KEY from Metadata to new interface TikaMetadataKeys, and changed Metadata to implement TikaMetadataKeys.
     new bd68bd6  Added clearer error message if a stream cannot be opened from a URL.
     new a73e6cc  TIKA-78 - AutoDetectParserTest should include tests for bad MIME types and resource names.
     new fb3290d  TIKA-77.  The ParserPostProcessor is no longer used to wrap the parser.
     new 2e72c03  The use of Utils was there because the method was originally in the Utils class. Now that it is in TikaConfig, using TikaConfig is preferable.
     new 2b09fac  TIKA-72: The use of "filename" is replaced with "resource name", since we may be dealing with file names, URL's, etc.
     new 30d4072  TIKA-78: In AutoDetectParserTest, put each document type test in its own method so that one failure would not prevent the other document types from being tested.
     new 579bee4  Correct indenting (four spaces instead of one as the first indent on line)
     new c176570  Set svn:eol-style to native
     new 91f76e6  TIKA-75: Provides a MimeUtils.getType(URL) method that will determine MIME type based on the stream and, if necessary, the name.
     new a7d091b  TIKA-81.  Added default constructor to MimeUtils.
     new 076d9ef  TIKA-82. Disabling a log call.
     new 4c8f0b8  TIKA-83 - Create a org.apache.tika.sax package for SAX utilities
     new 9db6b38  TIKA-84 - Add MimeTypes.getMimeType(InputStream)
     new 6b827ba  Add news about Keith's committership, and document the website update steps
     new 001e1f7  TIKA-84 - Add MimeTypes.getMimeType(InputStream)     - Added also getMimeType(String, InputStream)     - Extracted common code to readMagicHeader(InputStream)     - Javadoc improvements
     new f383585  TIKA-85 - Add glob patterns from the ASF svn:eol-style documentation     - Added patterns based on svn:eol-style and svn:mime-type defaults     - Many of the patterns should be assigned to appropriate MIME subtypes
     new 5087833  TIKA-87 - MimeTypes should allow modification of MIME types     - Reversed the MimeTypes -> MimeTypesReader dependency     - Work in progress
     new 2291cab  TIKA-88: Moved all nonredundant functionality from MimeUtils to MimeTypes. Moved test code from MimeUtilsTest to MimeTypesTest accordingly. Deleted MimeUtils class and its test class. Modified URL for MIME type config file in default tika-config.xml to have leading "/". Created MimeTypesFactory class as a public factory and adapter to package protected MimeTypesReader.
     new b1bcf42  TIKA-87 - MimeTypes should allow modification of MIME types     - Merged MimeInfo and MimeType     - Made MimeType Comparable
     new 05b7bb7  TIKA-87 - MimeTypes should allow modification of MIME types     - Made Magic Comparable
     new 4b126e8  TIKA-87 - MimeTypes should allow modification of MIME types     - MimeType.addAlias(String) can now be used to add new aliases     - MimeType.addPattern(String) can now be used to add new patterns     - MimeTypes.forName(String) validates the name     - MimeTypes.forName(String) creates and registers the type if needed     - Simplified type name handling and validation     - New test cases
     new 3be8f6c  TIKA-87 - MimeTypes should allow modification of MIME types     - MimeType.setSuperType(MimeType) can now be used to modify inheritance
     new a01dcbe  TIKA-87 - MimeTypes should allow modification of MIME types     - Streamlined pattern handling
     new 5cbfe94  TIKA-100 - Structured PDF parsing     - Customized the PdfTextStripper class to produce XHTML SAX events       (there's a somewhat similar PdfText2HTML class in PDFBox, but       that class produces a character stream instead of SAX events)
     new c13e78d  - fix for JDK 6 reliance introduced in TIKA-100 commit
     new add1d56  - fix for TIKA-101 (contributed by Niall Pemberton)
     new b6ba8b7  set svn:eol-style to native
     new c839f7e  - move bin.xml, src.xml to src/main/assembly to make mvn assembly:assembly work correctly (by default)
     new 1bf1bed  TIKA-91: Add proper attribution for code from textmining.org
     new 91913d3  TIKA-102 - Parser implementations loading a large amount of content into a single String could be problematic     - Patch by Niall Pemberton
     new 0dd035b  TIKA-102 - Parser implementations loading a large amount of content into a single String could be problematic     - Forgot to include the new files in Niall Pemberton's patch
     new fa6bb7e  TIKA-107 - Remove use of assertions for argument checking     - Committed patch from Niall Pemberton
     new 2e18733  TIKA-104 - Add utility methods to throw IOException with the caused intialized     - Added an IOException subclass instead     - Adapted code from Niall Pemberton's patch
     new 1f2b716  TIKA-106 - Remove dependency on Jakarta ORO - use JDK 1.4 Regex     - Patch from Niall Pemberton
     new ae5915a  TIKA-105 - Excel parser implementation based on POI's Event API     - New class contributed by Niall Pemberton
     new a424f6c  - prep for 0.1-incubating release
     new 4ca79ab  TIKA-110: Add KEYS file for Tika
     new c1fec08  TIKA-111: Missing license headers     - Added license headers where needed     - Merged src/site/SITE-README.txt to README.txt     - Added a HEADER.txt file with the standard header
     new 1cb374a  - add my gpg key to KEYS file in prep for release
     new f722083  - prep for release
     new 08e456a  pom.xml: Updated trunk version to 0.2-SNAPSHOT
     new 9c97684  - add download link for tika releases to website
     new b04f706  - update site with news of first tika release (0.1-incubating)
     new 67dbe57  - update CHANGES.txt to reflect new Tika dev version (0.2-incubating)
     new 0da63b1  - Replace XMLParser by XMLParserUtils - Create Class DcXMLParser that extends XMLParserUtils and implements Parser. This class allows DublinCore metadata parsing - Add method setXMLParserNameSpaceContext() in XMLParserUtils. - Improvement of OpenOfficeParser to extract document content from office:body. - OpenOfficeParser extends XMLParserUtils - Modification to tika-config to use DcXMLParser instead of XMLParser
     new ebe7868  TIKA-112 XMLParser improvement - Replace XMLParser by XMLParserUtils - Create Class DcXMLParser that extends XMLParserUtils and implements Parser. This class allows DublinCore metadata parsing - Add method setXMLParserNameSpaceContext() in XMLParserUtils. - Improvement of OpenOfficeParser to extract document content from office:body. - OpenOfficeParser extends XMLParserUtils - Modification to tika-config to use DcXMLParser instead of XMLParser
     new 314a53b  add license header
     new 7f80b3f  remove unused imports
     new d795c5f  TIKA-109: WordParser fails on some Word files     - Applied WordParser patch from Dave Meikle     - Removed the now unused WordTextPiece class
     new 7fdba7e  TIKA-105: Excel parser implementation based on POI's Event API     - Replaced ExcelParser with ExcelEventParser     - Use a setter for listenForAllRecords       (JavaBean properties are more flexible       than constructor arguments)     - Use debug logging for all output     - Removed some of the explicit log.isDebugEnabled() checks       (simplicity over insignificant performance gains)     - Inlined the trivial debug(Record) method
     new 93411a7  TIKA-105: Excel parser implementation based on POI's Event API     - Added a changelog entry for revisions 606141 and 613566.
     new 4b33a0d  TIKA-109: WordParser fails on some Word files     - The patch was from Dave Meikle and not from Mats.       I'm sorry for the mistake.
     new 5e97d46  TIKA-116: Streaming parser for OpenDocument files     - Streaming XPath implementation in o.a.tika.sax.xpath     - New o.a.tika.sax utility classes     - Streaming XML parser     - Avoid closing the input stream while parsing XML     - Streaming OpenDocument parser     - Extract correct OpenDocument MIME type while parsing
     new 88d8173  TIKA-117: Drop JDOM and Jaxen dependencies     - Note the signature changes in TikaConfig constructors!     - Dropped a few obsolete Utils methods
     new 110abef  TIKA-115: Tika package with all the dependencies     - The bin assembly now contains all runtime dependencies     - Reviewed the dependency licenses and updated the       NOTICE and LICENSE files accordingly
     new e448a95  TIKA-97: Tika GUI     - Added a simple Swing GUI for Tika
     new ed76d40  TIKA-97: Tika GUI     - Dropped Java 6 methods
     new 50dd486  TIKA-97: Tika GUI     - Make the extracted text content scrollable
     new 4b0936c  TIKA-97: Tika GUI     - Dropped another Java 6 dependency
     new 5030b11  TIKA-116 - isolate test that uses accented chars, which currently fails
     new c4d64f8  TIKA-116 - DcXMLParserTest.testXMLParserNonAsciiChars fixed
     new b0ddfce  TIKA-96: Tika CLI     - Added the o.a.tika.cli.TikaCLI command line class     - Initial features:       + four output formats (xml, html, text, metadata)       + three input sources (files, URLs, standard input)       + two logging levels (info and debug)       + usage message       + GUI mode     - Added simple Unix and DOS start scripts     - Added required packaging and manifest settings
     new 6774cd7  TIKA-118: Bouncy Castle binaries require US exports regulation compliance     - Added export control information in the README
     new 4ed2d4b  TIKA-123: Structured MS Office parsing     - Changed OfficeParser to allow structured parsing in subclasses     - ExcelParser now outputs XHTML tables with nice tabs and line breaks     - Dropped unused formatting code from ExcelParser (TODO fix that)     - Streamlined PowerPointParser and started using Java 5 features     - No functional changes (yet) in PowerPointParser     - No functional changes (yet) in WordParser
     new b2b79ce  TIKA-123: Structured MS Office parsing     - New utility methods in XHTMLContentHandler
     new 8e3a5f5  TIKA-123: Structured MS Office parsing     - Fixed incorrect test case
     new 71aee51  TIKA-123: Structured MS Office parsing     - Close the PowerPoint <p/> element properly
     new b8bad51  TIKA-103: Excel parsing ignores cell formating     - Added test document contributed by Niall Pemberton
     new e6fa719  TIKA-123: Structured MS Office parsing     - Upgraded POI dependency to 3.0.2-FINAL and added poi-scratchpad
     new 9398c06  TIKA-123: Structured MS Office parsing     - Replaced custom PowerPoint parser with PowerPointExtractor from POI HSLF
     new 1cd1b27  TIKA-123: Structured MS Office parsing     - Replaced custom Word parsing code with WordExtractor from POI HWPF
     new 80866ff  TIKA-122: Use Commons IO 1.4     - Introduced Commons IO 1.4 dependency     - Use the new dependency in the obvious places
     new a738abc  TIKA-123: Structured MS Office parsing     - We no longer use the textmining.org code
     new 7158881  TIKA-123: Structured MS Office parsing     - Moved property file parsing to a separate Parser class
     new 75fb47b  TIKA-123: Structured MS Office parsing     - Consolidated all MS Office parsing to a single class     - Reliable MIME magic for pseudo type application/x-tika-msoffice     - Added MIME magic for RTF
     new 3a743ef  TIKA-126: Add Parser.parse(InputStream, Metadata) for metadata extraction
     new c26e7b3  TIKA-127: Add support for Visio files
     new 5cb14de  TIKA-129: node() support for the streaming XPath utility
     new 68628ea  TIKA-130: self-or-descendant axis does not match self in streaming XPath     - Also added @Override annotations to SubtreeMatcher
     new aa271ef  TIKA-131: Lazy XHTML prefix generation
     new c370c52  TIKA-128: HTML parser should produce XHTML SAX events
     new 2907d8f  TIKA-133: TeeContentHandler constructor should use varargs
     new 3e05c07  TIKA-97: Tika GUI     - New tabs for different views of the parser output     - Improved drag-and-drop support     - Improved error handling
     new 49028e3  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Patch by Niall Pemberton
     new f81d990  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Use a TreeMap instead of custom linked lists for the sparse matrix
     new cdd4cf3  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Replace TikaExcelCell with a modular/extensible set of classes that       encapsulate the functionality of rendering the cell content to XHTML
     new a8c7b38  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Refactored processCellValue to a getCellValue factory method
     new 474c19f  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Added NumberCell for formatted numbers
     new 9da6dd5  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Further refactoring to simplify cell value handling
     new f6d4c07  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Merged the two sid case statements to one
     new 9028d6f  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Improved formatting of internalProcessRecord
     new 6f20d1a  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Improved exception handling, now all subsequent HSSF events are simply ignored
     new f198ac1  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - Removed the insideWorksheet flag     - Improved javadocs     - Extracted PointComparator to an explicit utility class
     new 77e1d38  TIKA-97: Tika GUI     - Simplify the HTML output for JEditorPane to better understand it
     new 9624d5b  Reformatted NOTICE to be less verbose
     new 8740cb5  TIKA-132: Refactor Excel extractor to parse per sheet and add hyperlink support     - The numbers are now correctly formatted thanks to the default       NumberFormat being used instead of Double.toString()     - Updated the test case accordingly, and added assertions to prevent regressions     - TODO: Proper number formatting based on Excel formatting settings
     new 4228d37  TIKA-123: Structured MS Office parsing     - Commented out failing test case.     - TODO: Improve getMimeType to better support MS Office files
     new ddfe1a5  TIKA-123: Structured MS Office parsing     - More failing MS Office detection test cases
     new f45c4b7  TIKA-134: mvn package does not produce packages for bin/src     - Based on a patch by Karl Heinz Marbaise
     new 315b7e1  TIKA-138: Ignore HTML style and script content     - Added a set of elements to discard, currently style and script
     new 64e1e51  TIKA-113: Metadata (such as title) should not be part of content     - Added BodyContentHandler that only processes XHTML body events     - Added utility constructors for WriteOutContentHandler and BodyContentHandler     - Updated test cases and related code to use BodyContentHandler where appropriate     - Removed AppendableAdaptor class as it's not used anymore
     new 759cf17  Replaced tabs with spaces in tika-mimetypes.xml
     new 25150e0  TIKA-139: Add a composite parser
     new a5c897e  TIKA-87: MimeTypes should allow modification of MIME types TIKA-89: Rename MimeType and MimeTypes     - Trying to decouple the MIME type registry from Tika configuration     - Work in progress
     new 7c83be9  TIKA-92: Image metadata extraction     - Added a simple ImageParser based on ImageIO     - Currently only supports custom "width" and "height" metadata fields     - Included a few test images
     new 068d81f  Simplified log4j configuration for unit tests
     new d368623  - fix for TIKA-142
     new 284f644  TIKA-143: Add ParsingReader
     new 83d6421  Modified svn:ignore to cover things like ".checkstyle". Also there is no longer need to ignore log files.
     new fd62fb0  TIKA-115: Tika package with all the dependencies    - Create a runnable standalone jar instead of a bin package
     new 6d73404  TIKA-115: Tika package with all the dependencies    - Shell scripts no longer needed as we have a runnable jar
     new e0a48ab  typo
     new e3631d0  TIKA-118: Bouncycastle binaries requires US exports regulation compliance     - Added a download page with an export notice
     new 6c031eb  TIKA-144: Upgrade nekohtml dependency     - Upgraded to version 1.9.7     - This version is ALv2 and has no NOTICE file
     new fa3b197  TIKA-145: Separate NOTICEs and LICENSEs for binary and source packages
     new 73cffdb  TIKA-146: Upgrade to POI 3.1     - Upgraded POI dependency
     new 91d1077  TIKA-146: Upgrade to POI 3.1     - Enable Excel hyperlink support available in POI 3.1
     new 73603f4  TIKA-54: Outlook msg parser     - Patch by Dave Meikle     - Test file by Rida Benjelloun
     new c9f08c2  TIKA-99: Support external parser programs
     new eb79973  TIKA-149: Parser for zip files
     new 28c1369  TIKA-149: Parser for zip files
     new 70ca6ce  TIKA-149: Parser for zip files
     new 9b506be  TIKA-149: Parser for zip files
     new 39ab46a  TIKA-149: Parser for zip files
     new e12b04d  TIKA-149: Parser for zip files
     new d1ab05c  TIKA-149: Parser for zip files
     new 424c708  TIKA-150: Parser for tar files
     new 46f2fb8  TIKA-151: Stream compression support
     new c22581f  TIKA-156: Some MIME magic patterns are ignored by MimeTypes
     new e889dfa  TIKA-155: Java class file parser
     new 19f04ad  Removed debug prints from test cases.
     new 2a15448  Disabled the spell checking performance test as there is no assertion to check.
     new 6abd44b  TIKA-155: Java class file parser
     new 2a405d3  TIKA-155: Java class file parser
     new b2a441f  TIKA-108: New Tika logos
     new 982c97b  Documentation, first draft...
     new bd23146  Add the documentation page (just one so far :-) to navigation.
     new c2b41c4  Missing license header.
     new 9ff3f2d  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new 72b945e  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new 8080fca  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new e97ee49  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new e0c59b5  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new ce4095f  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new 58401bc  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new 058e1fe  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new d2df838  TIKA-120: Add support for retrieving ID3 tags from MP3 files
     new 79a5048  TIKA-54: Outlook msg parser
     new df019d9  TIKA-54: Outlook msg parser
     new 4d37376  TIKA-157: List all the document formats supported by Tika
     new 892eebd  TIKA-151: Stream compression support
     new 8956153  TIKA-157: List all the document formats supported by Tika
     new 0482529  TIKA-114: PDFParser : Getting content of the document using "writer.ToString ()" , some words are stuck together
     new a766f5c  TIKA-157: List all the document formats supported by Tika
     new ea4bcc3  TIKA-157: List all the document formats supported by Tika
     new 57260ff  TIKA-157: List all the document formats supported by Tika
     new 50eaa3b  TIKA-157: List all the document formats supported by Tika
     new 5f6e66f  TIKA-108: New Tika logos
     new f792637  TIKA-161: Enable PMD reports
     new 7d8686c  TIKA-126: Add Parser.parse(InputStream, Metadata) for metadata extraction
     new ca427a1  TIKA-159 - Add support for parsing basic audio types: wav, aiff, au, midi
     new 4937653  Removed the unused <distributionManagement/> entry
     new 836c4b9  TIKA-140: HTML parser unable to extract text
     new c6d2f09  TIKA-163: GUI does not support drag and drop in Gnome or KDE
     new 032f752  Improved/extended documentation
     new 871a44c  Fix typo, merge paragraphs.
     new d7fc922  Updated index page with documentation and download links in body text.
     new 90fb111  Tested new committer status by adding my name.
     new 9d5c3d8  TIKA-166: Updated HTMLParser to parse HTML meta tags into Metadata
     new 3a7faf9  TIKA-166: Updated HTMLParser to parse HTML meta tags into Metadata
     new 6e87254  Graduate Tika from Incubator to Lucene
     new 669b0e8  TIKA-170: Graduate Tika
     new 0dc4fa2  TIKA-170: Graduate Tika
     new 8cb0cb8  Added code-signing key in preparation for a release
     new b5ff62e  TIKA-170: Graduate Tika
     new 3a58cc2  Updated formats page to finish some todos on supported formats
     new 1b9fd8c  TIKA-170 - updated version number to reflect graduated status
     new da3388c  TIKA-170: Graduate Tika
     new 40ba713  TIKA-170: Graduate Tika
     new a094d5c  Added some recent news.
     new 37c987a  Upgraded version number to 0.3-SNAPSHOT now that 0.2 is branched.
     new 9c86a5d  TIKA-175: Retrotranslate Tika for use in Java 1.4 environments
     new 647fdac  TIKA-176: Getting Started guide
     new 8dfa33c  TIKA-171: New ContentHandler for plain text output that has no problem with missing white space after XHTML block tags
     new 1fb92c3  TIKA-164: Update nekohtml version
     new 00253e0  TIKA-165: Update icu4j
     new c1b35b7  TIKA-176: Getting Started guide
     new 6adc68b  Added missing license information on HTML, XML and SVG files
     new 2b4b0f7  TIKA-177: Improved build instructions in README
     new 3079789  Add missing svn:eol-style settings.
     new 49d2f2b  Add missing svn:eol-style settings.
     new 031a1bd  TIKA-176: Getting Started guide
     new 27fdf2c  TIKA-172: New Open Document Parser that emits structured XHTML content
     new f969a26  TIKA-172: New Open Document Parser that emits structured XHTML content
     new 3c15d9c  TIKA-172: New Open Document Parser that emits structured XHTML content
     new aab5571  Updates to CHANGES.txt to reflect re-creation of 0.2 release from trunk
     new ed92e58  TIKA-170 : Updated mailing lists to reflect graduation
     new a0886c3  TIKA-170 : Updated tika-user address
     new aac081e  TIKA-152: Support for Office XML files
     new e43a19e  TIKA-179: Tika stand alone CLI --text output mostly not working, other output formats are fine
     new 1e81622  TIKA-181: Retrotranslator plugin fails if using a 1.0-SNAPSHOT version
     new 8e8d2f8  Updated website, CHANGES.txt and README.txt for 0.2 release
     new 2e937c5  TIKA-183: Fix Maven plugin versions
     new 9506373  TIKA-184: Avoid the <resource/> entry on ${basedir}
     new 717833a  TIKA-180: XHTMLContentHandler unable to extract text from MSWord file
     new 235e807  TIKA-180: XHTMLContentHandler unable to extract text from MSWord file
     new aefe614  TIKA-180: XHTMLContentHandler unable to extract text from MSWord file
     new a9fb614  TIKA-188: Automatic whitespace for block elements in XHTMLContentHandler
     new e77e233  TIKA-185: XML files with (unsatisfied) SYSTEM entities can not be extracted
     new aef10d5  CHANGES.txt: Added a higher level summary of some of the more notable changes in the upcoming 0.3 release.
     new 312721c  CHANGES.txt: Added credits for all people who show up in the 0.3 contribution report.
     new 9b7b2af  CHANGES.txt: Added a pointer to the contribution report.
     new 8757a35  TIKA-154: Better detection of plain text versus binary formats with a text header
     new a77cf07  TIKA-95: Pluggable magic header detectors
     new 7d67ade  TIKA-95: Pluggable magic header detectors
     new 006831e  TIKA-95: Pluggable magic header detectors
     new 2e25dc0  TIKA-95: Pluggable magic header detectors
     new 2a6d726  TIKA-95: Pluggable magic header detectors
     new e11ead4  TIKA-95: Pluggable magic header detectors
     new d806c4d  TIKA-95: Pluggable magic header detectors
     new 816eb26  TIKA-95: Pluggable magic header detectors
     new 2699cf4  TIKA-190: wrong handling of ignorableWhitespace/characters in SafeContentHandler and WriteoutContentHandler
     new dd352d2  TIKA-189: Text extraction from Excel files juxtaposes cells
     new b6bcbd9  TIKA-95: Pluggable magic header detectors
     new 59ec6ce  TIKA-95: Pluggable magic header detectors
     new de1a353  TIKA-95: Pluggable magic header detectors
     new 32d2989  TIKA-95: Pluggable magic header detectors
     new 073a36a  TIKA-95: Pluggable magic header detectors
     new e167f83  TIKA-95: Pluggable magic header detectors
     new 6397cde  TIKA-189: Text extraction from Excel files juxtaposes cells
     new 8b7a1d4  TIKA-95: Pluggable magic header detectors
     new 7789252  TIKA-95: Pluggable magic header detectors
     new 16669e3  TIKA-95: Pluggable magic header detectors
     new f807af0  TIKA-192: Add glob and magic patterns for image types
     new ab6aec8  TIKA-192: Add glob and magic patterns for image types
     new 741c575  TIKA-192: Add glob and magic patterns for image types
     new 820841c  TIKA-192: Add glob and magic patterns for image types
     new de1c077  TIKA-192: Add glob and magic patterns for image types
     new b8a131f  TIKA-192: Add glob and magic patterns for image types
     new 16f436c  TIKA-192: Add glob and magic patterns for image types
     new 418284d  TIKA-192: Add glob and magic patterns for image types
     new d966d35  TIKA-192: Add glob and magic patterns for image types
     new a93976f  TIKA-192: Add glob and magic patterns for image types
     new 411c880  TIKA-192: Add glob and magic patterns for image types
     new 4429727  TIKA-192: Add glob and magic patterns for image types
     new d3a3286  TIKA-192: Add glob and magic patterns for image types
     new 8dbb649  TIKA-192: Add glob and magic patterns for image types
     new 0fa61dc  TIKA-192: Add glob and magic patterns for image types
     new ce6ac91  TIKA-196: Configuration parser fails in Java 1.4
     new 9758d20  TIKA-199: Improved audio detection and parsing
     new 215473a  Reverted changes that were accidentally included in revision 741674 (TIKA-199).
     new 922fb5f  TIKA-199: Improved audio detection and parsing
     new 2e85556  TIKA-199: Improved audio detection and parsing
     new e678e32  TIKA-201: Extract lyrics and other text from MIDI audio files
     new 2ba3c7e  TIKA-201: Extract lyrics and other text from MIDI audio files
     new 2a77cf3  TIKA-202: Warnings during Site generation
     new f6f9d4f  TIKA-197: Microsoft Outlook (msg) files get parsed multiple times
     new b669279  add apachecon promo
     new d9575ae  TIKA-203: Earlier metadata extraction in ParsingReader
     new 8a80ee5  TIKA-152: Support for Office XML files
     new a981a84  TIKA-192: Add glob and magic patterns for image types
     new 91146f5  Acknowledge more Tika 0.3 contributors
     new 1588c38  TIKA-186: Refactor the MS Office property names to MSOffice.java
     new 78ac637  TIKA-152: Support for Office XML files
     new 3b01147  TIKA-152: Support for Office XML files
     new 74df811  TIKA-152: Support for Office XML files
     new 0688994  Updated year in copyright notices.
     new cc9a94f  - fix for TIKA-194
     new 29c5b52  - fix for TIKA-205
     new c5f0d09  - remove extraneous System.out
     new 8c2c6bc  - 0.3 RC version bump
     new 59402a9  - reflect 0.3 RC (even though release date will change, will make final updates in branch)
     new f322f4c  TIKA-206: Improved pipe mode in Tika CLI
     new 9b541f4  TIKA-200 Allow drag and drop of URLs in TikaGUI
     new 3b4986c  Updated trunk version to 0.4-SNAPSHOT
     new b75cfe3  - update to 0.4 unreleased changes
     new 972e57a  apache tika 0.3 docs update
     new 035d8d3  TIKA-211: memory issue in ExcelExtractor
     new b7e960e  TIKA-211: memory issue in ExcelExtractor
     new 60c9b0e  TIKA-210: html content directly under body node not parsed correctly
     new d016d32  TIKA-208: Special characters in HTML file are not parsed correctly
     new 23e3bae  TIKA-217: TikaConfig fails when a parser can't be loaded due to an Error
     new dd22d43  TIKA-216: Zip bomb prevention
     new f29a52a  TIKA-216: Zip bomb prevention
     new 896aaaf  TIKA-216: Zip bomb prevention
     new b22616e  TIKA-216: Zip bomb prevention
     new 1c0109f  TIKA-216: Zip bomb prevention
     new 4b64d9a  TIKA-216: Zip bomb prevention
     new 3040a36  TIKA-215: Use a thread pool in ParsingReader
     new 12ec68e  TIKA-209: Language detection is weak
     new 4ab12d7  Improved documentation about support for audio formats
     new 8ab4ea1  Improved documentation formatting.
     new 5ff52d6  TIKA-219: Split Tika to separate modules
     new 378ccee  TIKA-219: Split Tika to separate modules
     new 9b7e6ea  TIKA-219: Split Tika to separate modules
     new 14f7707  TIKA-219: Split Tika to separate modules
     new a2371ab  TIKA-219: Split Tika to separate modules
     new 1caa599  TIKA-219: Split Tika to separate modules
     new 77abee6  TIKA-221: Drop log4j dependency from tika-core
     new 049cace  TIKA-222: Drop commons-codec dependency from tika-core
     new cd088d3  TIKA-226 - Change to generate javadocs and source references for each module
     new 6be2c4a  - TIKA-227 Make MimeType JavaDoc match behaviour (Robert Burrell Donkin via mattmann)
     new 8189f14  TIKA-220: Remove obsolete utility code
     new 368e6fa  TIKA-225: [PATCH] Various bugfixes for MIME detection
     new e5d034d  TIKA-225: [PATCH] Various bugfixes for MIME detection
     new 27a100a  TIKA-225: [PATCH] Various bugfixes for MIME detection
     new e10b21c  TIKA-230 : Addition of Parent POM File. Patch by Robert Burrell Donkin
     new 199fd5f  TIKA-230: Add the parent POM to the multimodule build
     new ea35b47  TIKA-230: [PATCH] Parent pom
     new d96821e  TIKA-230: More POM cleanups.
     new 985a604  Ignore generated and hidden files.
     new fd995b9  TIKA-229: Per-component LICENSE and NOTICE files
     new d9493ee  TIKA-233: Inline the ICU4J charset detection logic
     new 7151759  TIKA-228: Add OSGi metadata to Tika
     new fc3c880  TIKA-228: Add OSGi metadata to Tika
     new f86934f  TIKA-228: Add OSGi metadata to Tika
     new f9d22f8  TIKA-219: Split Tika to separate modules
     new 1cd949d  TIKA-228: Add OSGi metadata to Tika
     new e5bf650  TIKA-204: Use commons-compress for parsing packages
     new f6ead4d  TIKA-233: Inline the ICU4J charset detection logic
     new 206dc7a  TIKA-198: Better distinction between IOException and TikaException
     new 3864f42  TIKA-193: PDFParser adds mime-type twice
     new 1991c32  TIKA-231: Difference between Web-Site and real working code
     new 3864ebb  TIKA-231: Difference between Web-Site and real working code
     new 6aaa141  TIKA-237: Better distinction between SAXException and TikaException
     new 26a8225  TIKA-237: Better distinction between SAXException and TikaException
     new 3a6036e  TIKA-87: MimeTypes should allow modification of MIME types
     new fd4a584  TIKA-234: Drop SpellCheckedMetadata
     new fa0955e  TIKA-238: Better handling of delegating parser implementations
     new bb1ed46  Replace a tab with spaces.
     new 2fa2699  TIKA-238: Better handling of delegating parser implementations
     new f670711  TIKA-235: Site search powered by Lucene/Solr
     new fd0470f  TIKA-225: [PATCH] Various bugfixes for MIME detection
     new 3efc2cf  TIKA-204: Use commons-compress for parsing packages
     new a70b99f  TIKA-204: Use commons-compress for parsing packages
     new 96742ab  TIKA-248: No logging in tika-core
     new 87a45f9  TIKA-249: Inline key commons-io classes
     new 2d893ab  TIKA-247: parse language and category from MS Office properties
     new 410d493  TIKA-148: The ExcelParsing should scan the cell comments
     new 74947b6  TIKA-244: Missing Header/Footer text for Word'97 documents
     new 226aaf4  TIKA-255: Embedded Visio Content Crashes PPT Parser
     new 0236e01  TIKA-253: Better mime type for ooxml files
     new cccb797  TIKA-254: parse ooxml templates and macro-enabled formats
     new f04edc5  TIKA-240: Drop the BOM when extracting plain text
     new 15c8343  TIKA-258: AutoDetectParser does not allow to use alternative mime detector
     new acf76a6  - fix for TIKA-121 MimeType.clean method no longer exists as a capability
     new f676cc0  - fix for TIKA-74 Test Resources should be loaded by the class loader (e.g. getResourceAsStream())
     new 2be0726  - cleanup javadoc
     new 3d8a58a  TIKA-257: Uncorrect mime-type detection for ooxml
     new 471655e  TIKA-260: Weird transitive dependencies from commons-logging
     new 7da2b34  - prep for release
     new fc8bdb7  - prep for 0.4 RC (wow there are a lot of these poms now!)
     new 401af9c  Update version number to 0.5-SNAPSHOT
     new 9b4ca8f  Improved instructions for getting started with Tika.
     new ace4928  TIKA-262: ParsingReader does not parse metadata for larger MS Office documents
     new f4ebc4b  - update site documentation to reflect release of 0.4
     new 458a03d  - prepare for next release
     new 576101f  - update to reflect 0.4 release
     new 8c130ba  TIKA-209: Language detection is weak.
     new 29af90d  TIKA-263: Core parser classes duplicated in the tika-parser and tika-core jar files.
     new 62697af  TIKA-209: Language detection is weak.
     new 27e6e47  TIKA-209: Language detection is weak.
     new 655afb5  TIKA-209: Language detection is weak.
     new 0c8e965  TIKA-209: Language detection is weak.
     new b4e1a57  TIKA-265: Web-Site http://lucene.apache.org/tika/gettingstarted.html does not correspond to current release
     new 985e93d  TIKA-264: Getting Started: change "source directory" to "base directory" or similar
     new 5099b33  TIKA-250: XLS parser does not extract empty sheet names
     new 3acc27c  TIKA-266: Empty tika-core jar
     new 290bbf8  Code style: Reindent at four spaces, remove unused access modifiers, inline singleton classes.
     new 84587e3  TIKA-209: Language detection is weak
     new 02519da  - update web site news per grant's comment at: http://www.lucidimagination.com/search/document/e6e888e48060d38c/apachecon_promo
     new a09f5af  TIKA-268: HTMLParser omits necessary space-characters when parsing table-data
     new 00ae4b3  TIKA-267: encrypted pdf files aren't handled properly
     new d34e550  TIKA-217: secure-processing not supported by some JAXP implementations
     new 4abeb2d  TIKA-217: secure-processing not supported by some JAXP implementations
     new 296e279  TIKA-274: CharsetDetector.setDeclaredEncoding has no effect
     new 4307560  TIKA-273: Content encoding in HtmlParser
     new 3922b01  TIKA-275: Parse context
     new 41ca17f  TIKA-275: Parse context
     new f091ceb  TIKA-275: Parse context
     new d785334  TIKA-275: Parse context
     new 1b6a563  TIKA-276: Drop the StringUtils class
     new d75342b  TIKA-269: Ease of use -facade for Tika
     new a6cb65c  TIKA-269: Ease of use -facade for Tika
     new 06d4d21  TIKA-269: Ease of use -facade for Tika
     new 0a60ac0  TIKA-269: Ease of use -facade for Tika
     new 4acfd54  TIKA-275: Parse context
     new 02b1eaa  TIKA-277: Tika stand alone CLI --possibility to specify output encoding (--text)
     new 934fd0e  TIKA-277: Tika stand alone CLI --possibility to specify output encoding (--text)
     new d634072  TIKA-158: Upgrade to Apache PDFBox
     new c98e188  TIKA-158: Upgrade to Apache PDFBox
     new c5abb68  TIKA-280: Fix NOTICE files to match consensus from legal team
     new 231fac1  TIKA-281: Use repository.apache.org to deploy snapshots and releases
     new 95071ec  TIKA-281: Use repository.apache.org to deploy snapshots and releases
     new ee13b1a  TIKA-281: Use repository.apache.org to deploy snapshots and releases
     new 4c04a49  TIKA-158: Upgrade to Apache PDFBox
     new 43bb4c6  TIKA-283: XWPFWordExtractorDecorator does not extract links in tables
     new a68d61c  TIKA-283: XWPFWordExtractorDecorator does not extract links in tables
     new 120c238  TIKA-275: Parse context
     new 4130936  TIKA-275: Parse context
     new f4f7f72  TIKA-285: Update media type registry to the latest httpd mime type database
     new 8f90597  TIKA-285: Update media type registry to the latest httpd mime type database
     new 659945d  TIKA-285: Update media type registry to the latest httpd mime type database
     new 2ad8d40  TIKA-285: Update media type registry to the latest httpd mime type database
     new d77f5cf  TIKA-285: Update media type registry to the latest httpd mime type database
     new a1d21f4  TIKA-285: Update media type registry to the latest httpd mime type database
     new 439f7ed  TIKA-285: Update media type registry to the latest httpd mime type database
     new cce94cb  TIKA-285: Update media type registry to the latest httpd mime type database
     new 0cc002f  TIKA-285: Update media type registry to the latest httpd mime type database
     new 6661e11  TIKA-285: Update media type registry to the latest httpd mime type database
     new e99d737  TIKA-285: Update media type registry to the latest httpd mime type database
     new 2ec2eb1  TIKA-285: Update media type registry to the latest httpd mime type database
     new ead42a4  TIKA-285: Update media type registry to the latest httpd mime type database
     new b19fe7e  TIKA-285: Update media type registry to the latest httpd mime type database
     new 710f9bd  TIKA-285: Update media type registry to the latest httpd mime type database
     new 5649aaa  TIKA-275: Parse context
     new 9cfb4e1  TIKA-269: Ease of use -facade for Tika
     new da2fb0c  TIKA-269: Ease of use -facade for Tika
     new 7990a84  TIKA-269: Ease of use -facade for Tika
     new e59923b  TIKA-285: Update media type registry to the latest httpd mime type database
     new d7a498f  TIKA-285: Update media type registry to the latest httpd mime type database
     new 7240438  TIKA-285: Update media type registry to the latest httpd mime type database
     new d7b1952  TIKA-285: Update media type registry to the latest httpd mime type database
     new 7b0b425  TIKA-291: Adobe InDesign support
     new 18d0767  TIKA-281: Use repository.apache.org to deploy snapshots and releases
     new b6ced30  TIKA-281: Use repository.apache.org to deploy snapshots and releases
     new 6b9a82f  TIKA-292: PDFBox is too verbose
     new a459b77  TIKA-284: Upgrade to POI 3.5-FINAL
     new dc95913  TIKA-299: Update Geronimo dependency in tika-parsers pom.xml to 1.0.1
     new 26293c0  TIKA-297: The HtmlParser ignores <menu> tags, resulting in invalid XHTML
     new a463b9f  TIKA-297: The HtmlParser ignores <menu> tags, resulting in invalid XHTML
     new c5038b8  TIKA-296: Automatically set the supertype for "+xml" mimetypes
     new 1cc319f  TIKA-294: TikaCLI always uses System.in for input
     new c9dfb89  TIKA-290: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@6caf16
     new 0486ffb  TIKA-256: MSWord parser does not extract footnotes and comments
     new 0297f44  TIKA-279: XWPFWordExtractorDecorator does not extract some headers/footers
     new 6c6c17d  TIKA-293: XWPFWordExtractorDecorator does not extract bookmarks
     new cb4e59d  Add svn:eol-style
     new 1e7a874  TIKA-295: Rough cut of mbox parser
     new d509069  TIKA-310: Use TagSoup to parse HTML
     new 99c2dd5  TIKA-311: Broken handling of <a name="..."/> tags
     new 645850a  TIKA-287: HtmlParser should resolve relative paths in <a href="xxx"> elements
     new b42b4b8  TIKA-287: HtmlParser should resolve relative paths in <a href="xxx"> elements
     new 732fadb  TIKA-287: HtmlParser should resolve relative paths in <a href="xxx"> elements
     new 987ec1d  TIKA-309: Mime type application/rdf+xml not correctly detected
     new 16924c3  TIKA-306: patch: OOXMLParserTest uses OpenOfficeParser
     new 03be219  TIKA-305: XHTML href attributes end up in the wrong namespace
     new 29c9a27  TIKA-304: HtmlParser could be easier to subclass
     new 52ac68f  TIKA-302: patch: initial support for ePUB
     new e7524be  TIKA-302: patch: initial support for ePUB
     new 01bf6b5  TIKA-302: patch: initial support for ePUB
     new e7f48bf  TIKA-301: patch: embedded ODF and office:annotation
     new e2049a1  TIKA-312: TikaCLI can't print metadata
     new 5ab52bf  TIKA-300: rename openoffice.. parser classes to odf..
     new 7a3b506  TIKA-314: Initial support for JPEG EXIF metadata extraction
     new bda2bb7  TIKA-314: Initial support for JPEG EXIF metadata extraction
     new f11860e  TIKA-209: Language detection is weak.
     new 3476a72  TIKA-209: Language detection is weak.
     new 7a6089c  TIKA-209: Language detection is weak
     new ef1cd4d  Add change log entries for TIKA-209 and TIKA-275
     new a9e8732  TIKA-269: Ease of use -facade for Tika
     new 6c2d654  TIKA-320: Allow disabling language detection in AutoDetectParser
     new 084dcb8  TIKA-320: Allow disabling language detection in AutoDetectParser
     new 958c208  TIKA-209: Language detection is weak.
     new 9458719  TIKA-319: HtmlParser - use encoding hint only if charset is supported
     new dd1ddf9  TIKA-313: patch: ODF improvements for svg:desc, presentation notes
     new 39469ba  - fix for TIKA-309: Mime type application/rdf+xml not correctly detected
     new ad11aac  TIKA-275: Parse context
     new 995d275  - remove duplicate glob: TIKA-309
     new e5b3736  - increasing the offset to 4k bytes for an appearing <html tag seems to have fixed the unstable build issue introduced by TIKA-309
     new bc54bef  RE: TIKA-309, yes I can't count (4*1024 = 4096).
     new 22c4ea3  - prep for release
     new 2fbe85b  - change back to SNAPSHOT: mvn release:prepare will take care of this
     new 330bd83  test of command line commit (needed by mvn release:prepare)
     new 6f0312f  [maven-release-plugin] prepare release 0.5
     new 62a539e  [maven-release-plugin] prepare for next development iteration
     new d56ccaa  - make CHANGES.txt "release"-ified
     new 4b815bc  - undo the m2 release plugin's magic
     new 6e420b2  [maven-release-plugin] prepare release 0.5
     new 3c28408  [maven-release-plugin] prepare for next development iteration
     new 98a96da  TIKA-309: Mime type application/rdf+xml not correctly detected
     new 246ab61  TIKA-309: Mime type application/rdf+xml not correctly detected
     new 5d8b457  TIKA-321: Optimize type detection speed
     new b93c5d7  TIKA-321: Optimize type detection speed
     new efdda0a  TIKA-321: Optimize type detection speed
     new b4405b7  TIKA-326: Map javax.imageio.IIOException to TikaException
     new 3ee9be7  TIKA-321: Optimize type detection speed
     new ac74696  TIKA-324: Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
     new 0118770  TIKA-325: tika-parent/pom.xml missing <inceptionYear>2007</inceptionYear>
     new 7d5d6c7  TIKA-321: Optimize type detection speed
     new 95163d2  - update for current development
     new c47d81b  TIKA-330: Better HWP (Hangul Word Processor) detection pattern
     new 59d94cb  TIKA-321: Optimize type detection speed
     new 51c6242  - fix for TIKA-336 More issues with RDF mime detection
     new 8cbebbf  TIKA-321: Optimize type detection speed
     new 6886317  TIKA-321: Optimize type detection speed
     new c1f3579  TIKA-334: HtmlParser should use CharsetDetector whenever no charset is specified via meta http-equiv tag
     new 61d64d2  TIKA-329: secure-processing not supported by some JAXP implementations (2)
     new ce27fbd  TIKA-340: Provide full Tika bundle
     new 6a8a48c  TIKA-340: Provide full Tika bundle
     new 496afa9  TIKA-340: Provide full Tika bundle
     new 43e945b  TIKA-340: Provide full Tika bundle
     new 4545895  TIKA-332: Use http-equiv meta tag charset info when processing HTML documents
     new dbc2ef4  TIKA-334: HtmlParser should use CharsetDetector whenever no charset is specified via meta http-equiv tag
     new afa3c1f  TIKA-335: TXTParser should use incoming charset
     new 2354fe6  TIKA-341: Use charset in CONTENT_TYPE metadata when detecting the character encoding
     new 533dd67  Add a change log entry about the character encoding improvements.
     new 0f10bd2  TIKA-345:  Add application/vnd.wap.xhtml+xml to list of mimetypes handled by HtmlParser
     new 8bd0ccb  TIKA-347: Make HtmlParser customizable through ParseContext
     new 2f0fa3b  TIKA-343: some parsers produces glued words
     new 7ed838c  TIKA-343: some parsers produces glued words
     new ebe7995  TIKA-343: some parsers produces glued words
     new 5df8f87  TIKA-343: some parsers produces glued words
     new d569761  TIKA-343: some parsers produces glued words
     new 16db668  TIKA-339: HtmlParser & TXTParser should not use language returned by CharsetDetector if language hint has been provided
     new 43c83e7  TIKA-328: Add parser for .flv videos
     new e3c68b4  TIKA-328: Add parser for .flv videos
     new 1ea1cf0  TIKA-342: Improve OSGi bundling
     new bf856a4  TIKA-125: Pass Locale information to parsers
     new cacb6b8  TIKA-282: RTF parser expects a GUI environment
     new cee95ed  TIKA-349: HtmlParser's http-equiv code needs to be more flexible
     new 3f27155  TIKA-350: HtmlParser's content-type handling code needs to be more flexible
     new 8d09544  TIKA-351: MediaType.parse should be more forgiving of broken input
     new 0678c44  TIKA-352: Use MediaType.parse when extracting charset from content-type metadata in parsers
     new 9f406c3  TIKA-352: Use MediaType.parse when extracting charset from content-type metadata in parsers
     new 4c09850  TIKA-352: Use MediaType.parse when extracting charset from content-type metadata in parsers
     new dfb6447  TIKA-353: Upgrade to POI 3.6
     new 0b6a7cd  Update change log, minor readme improvement
     new 7bb3530  TIKA-347: Make HtmlParser customizable through ParseContext
     new 312ec4a  Added my info to project team in pom.xml
     new 895590e  TIKA-103: Addition of POI supported number/date formatting handling within ExcelParser
     new e543c5b  TIKA-103: Addition of POI supported number/date formatting handling within XSSFExcelExtractorDecorator
     new 5bffa84  TIKA-103: Corrected XSSFExcelExtractorDecorator to use document style table.
     new a5e9584  TIKA-103: Corrected XSSFExcelExtractorDecorator to use correct style index.
     new e207c87  TIKA-103: Updated CHANGES.txt with details of new features.
     new 7ece6ff  - fix for TIKA-327
     new 8d76fd9  - fix for TIKA-366 Increase buffer size for mime type sniffing
     new 28327a8  - fix for TIKA-367 Mime type rootXML equality improvement
     new cecaa12  - fix for TIKA-357: Increase buffer size for meta tag sniffing. Patch contributed by Ken Krugler.
     new 6f85dd5  - prep for release
     new f20cdb9  - include contributors (always forget to do this on the first try!)
     new 72f2665  [maven-release-plugin] prepare release 0.6
     new dfcc01d  [maven-release-plugin] prepare for next development iteration
     new 2cd366d  - bump CHANGES.txt
     new 093a62b  TIKA-317: Excel formatting depends on the default locale
     new 8602783  TIKA-368: ID3v2 support for mp3 parser
     new fa286ee  TIKA-365: Extract more OpenDocument metadata
     new 4a8ca9c  TIKA-362: Add publisher support
     new 9544c3a  TIKA-364: [PATCH] Metadata mark for xlsx documents with protected sheets
     new 7c51e89  TIKA-372: Channel and SampleRate information for MP3 files
     new 96a15bc  TIKA-141: Mime Content Type detection of a web document from its URL.
     new 4437342  TIKA-141: Mime Content Type detection of a web document from its URL.
     new 57cfc63  TIKA-374: AutoDetectParser not thread-safe
     new 0fa888e  TIKA-199: Improved audio detection and parsing
     new 3ad79a6  TIKA-375: Improve code quality metrics
     new 4ae3021  TIKA-278: Move Tika site sources outside trunk
     new 530149b  TIKA-278: Move Tika site sources outside trunk
     new a4013fa  - fix for TIKA-376 Typo in parse-rtf spec in tika-config.xml
     new 6a7f39d  TIKA-377: Error parsing HTML partial with AutoDetect parser
     new bfc53af  TIKA-377: Error parsing HTML partial with AutoDetect parser
     new 70bfbe8  TIKA-380: Upgrade to PDFBox 1.0.0
     new 101a5c3  TIKA-370: Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
     new f721fac  TIKA-370: Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
     new 33171a5  TIKA-317: Annotation-based Tika configuration
     new 3c807da  TIKA-317: Annotation-based Tika configuration
     new c9d44db  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 090001d  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new e64cad2  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 1cad9b3  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 3caba57  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 6e4a971  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 6ef60c2  TIKA-378: TikaConfig should notify users if it cannot initialize some parser
     new 80c39e9  TIKA-317: Service provider -based Tika configuration
     new fad2211  TIKA-382: No textextraction in tika-app
     new 9a049b4  TIKA-386: Tika relies on X11
     new 567841a  - add Ken to the committers list
     new 55fb7c9  TIKA-388: Don't trust streams that claim mark support
     new 4400e3a  TIKA-261: Ability to limit the amount of extracted text
     new 9c7c2c2  TIKA-282: RTF parser expects a GUI environment
     new 29ab691  TIKA-282: RTF parser expects a GUI environment
     new bbd3cd3  TIKA-392: RTF parser smashes words together in subsequent table cells
     new 939fa08  Updated KEYS to include my new RSA code-signing key
     new cca6e63  TIKA-395: Update to allow OutlookParser to support new format Outlook messages.
     new f60ce11  TIKA-393: Upgrade to PDFBOX 1.1.0
     new f66c262  - prep for release 0.7
     new 8de4115  - consistency with sentence endings
     new aa01b10  [maven-release-plugin] prepare release 0.7
     new 27dc9b5  [maven-release-plugin] prepare for next development iteration
     new c7b924c  - prep for 0.8 development - incorporate comment from gsingers RE: mentioning the major versions of libraries used in tika-parser
     new 229b78f  - patch for TIKA-398 TestParsers fails when classpathh contains special characters like spaces (Uwe Schindler via mattmann)
     new 14a229e  - basic support for netCDF parsing, as specified in TIKA-400 netCDF Tika Parser. Can extend more later, but enough support right now to commit. Includes basic unit tests.
     new d06aed2  TIKA-396: Parse Attachement included within Outlook Message.
     new 648b5bf  TIKA-404: Media-type handling depends on the locale
     new a918dd6  TIKA-404: Media-type handling depends on the locale
     new 550a2dc  TIKA-92: Image metadata extraction
     new 6464286  TIKA-92: Image metadata extraction
     new afc0523  TIKA-396: Parser Attachements from Outlook Messages
     new 9f10da2  TIKA-379: Html elements and attributes not available in XHTML representation
     new ff494ed  TIKA-403: Refactor log library usage in tika-parsers
     new baafe0e  TIKA-403: Refactor log library usage in tika-parsers
     new 1f3c0a6  Use spaces instead of tabs for indentation
     new e8c71a7  TIKA-403: Refactor log library usage in tika-parsers
     new c25430c  TIKA-400: netCDF Tika Parser
     new 5ec7c51  TIKA-409: Missing poi-ooxml-schemas-3.6.jar in tika-bundle
     new bdf52f7  TIKA-153: Allow passing of files or memory buffers to parsers
     new 8658485  TIKA-412: Exclude the xml-apis dependency
     new f962509  TIKA-153: Allow passing of files or memory buffers to parsers
     new cecdcea  TIKA-153: Allow passing of files or memory buffers to parsers
     new 3af35a9  TIKA-153: Allow passing of files or memory buffers to parsers
     new 571fc14  TIKA-153: Allow passing of files or memory buffers to parsers
     new 9d31098  TIKA-404: Media-type handling depends on the locale
     new 7736aaa  TIKA-153: Allow passing of files or memory buffers to parsers
     new af2aabb  TIKA-298: CompositeParser.getParser() should use mimetype hierarchy when falling back
     new 807b024  TIKA-298: CompositeParser.getParser() should use mimetype hierarchy when falling back
     new 8a5cfcb  TIKA-89: Rename MimeType and MimeTypes
     new ccd0340  TIKA-89: Rename MimeType and MimeTypes
     new c760e5d  TIKA-419: Allow parser lookup from a custom class loader
     new ae52cb8  tika now a tlp, moved svn
     new 6f758f8  TIKA-415: Findbugs: XHTMLDowngradeHandler equals() comparing different types
     new c0a9227  TIKA-417: Unable to parse the content for UCS2 Litte Endian encoded file
     new a576fb3  TIKA-153: Allow passing of files or memory buffers to parsers
     new 64ed199  TIKA-402: Support for Keynote and Pages documents
     new 86735b2  TIKA-416: Out-of-process text extraction
     new 2135910  TIKA-400: netCDF Tika Parser
     new a2d2c1a  - fix for TIKA-432 Include NOTICE and LICENSE file updates for NCAR NetCDF parser lib
     new 67748e3  TIKA-416: Out-of-process text extraction
     new b122281  TIKA-425: Exception parsing mp3
     new b2b3685  TIKA-418: RuntimeException while getting content for ppsx, ppsm, pptm, thmx and xps file types
     new 890bcb5  TIKA-424: Avoid ArrayIndexOutOfBoundsException on some mp3 files
     new f1f45a7  TIKA-413: DWG Parser
     new 19439b7  TIKA-413: DWG Parser
     new d4dd400  TIKA-413: DWG Parser
     new 8451402  TIKA-402: Support for Keynote and Pages documents
     new dd6330f  - fix for TIKA-379 Html elements and attributes not available in XHTML representation
     new f4c4123  - consistency, Chris, consistency.
     new eb7aed1  TIKA-402: Support for Keynote and Pages documents
     new a2d3c00  TIKA-402: Support for Keynote and Pages documents
     new a4a6c02  TIKA-402: Support for Keynote and Pages documents
     new d533a2d  TIKA-402: Support for iWork documents
     new d0f703f  TIKA-269: Ease of use -facade for Tika
     new d38860c  - test SVN auth
     new 9f5ae40  - revert SVN auth test
     new ead95c8  TIKA-95: Pluggable magic header detectors
     new 97549f6  TIKA-441: Sometimes, tika not working (crashed) because of null classloader
     new 2846c74  TIKA-440: [Patch] Fetch the composer information in the MP3 Parser
     new c1fe863  TIKA-439: DWGParser (and some others) not used by AutoDetectParser
     new 8d14ba6  TIKA-308: Improve supertype handling in type registry
     new 0bda933  TIKA-308: Improve supertype handling in type registry
     new 72cc380  TIKA-308: Improve supertype handling in type registry
     new 414c592  TIKA-298: CompositeParser.getParser() should use mimetype hierarchy when falling back
     new f809f45  TIKA-89: Rename MimeType and MimeTypes
     new c705865  TIKA-89: Rename MimeType and MimeTypes
     new 157fcc9  TIKA-308: Improve supertype handling in type registry
     new 2b73f6b  TIKA-442: Image extractors use inconsistent metadata keys and formats for common features
     new 11cb267  - fix for TIKA-444 Tika sites refers to incorrect svn repo URL
     new af37f82  Add myself to the committers list, and remove Ken Krugler's duplicate entry
     new ed8497d  Upgrade to POI 3.7 beta 1 (TIKA-373) Includes patch from TIKA-361 to update the Outlook parser to match the new HSMF API + update to the MBox parser to capture equivalent metadata
     new 85a8a76  Apply patch from Maxim Valyanskiy from TIKA-437 - support encrypted OOXML office files which use the default password.
     new c02b152  Use the new TIFF Metadata entries for image width/length/sampling from the TIFF, JPEG and general Image (ImageIO) parsers. Gives a small number of consistent image related metadata entries across all formats. (TIKA-442)
     new 1197065  Apply Jukka's patch from TIKA-371 - now we're on POI 3.7 beta 1, do the locale handling in unit tests better
     new 189557f  MP3 Lyrics text extraction support Updates the MP3 parser to detect a LyricsV3 block before the ID3v1 tags block. If found, the lyrics text will be captured and output.
     new b471e61  Unit test to show that we support pptx, pptm, ppsx and ppsm (TIKA-418) .thmx will need a POI upgrade, but the file format lacks any text! .xps is still unsupported by POI
     new 9677c50  Add geographic metadata namespace (TIKA-445)
     new 59d2e57  Enable extraction of longitude and latitude from JPEG/Tiff files (via the EXIF tags), and HTML (via the ICBM meta tag), to the new geographic metadata namespace
     new b8223f9  TIKA-371: Excel formatting depends on the default locale
     new a4322a1  TIKA-446: Upgrade to PDFBox 1.2.0
     new 423eabd  Fix for TIKA-449 (Update parsers to extract geographic metadata) from Jukka to ensure that the lats and longs are correctly formatted in all locales
     new 61751d2  TIKA-452 - Extract custom pdf metadata
     new ccf1216  Test for TIKA-452 - Extract custom pdf metadata
     new 2f78035  TIKA-442 follow-on - map another Exif/JPEG tag (comment) onto a standard tika metadata key
     new 3cebb67  TIKA-454 Illegal Charset Name crashes HTMLParser
     new 007f2d7  TIKA-402: Support for iWork documents
     new 76afce1  TIKA-375: Improve code quality metrics
     new 035dab3  TIKA-292: PDFBox is too verbose
     new dce07df  TIKA-292: PDFBox is too verbose
     new 66882d3  TIKA-402: Support for iWork documents
     new a87d3e6  TIKA-402: Support for iWork documents
     new ca465e3  TIKA-402: Support for iWork documents
     new 9ba3b58  TIKA-459: Improve handling for invalid charset names.
     new ed34e70  TIKA-453: Fix Estonian language identifier.
     new 7bee75a  TIKA-446: Upgrade to PDFBox 1.2.1
     new 74a432d  TIKA-420: Integration of Boilerpipe.
     new 4658fd7  TIKA-451 - Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED Make CREATION_DATE and LAST_MODIFIED Date property instances, and add support for getting and setting Dates (+getting ints), as discussed in TIKA-451 Unit tests for getting and settings ints and dates are included. Work to update the existing parsers to make use of the new Date setter is still outstanding
     new 18f82c7  When building ISO8601 dates, ensure we're always working in UTC (for TIKA-451)
     new 11bbc01  Update parsers to fix problems with new style Date properties, for TIKA-451
     new 6cd30a3  Update parsers to fix problems with new style Date properties, for TIKA-451 (file was missed from last commit)
     new 8497182  The bundle needs to include boilerplate, as it's a required dependency
     new ad22596  Accept a wider range of ISO8601 date formats when turning a Property from a String into a Date, for parsers which do set(Property,String) - for TIKA-451
     new efdc32a  - update index doc to include 0.7 skeleton
     new ab5353a  - more skeleton
     new 16fe44c  - update for Tika 0.8
     new 28eafc8  - fix for TIKA-464 Contribute a "get Tika parsing up and running in 5 minutes" quick start guide
     new 85aa4d2  - fix for TIKA-466 Feed Parser contributed by jnioche
     new a5aff98  Add the new rome dependency to the bundle (TIKA-466)
     new 5da4b6a  TIKA-470 - New tika-app option to list the supported parsers, and their mime types, via options of --list-parsers and --list-parser-details
     new 3ca5da9  TIKA-447 - Container aware mimetype detection Initial implementation of container aware detection. New ContainerAwareDetector class, which is a Detector, which will open and handle OLE2 and Zip files to detect the mimetype, falling back on a specified default detector for non-container formats. Some work remains - Not all Zip file based things are detected yet, and the Zip based parsers don't yet take advantage of the already open zip stream. (OLE2 ones can)
     new 5fcd987  Add Office Open XML (OOXML) support to the Zip container aware detector (TIKA-447) If an OOXML zip file entry is found, passes this to POI and fetches the content type through that. Also updates the OOXML text extractor to take advantage of the open package if detection was already done.
     new 852dbed  Container aware detection for Jars, and add stub TODOs for iWork files (TIKA-447)
     new bc5f04e  Fix 1.6ism in recent TIKA-447 commit
     new 21b54bf  Slightly improve OLE2 file type matches, for cases where the OLE2 properties stream is in one of the first couple of blocks in the file. Add a note about using the ContainerAwareDetector for better results. (TIKA-447)
     new 6f48f57  Make mime type detection a little bit more stable (TIKA-391) Make the comparison operator work better on Magic types, and ensure that the type is present on the magic to help debugging and sorting. Also add tests to show that we can detect the same file multiple times, and get the same answer each time.
     new 66077e1  Apply patch from TIKA-472 - Extract JPEG title, description and author Also fix a few indents to follow tika standard of space not tab
     new f49c4bd  Excel parsing improvements for files with charts (TIKA-214) Support chart based sheets, outputting chart labels, not over-writing sheet entries with chart ones, and outputting extra sheet text inside the sheet but outside the table. Also adds unit test based on file from TIKA-214, along with a few toString() methods to aid with debugging
     new 9fe414c  TIKA-358: Auto-detection of HTML fails with common auto-generated template
     new 3ab1096  Don't break on MP3 files where the ID3v2.4 tags are broken, and lie about their size (TIKA-424) (The unit test for this will not normally be run, unless you explicitly download the sample file, as we can't re-distribute it as part of Tika)
     new 871bea9  The id3v2.4 spec doc has a bug - the layout section says 4*size to bytes, but the description is just size=bytes. Switch to the latter, which is what the other programs use, and add a unit test based on a mid3v2 generated file. (TIKA-424)
     new c0527e0  TIKA-447: Container aware mimetype detection
     new 955611f  TIKA-473: Prepare Tika site for svnpubsub
     new 1b3de63  TIKA-474 - Do what we can with MP3 files where the ID3 header is truncated
     new 8fa36e1  - fix for TIKA-476 Add page count to metadata
     new a1cd688  TIKA-89: Rename MimeType and MimeTypes
     new c730e0d  TIKA-475: MBoxParser class (inside tika-parsers-0.7-jdk14.jar) calls method getSimpleName from java.lang.Class (jdk1.5)
     new 458753c  TIKA-476: Add page count to metadata
     new 40de6ed  TIKA-476: Add page count to metadata
     new f9e1ddd  TIKA-468: Missing Silde-Count metadata for PPT files
     new 621950e  TIKA-477: Add GUI support for Boilerpipe, and improve output from Boilerpipe content handler.
     new f8a3d0b  TIKA-477: Also commit fix for BP test
     new 486281b  TIKA-478: Fix handling of <head> elements in HTML parser, and improve robustness of XHTMLContentHandler.
     new a1c0503  TIKA-478: Fix up missing end </body> and </html> tags for document with no real content.
     new 31345be  TIKA-463: emit <img> tags with resolved URLs for src attribute.
     new 163d93c  TIKA-457: Fix frameset handling (both general, and for broken HTML)
     new 2a21313  - fix for TIKA-479 Post link to Tika in Action on Tika website
     new 28410f6  - docs for TIKA-447 should be part of trunk site build too, so that when 0.8 (and beyond) docs go up, detection is included.
     new 52664ff  TIKA-460 : A elements never reached with IdentityMapper
     new e347b98  TIKA-463 & TIKA-457: Fixed issue with emitting <meta> elements that had null content values (these are valid in Metadata, but not <meta>.
     new fed8c46  TIKA-480: Don't emit empty attributes, and include full set of standard HTML elements around <p>...</p> output.
     new 4f6a589  TIKA-481: Resolve href in <link> element.
     new ed7a1d1  TIKA-463: Get all URLs (resolved) from XHTML-valid elements.
     new 98eb0b0  TIKA-483: Empty file detected as text/plain
     new 2e6dc43  TIKA-401: Tika hangs on corrupt zip files
     new d32c1d8  TIKA-495: Metadata constructor is slow
     new fc3a086  TIKA-487 - ContainerAwareDetector fallback support for truncated zip files
     new 97cd04c  When building tika-app, embed de.* classes too, along with org and com, as boilerpipe uses that namespace (TIKA-420)
     new 31452f9  - fix for TIKA-498 HTML parser fails on turkish locale
     new 05acb51  TIKA-501: Remove ICU-based language detection from plain text parser (TXTParser)
     new f72c1fb  - fix for TIKA-488 Add alternative search provider on site
     new f878d8e  Add missing svn:eol-style settings
     new 61f707f  TIKA-503: Add a ContentHandler for collecting links from parser output
     new c1f7fd7  Add missing svn:eol-style settings
     new bc3fd07  Add missing svn:eol-style settings
     new 9a2bef7  Apply Staffan Olsson's patch from TIKA-482 (with a few tweaks), which improves how EXIF metadata is processed from TIFF and JPEG files, and moves more of the Date properties to be real ISO8601 dates internally.
     new 729bca5  Add several more common EXIF tags to the TIFF metadata namespace, and have the EXIF parser also output property-typed tags for these (TIKA-504)
     new 806ccd1  Add TIFF/Exif Flash property and support (TIKA-504)
     new 86f9f62  TIKA-416: Out-of-process text extraction
     new 7012400  Add basic extension based detection of corel formats, and works, along with the submitted sample files (TIKA-486)
     new 4bf74e9  Add support for to the ContainerAwareDetector for Corel OLE2 formats, and Microsoft Works (TIKA-486) Also slightly refactor the child container detectors, so we can do common fallback logic when the container detector can't figure it out
     new e33c7d5  Apply (with slight tweaks) Antoni Mylka's container aware detector patch for truncated OLE2 documents - TIKA-485
     new acc172c  TIKA-153: Allow passing of files or memory buffers to parsers
     new 51e72bb  TIKA-153: Allow passing of files or memory buffers to parsers
     new c0b380d  TIKA-153: Allow passing of files or memory buffers to parsers
     new 30d9660  Add various test office files which have images and other office files embeded in them. (Will be used for unit tests for TIKA-509)
     new d2877b7  Initial work on Container Extractors (TIKA-509) Basics of the interfaces and key classes are included, along with a partial POIFS extractor implementation.
     new d98bc51  TIKA-507: Parser for font files
     new a20e0c1  Add missing svn:eol-style setting. Looks like I need to fix the auto-prop configuration on my laptop...
     new 8dd9c86  Refactor how container extraction works - Jukka's patch from TIKA-509 Replace the AutoContainerExtractor with ParserContainerExtractor, and push more of the work to the Parsers
     new c3859d2  Support for container extraction of Images in .xls, and OOXML files embeded in OLE2 documents (TIKA-509) Also rename ContainerEmbededResourceHandler to EmbededResourceHandler as suggested by Jukka, fix ParserContainerExtractor recursion, and remove ContainerExtractor from TikaConfig now we have ParserContainerExtractor.
     new c6505ef  Add support for extracting images embeded in Word .doc files - TIKA-509
     new 14be8fe  Have the ooxml container aware detector use the file, not the input stream, as it's more efficient (TIKA-447)
     new 0c40fd5  We don't need to wrap our stream in a BufferedInputStream for mark/reset to work if it is already one (identified in TIKA-509 work)
     new ca80cad  OOXML support for embedded resource extraction for .docx and .xlsx. (TIKA-509) Also fix a spelling mistake in a class name + comments
     new 6d9cf87  TIKA-509: Container contents extraction
     new 2d822bc  Make the emf/wmf mimetypes returned for the OLE2 office files match that stored in the OOXML files, as well as refactoring the container tests to reduce duplication (TIKA-509)
     new 274acac  TIKA-509: Container contents extraction
     new 90f60e3  - fix for TIKA-512 Print the supported Metadata models and their associated met keys in tika-app
     new cdd1aa5  More Office embedded resource extraction support (TIKA-509) Existing outlook code has been updated to the new style, and tests added XSLF .pptx support has been added with tests POI version bumped from 3.7 beta 1 to 3.7 beta 2, as required for better outlook attachement support
     new 3864c2e  Container extraction tests for package based parsers (TIKA-509)
     new 3620143  Tidy up OOXML unit tests by removing TODOs, and make the sample word document contain a bit more so we can later improve the unit test (TIKA-506)
     new af99a31  TIKA-514: Provide constructor for AutoDetectParser that has explicit list of supported parsers
     new bb7ab51  Enable word6 / word95 support via the new POI Word6Extractor class (TIKA-408)
     new 01a1067  Move the AutoDetectParser tests from TIKA-514 into the existing AutoDetectParserTest class
     new 488e132  Make our sample .doc file more complex, to match the sample .docx, so our tests for TIKA-506 will have more to work on
     new d0cd679  Apply patch from TIKA-506 - Improve the html generated from .doc and .docx to include more things This patch includes an upgrade to POI 3.7 beta 3 For .docx, we now return headers where appropriate, tables, hyperlinks, non standard styles as classes, and images in the correct place For .doc, we also do headers, hyperlinks and non standard styles. Tables only work for 1st level ones, nested tables just come out as paragraphs for now (Lists are not yet supported in either  [...]
     new 1bd3087  Add missing table close tags for .doc (TIKA-506)
     new c442efe  From suggestion in TIKA-506, make Word paragraphs formatted in the style of "HTML Preformatted" use pre tags
     new f9495f1  When processing .doc files, handle matching the embedded images to the character runs better. Somewhat copes with \u0001 real images vs \u0008 floating escher images, and is as good as we can probably get given the current Microsoft docs... Avoids NullPointers though! (TIKA-506)
     new 2d396c8  Not all XWPFParagraphs have the root document, so check for this to avoid a NPE (TIKA-506)
     new 13bd4cd  TIKA-519 - Display embedded images in the GUI Formatted Text pane where they occur in the document. Applies updated patch from TIKA-519 as discussed
     new 0abbfee  TIKA-520 - Apply patch from Sjoerd Smeets for DWG files that lack a header, which avoids ArrayIndexOutOfBoundException
     new c6ab1a9  TIKA-383: new option for TIKA CLI to get only the languages of a document
     new 1eee081  TIKA-383: new option for TIKA CLI to get only the languages of a document
     new 29972fd  TIKA-426: Parsing javascript as XML
     new fbc7bac  TIKA-426: Parsing javascript as XML
     new ba18170  TIKA-411: Generate list of supported and detected types automatically
     new dd178c1  TIKA-411: Generate list of supported and detected types automatically
     new 6d7804f  TIKA-527: Allow override mapping mime<-->parsers through config
     new 36cc24e  Remove extra System.out prints from a test case, clean whitespace
     new d4e7bfd  TIKA-528: Reuse TagSoup HtmlSchema instance across HtmlParsers (performance improvement)
     new 1a564ef  TIKA-533: Mis-detection of zip-within-zip as application/vnd.apple.iwork, with no output by CLI app
     new d60651e  Add iWork support to the Container Aware Detector (TIKA-533) It's a bit icky for now, but it works and it's quick...
     new befb3db  Add --container-aware-detector option to the Tika CLI, which will switch the detector used by the auto parser
     new 3e08e27  TIKA-535: Implement Apache project branding requirements
     new 8a5f288  TIKA-394: Missing spaces on html parsing
     new 1c04a00  TIKA-503: Add a ContentHandler for collecting links from parser output
     new 024bcc4  TIKA-503: Add a ContentHandler for collecting links from parser output
     new e947211  - progress towards TIKA-407 Push NetCDF4 lib dependency to Maven Central and Update Tika POM: upgrade tika-parsers to depend on eventual Maven Central group/artifactId. Also temporarily change M2-forge to Sonatype OSS (will remove when Central sync is loaded)
     new aacb3b4  - progress towards TIKA-407 Push NetCDF4 lib dependency to Maven Central and Update Tika POM: netcdf is now available from Maven Central: see http://repo1.maven.org/maven2/edu/ucar/netcdf/4.2/
     new c496ffa  - fix for TIKA-515 MimeType.getDescription() often returns nothing when "tika-mimetypes.xml" has a useful description already available.
     new b3c8dc8  - fix for TIKA-399 HDF4/5 Tika Parser
     new 7153f6b  - fix for TIKA-399 HDF4/5 Tika Parser
     new 92188c4  TIKA-446: Upgrade to PDFBox 1.3.1
     new f3f1f15  - fix for TIKA-399 HDF4/5 Tika Parser
     new 83a6efa  - fix for TIKA-490 Support for adding language profiles dynamically
     new 195f1d6  - suggestion by jukkaz: make sure we're not using JDK5, before we run the NetCDF and HDF tests
     new ab41dbd  - fix for TIKA-399 HDF4/5 Tika Parser: add he5 file extension for application/x-hdf
     new 4b5ec9a  - don't put another application/ in front of the existing application/x-hdf and application/x-netcdf
     new 3cbfe69  - update export dependency; artifact now named netcdf
     new 5a02577  TIKA-373: Upgrade to POI 3.7
     new 2cdb437  - add NetCDF classes to tika-app bundle: TIKA-400
     new d4a4014  TIKA-462: Get Boilerpipe into Maven.
     new 8ee313a  Fix build problem caused by removing the java.net Maven repository.
     new 2326440  XWPFWordExtractorDecorator: extract text from footnotes
     new e35b48c  TIKA-462: Remove java.net repository from parser pom
     new 6789916  TIKA-543: Remove rome 1.0 dependency on java.net repository
     new 4d5eb38  - fix for TIKA-537 Command line option --list-parsers should list 2nd level parsers below CompositeParsers
     new 86e32d9  - fix for TIKA-523 Add application/ms-tnef as alias to application/vnd.ms-tnef
     new 4162253  - prep for 0.8 RC
     new 26e79a5  [maven-release-plugin] prepare release 0.8
     new 191a3f5  [maven-release-plugin] prepare for next development iteration
     new 7e2de56  TIKA-510: Use POI usermodel API for text extraction from XSLF shape
     new 453ae26  TIKA-511: NPE when POI is configured to prefer event extractors
     new 3bc4231  add unit-test on parsing write-protected xlsx
     new 7cb1b4a  PackageExtractor: javadoc fix
     new 31e60d9  Improved extraction of EXIF and IPTC metadata from JPEG and TIFF Images (TIKA-482) (Applys patch from Staffan Olsson from TIKA-482)
     new cfe575a  Missing new directory from previous TIKA-482 patch commit
     new 1585135  Extract interface for EmbeddedDocumentExtractor
     new 67695eb  Extract embedded Ole10Native files from POIFS
     new 7413cfd  TIKA-549: support for extracting OLE-shapes from PPT
     new bbd3acf  TIKA-549: support for extracting OLE-shapes from PPT
     new 9dbebc4  TIKA-550 - Add stable filenames for extracted embedded files from Office binaries
     new 21c35f7  TIKA-552 - Handle word styles like "heading 4" just like "Heading 4", and in .docx files insert bookmarks as anchor tags, along with relative hyperlinks for the text that references them. (Updates the .doc test file to include bookmarks, but there's no .doc handling of them yet)
     new 3dd1211  TIKA-553: Automatic license header checks
     new 45576d7  - get ready for next dev cycle
     new 3d244de  - add in detail link on contributions for 0.8
     new 98cf861  TikaInputStream: do not wrap ByteArrayInputStream/BufferedInputStream in BufferedInputStream
     new e54acb4  OOXMLExtractor: use EmbeddedDocumentExtractor
     new 7c8ac0b  XSLFPowerPointExtractorDecorator: imports cleanup
     new 8e58142  TIKA-548: PDF content extracted as single line
     new 4939ac6  - progress towards TIKA-556 Problems with the NetCDF jar: update to NetCDF 4.2-min jar, but include temporary repository definition before sync to Central. Once available in central, will remove tika-parent/pom.xml mod.
     new cbdd84e  - progress towards TIKA-556 Problems with the NetCDF jar: and voila, the netcdf-4.2.-min jar is in Central and we're set!
     new 00afd53  If we hit the write limit, give a helpful error message in case you hadn't been expecting it (TIKA-557)
     new eb949a4  When detecting macro enabled OOXML files, return the same format media type as in mimetypes.xml. Adds unit tests for a few of these. (TIKA-560)
     new f409f28  New test files from TIKA-560
     new 96a8b1d  Apply mimetype updates from TIKA-560
     new 1573be6  TIKA-564: Support returning original markup in BoilerpipeContentHandler
     new 2a92008  TIKA-560: Improve detection of .mht, Foxmail, and OOXML files
     new dac8ec8  TIKA-562: In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
     new d82bca5  TIKA-563: .vor files are Staroffice Templates, not Staroffice Writer documents
     new 98c2833  TIKA-555: image/bmp mime type does not exist
     new 29ef6fd  TIKA-555: image/bmp mime type does not exist
     new adf4620  TIKA-461: RFC822 messages not parsed
     new 070b583  TIKA-555: image/bmp mime type does not exist
     new 9e9a9d9  TIKA-461: RFC822 messages not parsed
     new a7d6cf4  TIKA-461: RFC822 messages not parsed
     new 1613373  TIKA-565: Improved OSGi bundling
     new c58d8ce  TIKA-565: Improved OSGi bundling
     new 0349400  TIKA-565: Improved OSGi bundling
     new 5af3281  TIKA-565: Improved OSGi bundling
     new 6059044  TIKA-461: RFC822 messages not parsed
     new f0033c0  TIKA-566: Better convenience methods for type detection
     new edd122d  TIKA-548: PDF content extracted as single line
     new bf42aac  TIKA-567: Temporary file leak in TikaInputStream
     new 0df83fb  Add missing svn:eol-style
     new 12622b4  TIKA-447: Container aware mimetype detection
     new 0fa3727  TIKA-447: Container aware mimetype detection
     new 201dd27  TIKA-447: Container aware mimetype detection
     new d20903f  Add a PDF file that is protected with the default (Empty) password. Taken from PDFBOX-858 but for work on TIKA-389
     new 5451e28  Fix TIKA-389 - If the PDF is protected (aka encrypted), then always try to decrypt it. Otherwise, we can end up with garbage metadata. Includes a unit test that shows we now get the metadata correctly.
     new d5b8fde  TIKA-569: More fault-tolerant loading of parsers and detectors
     new 5f1e9d0  TIKA-569: More fault-tolerant loading of parsers and detectors
     new a78e298  Apply patch from TIKA-570 from Benson Margulies - stricter BMP detection and unit test
     new 05e6069  TikaCLI: add attachement extraction option
     new ded569d  TIKA-573: add MimeType.getExtension(). Extensions are taken from filename patterns
     new 9f5593e  TIKA-574: Support for IBM866 (CP866) encoding in TXTParser Submitted by: Kostya Gribov, grossws at gmail.com
     new ea4b6e4  tika-parent/pom.xml: added myself to commiters list
     new 02f945f  TIKA-574: add missing test file
     new 841ede4  fixed compilation in Java 1.5.0
     new 42458b1  TIKA-569: More fault-tolerant loading of parsers and detectors
     new 6b51782  TIKA-416: Out-of-process text extraction
     new 42eecdb  TIKA-416: Out-of-process text extraction
     new 2680077  TIKA-416: Out-of-process text extraction
     new 1d0ef6b  TIKA-416: Out-of-process text extraction
     new ab0921a  TIKA-416: Out-of-process text extraction
     new 1812a86  TIKA-416: Out-of-process text extraction
     new cb6b98e  TIKA-416: Out-of-process text extraction
     new fa6ee73  Add test access mdb file from TIKA-586
     new c187e0d  Add test true type font file from TIKA-586
     new 34bb845  Access mdb detection and test from Martijn in TIKA-586
     new bb31716  TIKA-416: Out-of-process text extraction
     new ba8a969  TIKA-416: Out-of-process text extraction
     new bdc9e61  TIKA-416: Out-of-process text extraction
     new dba5043  TIKA-567: Temporary file leak in TikaInputStream
     new ff74808  TIKA-567: Temporary file leak in TikaInputStream
     new 05f333c  TIKA-567: Temporary file leak in TikaInputStream
     new b6f45ba  TIKA-567: Temporary file leak in TikaInputStream
     new b65fad1  TIKA-587: NullPointerException in OutlookExtractor on missing chunks
     new 3e79510  TIKA-585: AudioParser Fails with NPE on fileFormat.properties
     new 164302f  TIKA-375: Improve code quality metrics
     new 728a226  TIKA-582: Lithuanian language identification
     new 46dce5b  TIKA-581: Parser fails on files that parsed with v0.7
     new 8133622  TIKA-587: XMLParser ContentHandler: multiple endDocument calls
     new 663b274  TIKA-416: Out-of-process text extraction
     new aec08ff  TIKA-416: Out-of-process text extraction
     new 3b3845a  TIKA-416: Out-of-process text extraction
     new 7391243  TIKA-416: Out-of-process text extraction
     new 4a9b1f4  TIKA-416: Out-of-process text extraction
     new 9ffe396  OutlookExtractor: fix NPE on messages without 'from' field
     new 1466606  TIKA-416: Out-of-process text extraction
     new cb91494  Fix up the iwork mime types with the patch from TIKA-588, and also add a unit test for the detection using the non-container detector (we already had container aware detector iwork tests)
     new f4098f1  TIKA-416: Out-of-process text extraction
     new b1e9764  TIKA-416: Out-of-process text extraction
     new cd4dff4  TIKA-422: Wrong charset conversion in some RTF documents.
     new ca5aa21  TIKA-593: Tika network server
     new 7b15af5  TIKA-372 follow-on: Set two more XMPDM metadata values for MP3, and add unit tests
     new 23aa290  Update copyright year to 2011
     new 6fd7bb8  TIKA-525: Mismatched start and end elements in HtmlParser
     new 568868c  TIKA-594: Upgrade Tika to pdfbox 1.4.0
     new 981a123  TIKA-508: HtmlParser link processing should skip usemap and codebase attributes
     new c139c2d  TIKA-497: HtmlHandler should fix up incorrect capitalization of names in <meta http-equiv="xxx"> attributes before putting into metadata
     new 88a0189  - fix for TIKA-596 NetCDF and HDF files don't parse correctly from the command line via tika-app
     new 71e1271  - 0.9 RC release prep
     new 69ebcc7  Whitespace tickle.
     new fec548c  [maven-release-plugin] prepare release 0.9
     new 30e45df  [maven-release-plugin] prepare for next development iteration
     new c292f1a  [maven-release-plugin] prepare release 0.9
     new 6eaa7cc  [maven-release-plugin] prepare for next development iteration
     new 3272ece  - bumpity
     new 453479f  - bumpity
     new 92573a1  TIKA-593: JAX-RS network server
     new df1e4cb  tika-server: fix compilation under Java 1.5
     new bbc97d8  tika-server: fix compilation under Java 1.5 - remove @Override on interface implementations
     new 2b34017  tika-server: add license header to commons-logging.properties
     new ba5671c  TIKA-597 : Bogus exception handler in org.apache.tika.parser.mail.MailContentHandler
     new 2bf987b  TIKA-606 - MP3 lyrics tags use a 6 digit length for the overall size, but only 5 digits for each tag
     new 4fca144  TIKA-611 : setSortByPosition reverted to the default value (false) in PDFTextStripper so that columns are separated
     new 2129a0f  TIKA-609: IOException from jempbox
     new e15598f  TIKA-597: Bogus exception handler in org.apache.tika.parser.mail.MailContentHandler.body(BodyDescriptor, InputStream)
     new 1bbaa90  TIKA-607: ParseUtils.getStringContent( ) of a text file - parser is null
     new 2c5efd4  TIKA-600: [patch] suspect transferable code
     new 2313022  TIKA-601: [patch] objects that compareTo each other, should also equals each other
     new 7857da1  TIKA-602: [patch] use short-cuircuiting rel ops
     new d7601cc  TIKA-599: Thread issue with autodetect parser
     new 4d96672  TIKA-593: Tika network server
     new 7da803f  TIKA-594: Upgrade Tika to pdfbox 1.5.0
     new d27c244  TIKA-593: Tika network server
     new 22c94c9  - fix for TIKA-614 Support hdf5 data file with file extension *.h5 contributed by Cynthia L. Wong
     new 9fbce43  Update the OOXML Excel (.xlsx) extractor to be largely SAX based, to reduce the memory use (it now works in a similar-ish way to the .xls one). Bumps the POI dependency up to 3.8 beta 1. (TIKA-521)
     new b13d07e  Fix the mime magic detection of TNEF files, and add a unit test for it. (The rest of the TNEF support will be committed when POI 3.8 beta 2 is out). (TIKA-615)
     new 8b87a38  When parsing an RFC822 file, don't assume that the from address is always in a certain format. Fixes TIKA-618 and adds a unit test for it.
     new cd45bd2  TIKA-534 - When parsing a jpeg file with unhandled tags in it, skip these
     new ca6b5ca  TIKA-592 - Support AutoCad DWG files from AutoCad 2000 (version 1015), and add Custom Properties support across all versions. Adds unit tests for various other "versions" (where the file format doesn't seem to have changed even if the product version has)
     new f45b0f8  Turning an ASCII string into static final bytes without exceptions shouldn't be this hard.... Fix 1.6ism for TIKA-492
     new b206755  update Excel number-format tests for use with latest POI trunk
     new 2ebf678  DOCX: rich text parsing for DOCX headers / footers
     new cf73252  OOXMLParserTest.testWordPicturesInHedaer disabled due to bug in POI - wait for 3.8-beta2 or 3.8-final
     new 8ff0334  Add some more detection tests, which show that for container formats the addition of the filename lets us specialise from eg tika-msoffice to msword
     new 8eb3b88  Fix deprecated warnings
     new 0c0cbf7  TIKA-620 - When trying to identify a parser for a media type in AutoDetect and similar, if the Parser claims to support an alias of the media type but not the canonical one (eg someone changed the mimetype file but not the parser), then have the parser accepted on the alias.\nAlso adds AutoDetectParser tests for images (the bmp one of which didn't work before)
     new c7ece7f  TIKA-620 - When creating a default TikaConfig instance with a DefaultParser, have the newly created parser wired up with the Mime Type Registry we create. This allows the parser to resolve media type aliases and supertypes as it assumes it can.
     new 683ac82  TIKA-555 fallout - While image/bmp isn't the official mimetype, it is what Java thinks it is. So, switch from the official to the un-offial one before asking Java to give us image processors
     new dca2ee6  TIKA-620 - Have CompositeParser always use the canonical mimetype internally, via suitable calls to registry.normalise, rather than trying to handle the aliases individually
     new 4cbd467  TIKA-160: Support encryption formats
     new 8b643b6  TIKA-160: Support encryption formats
     new e09dc30  Fix deprecated warnings
     new 58f7914  TIKA-625: Easier XML parser extensibility
     new 3b61cea  TIKA-626: Add an AbstractParser class
     new 5e1afea  TIKA-629 - Add the sample .asf, .wmv and .wma files from Microsoft
     new 46cda3d  TIKA-629 - Add detection for .asf, .wmv and .wma (including tests) Adds support for unicodeLE and unicodeBE strings in the mimetypes reader
     new 1cb70f3  TIKA-631 - Sample Chinese outlook file
     new 0ea4a15  TIKA-631 - Stub out the work for improving the outlook parsing WRT html body content and better encoding detection
     new 7dc399b  TIKA-633: NPE in XWPFWordExtractorDecorator.extractHeaders
     new 553d769  OOXMLParserTest: fix compilation on java 1.5
     new 2821b70  TIKA-634 - Initial work on supporting more flexible ExternalParser loading (via XML, part done), and external parser metadata extraction
     new 20f881f  TIKA-634 - Example external parsers config file
     new 92b0f12  TIKA-634 - Add support for checking if the external command is there, for collecting the output from a file, and a wrapper CompositeParser that loads all available External Parsers
     new b94b086  TIKA-635: Tika GUI improvements
     new 6aa8fc2  TIKA-635: Tika GUI improvements
     new 6acca4a  TIKA-635: Tika GUI improvements
     new 2ffde96  TIKA-615 - Outlook parsing update for POI 3.8 beta 2
     new 336bf67  TIKA-615 - POI powered TNEF parser
     new f6a4a01  TIKA-615 - Update the new parser to use AbstractParser
     new 90b0f8e  TIKA-622 - Switch the POI based parser from the old POIFS to the new, lower memory NPOIFS
     new 5f8737a  TIKA-639: Maximum pool size for ForkParser
     new 51989e7  TIKA-635: Tika GUI improvements
     new 15229b6  TIKA-639: Maximum pool size for ForkParser
     new f182496  TIKA-639: Maximum pool size for ForkParser
     new d029485  OOXMLParserTest: enable testWordPicturesInHeader that was disabled due to bug in POI 3.8beta1
     new d75b366  docx: extract image description in alt attribute
     new 75c5bcf  TIKA-593: Tika network server
     new b6d67d3  TIKA-621: RTF parsing fails with Java 7 early access on 64bit platforms
     new cfb1779  TIKA-461: RFC822 messages not parsed
     new e67859b  TIKA-461: RFC822 messages not parsed
     new 6b711d2  <?xml version="1.0" encoding="UTF-8"?>
     new 9c9d6f8  TIKA-644 - When generating html headings from word, h6 is the highest the xhtml allows, so don't try generating h7 (or higher) even if Word has a 'Heading 7' style
     new 0a7b8ce  TIKA-643 - Now that we're using NPOIFS which takes files, simplify the code as we don't need to use an InputStream
     new a5df877  TIKA-643 - Change TagginedInputStream to work like TikaInputStream for creation, with a static get, to avoid double wrapping. Also adds toString methods on the two
     new 03752ab  TIKA-643 - Add toString() to another of our InputStreams
     new cac8d4c  Remove un-used import
     new 1fe4980  Office: SummaryExtractor: do not fail on files without property stream (original fault file was generated by Java Excel API library)
     new e6e7650  OfficeParser: HWPF: ignore invalid style references
     new 907880f  TIKA-649 Fix for .docx files with no header or footer policy defined
     new ee817f0  TIKA-647 - Fix inverted logic on --list-met-models
     new fc9e8a0  TIKA-213 JSON metadata output support, using the GSON library to do most of the work
     new 3ab6160  TIKA-619 - Apply patch from Alexander Chow to ignore errors from a JRE GIF bug
     new 3e0191d  TIKA-654 - Open the OOXML OPCPackage as read only, and fix serial version warning
     new 5119a98  TIKA-654 - If we have an open container that can be closed, close it when closing the stream
     new 87fd9af  TIKA-655 - Push the iWorks detection logic from ZipContainerDetector to IWorkPackageParser, and make that detect similar to OfficeParser does. Then, put the content handler selection logic into IWorkPackageParser, and remove IWorkParser (which claimed to be a regular parser but in fact only worked when called from IWorkPackageParser) The result is that tika app can then parse iWork files, and unit tests still work
     new 6af91c6  TIKA-656 RFC822 and MBox parsers should output the same date metadata keys
     new 5716665  TIKA-656 Switch two more Office metadata keys that hold dates to being typed date properties
     new 469aa10  TIKA-656 Correct general POI date to metadata handling, plus test
     new 8ef4561  TIKA-656 Update the Outlook parser to handle dates the same way as the other mail parsers
     new 8f4b2ae  TIKA-652 Add a few more external properties to match the internal ones
     new 1835ad3  TIKA-652 Update the POIFS parser to handle custom metadata entries in the same way that the Open Document one already does
     new d18f3f0  INFRA-3583: Add Tika to Sonar
     new c2aed15  TIKA-658 TCPDump pcap mime matching
     new 8c58587  TIKA-659 Merge the ODF parser tests, and put them in the new package
     new ac4b6ef  TIKA-646 Helper class to allow us to avoid calling endDocument until a later time
     new 1fcb10a  TIKA-646 Avoid calling endDocument for OOXML and ODF parsers until after we have extracted the metadata
     new ddceb68  TIKA-655: IWorkPackageParser / IWorkParser not registering properly
     new c69ff98  TIKA-640: RFC822Parser should configure Mime4j not to fail reading mails containing more than 1000 chars in one headers text (even if folded)
     new 77c7847  TIKA-650: Missing required alt attribute on img tag
     new c772872  TIKA-645: Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
     new 0b7fa58  tika-parsers/pom.xml: remove unused commons-httpclient dependency
     new 08859fc  TIKA-213 Remove leading zeros from integers when outputting JSON
     new 35fc876  TIKA-660: Remove logging of duplicate parser definitions
     new 49a9c4f  TIKA-661: MimeType class does contain a String with accessor named Extension. This should be a List<String> Extensions due to several reasons.
     new c2f7fe0  TIKA-645: Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
     new 49a3991  TIKA-662: OOXMLExtractorFactory: use file when stream is TikaInputStream and it .hasFile()
     new 9f57e91  TIKA-416: Out-of-process text extraction
     new 5943753  TIKA-416: Out-of-process text extraction
     new 329ebc5  TIKA-642: Few of RTF files not extracting properly
     new 3b10c70  TIKA-660 Merge the two CompositeParserTests and PatternsTests into one each in core
     new 406cca3  TIKA-645: Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
     new c5f2c92  TIKA-572: Update plugin versions in the POM structure
     new 40fd8c1  TIKA-628: Binary distribution for releases
     new 15bd1d4  TIKA-664 Add mime entries for Adobe Premiere (PPJ) and Adobe SoundBooth (ASND), plus a common PhotoShop alias
     new 9b54a1c  TIKA-375: Improve code quality metrics
     new 2ff952a  TIKA-375: Improve code quality metrics
     new 355bfe2  TIKA-160: Support encryption formats
     new 7fe73eb  TIKA-259: Safe parsing of droste.zip
     new d64aa92  TIKA-527: Allow override mapping mime<-->parsers through config
     new 3f618f6  TIKA-527: Allow override mapping mime<-->parsers through config
     new 4d89ea2  TIKA-346: ZipParser throws "invalid compression method" error for some archives
     new 49692de  TIKA-665: NullPointerException from com.sun.org.apache.xml.internal.serializer.ToStream.writeAttrString on some excel files from the CLI
     new deeff20  TIKA-668: Better handling of XML parse errors
     new fb732d6  TIKA-671 - initial support for FictionBook document (fb2) format
     new 33e5529  added 3 chm files, testChm, testChm2, testChm3
     new e152e26  added chm patch included tests, changed tika-bundle-it pom.xml, version 0.9 --> 1.0
     new ab666f2  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new cbba597  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new 82a4e49  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new 8c6010b  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new 5619f19  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new 1d3fe57  - progress towards TIKA-245 Support of CHM Format (Oleg's patch, in parts, as suggested by Jukka)
     new 75e5f97  TIKA-631 Apply Outlook extraction enhancement to better extract html and rtf versions using POI 3.8 beta 3
     new 69f49a9  support of Java 5
     new 42d7de9  support of Java 5
     new f2e35d0  added missing Apache lisense header
     new 3dee434  added missed Apache lisense headers to the chm tests
     new cf8cb44  (TIKA-672) Proper error handling in the CHM parser
     new 6a62fe0  (TIKA-672) added to the chm tests more sophisticated error handling
     new 608a608  TIKA-466: Feed Parser
     new 9ffaf87  OfficeParser: choose correct Decryptor for document
     new 7b5f83a  List the commons compress dependency in the bundle too (TIKA-671)
     new 31ce68f  TIKA-434 - Pushback buffer overflow in TagSoup
     new 77c6b73  TIKA-679 Add detector support for CADKey PRT files
     new f5a9438  TIKA-679 CADKey PRT parser
     new afb429c  TIKA-679 Remove un-used import, and fix warning
     new 5ad2798  TIKA-679 Add missing license header
     new 96f592a  TIKA-679 Update the CADKEY PRT parser to get the description, and tweak the text encoding based on work by Troy
     new fc13845  TIKA-678 Add unit test using supplied test file that shows the problem with option headers no longer exists
     new 92b2619  TIKA-683 Rename TikaTest to something more specific, so we can use that name for a parent superclass of our tests
     new f7607ec  TIKA-683 New TikaTest parent class for tests (which RTF test will shortly use)
     new ca890e4  TIKA-683 Create a dedicate RTF parser test, based on the existing checks in TestParsers
     new f5f1ef1  TIKA-683 Unit test for Japanese RTF text
     new 2780869  TIKA-507 Split the mime type entries for AFM and PFM (font metrics) out from the fonts themselves, and add magic detection patterns for them
     new d27bad8  TIKA-507 Add byte based detection tests for .pfa/.pfb/.pfm (which we currently lack free sample files for)
     new 61c1637  added ngram profiler and its tests, also added an optinton to the TikaCLI.java for lang.profile creation and its test
     new 5bbd7ab  changed verification point og the testListParsers()
     new 24b8cfb  commented an assert temporarily
     new b303c6a  Add quick test to validate that RSS feeds will be processed by the appropriate parser (see https://issues.apache.org/jira/browse/NUTCH-1053).
     new 90b1063  added license headers to alice.cli.test & welsh_corpus files
     new 30897dc  added license header to welsh_corpus file
     new 25ca130  TIKA-593: Update tika-server
     new 1dc0a2e  TIKA-527: Allow override mapping mime<-->parsers through config
     new d7b8d74  TIKA-565: Improved OSGi bundling
     new 2af24f4  - patch for TIKA-422 contributed by Mike McCandless.
     new e6defb6  - accidentally committed in progress geo folder, removing it.
     new 62abba6  - commit unit test patch for TIKA-683 from Mike McCandless.
     new 8efad87  TIKA-692: TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag
     new b16c898  TIKA-692: TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag
     new 332e498  TIKA-692: TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag
     new c3318a1  TIKA-447: Container aware mimetype detection
     new 1be0e52  TIKA-667: Changes to RFC822Parser to support turning off strict parsing
     new e7b51f0  TIKA-693 - Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser
     new 5aa2fc4  XSLFPowerPointExtractorDecorator: remove unnecessary call to slide._getCTSlide()
     new 3c3e6d1  TIKA-434: Bug in TagSoup causes IOException
     new ea2cfb0  ZipContainerDetector: fix file descriptor leak
     new f74ff48  TIKA-697 Sample files in the AR archive format
     new 4316d8d  TIKA-700 Upgrade the POI dependency to 3.8 Beta 4
     new 600e69d  Fix TIKA-700 related 1.6ism
     new d64fb55  TIKA-392: add 3 RTF test cases
     new 728a218  TIKA-392: use unicode escapes for non-ascii chars
     new 3359566  TIKA-701: Fix problems with TemporaryFiles
     new 8ef85fc  TIKA-701: Fix problems with TemporaryFiles
     new 3e832b0  TIKA-701: Fix problems with TemporaryFiles
     new 6bc0e05  TIKA-701: Fix problems with TemporaryFiles
     new 6bd95ee  TIKA-701: Fix problems with TemporaryFiles
     new ddf56ce  TIKA-701: Fix problems with TemporaryFiles
     new 6862cc1  typo
     new 0ac8f50  TIKA-687: Temporary file not removed after detection
     new 411d5d8  TIKA-207: MS word doc containing tracked changes produces incorrect text
     new 9d73a10  TIKA-704: PDF and Outlook docs embedded in MS Word documents not parsed
     new 9d5cf32  TIKA-702: Cannot compile Tika with Java 7 (ImageMetadataExtractor.java)
     new 9c9763a  TIKA-698: "Invalid UTF-16 surrogate detected:" parsing PowerPoint 97-2003
     new e25a454  Embedded file extraction is broken for some OOXML files (bug introduced few commits ago)
     new b4bfb99  TIKA-704: PDF and Outlook docs embedded in MS Word documents not parsed
     new 07f6f63  add several test cases, derived from test case coming in TIKA-683
     new 04d1fb7  TIKA-704 Tweak detection of embedded non-office documents in OLE2 streams
     new 4c5599b  TIKA-698: use the unicode replacement char (U+fffd) when replacing invalid XML chars in SafeContentHandler
     new 4639ee2  TIKA-710: Expose the Parser and Detector instances within the Tika facade
     new 9cb9439  TIKA-704: PDF and Outlook docs embedded in MS Word documents not parsed
     new e924f82  TIKA-704: PDF and Outlook docs embedded in MS Word documents not parsed
     new 505a835  PPT: avoid NPE when OLEShape.getObjectData() is null (see POI bug#51771). Patch by Yegor Kozlov
     new 5c22595  TIKA-683: new RTF parser that performs its own direct shallow parse (instead of using RTFEditorKit from javax.swing)
     new 1121dfe  TIKA-594: Upgrade Tika to pdfbox 1.6.0
     new 24ef103  TIKA-692: TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag
     new ad30687  TIKA-692: TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag
     new d30f1ff  TIKA-598: Update HDF parser and NetCDF parser to emit minimal XHTML
     new 717b173  TIKA-688: Enhance content-type detector to recognize almost plain text
     new 0fdbd74  TIKA-698: explain that Unicode repl. char is now used for invalid chars
     new 2f7241d  TIKA-717: add testComment test
     new fdb902b  TIKA-717: fix RTF parser to extract annotations (comments)
     new 9dc084c  TIKA-688: Enhance content-type detector to recognize almost plain text
     new c4c8b22  TIKA-565: Improved OSGi bundling
     new de35385  Drop svn:executable properties.
     new 3f9f0fe  TIKA-700 Upgrade the POI dependency to 3.8 Beta 4
     new 7b36616  TIKA-565: Improved OSGi bundling
     new 8089184  Add missing svn:eol-style settings
     new e679c0e  Some CHANGES.txt updates
     new 5e9164d  Add a more specific mime magic pattern for detecting single stream Ogg Vorbis files
     new ea21418  TIKA-705 Temporary workaround for the relative links issue, pending upgrade to POI 3.8 beta 5
     new 6187e03  TIKA-705: re-enable test case
     new ab1aea6  TIKA-725: Empty title element makes Tika-generated HTML documents not open in Chromium
     new ebb0338  TIKA-716: Upgrade apache-Mime4J to Version 0.7
     new 39be977  TIKA-726: add EncryptedDocumentException for situation when extraction can be done due to unknown or wrong password
     new 111fcd5  TIKA-726: use EncryptedDocumentException in OfficeParser and CryptoParser
     new e39bff0  TIKA-726: add Apache license header
     new a5429c5  TIKA-726: throw EncryptedDocumentException in ExcelExtractor
     new c9e37f6  TIKA-716 Fix tika-bundle dependency list following apache-Mime4J upgrade
     new acbefdf  TIKA-712 Fetch Master Slide text for PPT and PPTX text extraction
     new 1de7bfe  TIKA-712: master slide's text is now extracted
     new ec9d7d2  TIKA-709: Tika network server does not print anything in response to, for example, Word documents
     new 0d2676c  TIKA-727 Improve the HSLF PPT parser by using HSLF usermodel classes to generate more specific XHTML events
     new 1109b5a  TIKA-508: HtmlParser link processing should skip usemap and codebase attributes
     new 2bbf5ca  TIKA-720 Sample EBCDIC (IBM-500/CP500) text file
     new eb73a4b  TIKA-720 Add Charset Detector for the IBM500 (EBCDIC) charset
     new b62ce33  Make the error message more helpful when this test assert fails
     new aa944e8  TIKA-720 Add documentation for some of CharsetRecog_sbcs, and tweak the EBCDIC bit to avoid false matches for short snippets of HTML
     new 7aeed0b  Add a disabled Outlook RTF related test, pending a fix for TIKA-632. (We're nearly there with the recent RTF improvements, but not quite....)
     new 683c9a8  TIKA-508: HtmlParser link processing should skip usemap and codebase attributes
     new a682c4c  TIKA-712: temporarily turn off pulling text from master slide until we can figure out how not to pull out the boilerplate text
     new 9836592  TIKA-712: add (disabled) test cases showing the bug
     new fcde81c  TIKA-651: Unescaped attribute value generated
     new 16de4aa  Use spaces for indentation
     new 23109a8  Avoid closing stdout in TikaCLITest
     new 77ebfcd  TIKA-651: Unescaped attribute value generated
     new cbd8a31  Replace 1.0 with 0.10 in @since statements
     new 525ded2  - prep for 0.10 RC1
     new 4ef2b04  - prep for RC 0.10 #1
     new e4ffb45  [maven-release-plugin] prepare release 0.10
     new 15f13bd  [maven-release-plugin] prepare for next development iteration
     new fb8a540  TIKA-731: NPE in WordExtractor.handleParagraph()
     new 9c609e6  TIKA-732: Upgrade to Commons Codec 1.5
     new 71a0b7e  - update CHANGES.txt changelog with contribution report for 0.10 and with dependency tree.
     new 3b9fbf9  HSLF Extractor improvements from Pablo from TIKA-727
     new c80eee3  TIKA-632: extract hyperlinks from RTF docs
     new 0406ad0  TIKA-632: temporarily disable RTF hyperlink test
     new add1168  TIKA-632: enable test case
     new 70d59d1  TIKA-711: add test for optional hypen across doc types; leave .doc turned off until we can fix it
     new 65a5b20  typo in comment
     new f60f184  TIKA-717: PPT is extracting comments correctly
     new 0396c4f  TIKA-738: move (ignored) test case to PDFParserTest
     new dc2b9a1  TIKA-733: try to be robust when RTF doc has too many closing {'s vs opening }'s
     new af7df6b  TIKA-711: correctly handle optional hyphen from Word docs (.doc)
     new 4e0b3e6  TIKA-742: extract paragraphs inside PDF pages
     new da8cc2a  TIKA-742: fix Java 1.6 only code
     new 3a70dc8  TIKA-743: Upgrade to Apache parent POM version 10
     new 609cfe7  TIKA-739: For certain DWG files, the Tika content parser outputs garbage
     new 84d1ee3  TIKA-741: "Zip bomb" (XML nesting) detection is too strict
     new 03fa932  TIKA-699: Automatic checks against backwards-incompatible API changes
     new 4e22d4b  TIKA-744: Drop support for Java 1.4
     new 8f3097f  Tweak the MP3 parser to put a class on the lyrics paragraph, so it could be filtered for if required
     new b5d5065  TIKA-745 If we find a ID3v2 Genre that isn't one of the ones in v1, use it as-is
     new 7d873ab  TIKA-746 Allow MimeTypesFactory to take more than once resource to load, and update the default to be to load tika-mimetypes.xml followed by any custom-mimetypes.xml files found
     new aea844e  TIKA-448: Tika FLVParser hangs
     new a2f30df  TIKA-682 Add mime magic detection for PSD files
     new f5bc2a7  TIKA-749 Add EndianUtils, which provides a way to read small and big endian numbers from streams, based on the version in POI
     new 01bbb05  TIKA-682 Add a basic PSD metadata extracting Parser
     new bec14db  TIKA-749 Convert the DWG and PRT parsers to use the Tika endian util, rather than the POI one
     new a5cc41a  TIKA-682 Fix 1.6ism
     new c256022  TIKA-748: ignore \* if it's not right after group start {
     new 578050c  TIKA-750: JavaDoc of Tika XPathParser should mention descendant:node()
     new bc49afc  TIKA-681: eight new n-gram language profiles
     new 66396ca  TIKA-681: eight new n-gram language profiles
     new e2f5b4c  TIKA-751: some initial improvements to embedded office doc handling in AbstractPOIFSExtractor
     new d2ac303  Add a common alias for the WordPerfect mimetype
     new c3f9910  TIKA-752: Typo in timezone used in Metadata.iso8601Format
     new 02ec12f  TIKA-657: Email parser gets into trouble on malformed html in enron corpus
     new 1aaac7f  TIKA-657: Email parser gets into trouble on malformed html in enron corpus
     new 773b46e  TIKA-753: speed up processing of embedded office docs
     new 30a22be  TIKA-755 Have TikaConfig create a DefaultDetector instance based on the supplied MimeTypes and/or ClassLoader, and switch Tika+AutoDetectParser to get their detector from there, rather than create their own DefaultDetector instance
     new 5cc8b40  TIKA-756: XMP output from Tika CLI
     new 9769f86  TIKA-738: optionally extract PDF annotations
     new 94356cb  TIKA-724: add option to PDFParser to control auto-space behavior
     new f22152d  TIKA-761: Provide version number by CLI argument -V
     new eab9318  TIKA-565: Improved OSGi bundling
     new 82e053d  TIKA-746 Allow MimeTypesFactory to take more than once resource to load, and update the default to be to load tika-mimetypes.xml followed by any custom-mimetypes.xml files found
     new 70c5417  TIKA-582: remove extra quotes from Lithuanian 3gram tables
     new c05a585  TIKA-565: Improved OSGi bundling
     new 1076bb7  Summarize changelog entries by feature rather than by issue
     new 6df67a8  Add a few more 1.0 changelog entries based on notable issues in Jira
     new 9adbcd0  Normalize CHANGES.txt to use UTF-8
     new dcc6e5b  Include only compile-scope tika-parsers dependencies in CHANGES.txt. The other dependencies aren't really of interest to normal users.
     new c25ac97  Uniform formatting of the CHANGES.txt file
     new 25f7a4d  TIKA-703: Drop deprecated methods/classes/interfaces
     new 1d55d3a  TIKA-703: Drop deprecated methods/classes/interfaces
     new 6603aa6  TIKA-761: Provide version number by CLI argument -V
     new 08af8cd  TIKA-763: Update license metadata
     new 8e434ae  TIKA-763: Update license metadata
     new 3083e91  TIKA-736: extract header/footer text for OpenOffice docs
     new 693d3c6  TIKA-764 Update OpenDocumentMetaParser to use the common Metadata keys for document statistics, and remove use of a deprecated class in fetching the stats
     new 9ba4191  TIKA-565: Improved OSGi bundling
     new 309a9a5  Remove unused import
     new 2e976ca  TIKA-565: Improved OSGi bundling
     new 8f39338  TIKA-565: Improved OSGi bundling
     new 1ef02db  TIKA-565: Improved OSGi bundling
     new e910698  TIKA-565: Improved OSGi bundling
     new 5fc2030  TIKA-699: Automatic checks against backwards-incompatible API changes
     new 6673d51  TIKA-565: Improved OSGi bundling
     new 3d12327  TIKA-565: Improved OSGi bundling
     new 1d7259d  TIKA-565: Improved OSGi bundling
     new d482bd0  TIKA-565: Improved OSGi bundling
     new 43bcd67  TIKA-565: Improved OSGi bundling
     new 52d6efd  TIKA-565: Improved OSGi bundling
     new bc3f524  massage CHANGES.txt: inline issue numbers so we can match to the right description
     new eea0229  TIKA-763: Update license metadata
     new a0aec54  TIKA-763: Update license metadata
     new 044cd54  TIKA-769: Upgrade to Commons Compress 1.3
     new 2a6eca0  TIKA-761: Provide version number by CLI argument -V
     new 8797f12  TIKA-703: Drop deprecated methods/classes/interfaces
     new 7d30b7a  - prep for release
     new 15f252d  [maven-release-plugin] prepare release 1.0
     new 0121288  [maven-release-plugin] prepare for next development iteration
     new 2c128e8  - add release date: updated on RC vote area, and will push to dist.apache.org on release (if successful).
     new 7f970ef  TIKA-767: allow controlling whether PDFBox should try to remove overlapped duplicated text; default to disabled
     new 1a7dfa1  TIKA-712: strengthen the test cases here to not only validate the text came through but also to make sure boilerplate text did not
     new a39d582  TIKA-714: add test case for PPTX to extract text from word art
     new a157a6f  TIKA-529: don't allocate byte[] for each byte when detecting IBM420 charset
     new 2c3dbde  TIKA-777: process buffered bytes/text on font change
     new 79150b7  TIKA-773: .NET version of Tika
     new 6d7b5d4  TIKA-780: Optimize loading of the media type registry
     new d57de4c  TIKA-780: Optimize loading of the media type registry
     new 72646ef  Adjust clirr checks to use Tika 1.0 as the baseline.
     new c57d94d  TIKA-780: Optimize loading of the media type registry
     new 2e5cd68  TIKA-780: Optimize loading of the media type registry
     new 85878f6  fix typo
     new 9f8c762  TIKA-780: Optimize loading of the media type registry
     new b7a58ef  TIKA-780: Optimize loading of the media type registry
     new 350d1be  Exclude the enum types from clirr checks.
     new ad78ecd  TIKA-781: don't output whitespace when we are in an ignored GroupState
     new 5e91e71  TIKA-663 Mimetype entry for JSP with magic
     new 1752947  TIKA-779 Works 2000 container aware detection, plus test
     new 7b80961  TIKA-612: enable controlling PDFBox's setSortByPosition from PDFParser
     new c631563  TIKA-782: properly handle \bin control word
     new a7e1524  TIKA-784 Mimetype entry and glob for DITA
     new 0edc3ad  TIKA-784 Sample DITA task, concept and map files. (Based on some Alfresco documentation, with content replaced with Tika info)
     new 9ae596c  TIKA-784 DITA mimetype entries for the 3 subtypes, plus tests
     new 1cbd62a  TIKA-785 Add a --list-detectors method to TikaCLI, along the lines of the existing --list-parsers one
     new e3f9af7  TIKA-784 Switch the DITA types to be format specialisations, rather than their own dedicated mimetypes, to match the OASIS recommendation
     new 78f6eac  Expand container detection tests, and added disabled (failing) tests for TIKA-786
     new ba101bb  A few more TIKA-786 related tests
     new fa077bf  Add basic JavaDoc for a few MediaType methods that lacked it
     new 2b39c75  TIKA-786 Control the ordering of detectors in DefaultDetector, so that user supplied detectors come first, then Tika ones, and finally MimeTypes. This ensures that more specific detectors get to try first
     new 482cf37  Add a note about TIKA-786 to Changes
     new f8e3364  TIKA-787: Improve charset detection for UTF-8 HTML fragment
     new 4073be5  TIKA-789 Sample Microsoft Project (MPP) files
     new 25fcdf9  TIKA-789 More consistent naming for sample MPP files
     new a141b27  TIKA-789 Microsoft Project (MPP) is OLE2 based
     new 1b01e4f  TIKA-789 POIFS Container Detection support for MPP files
     new 39a1bac  TIKA-789 Improve MPP detection based on info from Alex Ott
     new f05edfe  TIKA-789 Add (metadata only) Project support to OfficeParser, and add a unit test that checks we correctly get Project metadata back from our sample files
     new 8e449f9  Add CHANGES entry for TIKA-789
     new 00801cb  TIKA-789 Add the project type to the OfficeParser mimetype list, and add a note on why Works is missing from the list
     new 6ef64dd  TIKA-778: fix cases where PDFParser produced too many </p> tags
     new 31f96ed  TIKA-697 Test CPIO file
     new e72a9c4  TIKA-697 Archive formats mimetype tests (not all of which work yet)
     new 4885ea7  TIKA-697 Correct mime match for .ar unix archives, add the suggested extra filetypes and aliases, and list .deb as being ar based
     new 2d42f07  TIKA-697 Add mime magic for .deb files, which are base on .ar but have a specific first entry
     new 96c92f5  TIKA-794 Correct Little16 mime magic logic, and enable the CPIO test now that the detection is correct
     new 9a93719  TIKA-791 Sample protect Microsoft Office documents
     new e735a65  TIKA-791 POIFS Container Detector support for encrypted OOXML files, plus tests and new (tika specific) mimetype
     new a78309d  TIKA-790 Remove the duplicated detection code between OfficeParser and POIFSContainerDetector, by following the pattern from TIKA-791 and adding a type for OLE10Native, then pushing the rest of the detection work to POIFSContainerDetector
     new 90d327c  Patch+Test from Antoni from TIKA-797 - Correct the default PPT extension
     new 513dbdb  TIKA-798 - EMF and WMF metafiles aren't the same, so split the mimetypes, plus add magic+tests for them (patch from Antoni)
     new 82ddc53  TIKA-410 Word Parser support for extracting textbox content (Patch from John Mastarone)
     new 374538b  TIKA-800 Wrap the ArchiveInputStream in PackageExtractor so that it can be used with Detectors
     new 852f148  TIKA-565: Improved OSGi bundling
     new 7e2326a  TIKA-800: mark/reset not supported from POIFSContainerDetector
     new 7e2393f  TIKA-567: Temporary file leak in TikaInputStream
     new 707b8f2  Add TikaCLI help for the -f/--fork option previously added for TIKA-416
     new b73dccb  TIKA-801: fixed NPE when filtering Outlook docs with RTF or HTML content
     new f8e572b  Add disabled tests for TIKA-808 (parser needs fixing so that tests can pass)
     new eec596c  TIKA-809 Handle embedded files with no file extension
     new 50d6e58  TIKA-808 Remove un-used fork test imports
     new 4287c50  TIKA-803 Wrap the outlook message body in a special div
     new 50437f3  TIKA-812 Support for detection of MS Works 7.0 Spreadsheet files
     new 4bd426e  TIKA-813 Support for detection of Apple "bplist" files (Binary Property List) and webarchive files - a special case of bplists.
     new 853fea0  TIKA-814 MimeTypes detects plain text based on a larger sample of bytes.
     new 60711a9  TIKA-700 Upgrade to POI 3.8 beta 5
     new ce78a81  TIKA-757 Tidy Excel extractor code after POI upgrade
     new ac13e8e  TIKA-757 Tidy the Word Extractor picture locating code
     new b21fa6a  TIKA-757 Tidy the OLE10Native extractor code now that POI has been upgraded
     new 03e6113  TIKA-705 / TIKA-757 - Simplify the OOXML related parts code, following POI upgrade
     new 3d6dbb3  TIKA-816 The Excel (XLS) Parser should format numeric formula cell values, and handle string formula cell values
     new 75d487d  TIKA-821 Added support for detection of old MS Works Word Processor files
     new e3a1831  TIKA-812 Clarified the javadoc of the test method I introduced with ver2 of my patch. In ver2 I added a magic which allows pure MimeTypes to detect works-spreadsheet files, but forgot to update the javadoc for the text method. It was untrue. Now it's OK.
     new 48b28a6  TIKA-822 - Handle quoted parameters on media types
     new 57a99d0  TIKA-823 support for detecting StarOffice types, both in MimeTypes and POIFSContainerDetector
     new edb6775  TIKA-828: TaggedIOException can be passed non Serializable objects
     new 486d767  TIKA-808: Fork Parser doesn't work for PDF files
     new 0ee5935  TIKA-808: Fork Parser doesn't work for PDF files
     new dd8480f  TIKA-829 Validate inputs to the ForkParser constructor (must not be another ForkParser) and TikaInputStream get (must not be null) - patch from Jerome Lacoste
     new 915be4f  TIKA-831 Fix the data type when comparing errors from the forked server, and add some more Forked unit tests (one disabled) - patch originally from Jerome Lacoste
     new a90b827  TIKA-827 Handle sending non serializable exceptions back from the ForkServer
     new e2720e4  TIKA-831 Fix test warnings, and enable the last test (needs to not use the Tika facade)
     new b93fc6f  TIKA-793 Correct the null termination stripping in the ID3 tag code, when dealing with double byte encoded strings
     new a739ddd  TIKA-793 Sample MP3 with UTF-16 text in the comments (others are all ISO8859-1 text)
     new b20beb6  TIKA-833 Mark some more Excel formatting tests as passing (with tweaks to match what actually gets stored)
     new bc07570  TIKA-830 Improved error message in the ForkParser if we are unable to serialize the parse objects
     new 5a1ebc7  TIKA-831 Start on a test for the ForkParser with a parser exception that isn't serializable (currently not working so disabled)
     new 554605b  TIKA-793 Unit test for i18n MP3 tags (excluding comments)
     new 3bdf69a  TIKA-793 Correctly handle the COM/COMM tag in MP3s, which is in a different form - encoding+language+desc+text
     new f6001f7  Add TIKA-793 to the changelog
     new 52d51b9  - apply patch from TIKA-824 contributed by Markus Jelsma
     new 9e22f46  TIKA-826 We don't currently support .xps or .xlsb files (which are OOXML based), so ensure we don't explicitly claim them, and have the OOXML parser decline if it gets them on the basis of the parent type
     new 83a8a97  TIKA-826 For an OOXML type we can't handle, use EmptyParser for valid, empty xml
     new e303705  Patch from Fabian Lange from TIKA-837 - Make inner classes static where possible
     new fa076c5  TIKA-375: EmptyParser Singleton should be final
     new 0f54339  TIKA-793 follow-on: MP3 files can have more than one comment, as long as the language+description pair is unique, so support capturing multiple comments
     new ed4a055  TIKA-695 Sample office files with custom properties
     new 022f18c  TIKA-695 Add unit tests for XLS/DOC/PPT custom properties extraction
     new 4b51bea  TIKA-695 Support for Custom OOXML properties, plus start on tests for it
     new 0c52b66  TIKA-695 Support some more OOXML custom property types, and expand the unit test coverage
     new f28f380  TIKA-840 Expose the logic for detecting the type of an OOXML file from an open container
     new 3698626  TIKA-840 Update the OOXML parsers, so that rather than hard coding the content type, the file specific one is feteched and set
     new 36bbcfc  TIKA-805 Improved .pptx XSLF extraction, patch from Yegor Kozlov
     new 80d36eb  TIKA-841 User supplied parsers should be prefered over built in ones in DefaultParser
     new a2555c6  TIKA-507 FontBox powered .afm font metrics parser, patch from Fernando Arreola
     new 83eb5b8  TIKA-843 Metadata support for dates without times (treated as midnight UTC)
     new f6adfbd  TIKA-843 Switch dates without times to noon UTC
     new c657ecd  TIKA-846 Patch from Ray Gauss to parse RDF Bag Elements to multi-valued metadata
     new 45ac50a  TIKA-846 Fix indent
     new 92f4446  TIKA-844 Add an internal TagBag property type constructor, patch from Ray Gauss
     new c95bbb3  TIKA-845 Correct the conversion of XML tags to multi-valued metadata values, and avoid duplicating existing values
     new 7c43bff  TIKA-849 Add a sample iBooks epub file from Andrew Jackson, and add a unit test for the Zip Container Detector of epub zip formats
     new 1118426  TIKA-849 iBooks epub mimetype entry, and fix a few comments
     new c877ecc  TIKA-849 Initial ibooks epub support and test, from Andrew Jackson. Metadata only for now though, text isn't coming through as it's within <object> tags
     new 84e8fc3  TIKA-849 EPub (and iBooks) files typically have multiple xhtml documents making up the whole, so avoid repeatedly starting/ending the document for each part
     new 8f05b4e  TIKA-839 Update the .potm test file to match the others, and enable testing of it - file+patch from John Mastarone
     new f52da71  TIKA-760 Avoid NPE in XHTMLContentHandler if a null string is passed to the characters method
     new 85ae815  TIKA-770 Convert the remaining ODF document statistics to be defined properties, and update all of the Office Count statistics to be integer typed properties
     new b754b45  TIKA-770 Set all document statistics with Properties rather than Strings, now they are all typed
     new 8584631  TIKA-851 More specific quicktime/mp4 matches, for the common subtypes, based on the ftyp atom
     new b802cc1  TIKA-851 Another mp4 audio alias
     new 34c0293  TIKA-842 IPTC Metadata Properties, including full descriptions of all the properties taken from the Specification, along with appropriate License/Notice information for this.
     new 7901a55  TIKA-842 Avoid property name clash with IPTC and the old-style values from DublinCore
     new bd764cc  TIKA-851 Add another MP4 audio extension
     new 20dfa5a  TIKA-852 Sample MP4 Audio (M4A) file
     new fa10b69  TIKA-852 Initial MP4 Parser, powered by MP4Parser from Google Code
     new d53216a  Update the bundling defintions to include MP4Parser and dependencies
     new fe9897c  TIKA-852 MP4 files can be very large, so avoid trying to buffer them in memory
     new 24bc3dc  TIKA-853 Close the stream in the MP4 Parser, and use a cleaner way to get two of the metadata boxes
     new ab54c21  TIKA-757 Remove POI related TODO, now that we have upgraded to a version with the fix
     new a170fbe  TIKA-850 Add a new interface, PasswordProvider, which can be set on the ParseContext to provide a way to supply document passwords. Updates PDFParser to use this in preference
     new 5498c0e  TIKA-854: No text extraction for Word macroenabled template
     new 53ef1b9  TIKA-852 Avoid NPE on missing metadata boxes
     new 6454566  TIKA-852 Support setting the channel type from a channel count in mp4, via a couple of different possible routes (see dev@tika discussions)
     new 5a42442  TIKA-747 Add Vorbis and FLAC test files, for integration tests
     new 5e104cc  TIKA-747 Add the Vorbis and FLAC parsers, along with a simple integration test
     new 4cf7614  Re-enable the iWorks tests disabled in r1023712 as part of TIKA-533, as they work properly again now
     new 0ec2c05  Update CHANGES with recent new parsers
     new 59cab5f  TIKA-818 Use a temp file for PDFBox resource processing, if the input is a file based TikaInputStream
     new 72564dd  Mark text/javascript as an alias of the official application/javascript mimetype, rather than being seperate, and add the application/x-javascript which is sometimes used in older things too
     new f68f1a4  Add the older audio/x-mpeg alias for audio/mpeg
     new dc9ffcc  TIKA-850 Update OfficeParser to support the new style password fetching via PasswordProvider
     new eb62078  TIKA-865 Reduce the amount of MimeTypes.forName that needs to be synchronized
     new 1f16bf8  TIKA-866: Invalid configuration file causes OutOfMemoryException
     new 84a2a27  TIKA-865 Tweak what we lock on
     new 929f753  TIKA-864: Metadata.formatDate causes blocking in concurrent use
     new a547dde  TIKA-866: Invalid configuration file causes OutOfMemoryException
     new fbd2d6d  TIKA-866: Invalid configuration file causes OutOfMemoryException
     new 8cdd3dc  - upgrade to 1.1-SNAPSHOT dependency.
     new 8c111b9  update CHANGES.txt in prep for release.
     new 963aa1b  [maven-release-plugin] prepare release 1.1
     new bbd71ac  [maven-release-plugin] prepare for next development iteration
     new 6a8b23b  TIKA-870: allow setting maxStringLength per-call to Tika.parseToString
     new f1f4b03  - patch for TIKA-874 Identify FITS (Flexible Image Transport System) files contributed by Peter May
     new b11403d  TIKA-875: fix file handle leak in ImageParser
     new e91ba0c  TIKA-877 - fix extraction for OLE-attachements in TikaCli
     new e488767  TIKA-882 - ignore incorrect part references in OOXML Extractor
     new 49613a1  TikaResource: remove obsolete Parser.parse() implementation
     new 8618f64  TikaResource: force UTF-8 output
     new 2c25c2a  TikaResource: improve exception logging/processing
     new 830f702  TikaResource: extract anonymous class
     new 44d14a5  New rewritten UnpackerResource for TIKA-593:
     new 6d5a229  tika-server: configure surefire plugin
     new 6153280  TIKA-883 - Extract embedded images in PPT
     new fcfdec7  - update template for 1.1
     new f1a9287  TIKA-593 - enable jax-rs network server module
     new 2e21373  tika-server: remove java.net repository
     new a2bf1f8  tika-server: update java version because jersey-core requires java6
     new 66bdd0b  TIKA-866: Invalid configuration file causes OutOfMemoryException
     new 16c92d0  TIKA-884: Dynamic loading of Parser and Detector services
     new 93ab990  - progress towards TIKA-593: replace Jersey with CXF. Checking in to reduce the need to review patches. Disabled 3 tests for now that aren't passing. Will work with Max to make them pass.
     new 6d7753e  - ignore
     new 27163d3  - set Content-Type field: that's what the test is actually doing. TIKA-593
     new 892f9f9  - TIKA-593: try with 1.5
     new abb7e35  - TIKA-593: improvement: use CXF client for test harnesses, remove all extraneous pom.xml dependencies and remove dep on commons-httpclient
     new 71772d2  - update and configure logging.properties
     new f906d34  - accept should be */* since by default CXF client sets to an XML accept (yikes), thanks to pramirez for identification, TIKA-593, see: http://cxf.547215.n5.nabble.com/Why-is-the-default-accept-for-WebClient-text-xml-td5013707.html
     new a1e4fed  - TIKA-593: forgot X method
     new 232cc5d  TIKA-886 If we open the OPCPackage from a File on a TikaInputStream, have it tracked (+closed) by the TikaInputStream the same way that ZipContainerDetector opened ones are
     new d442f36  Bump the Apache James Mime4J version from 0.7 to 0.7.2, for recent bugfixes
     new 0b8f59f  TIKA-593: add ExceptionMapper for TikaException for prodivers list (this fixes test415)
     new 46f295f  TIKA-593 - trying to fix .jar build
     new 7b42120  - TIKA-593: remove FIXME and uncomment @Test, per max's comments.
     new 4f5c6b5  - TIKA-593 note.
     new a9e1304  TIKA-700 Upgrade to POI 3.8 Final
     new 5caebac  TIKA-593: share/bundle plugin configuration
     new b63d7e6  TIKA-593: fix java5 compatibility
     new 1c777da  TIKA-890 Sample APK file, along with sample EAR and WAR files (related)
     new 83cee2a  TIKA-890 Update the APK mimetype entry to mark it as JAR derived, and add entries for WAR and EAR (also JAR derived)
     new d8e91b6  TIKA-890 Container Aware detection of JAR derived types such as WAR, EAR and APK, with tests
     new f433242  TIKA-896: OSGi deployment without declarative services
     new 07e105b  TIKA-896: OSGi deployment without declarative services
     new 0eaf0f5  TIKA-896: OSGi deployment without declarative services
     new 67f1be5  Ignore Eclipse project settings and other hidden files.
     new 1f5a278  tika-server/pom.xml: add svn:eol-style, remove duplicate license header
     new aeeb366  TIKA-897 Detect XML files that start with the UTF-8 BOM, plus test
     new d53f02f  - apply patch from TIKA-901: Provide version number in tika-server contributed by Ingo Renner
     new 35539ab  TIKA-861 Patch from Ryan Quam to enable extracting PDF Links. (Links are extracted for now at the end of the page, further work will be needed to match them to the text they apply to)
     new 0f729e6  - disable until shade plugin is fixed.
     new 27fbb0a  TIKA-903 Avoid breaking on Password Protected iWorks files. We can't parse them yet though, as we don't know how the encryption works
     new aa901df  TIKA-906 Support extracting Headers, Footers and Footnotes in iWorks Pages files. As part of this, make the parser a little more aware of where in the file it is, and start tracking some of the earlier parts of the file ready for when we hit the main text
     new 7d8b3ea  Magic for PCKS7 in PEM format, and DER format (probably...)
     new c66a8ba  TIKA-876 Slight PKCS7 der magic tweak
     new c98a421  TIKA-907 Comments in iWorks Pages files
     new 7ef3548  TIKA-852 Upgrade the MP4 parser to 1.0 RC1, which allows us to enable the MP4 unit test (patch from Sebastian Annies)
     new de70921  TIKA-858 Patch from Craig Stires to add support for parsing IPTC ANPA News Wire Feeds
     new f010744  TIKA-858 Fix Java 1.6isms
     new 9543dfa  TIKA-858 Fix Java 1.6isms
     new 0204adc  TIKA-858 Fix Java 1.6isms
     new 824a2b4  Add a .gitignore file for people using the git mirrors
     new 9b1fcfd  Add Adobe AfterEffects mimetypes, fix up the Adobe Premier detection, and give .AEP to AfterEffects as it seems much more common now than AudioGraph
     new fdec3f3  remove stale nocommit
     new b90d72d  Patch from Ray Gauss II from TIKA-915 - add a disabled unit and a small sample file for the geo rounding problem
     new e2f1ef5  Whoops, properly disable the test for TIKA-915 this time...
     new 28ae612  TIKA-913 Mime Magic for PE, PE32 and PE64 executables
     new e59e664  TIKA-915 related - add mime magic for the elf format too, based on the mimetypes in the httpd magic file
     new 47a66ff  TIKA-917 A few sample files for Linux-ELF, and a PE32 one, plus the C file
     new cd6533b  TIKA-917 Start on a parser for PE and ELF executables, to output metadata
     new a42f88a  add iWork test case
     new 4a06417  remove leftover sop
     new 22e9376  TIKA-917 Pull the property definitions out to their own class, add more machine types, and define the platform
     new 291e167  TIKA-917 Expand platform and architecture parsing
     new 385c993  TIKA-925 - Patch from Ray Gauss to start on improving how the common metadata is stored/fetched
     new 5c32a67  TIKA-917 Some more sample elf files
     new c0d60f0  TIKA-926 Patch from Ray Gauss - Data Typed Metadata.set(...) Value Methods Should Call Metadata.set(Property...)
     new 9e79399  Update JavaDoc following TIKA-864 change
     new 49229f7  TIKA-927 - Patch from Ray Gauss to support Composite Properties (useful for backwards compatibility, and mapping between application and core properties)
     new 912357d  TIKA-917 Get the elf OS, if that bit of the header is set (but it often gets left as null....)
     new 00a14bb  TIKA-916 Correctly bail out early for .xps and .thmx files, which are an unsupported variant of PPTX, plus tests
     new 5796aa6  TIKA-928 Patch from Ray Gauss (plus extra JavaDocs) - start to define the set of common consistent metadata that all parsers will try to provide, no matter what their individual file format may term things
     new 08ef9c1  TIKA-928 Include the Geographic details to the common set of properties, and group slightly
     new 7488dcc  Add some simple JavaDoc descriptions of the property types, to help people who don't natively speak xmp! (TIKA-926 related)
     new ffe9d51  TIKA-926 Patch from Ray Gauss to allow set(Property,String[]) and add(Property,String), to mirror the string key based methods but with type safety
     new 362aa5e  TIKA-876 Another pkcs7 magic pattern
     new ce9ffb3  TIKA-929 Start to replace the old non-prefixed, largely non-property MSOffice metadata definitions with new style ones
     new a8755b0  TIKA-929 Bring some of the key parts of the Office metadata into TikaCoreProperties, with composites to support the previous (now deprecated) ones in MSOffice
     new d22c2ea  TIKA-929 Update the ODF Parser to use the new style Office properties
     new e2c5e08  TIKA-929 Use the prefered constant rather than the IPTC imported one
     new 9b94789  TIKA-928 Patch from Ray Gauss to improve metadata properties setting/getting
     new 34dff7b  Make the composite test more explicit in what it does, fix up some deprecated warnings, and fix the typed getters for composites
     new 70c85b4  Fix setter to be by property not name for add(Property)
     new e269e9c  TIKA-928 Fix up the DWG parser and tests to use the new style properties
     new 89dc4d7  TIKA-928 Epic patch from Ray Gauss - Update parsers and unit tests to use the new style TikaCoreProperties for setting (which supports aliasing for backwards compatibility), rather than old string based ones
     new a63b299  TIKA-842 Patch from Ray Gauss to split out the Photoshop and XMP Rights namespaces, and updates IPTC to use the new DublinCore properties (plus fix inconsistent indents)
     new a0e2a5e  TIKA-929 Bring across MSOffice.AUTHOR in the same way as initial and last authors
     new 923ef5d  TIKA-929 Fix up parsers to use the new style TikaCoreProperties.AUTHOR, along with fixing a few other deprecated bits in the process
     new c508447  TIKA-929 Ensure backwards compatibility on the Office document statistics
     new 32033de  TIKA-842 Patch from Ray Gauss to tidy up a few property names
     new 4a01b48  TIKA-903, TIKA-906, TIKA-907: add some CHANGES.txt entries
     new edf26d4  TIKA-923: add test case
     new 5eb3dc1  TIKA-910: fix text in Keynote text boxes and bullet points to not run together
     new fb7f940  TIKA-904: handle iWork Pages documents created in layout mode
     new 527f3d7  TIKA-924: extract table names from iWork Numbers docs
     new ddf2c2a  TIKA-931: Tika's PDFParser fails to parse documents embedded in a PDF Package
     new 2e6b1bc  PDFBOX-1320: fix NPE when visiting embedded files
     new 8fd2a52  TIKA-923: extract items from Keynote master slides too
     new 914cafd  - fix for TIKA-935 TikaException thrown when trying to parse archive (*.ar) files contributed by Josh Mastarone
     new 6e8f9ea  TIKA-939 Another WMV codec to look out for, to specialise an ASF to WMV
     new ffe67b3  Fix the case of the .ar files in the unit tests (TIKA-935) - case must match that stored in SVN or tests will fail on case-sensitive file systems
     new e58890a  TIKA-940 Sample 7zip (7z) file, based on the zip example
     new 46f742f  TIKA-940 Mime Magic and unit test for 7zip
     new 12c4ad2  TIKA-935: TikaException thrown when trying to parse archive (*.ar) files
     new 689a717  TIKA-932: Upgrade to Commons Compress 1.4.1
     new b279121  TIKA-932: Upgrade to Commons Compress 1.4.1
     new 1dfc65f  TIKA-932: Upgrade to Commons Compress 1.4.1
     new f15a7fc  TIKA-941: Detecting KML / KMZ files
     new e99be72  TIKA-941: Detecting KML / KMZ files
     new 328b935  TIKA-929: Consistent, namespaced definitions for office file related metadata
     new 14113de  TIKA-943: Add parameter to tika-app to supply password for decryption
     new 4c58d09  TIKA-934: Tika in server mode stops responding and reports NPE over and over in logs
     new bb535a0  TIKA-876: Signed pdf parsing
     new 1d1f292  TIKA-908: Adding XMP specification part one namespaces and properties
     new 19a8688  TIKA-908: Adding XMP specification part one namespaces and properties
     new c181ed9  TIKA-900: Tika fails to detect ISO9660 disk images
     new 3725dbf  TIKA-747: Ogg Vorbis and FLAC Parsers
     new 78b1b5a  TIKA-810: Upgrade to PDFbox 1.7.0 as available
     new d22a1fd  TIKA-747: Ogg Vorbis and FLAC Parsers
     new 3ffc69f  TIKA-847: Add regular expression support to the MagicDetector
     new 833b5e1  TIKA-832: ForkParser is unfriendly to code that prints things to its output
     new 411e942  TIKA-941 Sample KML and KMZ files, KML sample file from Google from the file format documentation
     new 5c51b48  TIKA-941 Mark KMZ as being Zip based, so data only detection works properly
     new 9e38af6  TIKA-941 KML/KMZ detection unit tests
     new f502178  TIKA-788 Some DWG files have an implausable header offset. Avoid problems and just skip over them, pending a better understanding of the file format
     new 5c29442  TIKA-863 Avoid creating a new AutoDetectParser (and implicit TikaConfig) for each part in a RFC822 message. Instead, check for one on the ParseContext, otherwise cache the TikaConfig for the lifetime of the message being parsed
     new fad14f3  TIKA-593: Tika network server
     new 89d98a2  TIKA-593: Tika network server
     new 59e93d7  TIKA-593: Tika network server
     new 1051da0  TIKA-773: .NET version of Tika
     new b8fe97e  Added rgauss as developer to tika-parent/pom.xml First commit
     new e154de9  TIKA-773: .NET version of Tika
     new ce112b2  TIKA-773: .NET version of Tika
     new aa6c45e  TIKA-756: XMP output from Tika CLI
     new a63b9b8  TIKA-756: XMP output from Tika CLI
     new 87f40d2  TIKA-756: XMP output from Tika CLI
     new 6bc6c88  TIKA-756: XMP output from Tika CLI
     new dfd8dc5  TIKA-756: XMP output from Tika CLI
     new 23eb923  TIKA-947: AbstractMetadataHandler addMetadata Does not Check Property.isMultiValuePermitted    - Added check for isMultiValuePermitted, if false call metadata.set instead of metadata.add
     new 75585bd  TIKA-773: .NET version of Tika
     new 63c2c69  TIKA-756: XMP output from Tika CLI
     new 352210b  TIKA-773: .NET version of Tika
     new 07a1590  TIKA-930: Consolidation of Some Tika Core Properties    - Added the Dublin Core Terms namespace and prefix
     new 196a61f  TIKA-773: .NET version of Tika
     new 0b9fc00  TIKA-756: XMP output from Tika CLI
     new 5696d7b  TIKA-756: XMP output from Tika CLI
     new 50f56b6  TIKA-949 Mimetype entries for some zip-based process/mapping formats
     new 843569a  Test file from TIKA-948
     new 1543cb0  TIKA-948 There is more than one way to embed things in OLE2, so add subtypes for both
     new 218352a  TIKA-948 Start to be able to correctly detect differnt things embedded in CompObj, such as PDF files, and also to be able to extract the contents
     new 03fd5d5  Fix the extraction test for the file type, and check for one additional file
     new 0ce2171  TIKA-951: Bundle activation policy for Eclipse
     new 270853e  TIKA-951: Bundle activation policy for Eclipse
     new f28f04a  TIKA-948 Look up the file extension for the mimetype detected for embedded resources, and fix unit tests for this
     new 3e2a652  TIKA-948 Add mime magic for ChemDraw .cdx files, then fix the Cli extraction test so it has the correct extension
     new 0a472d2  TIKA-561: Support EMLX file detection
     new 6bc0537  TIKA-322: Improve encoding detection speed and accuracy
     new 8756776  TIKA-322: Improve encoding detection speed and accuracy
     new 929c897  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new 6621297  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new 9ead13f  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new 12a9747  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new 89941a2  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new b8beefd  TIKA-471: Avoid Charset name bottleneck when multiple threads are using HtmlParser
     new ddb997a  TIKA-502: Add programming language mime-types
     new c6bcd32  Add an entry for TIKA-948
     new 95a1cf9  TIKA-906: Added basic support for AutoPageNumbers and their formats
     new 7d89a5e  TIKA-431: Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.
     new ee57f95  set svn:eol-style to avoid test failures on Windows
     new 09c6122  TIKA-431: Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.
     new b7ada46  TIKA-431: Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.
     new 2ecd434  TIKA-892: Tika does not use the HTML5 meta charset tag when determining charset
     new dcd7050  Fix for TIKA-945 Upgrade tika-server to CXF 2.6.1
     new 8d7a5b7  Prep for 1.2 RC #1
     new a0444db  [maven-release-plugin] prepare release 1.2
     new e09b77f  [maven-release-plugin] prepare for next development iteration
     new 0f0b041  Cleanup of javadoc
     new 5615ba6  TIKA-957 NTIF mime entry and magic
     new 8630e56  TIKA-811: Upgrade metadatExtractor version for OpenJDK 7 support    - Upgraded metadata-extractor to 2.6.2    - Refactored calls to metadata-extractor library methods and tags for new API    - Simplified use of JpegMetadataReader to use readMetadata method    - Updated TIFF parsing to utilize a temp File since metadata-extractor method accepting InputStream is now deprecated TIKA-915: Image geodata being rounded to integers    - Refactored GeotagHandler to use metadata-e [...]
     new 99ad680  TIKA-915: Image geodata being rounded to integers    - Added decimal formatting to GeotagHandler rather than test since the metadata-extractor is adding false precision
     new efdb467  TIKA-962: Backwards Compatibility for Metadata.LAST_AUTHOR is Broken    - Added tests for backwards compatibility of Metadata.LAST_AUTHOR    - Changed TikaCoreProperties.MODIFIER to be a composite property containing Metadata.LAST_AUTHOR
     new c36e5a0  TIKA-963: Backwards Compatibility for Metadata.DATE is Incorrect    - Added tests for backwards compatibility for Metadata.DATE and Metadata.CREATION_DATE    - Moved Metadata.DATE to be part of the TikaCoreProperties.MODIFIED composite property    - Added setting of Metadata.DATE to PRTParser
     new f58064e  TIKA-963: Backwards Compatibility for Metadata.DATE is Incorrect    - Added a few more tests for backwards compatibility for Metadata.DATE and Metadata.CREATION_DATE
     new 49e18e0  TIKA-906: Added missing licence header in AutoPageNumberUtilsTest.java
     new dfcf3d4  TIKA-965: Text Detection Fails on Mostly Non-ASCII UTF-8 Files    - Added looksLikeUTF8 method to TextStatistics    - Added check to TextDetector.detect for looksLikeUTF8    - Added testTextNonASCIIUTF8 to AutoDetectParserTest and testTextNonASCIIUTF8.txt test resource
     new c70b12f  TIKA-969: TikaException Thrown When Handling Unknown Fields for Some JPEGs    - Added check for null tag description
     new afcde8c  add 1.3 section to CHANGES
     new eda4abd  TIKA-970: Full identification of the JPEG 2000 family of formats
     new 8e23098  TIKA-966: org.apache.tika.Tika missing from tika-bundle-1.2.jar
     new 199368c  TIKA-968: tika-bundle missing org.apache.commons.logging.LogFactory
     new 609efd4  TIKA-956: show where embedded docs occurred when extracting processing Word (.doc) documents
     new 56237e3  TIKA-869: IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name
     new 08d52ec  TIKA-889: XHTMLContentHandler wont emit newline when html element matches ENDLINE set
     new 069305b  TIKA-771: "Hello, World!" in UTF-8/ASCII gets detected as IBM500
     new df53713  TIKA-975: LinkBuilder to optionally collapse anchor whitespace
     new 0f1934c  TIKA-983: HTML parser should add Open Graph meta tag data to Metadata returned by parser
     new 4cdcefd  TIKA-981: also extract from PDF pop-up annotations
     new 0ce025c  TIKA-982: handle Wordpad/RTF docs embedded in Word doc
     new 4f920c7  TIKA-986: don't throw NullPointerException on detached PKCS7 signature
     new bf24bc0  TIKA-918: extract chart name for charts embedded in Numbers documents
     new 8d38e36  TIKA-920: handle multi-valued metadata keys
     new 9e21496  TIKA-989: leave placeholder where embedded document appears in .docx files
     new 6630079  remove system.out.println
     new d3989e6  Add a test Opus audio file (ogg based, should eventually be supportable similar to Vorbis via TIKA-747)
     new a06209b  TIKA-999: extract page, word, character count metadata from RTF docs
     new 22b1235  TIKA-997: leave placeholder at end of slide where embedded document appears in .pptx documents
     new 9c0f4ca  TIKA-999: also extract CREATION_DATE from RTF
     new 8b68138  TIKA-999: fix false test failure
     new a846f7d  TIKA-997: also leave placeholder for embedded images
     new 433f21c  TIKA-1006: don't NPE if style is null
     new 4e09c63  TIKA-1005: also extract text from text boxes in .docx documents
     new 3a76e71  Add test CSS and JS files taken from the Tika website, and use these to add additional detection unit tests for these two formats
     new a64d33c  TIKA-1011: fix NPE when charset isn't recognized in .mhtml files
     new 7b76a7f  TIKA-984: JpegParserTest fails for some locales    - Changed GEO_DECIMAL_FORMAT to simple String GEO_DECIMAL_FORMAT_STRING    - Changed GeotagHandler to create a new, Locale-specific DecimalFormat object using the GEO_DECIMAL_FORMAT_STRING
     new 7d61675  TIKA-984: JpegParserTest fails for some locales    - Changed from DecimalFormatSymbols.getInstance to constructor for Java 5 support
     new 60e5d2c  TIKA-775: Embed Capabilities    - Added an Embedder interface, similar to Parser, which defines getSupportedEmbedTypes and an embed method    - Added a base ExternalEmbedder implementation of the Embedder interface, similar to ExternalParser, which can call a command line executable, the default being sed, to perform embedding    - Added a base ExternalEmbedderTest which 'embeds' lines in a text file then uses a TXTParser to verify the expected embedded metadata exists
     new acdb2b0  TIKA-1015: include rel id in Metadata when parsing embedded documents inside Word (.doc)
     new 6aba74b  TIKA-799: ForkParser does not populate metadata object after completing a parse
     new 71629f0  TIKA-1009: Expose TextDocument in BoilerpipeContentHandler
     new a2ddd8c  TIKA-1019: also leave placeholder for links inside .doc
     new 1d637c8  TIKA-1019: revert for now: the test file is too large
     new 355eba9  TIKA-1022: DWG Custom properties not extracted    - Added testDWG2010_custom_props.dwg    - Added CUSTOM_PROPERTIES_ALT_PADDING_VALUES constant for values found in test file    - Added check for alternate padding values in skipToCustomProperties    - Added testDWG2010CustomPropertiesParser unit test
     new 363ee3b  TIKA-1019: also leave placeholder for links inside .doc
     new a3f156b  TIKA-1024: don't returned naked BOM for MP3 ID3 tag values
     new 963100e  TIKA-1025: leave placeholder where embedded docs appear in .ppt extraction
     new 60ca6ad  TIKA-1026: ServiceLoader should respect OSGi service ranking
     new 02e550a  TIKA-1027: Allow null values when setting metadata
     new 2e9d1ef  TIKA-775: Embed Capabilities
     new d84781a  remove stale comment
     new e10c23b  TIKA-1031: create parent dirs when extracting embedded files
     new 9ec7153  TIKA-1032: dedup relID by slideN_ for embedded files in .pptx
     new 51a7c9c  TIKA-712: extract master text, except for title/body
     new 771e368  TIKA-1035: extract text from PDF bookmarks
     new 12f4084  TIKA-1036: leave placeholders when we extract embedded archive members
     new 1781506  TIKA-1031: TikaCLI doesn't create sub-dirs when extracting Zip files
     new 9db581a  create temp files under tika-app/target for this test
     new ad9512d  TIKA-1036: also set EMBEDDED_RELATIONSHIP_ID in the Metadata when extracting the embedded document
     new 01c02e6  TIKA-1035: move bookmarks before </body>, use <ul>,<li>
     new 95b8975  TIKA-1042: Lotus Notes .eml Files Not Always Detected Properly    - Added testLotusEml.eml which demonstrates the problem (with some info redacted)    - Added testDetectLotusNotesEml method to TestContainerAwareDetector    - Added new match to the message/rfc822 mime-type which looks for X-Notes-Item and Message-ID
     new 7260050  TIKA-1041: Tika 1.2 universalcharset errors
     new 978cc18  Added .xmp extension to application/rdf+xml mime-type for better detection and parsing    - This mime type is indicated in the XMP spec part 3, page 7: http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/cs6/XMPSpecificationPart3.pdf
     new 3e7e7c9  TIKA-990: Mp3Parser extracts wrong number of channels    - Changed AudioFrame to grab the correct bits (7,6)    - Updated Mp3ParserTest for the correct channels in the files
     new ab9f799  TIKA-1044 Fix issue for Word extractors on text that lacks any styling, plus tests based on files from Jonas Wilhelmsson
     new 2bf25c2  TIKA-775: Embed Capabilities    - Removed logging in ExternalEmbedderTest    - Minor formatting changes in ExternalEmbedder for better readability
     new 2052fc9  TIKA-976 Excel95 files should be correctly detected, but as POI HSSF does not support them they should not generate exceptions if you try to parse one
     new 39a63ca  TIKA-725: Empty title element makes Tika-generated HTML documents not open in Chromium    - Added an assert to TikaCLITest which verifies the issue    - Added ExpandedTitleContentHandler    - Changed TikaCLI to use ExpandedTitleContentHandler for html output
     new 1e78515  TIKA-725: Empty title element makes Tika-generated HTML documents not open in Chromium    - Added license header
     new 7e3cc3a  TIKA-1049: Upgrade PDFBox to 1.7.1
     new 71166de  TIKA-1048: add space after each extracted XML element
     new b928475  TIKA-1013: Added ability to check if a mime-type is already registered from Ryan McKinley
     new fb41a40  Tika-1055 patch from Bernhard Berger to add mimetypes for a number of programming languages
     new ed5591c  Remove three duplicated mimetype entries (keeping the one with more information in the definition each time), from Karel Zacek in TIKA-1052, and add a change entry for TIKA-1055
     new 7171855  Patch from Emmanuel Hugonnet from TIKA-1021 - PSD data lengths are even padded
     new e4e1f7d  Add a unit test for HDF4 files, which shows that TIKA-958 was already fixed
     new c8dea65  TIKA-1056: unify ImageMetadataExtractor interface    - Made parseTiff public
     new d1291b6  message/rfc822 pattern from Marco Quaranta from TIKA-1058
     new 908791f  Update CHANGES.txt for 1.3 release date
     new daf615e  [maven-release-plugin] prepare release tika-1.3
     new 320617c  Revert failed release:prepare command
     new 7a0048c  [maven-release-plugin] prepare release tika-1.3
     new 527e4de  [maven-release-plugin] prepare for next development iteration
     new 03f7625  TIKA-1060: Degrade gracefully when juniversalchardet not present
     new 3df2f59  TIKA-1062: parse lists from RTF documents
     new f5bcf3b  TIKA-852: Quicktime / MP4 Metadata Parser
     new 562dd39  Apply patch from Raimund Merkert and Chris Mattmann for TIKA-1047: Provide a JAX-RS to detect only mediatype.
     new 1b7871d  Apply patch from Raimund Merkert and Chris Mattmann for TIKA-1047: Provide a JAX-RS to detect only mediatype.
     new 3e18897  Apply patch from Raimund Merkert and Chris Mattmann for TIKA-1047: Provide a JAX-RS to detect only mediatype.
     new 1489edc  TIKA-1065 Mimetype entries for SAS file types
     new 86a6536  TIKA-1065 SAS subtype and mime magic
     new 452b6e6  TIKA-1076 Upgrade to Apache POI 3.9. Commit disables some HSLF related unit test checks, they need re-enabling along with a fix soon
     new 5016a83  Support tika:link and tika:uti mimetype extensions, along with unit tests. Modified version of the patch from TIKA-1012
     new d45386e  FileMaker Pro mime entry from Marco Quaranta from TIKA-1061
     new 00522e3  TIKA-991 Enable the DURATION property
     new 68e5671  Apply patch from Oliver Heger from TIKA-991 - Re-work MP3 parser to capture audio duration by processing more of the audio frames
     new d25f52d  Add missing license header
     new 956e640  TIKA-1053: upgrade to ASM 4.1
     new a6d2100  comment out @Overrides
     new 36f5051  fix for TIKA-1081 Error in specification of glob pattern for awk files identified by Giuseppe Totaro
     new 88a0702  change Tika to accept Java 1.6 source and write Java 1.6 bytecode
     new 46d0baf  TIKA-1084 Merge image/x-icon (old) with the newer standard image/vnd.microsoft.icon
     new 5456ae2  Patch from Ryan McKinley from TIKA-1083 - Add Link and UTI information for a number of common mimetypes
     new 5e50085  ChangeLog entry for TIKA-1012 and TIKA-1083
     new a6f9d9d  TIKA-1087 PICT mime magic and unit test
     new d23cd38  TIKA-1074: log certain exceptions and continue
     new 6d8ac09  TIKA-1074: catch Exception not Throwable, and restore interrupt bit for InterruptedExc
     new 2b8fdb1  TIKA-1074: remove future proofing for InterruptedException
     new 5a82c4f  patch for TIKA-1090 contributed by Lewis John McGibbney.
     new ef226f4  Patch for TIKA-1096 CompressorParser: Add support for handling concatenated InputStreams contributed by Gregory Canan.
     new 6e9e7cc  Patch for TIKA-1096 CompressorParser: Add support for handling concatenated InputStreams contributed by Gregory Canan.
     new 6a41a3a  Word2 and Word5 mimetype magic, from investigations into TIKA-1092
     new 0e4ccf6  Mimetype entries with magic for the arj and uc2 archive formats TIKA-1099
     new 00ab3e0  TIKA-1104 - Upgraded PDFBox to 1.8.1
     new 5f62566  Patch from Ryan McKinley from TIKA-1014 - Allow custom MimeTypesReader (with tests)
     new 5ffe387  TIKA-1115: ExifHandler throws NullPointerException    - Added check for null datetime    - Added file exhibiting problem datetime field    - Added unit test
     new aa18a79  Patch by Markus Jelsma for TIKA-992 to allow OpenGraph meta tags to have multiple values.
     new c4581fb  New code to help for TIKA-1118, currently disabled pending a POI upgrade
     new ade21f4  TIKA-1123 - Added mimetypes for additional programming languages
     new f50544d  TIKA-1126 - Patch by Ali Mosavian to allow Tika Server to produce text/html output
     new 9d81bdd  Patch for TIKA-1127 provided by Ali Mosavian.
     new 390d8c3  TIKA-1102: detect fragment that starts with <div> or <DIV> as HTML.
     new 14aa5bf  TIKA-1128: replace line tabulation with line break
     new c1dfba2  TIKA-1133: Ability to Allow Empty and Duplicate Tika Values for XML Elements    - Added constructors in ElementMetadataHandler to specify allowing duplicates and empty values    - Added a unit test and test data which confirms the default and override behaviors
     new 7b30eb8  Fixed previous whitespace issues in separate commit for better readability of diffs.
     new ec32a29  TIKA-1135: Incorrect Cardinality and Case in IPTC Metadata Definition    - Fixes to cardinality    - Fixes to key name case to match specification
     new 281bcf5  TIKA-1130: .docx text extract leaves out some portions of text    - Added test file    - Added disabled unit test
     new d546aa1  Fix for TIKA-1129: Test HTML file has poorly chosen GPL text in it
     new ca9b4b7  Fix for TIKA-1129: Test HTML file has poorly chosen GPL text in it
     new 7c2ae59  Prep for 1.4 RC #1.
     new 7ca70eb  [maven-release-plugin] prepare release 1.4
     new 72c7c50  [maven-release-plugin] prepare for next development iteration
     new 8b82be8  TIKA-1128: normalize newlines before assert
     new 21a6eba  Updated patch for TIKA-991 contributed by Oliver Heger
     new 0df62cc  Test file from Paul Brinich from TIKA-1136
     new 76cfbc1  Mimetype, Zip container detector and unit test for the Apple IPA format. Original logic from Paul Brinich from TIKA-1136
     new 32431aa  The Office Parser has a default password it can use, so if the PasswordProvider can't provide one (i.e. if it returns null for the password), keep going with the default rather than passing a null password through to POI (which doesn't like that)
     new 2db3812  Patch from Dietmar Glachs from TIKA-1070 - avoid stackoverflow in ToXMLContentHandler by resetting the parent state after the end of an element
     new 5dd29c4  Patch from Daniel Bonniot from TIKA-1109 - Fetch OOXML metadata earlier, to tidy code and make it available if required during parsing
     new 8f492eb  TIKA-1130 Upgrade to POI 3.10 beta 1
     new 970bafe  Patch from Tim Allison from TIKA-1130 - Extract from .docx SDT runs as well
     new 066f587  Helper class for unit testing TIKA-1145 - Test classloader that logs the resources it loads
     new c7d7f66  Fix TIKA-1145 - If a specific ClassLoader was given to TikaConfig, have that used for loading the mimetypes too
     new a852c4f  TIKA-1147: File-Based TikaInputStreams are Deleted by ExternalEmbedder.embed    - Restructured tests to be able to accept different input streams    - Added test for passing in a TikaInputStream    - Changed ExternalEmbedder to close the input stream rather than delete its file
     new 22fc879  TIKA-1146 Support for case-insensitive string matching on magic patterns (for ASCII text only - works at a byte level). Also adds more magic detection tests covering several of the string formats
     new c743609  Patch from Kai-Uwe Schmidt from TIKA-1146 - Handle rfc822 message detection with unusual (but standards ok) cases of the header strings, with test
     new 18a89e9  TIKA-1156 AMR glob, subtypes, magic and unit test
     new 042ec60  TIKA-1156 AMR-WB mime magic and unit test
     new 9ec2aef  Tika 1139 update to 1129
     new ba80052  TIKA-1124, process attachments within an embedded PDF
     new 7f080de  Tika 1124 not 1142...sorry
     new 34f7c18  TIKA-1159 Mime Magic for Adobe InDesign from Kabron Kline, plus sample file and unit test
     new 62d5ad3  TIKA 1001 more flexible html meta-header encoding detector
     new 51531a1  TIKA-1153 upgrade PDFBox to 1.8.2
     new 2d61e41  SolidWorks mimetype entry from  gunter rombauts from TIKA-1160
     new a7e0353  TIKA-961: No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)
     new 1e8e3c8  TIKA-1166: FLVParser NullPointerException    - Added check for null entry value
     new 26b2e60  Test OSX jnilib file from Apache ActiveMQ 5.8.0 (TIKA-1169)
     new 5ce1b58  Mimetype for jnilib files, which share some magic with Java classes but are actually native OSX code, plus test (TIKA-1169)
     new 4c6313f  TIKA-1170: Insufficiently specific magic for binary image/cgm files    - Applied patch from Andrew Jackson which…    - Added additional matches to image/cgm magic    - Added example cgm file    - Added test of image/cgm to MimeDetectionTest
     new b2fcf77  TIKA-1170: Insufficiently specific magic for binary image/cgm files    - Fix for incorrect application of patch    - Additional test and resource from Andrew Jackson for false positive cgm matches on malformed HTML files
     new 009a45a  bumped poi to 3.10-beta2
     new 2cf36cc  tika-1100 textboxes in xlsx; modified XSSFExcelExtractorDecorator and added test in OOXMLParserTest
     new e9c0166  TIKA-792 fixed by POI-3.10-beta2; added test for missing ooxml bean
     new ed60a56  commented out TIKA-792 test for now
     new 145bf2f  second attempt to add test for detecting missing ooxml bean. Builds successfully locally.  Jenkins failed last time.  Stack traces didn't point to this test; but redirecting stderr may be the culprit.
     new 146dcda  TIKA-1076 extract text from tables in ppt.
     new 2ac245f  TIKA-817 -- autodates in ppt and pptx. Already fixed by TIKA-805. Added files and tests to confirm behavior specifiedin POI-52367 and POI-52368
     new 20ab62d  TIKA-1171 -- extra asterisks from master slide in PPT; added tests to TIKA-712 test files to show 1171 was fixed.  Borrowed extraction code from POI PowerPointExtractor
     new 29e9b15  updated CHANGES.txt to cover recent activity
     new aaa547f  added 1130 to CHANGES.txt
     new 5e320cd  TIKA-1177: Add Matroska (mkv, mka) format detection    - Added Matroska video and audio mime-types and extensions    - Added WebM video mime-type    - Added mkv and webm test files (converted from existing testFLV.flv)    - Added name detection unit tests
     new d80fcf1  TIKA-1177 Add a common parent type for the Matroska container, then add data+name tests for WebM and Matroska Video which use them. (For full detection, we need TIKA-1180)
     new 38b1509  When the Tika App is in Server Mode, wrap the raw Socket InputStream as a TikaInputStream so that detectors can use mark/reset on it (TIKA-1183)
     new c183694  Include a workaround for PDFBOX-1749 by trying to use AWT (if we have a TikaInputStream) to check the TTF is valid. Should mostly solve TIKA-1182 until the upstream fix is done
     new ac538c9  TIKA-1188 mpx mimetype
     new 23a7f5c  TIKA-1188 Other parsers may be working on OLE2 files, and hence may want to be able to call the common OLE2 metadata extraction code, so make it public not package
     new dfee810  Pull the Date -> ISO8601 logic out of Metadata to a common utils class, so that other bits of Tika (eg TIKA-1188) can use it
     new 9227e9f  TIKA-817: (PPT/PPTX) Missing date/time in text content
     new ceccd0b  TIKA-1192: RTF: fix AIOOBE in handling list override
     new 80f4692  TIKA-1200 upgrade pdfbox to 1.8.3
     new 0332d28  TIKA-1201 enable parameter for NonSequentialPDFParser
     new 6f7177f  [TIKA-1196] Adding a host option currently set to localhost by default, thanks to Rian Stockbower for driving the discussion and patches, also doing few minor updates to minimise a number of warnings
     new 75c8f1f  [TIKA-1197] Switching to CXF 2.7.8
     new f708b84  [TIKA-1198] Support for multipart payloads
     new 8a04063  TIKA-1202 added PDFParserConfig and refactored PDFParserTest and TikaTest to reduce boilerplate
     new 4a59ade  TIKA-1202 -- small bug in using default or context config; added in-memory option for nonsequential parser; added more constraints to tests
     new dcbe0d2  TIKA-973 added basic extraction of pdf AcroForm content.  Many thanks to Ben Litchfield for org.apache.pdfbox.examples.fdf.PrintFields, on which this patch relies.
     new ffc3b3c  TIKA-973 reopened. Would prefer test docs unequivocally consistent with Apache License 2.0.  Deleted initial test docs from trunk and commented out test case.  Also added extractAcroFormContent to parameter file (should have been done in initial check in).
     new 2a36fcd  TIKA-1209: Upgrade Tika tests to JUnit 4.X
     new 30523c0  JIRA-1211: OpenDocument (ODF) parser produces multiple startDocument() events
     new 9385906  TIKA-1110: Incorrect declared SUPPORTED_TYPES in ChmParser
     new 0eb87d7  TIKA-1110: Incorrect declared SUPPORTED_TYPES in ChmParser
     new 8d794b5  TIKA-1110: Incorrect declared SUPPORTED_TYPES in ChmParser
     new 3dec312  TIKA-1152: Process loops infinitely on parsing of a CHM file
     new 84cc8dc  TIKA-1210: Address tika-parsers o.a.t.mime.TestMimeTypes TODO: Need a test flash file
     new 2608c79  TIKA-672: Proper error handling in the CHM parser
     new bc8f90a  TIKA-672: Proper error handling in the CHM parser
     new 2942530  TIKA-672: Proper error handling in the CHM parser
     new f1c5c69  TIKA-1193: Allow access to HtmlParser's HtmlSchema
     new f570a3b  TIKA-1160: Add support for SolidWorks files
     new ad853f4  TIKA-1086: Added import for org.w3c.dom in tika-bundle
     new 8fd76a4  TIKA-820: Added setDocumentLocator delegate call in TextContentHandler
     new c44e747  TIKA-1217: Integrate with Java-7 FileTypeDetector API
     new 94ff74c  TIKA-1078: TikaCLI escapes invalid filename characters as hex codes
     new 57583e8  Adding a server profile to Tika Server
     new 1d07545  misc: remove version of junit on dependency
     new e085bf2  [TIKA-1198] Updating JAX-RS server to accept multipart/form-data payloads at a dedicated path
     new 7d3194a  TIKA-1226: PDF TextStripper fails when it encounters PDSignature Field.
     new 33280e0  TIKA-1226, removed println...doh.
     new da992b2  Updating CHANGES.txt
     new 78a7ceb  TIKA-1224 - Adding a generic SourceCode parser for Java, Groovy, C++ with HTML render
     new 1ca0efd  TIKA-1228: Look for attachments under Kids node if embeddedFiles.getNames() returns null
     new 77d57d5  TIKA-1230: update PDFBox to v1.8.4 and updated CHANGES.txt
     new 17ef0de  Add the text file from TIKA-1229, and stub out a unit test for it
     new 25ffe53  Updated KEYS with revocation details
     new 2aaec12  TIKA-1229 - Parse Word Headers and Footers as proper ranges, not simple text strings, and thus be able to handle hyperlinks etc in them
     new 0605d6c  Updated KEYS with new code signing key
     new eac3711  Updated CHANGE.txt for 1.5 release
     new f607a4d  [maven-release-plugin] prepare release 1.5-rc1
     new 6f84307  [maven-release-plugin] prepare for next development iteration
     new 8282ef2  prepare for next development iteration
     new e327976  prepare for next development iteration
     new 0e3c51e  temporary fix to TIKA-1233. Added extra catch clause to catch PDFBOX-1803 related StringIndexOutOfBoundsException.  When PDFBOX-1803 is fixed, we should be able to remove these catches
     new c8ab7c4  TIKA-1237 upgrade to poi-3.10-FINAL
     new 2ae3e90  TIKA-1223 - Extract thumbnail of OOXML Office as attachment
     new 9f869d2  got rid of brittle requirement for specific number of pdfs to be tested in PDFParserTest
     new 86d1aed  Test 7zip file, based on the other test archive files, for TIKA-1243
     new 8316bb2  TIKA-1243 - Upgrade to Commons Compress 1.7, and add a disabled unit test for 7z support. 7z support is not enabled yet, pending a commons compress fix
     new aae0f4e  Remove debug line accidently committed
     new ffb52a3  Create a spanned zip test file for TIKA-1241
     new d188a62  TIKA-1241 Mime Magic for empty and spanned zip files, plus spanned zip file detection unit test
     new eed505e  Patch from TIKA-1225 from Marco Quaranta - MDI mime magic
     new a360ec5  TIKA-1248: handle empty/null declaredEncoding with call to CharsetDetector.getReader
     new b6f8f9e  TIKA-1249 vcard mime magic from Marco Quaranta
     new 91ec256  Switch to use FileZip instead of ZipArchiveInputStream on UnpackerResourceTest to fix build.
     new 7b2c4e8  Add more test of OfficeParser
     new 98c6957  TIKA-623 - Integrate PST Parser with java-libpst-0.7
     new beff394  TIKA-1257 - Filter control characters in output
     new 416b57e  TIKA-1257 - Add missing test doc
     new 099c53c  TIKA-1232: add fine-grained pdf version extraction
     new 3cfe5ad  TIKA-1252: handle multiple authors in PDF xmp metadata
     new cb6c9b8  TIKA-1252 small clean up
     new bb4b8f1  clean up whitespace in PDFParser components
     new a504a1e  cleanup whitespace in OutlookPSTParser
     new bbf9de4  Add a mimetype entry for Ogg Opus audio files
     new 38974e6  TIKA-1023 Upgrade the Ogg parser to 0.3, which fixes a maven issue, and adds Ogg Opus support
     new e503b24  TIKA-1243 Upgrade to Commons Compress 1.8
     new 5e9a8a0  TIKA-1243 Fix the test 7z file to match the order of the tar one, to simplify testing
     new a8e46e9  TIKA-1243 Add 7z support now that we have upgraded to Commons Compress 1.8, but it is a little nasty until COMPRESS-269 is resolved
     new 310b0bb  TIKA-241 Test rar archive, based on the same contents as our other test-document archive files
     new 52ca7b3  TIKA-1259 Dedicated mimetype entry for Ogg Speex
     new f81efb5  TIKA-1259 Some more Ogg based mime entries, for FLAC-in-Ogg, Ogg Theora and OGM video
     new ccc0770  TIKA-1259 Add Ogg mimetypes for the various uncompressed formats which can be stored in Ogg
     new 41c6749  TIKA-1259 More Ogg mimetypes and fix-ups - Add dirac, tweak the default theora type, disable the FLAC-in-Ogg test pending an update, and add a common parent to all Ogg Audio types
     new 348c92f  TIKA-1259 Complete adding the well known Ogg related mimetypes
     new d145ee6  Upgrade the Ogg plugin to 0.4, re-enable the FLAC test, and add an Opus one. Solves TIKA-1259 and TIKA-1113
     new 2c34c60  Changes update for recent Ogg updates
     new ef61057  TIKA-1263 An additional Atom feed xml namespace, and Atom parsing+detection unit tests
     new 5b743bd  TIKA-1151: Maven Build Should Automatically Produce test-jar Artifacts    - Added maven-jar-plugin to relevant poms
     new cc3c35e  Add my signature to David's GPG key, as used in the 1.5 release
     new 39e87f3  TIKA-1264 Updated Outlook PST mimetype from Luis Filipe Nassif
     new 556ee2a  TIKA-1196 Options keys must be unique, so leave host with -h and push help to -?
     new fe90fb4  TIKA-1244 - Extract mails as attached elements. Integrate code of Luis Filipe Nassif
     new fb1d4b6  Reput again failed test causing by compress-1.7
     new 8a7965a  TIKA-623 - Extract each mail as attachment
     new 2b7cd2e  TIKA - 623 - Update CHANGES.txt
     new 626625c  TIKA-1268: Extract images from PDF documents
     new 398f50f  TIKA-1271: trivial refactoring of classes useful for testing embedded document handling
     new c4efd41  TIKA-1270 Prep for new endpoints by having the existing ones use a common TikaConfig object
     new 0d2fe29  TIKA-1270 Prep for new endpoints - refactor server unit tests to reduce duplication
     new 3d60db6  TIKA-1270 Prep for new endpoints - fix eclipse warnings
     new 026aad2  If a mimetype is handeld by a composite parser, report the underlying parser against the type (TIKA-1270)
     new 702b819  TIKA-1270 Start on support for reporting the mimetypes that are known, still partly WIP
     new 3b1050e  The custom assert "assertContains" can be static, to allow use from elsewhere in the codebase
     new 55310ec  TIKA-1010 extract embedded documents from RTF
     new b17a97e  TIKA-1270 Move to a common set of logic to decide what to display, so the output type bit just deals with formatting it only, and add a browser friendly html view too
     new 2ddb759  TIKA-936: encoding of ZipArchiveInputStream
     new 3468cb2  TIKA-1277: Magic bytes from Wikipedia
     new a570bfb  [TIKA-1279] Missing return lines at output of SourceCodeParser
     new 5d29d84  [TIKA-1276] Add missing embedded dependencies in tika-bundle from patch of [rwesten]
     new e0db416  TIKA-1278: Expose PDF Avg Char and Spacing Tolerance Config Params    - Added averageCharTolerance and spacingTolerance fields to PDFParserConfig    - Moved configuration of PDF2XHTML params from PDF2XHTML.process to new PDFParserConfig.configure method
     new 3a9afb5  TIKA-1279 - Use System.getProperty() compatible with Java6 + tests
     new 0a32931  TIKA-1279 trivial fix caps in testJAVA.java in test cases so that tests pass in *nix
     new b4c8568  TIKA-1280 GZip now has an official mimetype
     new 01726d3  TIKA-1281 Add an alias application/x-xml for the XML mimetype (canonical application/xml)
     new f23511c  TIKA-1270 Provide a Tika Server endpoint to report on Detectors, modelled on the Tika App --list-detectors method
     new 5954a1d  TIKA-1270 Unit test for the Detectors server endpoint
     new 61617ab  TIKA-1270 Provide a slightly better way to handle the user-facing HTML output, we may well want to replace it with something either better or more JAXRS like shortly!
     new 1e2af4e  TIKA-1269 Stub out a human readable welcome page for the Tika Server
     new 306cf13  TIKA-1269 Have the human-facing welcome page tell you roughly what the different endpoints are
     new dd20615  Add a note for why TIKA-1269 is not yet working properly
     new 1428d63  TIKA-1290 - Upgrade to PDFBOX 1.8.5
     new 7d524cb  TIKA-1269 Sort welcome output, and add test coverage of it
     new fafcf75  Patch from Annie Bryant Burgess contributed to address TIKA-1265.
     new 90f5d76  Changes record for TIKA-1265.
     new f1aee6f  TIKA-1270 WIP parser details endpoint, similar to --list-parsers and --list-parser-details from the Tika CLI
     new 14991ad  TIKA-1270 Complete parser details endpoint, and tests
     new c9530d0  Changelog update
     new 5c3070c  TIKA-1175 Magic for MS-Money files from Boris Naguet
     new e652afa  Some more KML namespace definitions, from Marco Quaranta from TIKA-941
     new 54afc7a  Mimetype for the OPC based DWFX format, and detector support for it. TIKA-1204
     new 359aae5  TIKA-1221 Add Zip Container Detector support for the OPC-based XPS format, along with more mimetype details of it and a unit test
     new d02d033  TIKA-1259 One more Ogg mimetype
     new be6e237  Bump Vorbis Java version to 0.6, to solve TIKA-1112
     new 969804d  TIKA-1282 Some more common gzip aliases
     new 6b7db99  TIKA-1233: removed catch blocks after upgrade to PDFBOX-1.8.5; see PDFBOX-1803
     new f033bcf  TIKA-1231: added more null checks after underlying fix was made in PDFBox-1.8.5
     new 821e197  Ignore a test until TIKA-1298 is fixed
     new c2da98e  temporary bug fix until TIKA-1295 is resolved
     new 3d146b4  test doc actually added for r1594957 temporary bug fix until TIKA-1295 is resolved
     new 2f4fbc6  More OSX Mach-O file magic, for TIKA-1169, from Matthias Kruegger. This closes #8 github pull request
     new ef71795  TIKA-1292 Sample (Apache v2 Licensed) Jar with HTML in it
     new ee8bc66  Allow the DefautlDetector unit test to also check how MimeTypes would detect this, and add a commented-out check that uses that for TIKA-1292 (currently failing)
     new aacc74d  [TIKA-1272] Removing redundant Tika server properties file, patch from Lewis John McGibbney applied
     new 2d51ad2  Add some notes on entries, to help people maintaining the file know what to do, related to TIKA-1292
     new 9eaa4fc  Container formats with specific, low-false-positive magic matches need a slightly higher priority, so that they don't accidently end up being matched based on the contents of the container near the start of the file. Partly solves TIKA-1292. This closes #6 github pull request
     new 8e70572  Add a disabled unit test for TIKA-1292, which when working will ensure that if we have two matching magics at the same priority, the name is used to specialise if possible, first defined if not
     new 3902226  Set an explicit priority on the OLE2 match, remove two MS Word matches which were OLE2 ones in disguise, and add an intermediate staroffice parent on the staroffice types. Helps with TIKA-1292 testing
     new f689fa7  TIKA-1292 If there is more than one mime magic which matches at the highest priority, keep track and then try to pick based on filename or type hint later
     new 8c0119b  add license header to RTFObjDataParser and clean up whitespace in RTFEmbObjHandler
     new 0e7ea0a  TIKA-1294 add ability to turn off image extraction from PDFs
     new 0b57b5c  TIKA-1291/TIKA-1310 fix bug in JSON output from CLI
     new 08a4029  fix to TIKA-1294, uppercase enum
     new 051bf86  TIKA-1312 and TIKA-1313 - FDF and XSL-FO mimetypes from Marco Quaranta
     new d0ced96  TIKA-1305: make RTF list handling slightly more robust against corrupt list metadata
     new 6d7a36b  TIKA-1305: test file added to svn...argh.
     new 65748ed  Patch from Lewis from TIKA-1258 - Bump the NetCDF dependency from 4.2 to 4.2.20
     new 4fa1d2a  TIKA-1311 centralize serialization
     new 8f27da6  contribution for TIKA-1319: Translation module contributed by Tyler Palsulich.
     new 01c9cb5  TIKA-1324 As discussed on the mailing lists, use a common url prefix for the unpacker resources
     new 55d95c5  TIKA-1269 Some endpoints may lack a produces annotation
     new d14fa52  fix for TIKA-1316 identified by Tyler Palsulich.
     new 375fde6  Patch from Matthias Krueger from TIKA-1322 - XMLParser opens a p tag at the start, so always close it (not just on valid files), to avoid triggering the SecureContentHandler depth check on multiple xml errors. This closes #9 from github
     new 841c587  Remove the temporary PDFBox workaround for TIKA-1182, now that we have upgraded to a version with the fix
     new c1a9be0  TIKA-1325 Temp fix - pull out some of the font metadata keys to string constants, and rename the test
     new d70cbe0  TIKA-1326 MSI files are, rather improbably, based on OLE2 documents not Windows PE files. Patch from Luis Filipe Nassif plus test updates
     new 7cd4050  TIKA-1325 Have the TTF parser pull out a little bit more, and have it do so similar to the AFM one does, plus add some TTF tests
     new 68f513f  fix potential null pointer exception in PDFParser; found while working on TIKA-1302
     new 15bc1b3  Provide explicit DateUtil support for formatting Dates in an unknown timezone, matching what TestMetadata checks detail, and allow for setting a Metadata date value from a Calendar. Finally, use this for the TTF dates, to hopefully solve the TIKA-1325 test problem
     new 02b5424  To support OSGi testing, allow for a test class to find out what class names ServiceLoader.loadStaticServiceProviders will try, helps TIKA-1276
     new b97ef32  TIKA-1303 skip bogus second title tag
     new 02afb8b  Start trying to get the Tika OSGi tests to run, by splitting them out by area, explicitly using an OSGi test running, and beginning to check that in-bundle and non-bundle have the same parsers and detectors. Many tests disabled though as broken... TIKA-1276
     new 1e443c4  Patch from Michal Hlavac from TIKA-1258 - Correct Tika Bundle OSGi build list for newer NetCDF
     new d6307b7  Add a tests that ensures that the Tika Bundles are found + started, and clarify a bit why the detectors test should work but isn't
     new d943c10  Comment out the date related tests for TIKA-1325, to avoid problems with timezone matching in dates while we await a PDFBox fix
     new 4a1cbdd  Another dependency for netcdf for TIKA-1258
     new 468698d  TIKA-1325: small workaround until we can integrate PDFBOX-2122. Default timezone is now set and then unset for ttf test in FontParsers test.
     new 1646005  Fix for TIKA-1327: New parser for Matlab .mat files contributed by Annie Burgess.
     new c9ad965  partially revert r1601805: didn't need to change the slf4j dep. extraneous commit.
     new ccaa9a3  Add in the Java code to go along with r1601805: TIKA-1327: Matlab parser from Annie Burgess.
     new c18b83a  Partial revert of r1601805 - reset NetCDF dep to 4.2.20
     new df7b18e  Add a test CSV file, based on the Excel one, and a unit test shows it gets detected correctly. Also mark CSV as explicitly being a child of text/plain, rather than the previous implicit definition, to avoid confusion. TIKA-1335
     new 583af26  - fix for TIKA-1336 This closes #10
     new af6788a  Update docs for TIKA-1335 TIKA-1336.
     new 4357682  fix for TIKA-1337: LanguageProfile for Persian/Farsi contributed by Omid Pourhadi This closes #11.
     new a3c1c9e  fix for TIKA-1338 Converted README to Markdown contributed by Kyle Maxwell (krmaxwell@gmail.com) This closes #1.
     new 43dde08  Fix for TIKA-1339: Upgrade rome dependency to 1.0 contributed by Pradeep Singh (Github user: pksinghus) This closes #2.
     new 8477468  convert README to markdown TIKA-1338.
     new 6a306fb  TIKA-1341: fix double endDocument in PDFParser
     new 4a7cdda  - ignores
     new 7fa9772  - missing license headers
     new 3a50244  TIKA-1352 upgrade to PDFBox 1.8.6
     new 23adca3  TIKA-1352 update CHANGES.txt
     new a9a5c03  TIKA-1353 If a File is available, parse ODF documents with it, so that the metadata can always be processed first
     new 8c0667b  Bump the java-libpst version to 0.8.1 for TIKA-1350. This closes #12 from github
     new 5f27b97  - apply patch for TIKA-1274 ENVI Header parser contributed by Ann Burgess
     new 8dcf13f  - this should be in the translate package
     new 22feb27  - getters and setters.
     new 631711b  - remove extranoues print
     new 59c7e23  - use path prefix to load the properties file.
     new 4a75eef  - fix for TIKA-1362: Add GoogleTranslate implementation of Translation API
     new ba74309  [TIKA-1351] Updating AutoDetect, Composite and PDF parsers to guard against null content handlers
     new 9d189e6  Patch from Tyler Palsulich from TIKA-1327 - More enhancements to the Matlab parser
     new 3aa7738  updated patch for TIKA-1363 from Annie Burgess: enables Mat parser in META-INF and fixes unit test to use AutoDetectParser to validate it.
     new d3d2629  Fix for JRIA issue TIKA-411, generate list list of supported types automatically.
     new 9860b74  Fix for JIRA issue TIKA-1105, pass along ParseContext in CompositeParser.
     new 070b30c  Add check for possible NPE in CompositeParser.
     new 15521f1  Fix for JIRA issue TIKA-1370, adding a CachedTranslator.
     new efd7147  Fix for TIKA-1357: Use BufferedReader to parse ENVI files (from Ann Burgess).
     new adeab55  Fix whitespace in EnviHeaderParser.
     new 8e0c444  Fix for TIKA-1251: RuntimeException with certain word docs (contributed by Vadim Roizman).
     new 904223a  Fix TIKA-411 entry in CHANGES.txt. Was in 1.5 section.
     new 7ce1b27  Remove extraneous print statement from MatParserTest.
     new f086cdf  Remove redundand array initialization.
     new b9e5ee1  Remove more redundant array initialization.
     new 424c0eb  Remove several redundant type casts.
     new 50177e6  Remove caught and immediately rethrown IOException.
     new 628842d  Remove various unused imports.
     new 79211cc  Use foreach loops instead of for/while, when possible.
     new 51377f1  Use String.contains instead of String.indexOf > -1
     new 3b74476  Remove unnecessary boxing and unboxing.
     new d4fc8dd  Remove extraneous print statement from EnviHeaderParser.
     new f1d00db  TIKA-1373 - Send html content to SAX events by using TagSoup
     new 1c7f1bc  Fix potential NPE and fix javadoc refs for PDFParser
     new 4fefb49  Patch from Matthias Krueger from TIKA-1361 - Upgrade MP4Parser to 1.0.2, add a custom Data Source and use that for explicit temp handling. This closes #14 from Github
     new cc4898b  Update imports following TIKA-1361 changes, to match our current preference for explicit (not wildcard) imports
     new 25867f6  I don't know what the android.util package is, nor why we would/wouldn't want it, but without marking it as optional the bundle tests cry...
     new aeebd70  TIKA-1375: decrease memory consumption when extracting images in PDFs
     new e00b441  TIKA-1376: improve embedded file name extraction in PDFParser
     new 5c0089a  TIKA-1374: Try to extract OS-specific embedded files within PDFs
     new 081afb0  Update change log for 1.6 rc1
     new 301460e  [maven-release-plugin] prepare release 1.6
     new 3cbfb59  [maven-release-plugin] prepare for next development iteration
     new c28e3b0  Remove unused src directory for TIKA-1316.
     new b5a860d  Remove old build id-less build config from root pom.xml.
     new cb9cbd5  Remove unused imports and redundant throws declarations in tika-server.
     new b7e764d  Use assertTrue instead of assertEquals(true...
     new ed355dd  Chain together StringBuilder.append calls instead of using String concatenation.
     new 65aea2b  - TIKA-1378: MicrosoftTranslator setClient and setId NPE (thanks to tpalsulich for the review!)
     new 2045326  Partial TIKA-1377 patch from Dan Becker, with changes - add more XMPDM keys (in order), and add ID3v1 stubs for new tags which ID3v1 does not contain (in the same way as others)
     new b488880  Partial TIKA-1377 patch from Dan Becker, with changes - ID3v2 support for more keys, MP3 Parser support to use that, and tests
     new d0dfe79  Partial TIKA-1377 patch from Dan Becker, with changes - Extract more XMPDM data from MP4, with tests
     new 8c1ff56  Include the tool used to create the MP4 in the XMP output, fixes a TODO spotted while working on TIKA-1377
     new 2336b0c  The MP4 parser has extracted the channel count for some time, so enable the test for that
     new 73e942a  TIKA-1381 - Added Lingo24Translator implementation
     new 4689c26  TIKA-1381 - Added Lingo24Translator implementation to CHANGES.txt
     new 9b475d3  Make the Tika CLI Extraction test more robust, with better failure messages
     new 9b57be7  TIKA-1380: staging an updated test file for the actual patch once POI 3.11-beta-1 is released
     new 4153045  added test and test docs for comments in xls and xlsx; lack of tests detected during work on TIKA-1380
     new 2cf27a7  TIKA-1380 Upgrade to Apache POI 3.11 beta 1
     new 85137c5  Found existing comments test in TestParsers; clean up earlier tests for comments in xls and xlsx
     new cf4ee8f  Upgrade the Commons Codec version to match that in Apache POI, upgraded in TIKA-1380
     new 7446992  Update svn:ignore on newer modules to match that on the existing ones
     new 2936d55  Enable the check for TIKA-1118, now that we have upgraded POI
     new be7476e  Convert from assertTrue(contains) to assertContains, to make addressing TODOs and failures much easier, for TIKA-1380
     new 62979fa  Enable the POI fraction test, as it now passes with the latest POI release (TIKA-1380)
     new a13fdac  TIKA-1317 extract contents from SDTs within cells in tables in XWPF (docx) files
     new e07922d  Switch from assertTrue(containts) to assertContains, to give better failure messages, and enable one now-passing test following TIKA-1380
     new bf9d5c4  Another assertTrue(contains) to assertContains change
     new 2624321  Enable more tests / TODOs now that we have upgraded POI with TIKA-1380
     new e27454a  Address the remainder of the test TODOs that we can following the POI upgrade in TIKA-1380
     new e6a6083  Use the tika-parent version of junit via Maven dependency management.
     new 90c7245  TIKA-1275 upgrade commons compress to 1.8.1; updated CHANGES.txt, too
     new 4e1ce48  TIKA-1383: Minor simplification in the way Tika server is set up
     new 3de8dbf  TIKA-1383 Fixing TikaWelcome issues
     new aa2024a  Restore the HTML test of the Tika Welcome page, accidently zapped in the TIKA-1383 changes
     new 080b036  TIKA-1380; fix cases where ole.getLabel() == null for ole attachments
     new 95051f2  [TIKA-1371] Optional registration of new TikaLoggingFilter
     new 3b00450  [TIKA-1371] Minor update to TikaLoggingFilter
     new 6cc0e11  Fix for TIKA-1387 (thanks Uwe Schindler). Adding the Maven forbidden-apis plugin and fixing identified errors.
     new 7e64483  Fix for TIKA-1389, remove wildcard imports from project.
     new e285888  Create tika-example module for TIKA-1390.
     new add88fd  Fix scm links in the example and translate pom.xml files.
     new fa97e9a  Add in Apache license header to tika-example pom.xml.
     new eb9495e  Fix for TIKA-1385, creating an ExternalTranslator. Also creates a MosesTranslator.
     new 12c6b36  Update CHANGES.txt with TIKA-1385 and TIKA-1390 entries.
     new 0832bcd  Disabling the forbidden-apis plugin until TIKA-1387 is resolved.
     new a4cf450  Update 7z related comments
     new 61eccc1  Re-enabling forbidden-apis and working around identified errors in External and Moses Translators.
     new 50dcb7b  For places formatting numbers in fixed formats, or case-insensitive comparing Ascii strings, use Locale.ROOT not Locale.getDefault() to ensure predictable behaviour, and avoid issues in locales like Turkish. TIKA-1387
     new 91d1f00  For places formatting numbers in fixed formats, or case-insensitive comparing Ascii strings, use Locale.ROOT not Locale.getDefault() to ensure predictable behaviour, and avoid issues in locales like Turkish. TIKA-1387
     new 1c660bc  Review SimpleDateFormat use, adding comments where OK or potentially an issue, for TIKA-1387
     new a07a70a  Finish thread safety fix for TIKA-1387
     new 7f8c2d5  Fix typo in name of tika-example module.
     new 1036f5a  Initial commit for TIKA-1391, parsing examples.
     new 23eb15c  Second initial commit for TIKA-1391, parsing examples.
     new 979c7d8  LanguageIdentifierExample for TIKA-1392.
     new e5717ac  Microsoft Translator example for TIKA-1393.
     new e011883  AxCrypt mimetype, test file and test TIKA-1399
     new 0d87ac9  Bump the POI dependency to 3.11-beta2, and remove the Geronimo stax one which is no longer required by anything now we are on Java 1.6 TIKA-1380
     new b15e67f  Start on examples of using different Content Handlers to get differing output
     new e6821da  TIKA-1259 Add the Ogg Daala video mimetype, and remove an incorrect vorbis magic (it was actually a general Ogg one, which is already on the parent)
     new 25dc052  More content handler examples
     new 295658d  ContentHandler example showing how to break the resulting text up by size
     new 9a8b11e  Pull in the Tika Core tests as a dependency for the examples, some of the examples tests rely on asserts defined in the Core tests
     new 82c3744  Patch from Uwe - disable the forbidden API check on the Tika Bundle, which has no java code of its own, as the way we unpack classes before bundling confuses the checker. TIKA-1387
     new 4e36e66  correct examples pom to pull test-jar from tika-parsers
     new 32ebc7a  Fix for TIKA-674: CompositeParser should indicate which parser was actually selected for parsing contributed by Andrzej Bialecki.
     new 79db92b  TIKA-1404 The tika-app in server mode needs to close the TikaInputStream when done with it, to avoid leaking temp files
     new 5d3ad0e  Prep for 1.6 RC #2.
     new 2d08e52  [maven-release-plugin] prepare release 1.6-rc2
     new b64bd85  [maven-release-plugin] prepare for next development iteration
     new befbc6c  prep changes for next development iteration.
     new 1bafbb8  If we open a new NPOIFS object from a TikaInputStream, attach the opened container to the stream so it gets auto-closed when parsing is complete TIKA-1410
     new 2fedbba  Fix warnings in Eclipse about un-handled types in the switch statement
     new a9406a5  Patch from TIKA-1412 from Andrzej Bialecki - Handle ODF setup case of TikaInputStream from stream with no open container
     new d0f46bc  Have PackageParser include the last-modified date from the archive in the metadata, when handling embedded entries TIKA-1246
     new 783594d  Fix inconsistent whitespace/indents, spotted while working on TIKA-1246
     new cc922f4  Patch from Luis Filipe Nassif from TIKA-1411 - Avoid 7z file leak through use of TemporaryResources
     new 0365ed2  TIKA-1413 - Remove embedded thumbnail from body
     new d7f6323  surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin
     new 1bdd91b  Fix a TODO by adding in the PowerPoint .ppt embedded resources extraction unit tests
     new fa93128  TIKA-1418 add example for how to dump tika config; and add --config to CLI
     new 85b96e0  TIKA-1418 add files
     new d26f2a3  TIKA-1418 remove println...the horror.
     new 464a908  TIKA-93, create a new Tesseract OCR Parser.
     new 853df89  TIKA-1329 add RecursiveParserWrapper
     new 3fc408e  Add TesseractOCRParser to the META-INF services list.
     new 718776d  Simple JavaDoc fix for AutoDetectParser.
     new 190e17a  Fix Tika Mime Type post TIKA-93.
     new ec40cb6  TIKA-1412 - Add UnitTest
     new c206307  Fix for TIKA-1421 Check if Tesseract is installed before attempting OCR Contributed by tpalsulich,mattmann.
     new b32b061  TIKA-1424: clear PDFont's resources after each document
     new 6fa4168  TIKA-1419: upgrade to PDFBox 1.8.7 and update CHANGES.txt for this and a few recent changes
     new fdd4969  TIKA-1420, create an example of a PhoneNumberContentExtractor.
     new a8fee2d  Add license headers to PhoneExtractingContentHandler and its test.
     new 5976d31  TIKA-1420, refactor the phone number extraction to use a custom method of de-obfuscating numbers.
     new b871795  TIKA-1420, move the PhoneExtractingContentHandler to tika-core. Tests in tika-parsers.
     new 5be919d  Use TikaTest.assertContains in PhoneExtractorContentHandlerTest.
     new af4bca2  TIKA-1433 : extract documents embedded within annotations in PDFs
     new 4f42198  TIKA-1427: add markup for documents embedded in pdfs
     new e8fac5b  TIKA-1427 cleanup. Handle inline images with same markup as Word parser
     new ff8e253  TIKA-1427, small clean up to ensure that inline image number tracks with extracted file
     new c53ad42  Fix for TIKA-1435: Upgrade Rome to 1.5 contributed by Johannes Mockenhaupt <gi...@jotomo.de>. This closes #16.
     new d600793  Fix for TIKA-1354 Register ForkParser Service in Activator. Contributed by Michal Hlavac <hl...@hlavki.eu>. This closes #13.
     new ccb4bee  This closes #3. Looks like it's already merged, so tickling to get ASF PR to close at Github.
     new 2335991  Fix for TIKA-1369: Resolve thread safety issue in ImageMetadataExtractor. Contributed by Vilmos Papp <pa...@gmail.com>. This closes #15.
     new bac4ce4  Revert TIKA-1435 until we figure out the Rome/JDOM/HDFParser issue merge 1629338:1629337
     new ace2ad9  - fix for TIKA-1441 ExternalParsers should allow dynamic keys to be specified for Regexs
     new 3c68cf0  TIKA-1441 change log.
     new a331b5e  - fix for TIKA-605: GDAL Parser
     new 8f7d739  Update for TIKA-605
     new 63a2a6b  - TIKA-605: fix remainder of tpalsulich comments from https://reviews.apache.org/r/26542
     new fa4d2bf  - TIKA-605: deal with heading boundaries; add associated unit tests to expose and prove fixed for regression
     new eb7202d  Fix for TIKA-1422 contributed by tpalsulich and mattmann.
     new 1056aa2  TIKA-1444 Virtual PC Virtual Hard Disk mimetype
     new eb1e098  [TIKA-1242] Update to CXF 3.0.2
     new ab0065e  [TIKA-1242] Moving the CXF rt/rs/service/description dep into a test scope
     new 3d1a87e  WEBP mimetype from Nelson Monterroso TIKA-1450
     new f3718fa  WEBP sample file from Nelson Monterroso, and associated unit test for TIKA-1450
     new cf55425  TIKA-1422 - Apply fix of [~olegt] in Windows
     new fe06af6  TIKA-1422 - Fixing build & minor refactory of naming test class
     new 8f57b93  clean up from TIKA-1311
     new d42e013  TIKA-1451 add RecursiveParserWrapper output to CLI and GUI
     new eb7e348  move pretty print metadata key sorter into standalone class
     new 45da28c  move pretty print metadata key sorter into standalone class, with added PrettyMetadataKeyComparator...argh
     new 669565c  upgrade gson to 2.2.4
     new ccd5946  - getParsers is never called.
     new 42cebed  TIKA-1422. Skip checking the number of some handler invocations in the RFC822ParserTest if Tesseract is installed.
     new e2293d3  TIKA-1459 fix write limit bug in BasicContentHandlerFactory when creating a BodyContentHandler
     new 2cad861  cleanup tika-app pom, remove unnecessary gson dependency
     new 63fb75d  Very small Windows exe for TIKA-1461, generated with Visual Studio 2008 with advice from http://www.phreedom.org/research/tinype/
     new 2377a7c  TIKA-1461 PE files must also have the MZ header at the start, so tweak magic and add positive and negative mime magic detection tests for it
     new 4f4ce4c  If this test fails at all, have it report which test file it failed on to assist debugging
     new eacb983  TIKA-1463 - Fix tesseractPath in Windows
     new 0aa8bca  TIKA-1467: in PDFParser, move metadata set isEncrypted() to before decryption step.
     new ce51005  Fix for TIKA-1472 Warning on Tika Server startup - Failed to load class org.slf4j.impl.StaticLoggerBinder contributed by Konstantin Gribov <gr...@gmail.com> this closes #22.
     new 78c8213  TIKA-1475. Reformat pom.xml files.
     new 3b2fba4  TIKA-1476 - Updated TesseractOCRConfig to read from property file if present on classpath
     new 069ac2d  TIKA-1476 - Added tests for TesseractOCRConfig external configuration through properties files
     new 994d7f5  Add .svn to .gitignore for people using git/svn
     new 2d05a8a  TIKA-1476 - Fix test on Windows env.
     new 65977bb  TIKA-1446 - Integration of [binhawking]'s work on CHM parser
     new b6fda2a  TIKA-1446 - Revert CRLF on profile language files
     new ccc1fc0  TIKA-1476: Added default configuration file
     new 1cc2725  TIKA-1446: Updated test so it loads the test documents from the classpath
     new 746de4d  Reverting incorrect commit whilst fixing test on TIKA-1446
     new a0d5a4f  TIKA-595: Adding Julien Nioche's patch to enable Multivalue Metadata for Html
     new 9005c67  TIKA-1477: Added new custom header to Tika resource override Tesseract OCR language
     new 74c231a  TIKA-1477: Updated Tika resource to dynamically set TesseractOCRConfig and PDFParserConfig files from custom headers
     new 739ac45  TIKA-1486 Remove duplicated mimetype defintions
     new e500f6a  TIKA-1486 Make DITA Task be a subclass of DITA, and not of itself
     new bb9380c  TIKA-1488 add X-Tika as namespace
     new f64c9e5  TIKA-1487 Test Excel 4 file from govdocs, and an AOO generated Excel 5 file
     new a495a10  TIKA-1487 Based on the file format docs from OpenOffice, add detection and mime types for the older Excel 2, 3 and 4 pre-ole2 formats
     new 64419f7  Add a TODO for TIKA-1490
     new ccd0c4b  Add back return, while we're pending TIKA-1490
     new ed77de4  TIKA-1218, prevent negative array size exceptions for corrupted mp3s.
     new 9f1b421  TIKA-1491 BPG magic from  Johan van der Knijff
     new 49a6266  Some (but not all) test BPG files from Johan van der Knijff from TIKA-1491
     new 9565b06  TIKA-1491 BPG detection test
     new d66f8d7  TIKA-1384 and TIKA-1496. Upgrade slf4j-log4j12 to version 1.7.7 and manage it with tika-parent dependency management.
     new d3bea08  Fix tika-bundle slf4j dependency issue.
     new f233237  TIKA-1495 Decoder for UE7 integers, which work a bit like UTF8 does for strings, used in BPG
     new 0de5e6f  TIKA-1495 Start on a BPG parser, so far just covering width and height (rest to follow later)
     new a4be4be  Update the link to the XMP Spec, which seems to have moved on the Adobe site
     new 1f32c70  TIKA-1495 Fetch the BPG colour information, and have a rough go at storing that as metdata for both BPG and PSD
     new 04253d4  Prepare for TIKA-1494 - also provide the ParseContext to TikaResource.fillMetadata
     new 8b06655  Fix indents / whitespace
     new 2129abf  TIKA-1494 Support fetching the password for Excel .xls files from the ParseContext where given
     new 8b433c7  TIKA-1494 Allow supplying a password on a per-request basis via the Password header
     new 3f89b48  Update the change list
     new 1a317d1  Start processing extension data for BPG files
     new d6d1d22  TIKA-1495 Start on BPG Exif and XMP handling, but for some reason the drewnoakes Exif code gives silent errors
     new 2109402  TIKA-1442, upgrade to PDFBox 1.8.8
     new 57db5ef  TIKA-1498 add recursive parser wrapper output to tika-server
     new 84c5d24  Update CHANGES.txt with recently resolved issues.
     new 05eef80  TIKA-1498: now actually add providers to cli...argh
     new f776fc0  TIKA-1497: add JSON and XMP output to tika-server's /meta
     new 26b66a9  TIKA-1497: update changes.txt
     new 1233a93  Temporary workaround for TIKA-1445 for Tika 1.7 - always pass the image to the regular parser to get the metadata set. Will be replaced in 1.8 with composite parsers + user selected config with strategy
     new a8edc72  Fix some warnings
     new 7dc73ce  TIKA-1445 - Allow you to exclude certain mimetypes from a parser that would otherwise handle them, in your Tika Config xml
     new f13e537  TIKA-1494. Test that decrypting with the wrong password returns a 500 error in MetadataResourceTest.
     new e3fbbfb  TIKA-1499: fold MetadataEP in tika-server into MetadataResource
     new 8f6b48a  Additional OSGi bundle definitions from Tim Allison from TIKA-1469
     new a6b6ad4  Upgrade to POI 3.11 final, patch from TIKA-1469
     new 75230ce  Update the BPG parser following spec updates TIKA-1495
     new 46b321f  TIKA-1490 Parser for old Excel 2-4 files
     new d398c71  TIKA-1490 Unit tests for Excel 2-4 parser
     new e76299d  TIKA-1490 Use the Old Excel parser for older OLE2 based formats too, like Excel 5 and 95
     new e967f1d  Some test database files for TIKA-1502
     new dbc8f86  TIKA-1502 MySQL and SQLite3 mime types, with magic where possible
     new 554922a  More test database files for TIKA-1502
     new f7e2453  Start on magic for subtypes of Berkeley DB TIKA-1502
     new 52ccbef  Split the Berkeley DB mimetypes into three levels, and add a detection test (passes) and a heirarchy test (disabled as fails) TIKA-1502
     new ef0e28d  Fix test for TIKA-1502 - re-order the MediaTypeRegistry logic for getting the super type, so that if an explicit inheritance has been defined between one parametered type and another, that inheritance is used in preference to "drop all parameters"
     new e51064f  One more media type with parameters test, for unknown parameters
     new 79bb58b  TIKA-879 Add a new parent mime type, for the text based message formats, of text/x-tika-text-based-message, which allows Thunderbird messages to be correctly detected as they now show up as being text based not binary based in the hierarchy
     new 400ed89  Upgrade the Maven Shade plugin - slightly faster, and avoids spurious warnings about duplicate xmlbeans classes
     new db03e43  Missing test file from TIKA-879
     new f3df5ae  TIKA-1500. Strip tags from content in FeedParser.
     new 5a7216e  TIKA-1503. Don't run the GDAL FITS test if FITS files aren't supported by the installed version of gdalinfo.
     new 78265ea  Pure whitespace change. Reformat the GDALParser and its test.
     new 9277ce6  TIKA-1465. Reformat XHTML generation from NetCDFParser.
     new 02f4720  Pure whitespace change. Fix formatting of NetCDFParser and its test.
     new a97f811  Update CHANGES.txt for 1.7 release.
     new 716c50f  Add Tyler Palsulich's key to KEYS.
     new 001406d  [maven-release-plugin] prepare release 1.7-rc1
     new cbe900f  [maven-release-plugin] prepare for next development iteration
     new f5ad49d  TIKA-1506: close PSTFile's file handle after parsing
     new 5c694fd  [maven-release-plugin] prepare release 1.7-rc2
     new 0a6bd7a  [maven-release-plugin] prepare for next development iteration
     new bddf67d  Shorten the ParseContext fetching of the TesseractOCRConfig
     new 5f68452  TIKA-1445 If Tesseract isn't available, don't offer any supported mime types, so the parser avoids being picked by DefaultParser or similar
     new 895ebb7  Cleaner workaround parser call from Tim Allison from TIKA-1445
     new 1843bb5  TIKA-1445 Unit test to show that when an invalid tesseract config is given, and tesseract cannot be found, TesseractOCRParser will return no types and will not be selected by DefaultParser
     new 5cf7785  TIKA-1445 Unit test to check a JPEG via Tesseract gets both OCR text and normal JPEG metadata
     new 0a176a5  TIKA-1445 Use assertContains, and fix a problem with the ForkParser integration tests
     new d7f253f  Shorter supportedTypes initialisation
     new 192c0d0  TIKA-1445 Cache if Tesseract is present at a given path or not
     new 4533e06  Temporary workaround for the TIKA-1507 ForkParser / OGI issue
     new 9f8fc77  Disabled exif related bpg tests for TIKA-1495
     new f433700  TIKA-1445: need to fix TikaMimeTypesTest in tika-server to accomodate two options for parser
     new 41c09d7  TIKA-1445: add tests to TesseractOCRParserTest to ensure metadata is extracted
     new 9ba366d  Fix indenting in TesseractOCRParser.
     new 5d3fc7f  Remove unused variables from TesseractOCRParser.
     new c9887c1  TIKA-1445. Split TesseractOCRParser#offersNoTypesIfNotFound in two. Small import and comment changes.
     new 31fd0dc  Add Tyler Palsulich to parent pom developers list.
     new 9192474  Add tika-server to maven-release-plugin configuration.
     new 484fe4e  TIKA-1412: Fixed test issue on Windows build
     new 539d511  Update release date in CHANGES.txt.
     new 6e18dde  Remove redundant release config in root pom.xml.
     new dcc2511  [maven-release-plugin] prepare release 1.7-rc3
     new 221f10d  [maven-release-plugin] prepare for next development iteration
     new 9b4a088  Add missing svn:ignore on new-ish sub-projects
     new e09cb44  TIKA-241 Unrar parser from Luis Filipe Nassif
     new b9b8537  TIKA-241 Refactor to use common logic between PackageParser and RarParser for populating xhtml+metadata of embedded resources
     new eebeab1  Use assertContains(needle,haystack) rather than assertTrue(haystack.contains(needle)), to get more helpful failure error messages
     new a729485  Fix indents/whitespace, and use assertContains
     new 1b462d3  Tweak comments/whitespace, and use assertContains
     new a7bfac6  Use the common assertContains method, and use it more widely
     new 89eea5f  Use assertContains(needle,haystack) rather than assertTrue(haystack.contains(needle)), to get more helpful failure error messages
     new 7ef8091  Add the unrar license to the main and parsers license files (junrar uses the same, category B, license)
     new 389d935  Update vote.txt template.
     new c2f825f  Add 1.8 current development section to CHANGES.txt.
     new 7cd31ee  TIKA-1028 Have PackageParser report encrypted zips via EncryptedDocumentException rather than commons compress UnsupportedZipFeatureException
     new 0c9237e  Test rfc822 file with an encrypted zip file attached from TIKA-1028
     new a9f559d  TIKA-1028 If an encrypted attachment is found in a RFC822 email, silently skip it and carry on, so the rest of the email can be processed (may need more work!)
     new 32985ed  Partial unit test for TIKA-1028
     new 7b7ad95  TIKA-1222 For RFC822 mails, start to prefer a EmbeddedDocumentExtractor to a Parser for handling embedded resources, but retain the Parser use if not for backwards compatibility
     new 3b1d249  TIKA-1222 Move further towards EmbeddedDocumentExtractor, but keeping backwards compatibility
     new 8abbe99  TIKA-1222 Unit test for rfc822 embedded resources
     new 24faaa9  TIKA-1028 New rfc822 mail with encrypted zip of known password, from Juha Haaga, and matching unit test changes, but I'm not sure it's quite right just yet... (See TODOs)
     new 58b7c17  TIKA-1028 Refactor the RFC822 parser to setup recursion once per file, not once per attachment, and get it so that a non-encrypted zip attachment is correctly extracted. (Commons Compress currently lacks password protected zip support
     new ab00944  TIKA-1521 Support password protected 7zip files
     new 2f5148a  Add references to the Tika issue for upgrading for the fix
     new 9398ed2  TIKA-1526: initial fix for jvm bug that can affect users with a default Locale of tr running on MACOSX or BSD.  We still need to confirm that this fixes the problem and/or add a unit test.
     new 83b8f79  TIKA-1529: step 1...get rid of toLowerCase in BasicContentHandlerFactoryTest
     new 892f3a8  TIKA-1529: turn forbidden-apis back on and clean up all mentions of UTF-8
     new ff34714  fix for TIKA-1530: Include parsed mp4 duration in metadata contributed by Oskar Wickström <os...@live.com> This closes #25.
     new bb0822a  Use a locale-consistent DecimalFormat to set the mp4 duration, avoiding rounding issues TIKA-1530
     new 07892ac  TIKA-1521: follow commons-compress and require installation of jce before testing password on 7z file
     new 493567d  TIKA-1534: Upgrade to Commons Compress 1.9
     new 6778d5d  TIKA-1329, added examples for the RecursiveParserWrapper
     new c4b1393  TIKA-1423 Build a parser to extract data from GRIB formats
     new 1c9b450  TIKA-1518: Added Dockerfile to support building a Tika Server image
     new 418fdcd  Fix for TIKA-1537 Installation on OSX 10.10.2 generates OutOfMemory Error during parser tests contributed by Andrew Hwang. This closes #26
     new 1493ca7  Update readme with correct Java and Maven version requirements.
     new c2b4463  TIKA-1423: Added exclusion to avoid duplicate JCL dependency
     new 8d9ddf9  TIKA-1542 substitute Apache friendly TTF test file for our current copyrighted file
     new 0462e2d  Patch for TIKA-936 Fix for RarParser for handling Chinese characters contributed by kongxianghe1234 <ko...@gmail.com>. This closes #27.
     new 1916f12  TIKA-1542 substitute Apache friendly TTF test file for our current copyrighted file, take 2.  See PDFBOX-2383
     new 12bdd1c  Fix for TIKA-1539 GRB file magic bytes and extension matching contributed by Luke Sh LukeLiush <ha...@gmail.com>.
     new 6f788cb  Fix for TIKA-1539 GRB file magic bytes and extension matching contributed by Luke Sh LukeLiush <ha...@gmail.com>. This closes #28
     new 14386dc  TIKA-1539 Fix indent, and move the GRIB and XQuery mime entries to the right place in the sorted list
     new 415e35c  ParserDecorator unit tests
     new 1cdc9ae  TIKA-1509 Provide a possible "parser with fallback" implementation, with lots of questions!
     new a39fc18  TIKA-1547. Use POST for HTML form input.
     new 169ecc8  Fix for TIKA-1541: StringsParser: a simple strings-based parser for Tika Contributed by Giuseppe Totaro.
     new 73da44c  TIKA-1547. Update CHANGES.txt.
     new ad6a794  TIKA-1269. Add Miredot documentation for tika-server.
     new 8dbed10  TIKA-1269. Add link to tika.apache.org Miredot documentation on the tika-server index.
     new e12bb7a  TIKA-1544 consecutive new lines not preserved in rtf
     new 8fc3a09  TIKA-1548 improve handling of encrypted pdfs when wrong password is offered
     new 961a8f2  TIKA-1511 add parser for sqlite3
     new 8ecbab9  TIKA-1511, with new files added...doh
     new e66ec97  TIKA-1511, third time is the charm...many apologies
     new dfb59c3  TIKA-1511 try to revert to earlier version of sqlite-jdbc to avoid unsatisfiedlikeerror on ubuntu
     new 8494d93  Fix for TIKA-1549 Increased the speed of language identification by a factor of two. Fix contributed by Toke Eskildsen <te...@ekot.dk>. This closes #29.
     new a6708c1  Bump up the core maven plugin versions to the latest available
     new c19701b  TIKA-1553: add an EvilParser for testing purposes
     new dadd869  TIKA-1323: allow tika-server to return stack traces from parse exceptions for easier analysis of parser exceptions via tika-server.
     new 538de86  TIKA-1556 clean up whitespace in tika-server
     new 99b53ce  TIKA-1558. Enable blacklisting of Parsers and other services with a servicename.blacklist META-INF file.
     new 11787f5  Apply rollback patch for TIKA-1354 contributed by Michal Hlavac <hl...@hlavki.eu> this closes #30.
     new bab243e  Updated tests for TIKA-1541 simple strings parser from Guiseppe Totaro.
     new de5df17  Updated tests for TIKA-1541 simple strings parser from Guiseppe Totaro.
     new 4f562f3  instructions for how to contribute via Github. This closes #31.
     new a39e891  Fix for TIKA-1483 Create a Latin1 charset raw string parser contributed by Lius Filipe Nassif.
     new 960cd68  Fix for TIKA-1483 Create a Latin1 charset raw string parser contributed by Lius Filipe Nassif.
     new 547166e  More assert containing methods
     new 8b02f6a  Start on unit testing for the new TIKA-1558 style parser blacklisting
     new 43d7a8b  Start to prepare for child parser definitions within a composite parser
     new 9ee3a34  TIKA-1558 Support excluding (blacklisting) parsers from config, so you can use DefaultParser for all except certain parsers. Also supports child parsers of a composite parser from config, towards TIKA-1509
     new c56b275  Fix for TIKA-1561 GCMD Directory Interchange Format (.dif) identification contributed by LukeLiush <ha...@gmail.com>. This closes #32.
     new deefc83  Add the config blacklisting to the Changelog
     new 07597af  Sereal, CBOR and WinInf mime magic from file(1)
     new 1eb9f97  A few more mimetype updates inspired by file(1)
     new fcfec8b  Add a Tika CLI option for comparing with the File(1) magic directory, to report types to consider adding, and types we may be able to get magic for TIKA-289
     new d3217c7  Add getChildTypes(MediaType) support to MediaTypeRegistry, to allow you to navigate the hierarchy the other way too
     new 114b481  When looking at the file(1) magic dir, check children for magic too, as sometimes they have it, and update the changelog
     new d7156b1  TIKA-1563 Put the more common gzip file extension (.gz) first in the glob list
     new c63ae23  Use ${project.version} for tika-core dependency in tika-translate.
     new ecd7ce8  TIKA-758. Remove PDFBOX workarounds in PDF2XHTML.
     new aca3dfa  TIKA-758 clean up after remembering PDFBOX-1130
     new 2697354  TIKA-995. Properly output XHTML body attributes, contributed by Markus Jelsma.
     new bd19155  Update CHANGES.txt for TIKA-995.
     new d1e8d71  TIKA-1489 add optional accessibility checking to PDF files
     new fe76f57  TIKA-1000. Ignore an invalid SAXNotRecognizedException.
     new 15303fe  TIKA-1038. Fix possible infinite recursion while parsing some PDFs.
     new 0c81e7a  TIKA-1553 change EvilParser to MockParser and move to core
     new 0a92ecf  turn off pdfbox logging in PDFParserTest
     new 4aab1c1  Fix for TIKA-1567 WelcomeResource in TikaServer doesn't print PathParam prefix. This closes #33.
     new db07978  TIKA-1553, add action types for printing to stdout and stderr
     new fa5eb09  Remove println from XHTMLContentHandlerTest.
     new 86b897d  TIKA-1571 Upgrade UCAR dependencies to 4.5.5
     new e559b33  TIKA-1286 Sample Visio OOXML VSDX files from  Pascal Essiembre
     new 44032de  TIKA-1286 Visio OOXML mimetypes, and non-container detection unit tests
     new 3f6d06f  TIKA-1286 Bring the overall file mime types into line with the other OOXML formats, and add container aware detection + tests for the visio ooxml types
     new 257a53b  Support detection of OOXML-Strict files, and add a disabled unit test for OOXML-Strict xlsx parsing (not yet supported by POI)
     new 4112484  TIKA-1564. Move tika-server resources and writers to their own packages.
     new 63445bd  TIKA-1564. Fix package visibility compiling issue in tika-server.
     new 794595c  TIKA-1063. Add basic ODF style support, contributed by Axel Dörfler.
     new ce710eb  TIKA-1063. Add ODF style test resource file.
     new 54ea090  TIKA-1564. Move TarWriter to the server writers package.
     new c46f545  TIKA-1117. Don't let iWorkPackageParser close the given InputStream.
     new 19c87d9  TIKA-1137. Break early when possible in ForkParserIntegrationTest, contributed by Adrian Nistor.
     new 4cf0ab4  TIKA-1576. Upgrade metadata-extractor to version 2.7.2.
     new bd4798a  Pure whitespace change. Reformat ImageMetadataExtractor.
     new d3be3ea  Fix for TIKA-1365  Lower priority for XML starting with comment, allow HTML starting with comment to be detected as text/html contributed by Matthias Krueger <mk...@mkr.io> this closes #35.
     new 355da43  Adding EMF magic as per Microsoft's EMF specification, and thanks to Luis Filipe Nassif contributed by Matthias Krueger <mk...@mkr.io> this closes #34.
     new 05ad16e  Update tika-dotnet version to 1.8-SNAPSHOT.
     new fc87e4a  TIKA-1416. Refactor Translator Exception handling.
     new e5ca726  Style changes for GDALParser.
     new 832ba70  Log Tesseract messages.
     new fe4cd58  initial commit of TIKA-1330
     new 5a6626c  updated patch for TIKA-1579: outputs NetCDF file type in metadata
     new 375274a  updated patch for TIKA-1578: outputs file type in metadata
     new bc34c83  TIKA-1531 upgrade to POI 3.12-beta1
     new d524ba6  update to CHANGES.txt
     new dbc5eb7  TIKA-1330 clean up logging and some dependencies. Still some log4j dependencies for now
     new de00e27  TIKA-1581 - Switch using Jhighlight on CDDL/LGPL dual-licensed and update notices
     new 8a94581  TIKA-1581 - Typo & CHANGES.txt
     new b765db2  TIKA-1583. Convert tika-server README to markdown.
     new a4decc5  TIKA-1583. Small formatting changes for tika-server README.
     new 849eb31  TIKA-1583. Remove old README.
     new a606c94  TIKA-1586. Enable CORS requests on Tika server.
     new 7b979a8  Fix for TIKA-1580: Support IsaTab MIME identification and parsing. Thanks to Giuseppe Totaro for all the great work!
     new 62de8ee  Update pdfbox to 1.8.9
     new 6e5056a  TIKA-1511 include xerial and native libs; some cleanup of README in preparation for 1.8 release
     new 990c0c2  TIKA-1512 temporary workaround.  Currently not including test docs or tests that derive from govdocs1
     new 90bc595  TIKA-1584: fixed regression in Tika 1.7 that prevents processing of embedded docs with /tika service
     new 6ce4a16  ForkParser.setJavaCommand takes List<String> now
     new 4c1834e  Fix broken ForkParser APIs
     new 383208c  TIKA-1581 - Mention @kkrugler thanks in CHANGES.txt
     new 8054ddd  TIKA-1330, trivial fixes to avoid NPE with consumersManagerMaxMillis parameter
     new 4ae33b7  TIKA-1330: add integration tests to TikaCLITest
     new 3134278  TIKA-1423: exclude pdfs and readme.txt files from tika-app and tika-server jars.  Anything else we can exclude?
     new 6648ed8  TIKA-1511: add public domain license notice for Sqlite to main License.txt
     new 671b314  TIKA-1589 - Patch from Max Daniline to extract MP3 duration from files with no ID3 tags. This closes #38 from github
     new 5f506c1  TIKA-1558. Refactor Parser blacklisting.
     new b32a233  TIKA-1558. Better error message and fix typo.
     new 42d0111  TIKA-1586. Change CORS short option to -C.
     new d888256  Reformat 1.8 section of CHANGES.txt in preparation of 1.8 release.
     new 6a013f5  TIKA-1330 clean up logging in tika-batch ant tika-app integration of tika-batch
     new b45a092  TIKA-1330 clean up logging in tika-batch ant tika-app integration of tika-batch, take 2
     new a19c06f  TIKA-1330 flush stacktrace writers
     new 18d0536  Remove blacklist custom mimetypes.
     new 3fff336  Updated bouncycastle to 1.52
     new fe31952  TIKA-1330 fix logging in TikaCLI to avoid adding multiple appenders
     new d2dde3e  TIKA-1323: flush writer when printing stack trace
     new 6194e68  TIKA-1519 - don't allow potentially erroneous http-equiv Content-Type to overwrite Content-Type in HtmlParser
     new 43086f4  TIKA-1519 change underscore to dash
     new de74bb3  tika-batch cosmetics
     new 6d184fc  tika-parent: added myself to committer list
     new 39644a9  tika-parent: slf4j updated and adapters added
     new c88345a  tika-app: pass all logging through slf4j
     new 6906f35  Update release date for Tika 1.8 in CHANGES.txt.
     new 1c07108  [maven-release-plugin] prepare release 1.8-rc1
     new d176c6c  [maven-release-plugin] prepare for next development iteration
     new aecf60c  TIKA-1594 upgrade metadata-extractor to 2.8.0 and add parser for webp metadata
     new 1e33ed3  Fix for TIKA-1597
     new 13a280b  Cosmetics: remove trailing whitespace (RTFParser)
     new 7d231d0  TIKA-1519: add charset information for the non-html formats, too: XHTML(s) and x-asp
     new ece254a  TIKA-6000 - Fixing NPE when having style in footnote
     new f2c263d  TIKA-1600. Reformat ODF Parser files and move OpenDocumentParserTest tests to ODFParserTest.
     new 7af403b  Update CHANGES.txt in preperation for Tika 1.8-RC2.
     new a9a967f  [maven-release-plugin] prepare release 1.8-rc2
     new 6df7f40  [maven-release-plugin] prepare for next development iteration
     new 3e9db56  TIKA-1605
     new 35090b3  another npe in PDFParser
     new b46164b  TIKA-1606: update Guava version to something slightly more recent
     new 52a2eaf  TIKA-1511, move xerial dependency to 'provided'
     new 6d28756  Add 1.9 section to CHANGES.txt.
     new 88a5a85  Update version of tika-dotnet.
     new f25385e  fix documentation of when we moved the sqlite dependencies to provided
     new 3c5905e  TIKA-1501: Fix disabled OSGi related unit tests. Fixes from Bob Paulin.
     new 7ada216  TIKA-1611 -- allow RecursiveParserWrapper to catch exceptions caused by embedded documents
     new d9f730c  WIP Fix for TIKA-1610: Support MIME extension for CBOR files contributed by LukeLiush <ha...@gmail.com> this closes #42
     new b40b92e  TIKA-1580: Fix to allow test to run on Windows with space in folder
     new 60059ae  TIKA-1610 Bump the CBOR mime magic priority to 60, to be more specific than (x)html, which is what CBOR often contains, and add a detection unit test
     new c6ed613  tickle to close #44.
     new 54e6717  Revert r1676165 as it included local changes.
     new b9617fb  Patch from Bob Paulin from TIKA-1617 - Change OSGi Detection test to use OSGi Service
     new 40e0757  Spotted when looking at TIKA-1617 - DefaultParser should override getAllComponentParsers to mirror getParsers behaviour when a dynamic service loader exists
     new f192ed7  Add a parsers equivalent OSGi test to mirror the detectors one, spotted while working on TIKA-1617
     new 7ca5ec0  Fix for TIKA-1532 DIF Parser contributed by HyperDunk <aa...@gmail.com>. This closes #46.
     new 2a26a5d  Fix for TIKA-1535 Inheritance modification for the class MIMETypes contributed by LukeLiush <ha...@gmail.com> this closes #45.
     new 79c07df  Fix for TIKA-443 Geographic Information Parser contributed by unknown <ga...@gmail.com> this closes #47.
     new 281fedf  Fix for TIKA-1571 add probabilistic mime selection contributed by LukeLiush <ha...@gmail.com> this closes #41.
     new 39352e9  Fix for TIKA-1582 Mime Detection based on neural networks with Byte-frequency-histogram contributed by  LukeLiush <ha...@gmail.com>. This closes #36
     new 3210d79  - fix for TIKA-1621: TikaResource should log errors determining ContentDisposition
     new 7f9d4c2  Update CHANGES entry for TIKA-1621.
     new cd91ee4  Add some javadocs explaining what this does, and use a proper version UID to avoid serialisation problems (eg with Forked mode
     new 3784cac  Add some javadocs explaining what this does, and use a proper version UID to avoid serialisation problems (eg with Forked mode TIKA-1517
     new 66d39b7  Update whitespace to match coding conventions
     new c3d82a8  TIKA-1517 Pull the ordering logic for loaded classed (non-Tika first etc) out into a util class
     new ca0676f  - DIFParser has been moved to o.a.tika.parser.dif
     new 761467a  - DIFParser has been moved to o.a.tika.parser.dif
     new f54cb9e  TIKA-1562: Add examples from the Tika in Action book
     new 2867f73  TIKA-1562: Add examples from the Tika in Action book
     new d3c776c  - fix for TIKA-1622 Expose Tika LanguageIdentifier via Tika Server
     new d4f437c  fix for TIKA-1622 Expose Tika LanguageIdentifier via Tika Server
     new a54b37a  - fix for TIKA-1622 Expose Tika LanguageIdentifier via Tika Server
     new 3982066  - fix for TIKA-1622 Expose Tika LanguageIdentifier via Tika Server
     new 18902db  - typo
     new e533a22  - provide no arg constructor for service loading - this will get selected by default per its ordering by name - TODO: fix this later.
     new 9f255bc  - fix for TIKA-1620: OUTPUT_FILE_TOKEN not being replaced in ExternalParser contributed by Pascal Essiembre
     new 497445e   OUTPUT_FILE_TOKEN not being replaced in ExternalParser contributed by Pascal Essiembre TIKA-1620
     new fa8b7b6  properties file doesn't require package prefix (will default to this package)
     new 2280fd6  - no need for class prefix
     new 3241830  - use class classloader
     new 6d21945  - don't need the package prefix for Lingo24 translator
     new 072980c  - no need for pkg prefix for properties file
     new 6481195  - use class classloader to load config
     new a850812  TIKA-1623 Expose Translation Interface from Tika Server
     new 0a1ffa9  Fix for TIKA-1625 Add support to Tika Server for parsing remote file URLs and for providing language detection contributed by junwei1229 <ll...@tradeshift.com> this closes #48.
     new 73741e7  - be consistent and set language in /rmeta as well per TIKA-1625
     new 24c6004  Updated Apache POI to 3.12
     new 90ff7e3  TIKA-1628: ExternalParser.check now returns false if SecurityException is thrown
     new 88f140f  TIKA-1629 fix eol-style to LF in *.java *.properties and select *.xml
     new b5a79c8  Update to TIKA-1622 with corrected French language example contributed by Thomas Ledoux.
     new 345d33f  TIKA-1085 Treat a PDF with a leading Byte Order Mark the same for detection, and add low-priorty matches for the PDF magic coming in 1-1024 bytes of the start (may give false positives if too high), plus tests
     new 03203f1  TIKA-1632 zlib mime magic from Pavel Micka
     new 77a5e12  TIKA-1632 Add some test zlib compressed files, another magic for it, and detection unit tests
     new 8df88b0  TIKA-1635 Disabled zlib parser support, not yet enabled pending a fix for a commons compress bug
     new 2259e44  Add an alternate zlib mimetype found in some places
     new 13c38d7  TIKA-1634 Add some sample matlab files
     new fbd9142  TIKA-1634 Two more kinds of matlab magic, and tests
     new 84b7ca2  fix for TIKA-1614 Geo Topic Parser contributed by aranyali <ar...@gmail.com> and modified and updated by Chris Mattmann thi closes #43.
     new 3ab3c34  - formatting
     new c385761  - clean up imports/formatting
     new 1b23ca5  - missing Apache header
     new 2aa8b31  - fix for TIKA-1636: Toggle loading error warn logs in Tika Service Loading from the Command Line
     new 7728826  Update CHANGES.txt for TIKA-1636.
     new 3097527  - fix for TIKA-1638: Make ExternalParser actually work
     new 5c664b9  Update changes for TIKA-1638.
     new ba78bef  - fix for TIKA-1510: FFMpeg installed but not parsing video files
     new 5a1f1f3  Update changes for TIKA-1510.
     new 227ef01  TIKA-1510: fix videoColorSpace
     new 65127bf  - fix unit tests associated with TIKA-1638
     new 7ec0a0b  - fix for TIKA-1639: Add EXIFTool as an ExternalParser
     new be0e09c  CHANGES update for TIKA-1639.
     new a7e0a56  - fix unit test if CompositeExternalParser isn't available b/c exiftool and/or ffmpeg aren't installed.
     new a7b50ca  TIKA-1315 -- basic list support for WordExtractor; still need to add in override behavior once we add a class to ooxml via POI
     new 5239ee7  TIKA-1643: clean up code in tika-parsers -- changed all newlines to lf and autocorrected code for most parsers that I've mis-styled.
     new 7212465  TIKA-1643: reverted tika-server's pom to re-include qmino. i still need to delete this locally for every build because i can't figure out how to get to qmino through a proxy
     new bea6f06  upgrade release plugin to get around [ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin: error.
     new 48d76bd  [maven-release-plugin] prepare release 1.9-rc1
     new 741be83  [maven-release-plugin] prepare for next development iteration
     new e549ec8  Try to make the low-priority padded PDF magic match more specific, as it looks to have incorrectly triggered on a few of the govdocs text files
     new fa1c631  Bibtex entries are case insensitive, and might start with a comment, so tweak magic and add a test file. (Spotted in govdocs1)
     new db9ac56  TIKA-1634 Few more matlab and other code related tests
     new cc32605  TIKA-1646 fix RecursiveParserWrapper to add Metadata object even if an exception is hit while parsing the container
     new 9413f45  TIKA-1646 small cleanup
     new 826467e  TIKA-1315 cleanup after run against govdocs1
     new a1b849f  Fix for TIKA-1634 Detecting problem with Matlab source code contributed by Jihyun Oh <ma...@gmail.com> this closes #49.
     new 04be328  TIKA-1233 reopened
     new 482c7f8  Mark the Tex formats as subtypes of text, so that if there isn't a dedicated parser for them, then they still get some basic text extracted via the text parser. Improves govdocs1 coverage
     new 209b648  Fix for TIKA-1652, TIKA-1426: Tika Server should allow config file override from the command line like Tika App
     new 6199820  Fix for TIKA-1645 & TIKA-1642: Extraction of biomedical information using CTAKESParser contributed by Selina Chu, Giuseppe Totaro and mattmann.
     new c68c280  CTAKESParser: don't enable via SPI since enabled via config.
     new d170261  [maven-release-plugin] prepare release 1.9-rc2
     new 4fab75b  [maven-release-plugin] prepare for next development iteration
     new fa5e666  Fix indents to match http://tika.apache.org/contribute.html#Code_Formatting TIKA-1642
     new 952ba27  Improve how the Tika CLI reports decorated parsers in --list-parsers
     new 5132638  Include parser decoration details in the Tika Server parser listings as well
     new b93a7d2  TIKA-1653 Re-do the XML parsing in the Tika Config, so that a parser tag with another inside it doesn't get accidently duplicated at the top level
     new 342dd8f  Make the nesting more visually obvious in the Server HTML parsers listing
     new 086501e  Allow Tika Config xml to have a ParserDecorator with child parsers, and note about how this can work in the javadocs
     new 826bae2  cTAKES config xml example and code example in JavaDocs TIKA-1642
     new 80a1126  TIKA-1654 Reset cTAKES CAS into CTAKESParser (Fix for TIKA-1645)
     new fdaedd0  Adjusted indentation in pom.xml file to match rest of file
     new 84964ec  Reformat tika-parsers pom
     new e846610  Reformatted POMs
     new 78ac030  Add a mime type definition for Java properties files, after a discussion on stackoverflow showed we didn't have one
     new 64a268a  TIKA-1660 Java Properties sample file and detection test, follows on from r1686199
     new 5cdc797  TIKA-1654: Reset cTAKES CAS into CTAKESParser
     new cc9fbcf  add test to ensure that the list reader for tika-batch properly creates subdirectories
     new 1a3749f  Fix for TIKA-1659 ZipContainerDetector does not detect all IPA files contributed by Rami Shomali <ra...@lookout.com> this closes #51.
     new 90a2202  TIKA-1663 add a DigestingParser
     new 444dadd  Fix for TIKA-1664: GDALParser now correctly sets nitf as a supported media type contributed by Joseph North <jo...@gmail.com> this closes #53.
     new 761273f  Fix for TIKA-1669: xpath node test ./node() should match all contained nodes contributed by WulfB <wu...@inacta.ch> this closes #52
     new fd8514c  Rollback r1688087 as it seems to cause some tests to fail.
     new 2a47d9a  TIKA-1601: integrate Jackcess to parse MSAccess files
     new 06cfbaa  Fix for TIKA-1602: Detecting standards-non-compliant emails as message/rfc822 contributed by Jeremy B. Merrill <je...@nytimes.com> this closes #40.
     new 425506e  TIKA-1536. Upgrade to Java 1.7.
     new 4695df5  TIKA-1536. Update CHANGES.txt with upgrade to Java 7.
     new de5a2de  Remove change comment, TIKA-1602
     new 2764fb8  TIKA-1673 drop source file name from embedded file path; made a few java 7 updates; added timing for embedded docs
     new f2218da  TIKA-1673 -- doh, add back dropped qmino in server's pom
     new 9688c77  TIKA-1400
     new 165eebc  TIKA-1674: initial commit to add example of how to extract embedded files
     new 6029c0d  TIKA-1676
     new 320b289  TIKA-1681
     new 898f300  TIKA-1684
     new 4dcfb74  TIKA-1685 clean up easily cleaned up deprecations
     new 4a20585  TIKA-1687 upgrad xerial.org's sqlite-jdbc to 3.8.10.1
     new c11eee5  TIKA-1238: Update OutlookExtractor's codepoint detection algorithm
     new 9a8798b  TIKA-1678 -- initial commit.  Need to wait for fix to PDFBOX-2896 to generate test file.
     new 9c04fa6  TIKA-1683 -- add encryption support for Jackcess
     new 0658ee6  TIKA-1683 -- add encryption support for Jackcess, this time with test document
     new d2e68d4  TIKA-1692 : allow MimeTypes to look for a registered mime type that may or may not have parameters.
     new 28149a6  Tweak the getRegisteredMimeType javadocs a little bit, to try to make it clearer
     new 5f3acd2  Fix some javadoc warnings
     new 7cdf08c  Fix some javadoc warnings
     new a4baebe  TIKA-1588 upgrade to PDFBox 1.8.10
     new 98672cd  TIKA-1690: revert changes made in r1678515 that added fileUrl capability in tika-server
     new 194a301  TIKA-1667: upgrade to POI 3.13-beta1
     new 1ecb5f9  TIKA-1678: clean up and add test file and unit test
     new 58a757e  TIKA-1689: revert mistakenly flipped sort order of parsers from r1677328
     new 1ff04b1  TIKA-1689: with mention in Changes.txt
     new 797b0e8  Exclude junit compile-time dep from json-simple
     new 70dbd04  Remove junit from OSGi bundle deps
     new d8d05e0  Start on detector config tests for TIKA-1702
     new f018a43  Set missing svn:ignore
     new 0eeab40  TIKA-1702 Refactor some of the config parser loading to be more re-usable for detectors, and bring the method signature in line WRT Composite vs not (must always be composite)
     new 1db02ed  More TIKA-1702 refactoring to bring detectors in line with parsers
     new cb89a82  TIKA-1702 Start moving to a loader class pattern for common Detector and Parser (+later others)
     new 62428f2  TIKA-1702 CompositeDetector support for excludes, along the lines of the CompositeParser support
     new e3f48bd  TIKA-1702 Move the parser and detector creation logic to the config loader classes
     new b1091b0  Allow Detectors to be defined as excluded in Tika Config XML TIKA-1702
     new 79f20f6  If DefaultTranslator has multiple translators loaded, use the first available, not just blindly the first
     new 06fefb3  Empty Translator, similar to the ones for Parser and Detector, for use in testing etc
     new 727528c  Convert Translator config to the new pattern for TIKA-1702, and add unit tests for Translator xml config
     new 2cb29c2  Changelog
     new 9489bfd  Move AbstractTikaConfigTest to Core, and use that to shorten TikaConfigTest TIKA-1700
     new cd129b5  TIKA-1700 Add TikaConfig constructors that take a ServiceLoader, and add a unit test that shows we (now) use the LoadErrorHandler on that properly for reporting problems with listed class names
     new 3533866  Fix up the Probabilistic Mime Detection Test
     new ff0e5c3  Update CHANGES.txt for 1.10 release
     new abd6b87  Patch from Bob Paulin from TIKA-1700 - Allow setting the Service Loader dynamic flag and load error handler from the tika config xml
     new a57f21e  [maven-release-plugin] prepare release 1.10-rc1
     new 1569b3f  [maven-release-plugin] prepare for next development iteration
     new 38697cb  Fix for TIKA-1703: Can't Specify Tesseract Data Folder Distinct from Tesseract Executable Path Contributed by Christian Wolfe <ta...@gmail.com> this closes #56.
     new 3857bd5  Move to the most recent org.apache parent pom
     new 4023120  More Tika Core rat excludes
     new 8045b05  License headers and Apache Rat excludes
     new 347d7d2  License headers and Apache Rat excludes
     new 89d4674  Updated tika-dotnet POM for Apache 1.10 release
     new ee94feb  Updated tika-dotnet POM for Apache 1.10 release
     new 532de24  Fix indents/whitespace
     new 650d01c  Several people on StackOverflow are getting confused by this example, show how to use AutoDetectParser first, all the components second
     new d5796b9  Replace deprecated method use and outdated practice from the example
     new f67c93c  One more improvement
     new ef88dd9  TIKA-1705: Upgraded ASM to 5.0.4. Patch from Uwe Schindler.
     new c8edc83  TIKA-1705: Changed dependecy from asm-debug-all to asm
     new c7c166a  - fix for TIKA-1699: Integrate the GROBID PDF extractor in Tika contributed by Sujen Shah <su...@gmail.com> this closes #55.
     new 1c67639  Changes.txt for TIKA-1699.
     new 8bb18b1  - statically initialize the context once (so in Tika server it can be reused)
     new 99aaf11  Back out r1695816, so the build can pass again, pending a fix of the broken grobid poms. Fix being tracked in TIKA-1699
     new 7f5ef18  Use a consistent version of Commons IO everywhere, enable the Forbidden APIs check for it, and fix problems it found TIKA-1706
     new 8e03be7  Move the parent test class of many Tika tests to core/test, so core tests can use it too
     new 98b088d  Outlook detection with custom config tests, based on work by Justin Palmer TIKA-1708
     new 3f784c6  TIKA-1708 If the Tika Config detector entry calls for MimeTypes, use the already created one, avoid creating a new empty one
     new 4e7851d  Tweak text to avoid a false match from the tika-core test dummy mimetype, and try to make constants use clearer
     new 5e486df  - ignority *.log.*
     new 1139986  TIKA-1699: refactored GROBID parser to use GROBID rest API. Only introduced 2 deps, CXF client, and also org.json. very small and works great. Thanks to Sujen Shah for his initial work on the GROBID patch.
     new b4f1c29  - fix typo: TIKA-1699
     new a47881f  - further guards
     new 3e9ccb1  - TIKA-1699: statically load the rest URL properties inside of GROBIDRESTParser
     new 31c5d2d  - fix for TIKA-1712: GROBID parser fails in tika-app thanks to Sergey Beryozkin and Daniel Kulp for the idea for the fix.
     new 4753ec6  Update changes for TIKA-1712
     new d783608  TIKA-1699: fix bundle for GROBID parser deps.
     new feda994  Changelog update
     new 2861eb3  TIKA-1714 Allow --host=* to easily trigger listening on all addresses for the Tika Server
     new f4bdbbe  TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets
     new f71910e  TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets
     new 19eb444  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new d5981c7  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new a15819d  TIKA-1710 patch from Yaniv Kunda - Use java.nio.charset.StandardCharsets
     new 9b82372  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new 8f598d7  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new b3c9e3f  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new 7695a01  TIKA-1710 patch from Yaniv Kunda - Use Commons IO instead of the Tika Core IO copies, and java.nio.charset.StandardCharsets
     new 613a391  Import fix
     new 3e31179  TIKA-1711 As Tika needs 1.7, remove 1.6 specific bits of the bundle build. Patch from Yaniv Kunda
     new d9c469f  TIKA-1718 Enforce a consistent commons compress version between components
     new 2d55f1d  TIKA-1718 Upgrade to Commons Compress 1.10, and fix various TODOs that this permits
     new e349b4f  TIKA-1718 Add more Commons Compress supported formats
     new a406151  One more format to add support for
     new 2bf1790  TIKA-1710 Guava is no longer required, we have StandardCharsets instead now
     new 93f8d19  Changelog update
     new fe421fd  Bring in line with other parsers with special InputStream requirements, by using TikaInputStream TIKA-1710
     new 9e19740  TIKA-1722: Tika methods that accept a File needlessly convert it to a URL
     new be5f57d  TIKA-1721: Replace IOExceptionWithCause in ForkClient
     new 6c2abfe  TIKA-1720: Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed()
     new 24b8c2a  TIKA-1716 change default /rmeta content handler to xml and allow users to specify which content handler to use for content
     new 8938fdf  TIKA-1719: Utilize try-with-resources where it is trivial
     new 3497ea6  Test HWP files from Mungeol Heo from TIKA-1728
     new e7a3a49  TIKA-1728 HWP v5(+?) detection
     new 193b5bb  TIKA-1728 Fix the HWP v5 mime type hierarchy
     new 03af7fb  Fix license headers and reformat in tika-example
     new dba2033  Migrate phone numbers example to file walk API
     new b3665cd  Add interruptable parsing example
     new c31fe55  tika-example: explicit locale in String#toLowerCase
     new 0ae1f64  TIKA-1734 via Yaniv Kunda -- use java.nio.file.Path in TemporaryResources
     new ffb46d6  TIKA-1657 Update the example of dumping a Tika Config to support different output modes, for Translators and Detectors
     new d6d180c  Parser updates for config dumping
     new e607d65  Patch from Yaniv Kunda from TIKA-1750 - avoid NPE in CachedTranslator if no underlying translator is available
     new 83a3bb9  Expand the Tika Config dumping support for parsers
     new f3cd6ad  Expose the ServiceLoader used by TikaConfig, and use that to support serialising the service loader config xml section
     new cec1530  TIKA-1744 via Yaniv Kunda -- upgrade TikaInputStream to use Path.  Thank you, Yaniv.
     new 71d72df  TIKA-1747: migrate to Path from File in tika-batch
     new e07bbf8  TIKA-1752: use j.n.f.Path in o.a.tika.detect
     new 44462fb  Reformat to avoid tabs and use JUL for logging
     new 3f99f45  - Files isn't always present (just found test case on older version of GDAL)
     new 285a462  TIKA-1707: upgrade to POI 3.13
     new 191c03d  TIKA-1742 prevent infinite recursion while processing inline images in PDFs by limiting extraction to unique images per page...following Tilman Hausherr's solution on PDFBox
     new 36c2ea4  can't have assumeTrue in a try{}catch{} block: http://stackoverflow.com/questions/8736506/custom-junit-runner-doesnt-ignore-tests-with-assume-assumetruesomefalseconditi
     new 5d50c2e  clean up from TIKA-1742 and TIKA-1748
     new 91422cb  TIKA-1757 and TIKA-1758. Mea culpa.  Thank you Uwe Schindler and Yaniv Kinda
     new bbbfaac  fix two unchecked operations
     new e51adae  TIKA-1756: update forbiddenapis to 2.0 via Uwe Schindler
     new 5bf930e  TIKA-1744 tidying up via Yaniv Kunda
     new 0c86583  TIKA-1741 Include CTAKESConfig.properties within tika-parsers resources by default
     new 8d8823a  TIKA-1765
     new 2a560eb  TIKA-1755 make div and other formatting more consistent btwn PPT and PPTX
     new 2aa5b9e  TIKA-1736 upgrade jackcess-encrypt
     new 3d9454b  TIKA-1762 - Create a Configurable ExecutorService in TikaConfig
     new 1996cdf  TIKA-1772 WebVTT mime entry from Alexander Widera
     new ae80661  Test JP2 (JPEG2000) file from Andreas Hirtzel from TIKA-1773
     new 96980ec  JPEG2000 (jp2) detection tests
     new f949ddf  TIKA-1772 Test WebVTT file from Alexander Widera, mime magic for it, and detection tests
     new 4bb76d3  Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <je...@nytimes.com> this closes #58.
     new 6e6359e  Fix for TIKA-1772: Mimetype of VTT files contributed by Alexander Widera <wi...@chemmedia.de> this closes #59.
     new 6f6a764  Fix for TIKA-1745 Add methods accepting java.nio.file.Path to org.apache.tika.Tika and org.apache.tika.parser.ParsingReader contributed by  Yaniv Kunda.
     new 29f0bf0  Fix for TIKA-1746: modify TikaFileTypeDetector to use new detect method accepting java.nio.file.Path contributed by Yaniv Kunda.
     new 0a1a884  Fix for TIKA-1751: Use java.nio.file.Path in TikaConfig contributed by Yaniv Kunda.
     new 424da1d  Prep CHANGES.txt for 1.11 rc 1
     new b66fca8  [maven-release-plugin] prepare release 1.11-rc1
     new 986d9f9  [maven-release-plugin] prepare for next development iteration
     new 5a7027a  TIKA-1777 fix regression in ppt spacing, patch from Andreas Beeker
     new ff31d82  Bump changes.
     new 63351d1  TIKA-1777 wasn't fixed in time for 1.11.  Fixing CHANGES.txt to reflect my earlier entry error
     new f43de5a  TIKA-1782 allow XHTMLContentHandler to pass attributes of html element via Markus Jelsma
     new 9f680b0  Add Tika Facade parse methods for Path and File which take a Metadata object, to mirror the existing InputStream one. This closes #60 from GitHub
     new 2e60efb  TIKA-1507 - Moved tika-external-parsers.xml to tika-core to prevent OSGi split package issue.
     new a960709  TIKA-1786 -- clean up logging in tika-batch
     new 9e9ea27  TIKA-1792 ASiC E and S mimetypes, detection and tests. Files and mimetype from Roberto Benedetti
     new 9c6f81c  Tweak ASiC comment and priority based on feedback from the spec
     new f73655a  TIKA-1793 Add rfc822 email detection for common thunderbird message first headers
     new c2895d5  TIKA-1791 GeoParser fix for models in a jar file, from Thamme Gowda N. This closes #63 from GitHub
     new 58df156  Fix inconsistent whitespace
     new 02bcf20  TIKA-1791 Comments and logging
     new 6750a05  Fix inconsistent whitespace
     new 74665c0  Changelog update
     new f682eee  TIKA-1795 RTFParser should set, not add, mime type
     new 702b6c4  Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed by Thamme Gowda N and Yueheng He this closes #61 this closes #62
     new 9de407a  Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed by Thamme Gowda N and Yueheng He this closes #61 this closes #62
     new a147d7c  Fix for TIKA-1798 Parser for Video Similarity using PooledTimeSeries metric contributed by Aditya Dhulipala and Chris Mattmann this closes #64.
     new 88b0581  Add an ignore for the on-demand downloaded NER files
     new ded5294  Fix for TIKA-1803 Use lucene-geo-gazetteer REST API in GeoTopicParser contributed by Madhav Sharan msharan@usc.edu this closes #65
     new 3574c9c  Fix for TIKA-1815 Text content from parser is empty when NamedEntityParser is enabled contributed by Thamme Gowda <tg...@gmail.com> this closes #67
     new caa2b5a  Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tg...@gmail.com> this closes #68
     new efaf482  TIKA-1817 Mime magic for AutoCAD DXF in Ascii and Binary, plus the related DXB
     new e9718f9  TIKA-1817 Test DXF ASCII file, and detection unit test
     new 61671f5  TIKA-1820 Upgrade rome to 1.5.1 && TIKA-1516 Downgrade Rome dependency to 0.9 to avoid nasty NPE
     new 784c0a4  TIKA-1821 Support for 1 and 3 byte length PKCS7 DER encoded magic (needs neater way, to follow)
     new f35351e  Try to make the common parts clearer for the DER-encoded PKCS7 signature (length comes between 0x308. and the pkcs7 object)
     new 3e7e335  TIKA-1826: should use getValues instead of just get in Tika gui
     new 40b4a6d  Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tg...@gmail.com>
     new 489ab93  Fix for TIKA-1834: Fix for GeoTopic parser holding state while running Tika server contributed by smadha <ms...@usc.edu> this closes #71.
     new fe841bc  TIKA-1835: LinkContentHandler skips iframe and rel tags
     new 52b82bd  fix for TIKA-1840 contributed by zetisam
     new 7d43bd7  fix for TIKA-1840 contributed by zetisam -- fixed indentation
     new c42b5ad  The testGetJSON() method had a strange cast to (Object) that I removed to improve readability and maintainability. This was identified by findbugs rule BC_IMPOSSIBLE_CAST.
     new efb645e  Update to record change for GH #73 contributed by Marc Breslow <ma...@devfactory.com> this closes #73.
     new b4b5316  Merge branch 'TIKA-1840' of https://github.com/zetisam/tika into TIKA-1840
     new 1bc6176  Fix for TIKA-1840 contributed by Sam Heijens <sa...@zeticon.com> this closes #72
     new 9fa7a4d  Changes.txt for 1.12 release.
     new d0d9013  Prep pom.xmls for release - remove all SCM tags except for tika-parent. Update scm tags to Git. Prep for 1.12 release.
     new 4b4246c  Update SCM connection.
     new 39b9c1c  Rollback release.
     new 2eb6715  Update SCM url.
     new 809370e  Upgrade Git SCM provider and Maven release plugin to 2.4.2 and 1.8.1 respectively to get around http://stackoverflow.com/questions/15166781/mvn-releaseprepare-not-committing-changes-to-pom-xml
     new c0d2b4f  [maven-release-plugin] prepare release 1.12-rc1
     new 5c0ef63  [maven-release-plugin] prepare for next development iteration
     new 38fbc50  TIKA-1823 Sample AutoCAD 2010 DWF file
     new 6a09233  TIKA-1823 AutoCAD DWF mime magic and subtypes
     new 6ef9c94  TIKA-1830 upgrade pdfbox to 1.8.11
     new 256209a  TIKA-1830 upgrade pdfbox to 1.8.11...updated CHANGES.txt file.  doh!
     new d685742  Added NLTK NER
     new 2b99eea  Merge remote-tracking branch 'upstream/master' Integrated NLTK into Tika Parsers by using endpoint as NLTKRest
     new db2b475  Update NLTKNERecogniser.java
     new 59ddcaa  Update NLTKNERecogniserTest.java
     new 892beca  Update NLTKNERecogniserTest.java
     new 25cee54  TIKA-1799: upgrade to POI 3.14-beta1
     new 6ac99bf  TIKA-1799: upgrade to POI 3.14-beta1, cleanup
     new 57ae2c5  Test PKCS7 Signature files produced by CADES, from Alessandro De Angelis TIKA-1821
     new 046e43f  PKCS7 signature detection tests, using test files from TIKA-1821
     new fabeac9  minimal cleanup while working on TIKA-1849: turn test back on.
     new 1e0159b  TIKA-1845: fix npe created by combination of error in RTFEmbObjHandler and failure to handle null in TikaResource
     new d8a2fc0  Test JS file that includes <html in it, based on JS from the ComDev website TIKA-1141
     new d740f5d  Lower the priority of <html later in the file header
     new 557b370  Unit test for detecting JS files
     new 6c0b790  Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tika
     new 559557a  TIKA-1854: add handling for embeddedStorageClassId in MSOffice docs (patch from Daniel Bonniot de Ruisselet)
     new c5b9cb7  clean up tests in server that used to rely on EvilParser (early name for MockParser)...discovered while working on 2.x branch TIKA-1 8 5 1.
     new 14ca320  Used Apache CXF WebClient
     new 542bebc  Record change for TIKA-1835.
     new 2eb49a7  TIKA-1856 Upgrade the Ogg dependency for the truncated files fix
     new fc801d1  upgrade sqlite-jdbc to 3.8.11.2
     new 28b9a66  Briefly describe the parser, and link to the wiki for more details
     new 1b14b39  nltk modification
     new f054bcd  Merge remote-tracking branch 'upstream/master'
     new 13d772a  TIKA-1869 update Jackson to latest version 2.7.1
     new 0bd05ce  TIKA-1870 refactor RichTextContentHandler into tika-core from tika-server so users if needing it don't need to depend upon tika-server
     new 08e38bb  Update changelog for Jackson upgrade from John Patrick from TIKA-1869. This closes #75 from github
     new 3b7922d  TIKA-1870 JavaDoc and Test coverage for RichTextContentHandler that lived in tika-server
     new ac4c0b2  created NLTK host server properties
     new 6c595fb  Merge remote-tracking branch 'upstream/master'
     new 0c03008  TIKA-1874 fix potential npe
     new 1882def  Merge branch 'bugfix/TIKA-1870' of https://github.com/nhojpatrick/tika
     new ed762b7  TIKA-1870 Move RichTextContentHandler from Server to Core, contributed by John Patrick. This closes #77 from Github
     new a13369b  fix for TIKA-1876 contributed by manalishah
     new 7ebe007  fix for TIKA-1876 contributed by manalishah
     new c809690  fix for TIKA-1876 contributed by manalishah
     new 3a7e24c  fix for TIKA-1876 contributed by manalishah
     new 114d0ff  fix for TIKA-1876 contributed by manalishah
     new cdb684d  fix for TIKA-1876 contributed by manalishah
     new 602d237  fix for TIKA-1877 contributed by prasadns14
     new 7801007  Update tika-mimetypes.xml
     new 0dbd69c  updated with changes
     new e147de3  resolved conflicts
     new dbefe98  TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
     new 7c245fa  TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
     new 3fbc03c  Fix for TIKA-1876 Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition contributed by Manali Shah <ma...@gmail.com> this closes #80
     new 9056894  Fix merge conflict.
     new 3aa1dca  TIKA-1657 move xmlification of TikaConfig to tika-core.  Thank you, Nick!
     new 5a34107  TIKA-1657 move xmlification of TikaConfig to tika-core.  Thank you, Nick!
     new 9a1ba94  Fix for side effect of TIKA-1857-- javax.xml.stream is no longer optional.  Thank you, Bob Paulin, for diagnosing this!
     new 46deb4d  Added .hfa mime type mime-type.xml
     new 86579ec  updated mime magic for cab, quicktime, fits and netcdf based on fht analysis on polar-data dump
     new e5d348d  Fix for TIKA-1886 provided by nandan-pc
     new 355a7d1  Patch from prasadns14 from TIKA-1875: NetCDF mime magic, and detection unit test. This closes #78 from github
     new e105088  Merge branch 'TIKA-1877' of https://github.com/prasadns14/tika
     new 1299c9e  Rename the test file for TIKA-1877 to better match our test file naming pattern. This closes #81 from github
     new 963a916  TIKA-1878 Make the Apache SIS version a property, to allow for easier upgrades
     new 1b7009d  TIKA-1878 - Upgrade Apche SIS to 0.6. This closes #79 from github
     new bee1a87  Better express the MP4/QuickTime relationship in our mime type hierarchy
     new b878281  Test CAB file and CAB file descriptor, based on the existing archive test file set TIKA-1890
     new f7d3097  TIKA-1890 Mime magic for CAB files, and unit tests for detection
     new 74e71eb  Magic for Mobipocket Ebook and ESRI Shapefiles from TIKA-1892 from Suman Kashyap
     new c5d4ec6  TIKA-1894: Add XMPMM support to PDFParser and JpegParser via Jempbox
     new b9d5c22  Add missing dependency on tika-test-resources
     new 973204e  Roll in new lang detect support in new module
     new e9f5f42  Add project.build.sourceEncoding to properties
     new e38512e  Add tika-langdetect dependency in other modules
     new 3a7a94c  Remove built-in lang detector
     new f9113be  Move base lang detect classes to core
     new 3bee1d9  Make detector "discoverable", use that everywhere
     new 1caa4fb  TIKA-1853: upgrade to POI 3.14-final
     new 68225a9  fix for TIKA-1872 contributed by trevorlewis
     new 260d77b  Update tika-mimetypes.xml
     new b2cf231  Add uniformity to parser parameter configuration.
     new ae51417  remove unwanted TODO:
     new db1c0e6  Rerranged the order of mime-type x-erdas-hfa in tika-mimetypes.xml , changed the test file name and reduced sized of test file
     new bf2d405  TIKA-1899 -- didn't add a test because triggering file was larger than the fix, metaphorically.
     new 64db961  Added a TikaConfigException, params getter
     new 0d69ca7  Test Case updated with newer exception and getter
     new a1243c7  TIKA-1900 - Max Pool size must be set before core pool size in java 9.
     new a991394  Add missing license headers
     new eafe280  Added missing license headers
     new 4a40cf5  TIKA-1905 - Fix JavaDoc Failures on Java 8
     new 73aaa1b  TIKA-1906: ExternalParser No Longer Supports Commands in Array Format   - Added check for command length and reintroduced copying all arguments for arrays
     new 6a8bb0d  resource leaks discovered while working on 2.x (TIKA-1855).
     new 1924c3f  Merge remote-tracking branch 'origin/master'
     new 98eb56e  TIKA-1285 -- upgrade to PDFBox 2.0.0
     new 9ebf066  TIKA-1285 -- upgrade to PDFBox 2.0.0 -- for now turn off tests with different results in different OS.  TODO: figure out the cause and turn back on!
     new 959c9ad  Test vnd.mif file from Steffen Netz from TIKA-1898
     new c94236a  Remove erroneous backslashes before already-escaped < entries in vnd.mif mime magic, plus unit tests. Thanks to Steffen Netz in TIKA-1898 for help with this
     new 9cdfc4a  Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872
     new 3279a11  Depend on 1.13-SNAPSHOT, not 2.0.
     new 34db935  TIKA-1918: make outputSuffix optional in tika-batch
     new 01109c8  Merge remote-tracking branch 'origin/master'
     new 67fac45  TIKA-1919
     new 404d420  TIKA-1920
     new 30e4e61  TIKA-1921 -- note: need to set default timezone to UTC both programmatically and in surefire plugin at least with Java 8.
     new 6950dcf  TIKA-1922 -- upgrade jackcess to 2.1.3
     new 8580fc2  TIKA-1923 -- upgrade bouncy castle to 1.54
     new 9a71fa7  TIKA-1924 - upgrade iso parser to 1.1.18
     new fc725b9  Upgrade changes.txt to reflect upgrades.
     new 53f29e0  Upgrade changes.txt to reflect upgrades -- fix spacing and small clean up
     new 63bb154  fix for TIKA-1926 contributed by hasanayesha
     new 5e170d4  TIKA-1916 -- Thanks to fxfixer (Nick C) for opening the issue and submitting a patch and test file.  This closes #94.
     new 4f22b08  TIKA-1927 -- Thanks to fxfixer (Nick C) for opening the issue and submitting a patch.  This closes #98.  xerial's sqlite now stores timestamps differently -- varchar.  I got rid of the hack to handle that.  I also added more robust handling of nulls throughout.  The parser should return no value/empty string for null values of all types.
     new 48d17b6  TIKA-1927 -- update changes . This closes #98
     new 5a74add  TIKA-1927 -- add catch for UnsupportedOperationException, which SerialBlob can throw in some versions/OS of java.
     new 6a3aa1a  TIKA-1929 -- ensure deletion of temp file no matter which type of InputStream is used.  Clean up resources correctly in sqlite unit tests.
     new 1c85418  TIKA-1929 insignificant cleanup
     new e0005b3  TIKA-1929 cleanup more tests
     new 7a4301e  TIKA-1914 -- need to startDocument in ExecutableParser.  Thanks to Nick C (fsfixer) for this. This closes #93
     new e920647  TIKA-1033 -- add detection for embedded MSGraph.Chart objects. Also add two convenience methods to TikaTest.
     new 27f1b8f  TIKA-1932
     new e2ef2e9  TIKA-1932 - with correct pattern
     new ca1c265  TIKA-1932 - with correct pattern, third times the charm...argh.  This will avoid incorrectly closing a TikaInputStream if a TIS is passed in.
     new cffda85  TIKA-1934
     new d692390  TIKA-1934
     new f89a19f  TIKA-1935
     new 71f8423  TIKA-1936 -- clean up parsers and tests that aren't cleaning up tmp files, with heavy refactoring of PDFParser tests.
     new 270b8a9  TIKA-1936 -- small cleanup in pdfparser test
     new 61d8ec7  TIKA-1933
     new d184e9b  TIKA-1944 Magic for apple single/double files from Nick C
     new 7e2c089  Grobid NER
     new 5f859fb  mitie ner parser added
     new 2210c81  runtime binding to mitie
     new ab09b0c  read all entities from NLTKRest
     new f9a716a  remove starred imports
     new 26081ca  TIKA-1949 -- upgrade commons compress to 1.11
     new 81bc3cd  Merge remote-tracking branch 'origin/master'
     new 08e932b  fix for TIKA-1943 contributed by Mark Duske
     new f509917  fix for TIKA-1943 contributed by Mark Duske
     new 86145d9  fix for TIKA-1943 contributed by Mark Duske
     new b5e246f  code cleanup
     new b4404c3  TIKA-1948 -- handle per page IOExceptions more robustly in PDFParser
     new e032ac6  TIKA-1948...not sure why these weren't comitted..argh.
     new 3d49131  Add DOM and Stax parsers to ParseContext
     new 172c584  clean up to "Add DOM and Stax parsers to ParseContext
     new dfea473  TIKA-1931 -- revert isoparser to 1.1.7 because of rare permanent hangs on some files starting with 1.1.9
     new 8eac1c5  TIKA-1931 -- revert isoparser to 1.1.7 because of rare permanent hangs on some files starting with 1.1.9, with updage to CHANGES.txt
     new 1c5e96c  TIKA-1937 LinkContentHandler wasn't extracting links from script tags via Joseph Naegele
     new 52851a4  TIKA-1895 upgrade to POI 3.15-beta1
     new da1fe24  Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tika into TIKA-1872
     new b47c364  Merge branch 'master' of https://github.com/NamithaGS/tika into TIKA-1881
     new e4dc21c  Record entry for TIKA-1881.
     new d40f8d4  Update changes.txt and record #85 and fix conflicts.
     new 2e0c9bd  Skip PooledTimeSeriesParser if it's not available
     new 1e2bd89  Added CompositeParser workaround
     new 3f1d6ae  Resolve TIKA-1883: add SFDU MIME magic and type.
     new 9b92734  Record TIKA-1883.
     new acad4cb  This closes #87.
     new 9730a36  Merge branch 'TIKA-1886' of https://github.com/nandan-pc/tika into TIKA-1886
     new f61a4ed  Record change for TIKA-1886
     new 1f96a0e  Fix for TIKA-1882: .cab, .xar, .mobi and .mov files from the TREC-DD-Polar dataset. This closes #82.
     new 3d59471  Record change for TIKA-1882 this closes #82.
     new dab1039  TIKA-1956 -- prevent NPE when trying to get embedded image offset in WordParser
     new 483c162  EPub mimes can end in new lines; let's trim the mime for easier comparability.  Too small for jira or test...imho.
     new 9f3a32a  Added imports
     new 443ecd3  Merge branch 'TIKA-1844' of https://github.com/cafed00d4j/tika
     new 72d76f8  TIKA-1844 pass through POT if it isn't available -- via Aditya Dhulipala. This closes #107.
     new 2ec36ff  TIKA-1844 clean up indentation, clean up streams in case of exceptions, make isAvailable check happen only once. This closes #107.
     new 8487fa7  TIKA-1844 clean up indentation, clean up streams in case of exceptions, make isAvailable check happen only once. This closes #107.
     new 0dc29d0  TIKA-1950 -- clean up jdom version conflicts
     new f39c087  removed logs
     new 80b27e6  Merge remote-tracking branch 'upstream/master' into TIKA-1913
     new 2cce66d  TIKA-1924 -- upgrade to isoparser 1.1.18. Add workaround to prevent infinite loop.  Remove aspectjrt license because isoparser 1.1.18 no longer relies on it.
     new c9d508d  TIKA-1924 -- add back qmino from last commit and clean up earlier work on ParseContext.
     new ea0e68b  Updated TextLangDetector and fixed build errors
     new e558f5d  fix: changed to post request from get request
     new 055bbd4  Merge branch 'TIKA-1872' of https://github.com/trevorlewis/tika into TIKA-1872
     new a484e5e  fix: remove test and handle null quantities
     new 20c1cfe  Merge branch 'TIKA-1872'
     new 2c72c42  Update with information about TIKA-1872, TIKA-1696 and TIKA-1723.
     new 2caf3da  Resolve conflicts in CHANGES.txt
     new b5fe00e  removed starred imports
     new cd06762  model path flaw
     new 5a8e269  Merge pull request #1 from yashtanna93/master
     new a30b143  Merge branch 'TIKA-1926' of https://github.com/hasanayesha/tika into TIKA-1926
     new e0ca3b5  Update changes for TIKA-1926.
     new eb51d9b  Merge branch 'master' of https://github.com/AravindRam/tika into grobid-quantities
     new a353200  Update changes.txt to reflect Grobid Quantities NER merge.
     new 9ecb183  Merge branch 'TIKA-1917' of https://github.com/manalishah/tika into TIKA-1917
     new 2d06bc2  Update changes for TIKA-1917.
     new aebaee6  Merge branch 'master' of https://github.com/reevapp/tika into TIKA-1943
     new e2fdcaa  record changes for TIKA-1943.
     new d711ac9  Merge branch 'TIKA-1913' of https://github.com/manalishah/tika into TIKA-1913
     new f827026  record changes for MITIE and TIKA-1913.
     new d0c5259  Fix compile error.
     new 792c7cf  use assumeTrue to pass tests if not connected to Yandex.
     new ab7c325  clean up earlier work on ParseContext.
     new 0cdf17d  Adding parser for ICNS files
     new 7a543c8  TIKA-1924 - needed to add sanity check in map()
     new 92a4835  TIKA-1894 -- fix potential NPE in XMPMM extraction
     new ee60bc6  TIKA-1894 -- fix potential NPE in XMPMM extraction
     new 64b7ded  TIKA-1960 -- put legacy language detection code back in so that we don't break the API without some warning.  Deprecate it thoroughly.
     new 19ed261  TIKA-1960: Added Welsh Corpus back into test resources to support deprecation of legacy language code.
     new b6d23c1  fix for TIKA-1938 contributed by naegelejd
     new d4fb28f  TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder
     new 6a19918  whitespace cleanup of SourceCodeParser
     new 9f09a55  TIKA-1964 -- clean up incorrect use of BigInteger.add()
     new e2e10f9  Grobid Quantities parser extracts types
     new eb05d04  fixed small whitespace issue
     new 585ab9b  Merge github pull #110 for TIKA-1893
     new 567f7f7  Fix whitespace and keep ordering
     new fba811b  Whitespace
     new 7f486e6  Comment updates
     new c93ff3e  TIKA-1966 Converted versions of test iWorks files from latest iWorks for iPad
     new 4aff483  Merge branch 'master' into TIKA-1343
     new 8e4c3ff  Merge branch 'master' of https://github.com/cmenekse/tika into TIKA-1965
     new c9bc910  TIKA-1965: Added types to Grobid quantities parser. Pull Request by Can Menekse.
     new 91313e3  Reverting incorrect addition of : to header
     new d447193  TIKA-1885: Addition of ZeroSizeFileDetector based on Pull Request from Adesh Gupta.
     new 5f0e930  Added CHANGE information for TIKA-1885 and TIKA-1965
     new eede044  TIKA-1885: Updated test to specify charset in getBytes()
     new c991452  Update CHANGES.txt for 1.13 release.
     new ff066dd  Added ASL2.0 Headers
     new cc5f841  [maven-release-plugin] prepare release 1.13-rc1
     new be66dac  [maven-release-plugin] prepare for next development iteration
     new a91d083  TIKA-1955: Updated check to read from stream to avoid misreporting due to blocking
     new 1caba4d  [maven-release-plugin] rollback the release of 1.13-rc1
     new 114b604  TIKA-1885: Updated MimeType to application/x-empty to match Unix file command
     new 54c5909  Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tika
     new 4b2abf6  [maven-release-plugin] prepare release 1.13-rc1
     new 6ed2129  [maven-release-plugin] prepare for next development iteration
     new 1180dc3  Revert "[maven-release-plugin] prepare for next development iteration"
     new e7e0886  Revert "[maven-release-plugin] prepare release 1.13-rc1"
     new 75fc12b  TIKA-1955: Updated to use mark() and reset()
     new 58c82ca  [maven-release-plugin] prepare release 1.13-rc1
     new 079c25b  [maven-release-plugin] prepare for next development iteration
     new 84619e0  Revert "[maven-release-plugin] prepare for next development iteration"
     new b216b87  Revert "[maven-release-plugin] prepare release 1.13-rc1"
     new 386b68b  [maven-release-plugin] prepare release 1.13-rc1
     new da5bbbe  [maven-release-plugin] prepare for next development iteration
     new 46d5775  fix for TIKA-1938 contributed by naegelejd
     new 69852e4  TIKA-1454 -- added initial hyperlink extraction for ppt, pptx, xlsx.  Areas for improvement: limit links to external links for ppt and pptx. Fix NPE in cell.getHyperlink within ppt in POI
     new c1a3ce0  Merge remote-tracking branch 'origin/master'
     new bb78082  TIKA-1454 -- clean up and add entry to CHANGES.txt
     new bc0b1f7  TIKA-1958 - add mime detection and parsers for 2003 MSWord XML (wordml) and MSExcel XML (spreadsheetML)
     new b2821d9  TIKA-1928 Fix detection for filenames containing a #, avoid mis-detecting that part as a page anchor
     new e08d006  TIKA-1971 - add another magic for rfc822
     new 09cc658  TIKA-1970 - Mac Mail date of interesting format not parsed by james mime4j
     new de6dbd0  Merge remote-tracking branch 'origin/master'
     new 7408531  TIKA-1971 - failed to revert earlier work on dbf - this is a clean up from the TIKA-1971.  Thank you, Nick.
     new 89881ce  TIKA-1977 set title vs add title -- also clean up of javadoc href and whitespace from TIKA-1970
     new 4a324ff  TIKA-1977 avoid use of forbiddenapi
     new 534347d  TIKA-1976 improve date parsing in rfc882 parser
     new 6e42062  TIKA-1976 fix forbiddenapis
     new bb46c0e  TIKA-1979 -- add message on stdout when tika-app's server.  Add deprecation message on stderr for tika-app's server
     new e780d56  merged upstream changes and resolved conflicts
     new b64612d  Update javadoc with @since
     new fc4f13d  TIKA-1978 Invocation of java.net.URL.equals(Object), which blocks to do domain name resolution, in org.apache.tika.parser.geo.topic.GeoParser.initialize(URL)
     new e74f663  TIKA-1513 -- add mime detection and parsing for dbf files. Thanks to Nick C for the mime definition and Luis Filipe Nassif for collaboration.
     new 167966e  Merge remote-tracking branch 'origin/master'
     new cb492f4  TIKA-1513 -- add mime detection and parsing for dbf files. Thanks to Nick C for the mime definition and Luis Filipe Nassif for collaboration.
     new 608fbf5  TIKA-1985 -- add charset handling to field names; add datetime processing; rework calculation of number of columns to handle extra zero-padding at end of header.  Waiting on permission for test file.
     new b47f162  TIKA-1985 -- remove errant printstacktrace
     new 0186992  Added support for type for runtime parameters
     new 9e08a6b  Updated test case with type checking
     new dcaeccb  TIKA-1513 -- update mime type according to Nick Burch's recommendation, other small import clean up
     new 1a04f80  TIKA-1985: fix auto-import mis-clean up
     new 16290d8  TIKA-1837 -- strip comments before trying to find encoding in HTMLEncodingDetector
     new c6eefbd  FIX: return value typo
     new 6ad18f4  TIKA-1990 -- make sure to include JPEG filters when exporting jpegs embedded in PDFs
     new aad23d9  Added @Field Annotation to support auto initilaize params from config
     new 67941a6  Using TikaConfigException instead of RuntimeException
     new 40f8ec9  TIKA-1992 -- check for duplicat images via COSStream not object name.
     new 3ca9214  TIKA-1992 -- check for duplicate images via COSStream not object name.
     new fbd766e  TIKA-1992 -- check for duplicate images via COSStream not object name....sorry.
     new a20c46c  TIKA-1992 -- check for duplicate images via COSStream not object name...fourth time is the charm...ugh.
     new ea47b71  Merge branch 'TIKA-1508' of https://git-wip-us.apache.org/repos/asf/tika into TIKA-1508
     new 7aeb95d  TIKA-1994 -- integrate OCR with PDFParser
     new 27d9290  Remove duplicate code
     new 1202f45  Merge branch 'TIKA-1986' into TIKA-1508
     new 1af1078  TIKA-1994 -- integrate OCR with PDFParser, update CHANGES.txt
     new 49ddf6e  Merge remote-tracking branch 'origin/TIKA-1508' into TIKA-1508
     new 18ab8f9  TIKA-1508 add some unit tests for ParameterizedParserTest
     new 853750d  TIKA-1508 proof of concept with on parameter on PDFParser
     new 3e14505  TIKA-1999 add limit to number of events extracted from the XMPMM section by the JempboxExtractor
     new 99aa587  TIKA-1999 small fix and update CHANGES.txt
     new 71726bc  Fix configure issue with decorated parsers
     new e48d191  Add test case config file
     new 9b5dc7f  External Parser now have consumer for ignored lines,  Fix TIKA-2002
     new eccc153  Added an utility to load and insatiate classes
     new 2184e2c  Object recognition parser, tensorflow based implementation, and test cases for these
     new 0305cfb  Explicit Locale
     new f3e9d82  Feedback incorporated - longer keynote + confidence output to metadata
     new 6b56ad1  Added the script file for Tensorflow classify images
     new 2a61205  Script copied from class path rather than HTTP GET  and Code conventions correction for static final constants
     new 06633cc  TIKA-1996 -- upgrade to PDFBox 2.0.2
     new ecdc403  Merge remote-tracking branch 'origin/master' into TIKA-1508
     new 338db90  Start factoring out "configurable"; change signature of ParseContext's setParam to (Class, Param); add check for illegal field being specified in TikaConfig.
     new 2140858  Merge remote-tracking branch 'origin/TIKA-1508' into TIKA-1508
     new ef1f7b9  fix conflict
     new 03d3824  Remove Configurable entirely; update PDFParser example for one field.
     new 4d308fd  TIKA-2006 -- add mime definitions for iCal and vCalendar
     new d405172  Add mime definition for Windows Media Metafile (TIKA-2004).
     new 01a9b6d  Add mime detection and parser for Microsoft Owner File (TIKA-2008).
     new f7fe685  TIKA-2008 -- change owner metadata key from TikaCoreProperties.CREATOR to TikaCoreProperties.MODIFIER
     new 592ae6a  TIKA-2008 -- change owner metadata key from TikaCoreProperties.CREATOR to TikaCoreProperties.MODIFIER
     new acf031a  TIKA-2009 -- add mime magic for djvu
     new 6291648  make sure to test magic for vcs/ics/asx
     new ade60ed  TIKA-2011 -- add mime detection for Endnote Import file
     new d9dcd59  Merge remote-tracking branch 'origin/master' into TIKA-1508
     new 0132037  TIKA-1986 -- add Initializable, strip out handling of params passed via ParseContext in PDFParser
     new 30b0f66  TIKA-1986 -- revert parsecontext to ab7c325 and update PDFParser to handle non-primitive parameter setting
     new f8fe50a  TIKA-1986 -- allow params to be passed into initializable, delete configurableparser
     new 4c7481e  Added Documented Annotation
     new 027625d  Pulled upstream changes (TIKA-1508) and resolved merge Conflicts
     new 31cf12d  updated with the new changes of TIKA-1508
     new 5101023  Tesseract may see the t in haystack as a ! some times...
     new a46ffac  Upgrade Commons Compress to 1.12 (supports progress on TIKA-1358)
     new d6981ad  TIKA-1358 add preliminary detection for iWorks 2013 file types
     new 52ea9ba  Detection magic for POI-generated OOXML files, which have _rels before content type, plus test
     new 7ae760e  TIKA-2019 -- parsers for 2003 MS xml files fail to add spaces/tabs correctly when using the ToTextHandler
     new 81279a1  Merge remote-tracking branch 'origin/master'
     new 2031de7  TIKA-2019 -- clean up -- move state variables to inner classes, convert protected to package private, add @Override on parse
     new 48b27d2  fix for TIKA-2021 contributed by Zarana Parekh
     new de84d71  fix for TIKA-2021 contributed by Zarana Parekh
     new be8b433  creation of TIKA-2016 contributed by amensiko
     new 47221b9  TIKA-2022 -- add applefile parser
     new 0f3b0bd  TIKA-2022 -- add applefile parser
     new e6c2839  TIKA-2022 -- clean up test, change dependency on CloseShieldInputStream to commons.io
     new c1cea20  TIKA-2023 -- clean up RTFParser to use EndianUtils
     new 7db0ab6  TIKA-2023 -- clean up newlines and indenting
     new b10f250  optional processing enabled
     new 37695d4  added validation tests for new processing features
     new 2c4670e  TIKA-2022 -- clean up AppleSingleFileParser to use EndianUtils, shorten test file, make field types private
     new b6d55ae  TIKA-2024 -- extract original file name/location, initial patch: rtf, applefile, word2003, word, pdf
     new 69d8250  TIKA-2024 -- remove debugging test
     new 7cc610e  TIKA-2026 -- improve extraction of embedded files from ppt, pptx and xlsx
     new 52f04be  TIKA-2026 --fix caps on test files
     new a57d836  TIKA-2024 add location extraction for OLE1.0 embedded files
     new 23a11ef  fix getRecursiveJson -> getRecursiveMetadata in TikaTest, no json is involved here...
     new bc6667c  updated property name, removed orthogonal changes
     new 6773d42  updated Javadoc for Tesseract config and parser
     new fa30edd  updated scope in pom.xml
     new fe559b8  Merge master into TIKA-1343
     new 27e999d  updated config file
     new c2a8ac1  formatting chanages
     new 12b0aee  rebasing pom.xml for tika-bundle
     new 6809282  formatting changes
     new 1a46c59  added check for non-UNIX OS
     new 95b2cd1  TIKA-2029: add some content for links so that we don't generate bad html <a href="http://tika.apache.org/"/>
     new b3c09b4  formatting changes
     new 4fd3e68  fix orthogonal changes
     new 1c6cff8  Merge branch 'TIKA-2021' of https://github.com/Zarana-Parekh/tika
     new 6f16480  Fix to work if ImageMagick isn't present. Fix forbidden APIs.
     new 636060e  Record TIKA-2021 change.
     new c0320f1  TIKA-2030 - add processing for <text:s/> element in odt, thanks to David Pilato for identifying this.
     new 8d29f7a  Merge remote-tracking branch 'origin/master'
     new ff187a0  TIKA-2030 -- fix test document so that it is correctly detected.
     new 3ecdc0c  Email with attachment for testing extraction issues
     new 952fb54  TIKA-2037 RFC822Parser should wrap the James InputStream of embedded resources to avoid problems with downstream detection or extraction
     new a383567  TIKA-2025 -- override general format in excel to extract 15 digit integers
     new f00ab04  TIKA-2029 -- upgrade jackcess to 2.1.4
     new 9e0a87e  TIKA-1993 -- High throughput Tensorflow  Inception based image classifier via: (1) GRPC and (2) REST API
     new 3deea1b  TIKA-1993 -- improve usability of docker container
     new 72d2d88  TIKA-2042 MBOX magic and detection unit test
     new 53c461a  Changelog update
     new d698d49  TIKA-2041, step 1, upgrade icu4j components; add back ebcdic and bump bytesRead back to 12000 from the "modern" 8000
     new 7dc5c67  TIKA-2041 -- add unit test in HTMLParserTest
     new f5b04b6  Merge remote-tracking branch 'origin/master'
     new 71cb936  TIKA-2040 - prevent permanent hang/oom on corrupt chm file
     new e539802  removed error print statements, static changes
     new 85e5385  TIKA-2048
     new 8a68b5d  clean up MatParser
     new bd9a9b9  TIKA-2041 - add important diffs between new copy/paste from ICU4J and legacy code which may have included Tika-specific mods.
     new 9024f12  fix for TIKA-1980 contributed by naegelejd
     new 5495ffc  TIKA-1980 - via Joseph Naegele. This closes #121
     new f77eb2b  Removed GRPC implementation as it is redundant over REST
     new a1d1a81  TIKA-1986 Fixed the outdated Bad paramter test case and removed deadcode in comments
     new 0096dd7  Log warn when confidence is not equal to size of objects.
     new 52be425  1. use start/End document in handler; 2. populate metadata before handler is called. 3. make topN 2 in both REST and script configs.
     new afb7e36  Merge branch 'TIKA-1508'
     new da82df5  Update changes for TIKA-1508, TIKA-1993, TIKA-1986.
     new c3fc92f  Tickle to close Github issues. This closes all GROBID recogniser's - they have already been committed. This closes #116 this closes #117 this closes #118 this closes #119
     new fde6717  Tickle to close. This closes #96.
     new 6213cc1  This closes #115. Empty MimeType subsumes application/zerosize.
     new 33dc408  This closes #109. No need for PersonaParser. Going to handle this in USC IR repo, no need for a parser.
     new a5add3e  Tickle this closes #120. Orthogonal changes that I don't understand to Lingo24 API.
     new bf072bb  Merge branch 'master' of https://github.com/Zarana-Parekh/tika into TIKA-2031
     new 830685e  record change for TIKA-2031.
     new 5e4678e  remove dependency on edu.usc.sentiment
     new bec2f9b  remove sysout
     new c71e0b2  TIKA-2007 -- upgrade jackson, clean up unused dependency in tika-parsers
     new 173ff59  TIKA-2065 -- upgrade forbiddenapis
     new 069fa86  TIKA-2066 -- upgrade commons-io
     new 52be682  TIKA-2067 -- upgrade maven plugins
     new 80efc84  TIKA-1255 -- fix hyperlinks in doc/docx if there is formatting TIKA-2078 -- handle multiple runs within a hyperlink (docx)
     new 3c0abc8  TIKA-2064 Mime types, with magic, for mostly-xml Stata DTA files. (Awaiting suitably licensed file for testing)
     new 2222fe0  TIKA-2064 Test Stata DTA files from Michael Stepner, plus detection unit test
     new 9130bbc  Changelog update
     new 27b9cf5  TIKA-2055 catch exception when totalTime out of unsigned int range in ooxml
     new 1c0e600  Merge remote-tracking branch 'origin/master'
     new 6ebbd40  clean up triplicate commons-exec defs...not sure how these got in here.
     new 084379c  Remove unused variable
     new 07aea36  TIKA-2051 -- upgrade to PDFBox 2.0.3
     new a1250ff  Improve logging and trivial code conventions
     new d50a693  Merge branch 'master' into TIKA-1343
     new cc6f6dc  TIKA-2013 -- upgrade to POI 3.15-final, make sure to add new close() throughout for MAPIMessage and NPOIFS
     new 4153812  TIKA-2047 -- maintain mime info for mimes that are subtype of text/plain handled by TXTParser
     new 2ae7206  TIKA-2069 -- extract macros from MSOffice docs
     new 8a45f67  TIKA-2069 -- extract macros from MSOffice docs, fix tests to find target metadata object in any order
     new 10507d0  add hOCR output format to TesseractParser
     new 3a5431e  TIKA-2093 -- add option for Tesseract's hOCR output, thanks to Eric Pugh! This closes #133.
     new d612aea  TIKA-2081 -- add fileUrl back into tika-server
     new b58368f  TIKA-2081 -- add fileUrl back into tika-server, update changes.txt
     new e9e8d3b  TIKA-2081 -- add fileUrl back into tika-server -- fix commandline options not to include '-'
     new 98d75f6  TIKA-2095 -- include Tika version in GREETING
     new ce07d8a  TIKA-2057 - maintain DocInfo metadata in PDFs
     new 5466468  typo in changes.txt
     new 308d26f  TIKA-2097 fix npe in MboxParser
     new c33ac04  fix for TIKA-2098 contributed by alexshadow007
     new 0a4b0e8  Merge branch 'TIKA-2098' of https://github.com/alexshadow007/tika
     new 9b497d1  TIKA-2098 small clean up.  Test for writelimitreached for each catchable IOException.  Many thanks to Alexander Kazakov for finding this and submitting https://github.com/apache/tika/pull/134
     new feac58b  TIKA-2101 -- Don't call MAPIMessage.close()
     new 5af482e  TIKA-2106 -- need to lowercase hocr/txt suffix; thanks to Eric Pugh.  This closes #136
     new 1b72a38  TIKA-2110 -- include exception along with message
     new bfd1d91    * Upgrade metadata extractor to 2.9.1 (TIKA-2113).
     new 8e819c3  TIKA-2122 : add all headers from MSG and RFC822 files
     new bf08ba9  TIKA-2122 : add all headers from MSG and RFC822 files, update changes file
     new 02425b2  TIKA-2123 : digester fails when multiple digest values on large files; add more robust tests
     new 88058a0  Prep for release of 1.14 RC #1.
     new dbb6baa  [maven-release-plugin] prepare release 1.14-rc1
     new 687d770  [maven-release-plugin] prepare for next development iteration
     new b3f1497  TIKA-2127 : npe if there is no notes master
     new f19be22  Merge remote-tracking branch 'origin/master'
     new a6b8e04  ugh...remove println statement from AppleSingleFileParser
     new bc7216f  TIKA-2133
     new 7ca105e  TIKA-2130
     new 7fbf0f3  TIKA-2090 -- first draft
     new 5657ae6  Merge branch 'TIKA-1343' of https://github.com/lewismc/tika into TIKA-1343
     new dadbf55  TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder
     new 8d5eaaa  TIKA-1343 Stabalize check for JoshuaNetworkTranslator availability.
     new 4dd6fd1  TIKA-2090 -- add more areas where javascript might live and add ability to turn action extraction on/off
     new 01163e2  TIKA-2144 - avoid npe if styles doesn't exist (odd, indeed, but if MSWord can handle it, we should, too).
     new e215b9d  Merge remote-tracking branch 'origin/master'
     new b67373f  TIKA-2056 Make ExternalParser.LineConsumer Serializable
     new 011f338  TIKA-2056 Fixup (forgotten import)
     new 15a9230  TIKA-2111 - set instead of add "Content-Type" in the ExecutableParser
     new 2df68c8  improve test for TIKA-2098
     new 75fa138  TIKA-2157 -- handle zip exception in embedded stream
     new aadccbf  TIKA-1933 -- one more place where we weren't properly closing the ForkParser and were leaving behind a tmp ForkParser jar
     new 7b45c7c  TIKA-1896 -- add test files and unit tests, no fix yet
     new ab53bdc  TIKA-2171 --upgrade SQLite to 3.15.1
     new 7dda921  TIKA-2173 add other setters to PDFParser so that they can be configured
     new c17d1b8  TIKA-2174 add jp2 and jpx to file formats handled by TesseractOCRParser
     new 47ba703  TIKA-2159 -- first step
     new 91cdce4  TIKA-2175 -- add extraction for inline jp2/jpx from PDFParser
     new 1aff638  TIKA-2174 -- add .ppm to tesseract
     new 98de288  TIKA-2174 -- fix jp2
     new b97045a  TIKA-2174/TIKA-2175 -- clean up
     new e8bf985  TIKA-2170 -- allow users to configure timeout for ForkServer
     new 2e325cb  TIKA-2170 -- fix unit test to allow for different exceptions to be thrown for losing connection to server
     new d647a23  Mimetype for SAS Xport (XPT) files
     new a9a9e08  TIKA-2116 upgrade to POI 3.16-beta1
     new 81fad8c  TIKA-2179 -- add detection and parsing for word2006ml files
     new 2df8567  TIKA-2169 -- fix xhtml markup caused by bug in OCR parser
     new 361ffa4  TIKA-2096 -- automatically add AutoDetectParser for embedded documents if the user forgets
     new 1cfd250  TIKA-2096 -- update CHANGES.txt
     new 7b4f6fa  TIKA-2096 -- fix example of not including embedded docs
     new d19e472  TIKA-1321 -- add SAX based docx parser and integrate it with the recent 2006ml parser work -- initial commit
     new fe20ecd  TIKA-2187 -- change default behavior in experimental .docx parser to ignore deleted text.  Allow configuration of including or not including text from .doc files.
     new 09931fe  TIKA-2187 -- fixed test
     new 0e0f30d  Merge branch 'pdf_javascript'
     new 99b5924  TIKA-2090 -- add ability to extract PDActions from PDF files
     new 40401e5  [TIKA-2189] fix for Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java
     new 8943013  TIKA-2191 -- step1 -- add other docx tests and comment/ignore where appropriate
     new f93d4e1  TIKA-2191 -- step2 -- add handling for docm files...extract macros
     new 1aca10a  TIKA-2191 -- step 3 -- clean up <b> and <i> tag handling
     new 806eaf8  TIKA-2191 -- step 4-- add markup for embedded pics
     new 4469ca2  TIKA-2191 -- step 5 actually extract images embedded in areas besides the body of docx/m
     new 615bf75  TIKA-2192 - add extraction of embedded objects in DOM docx parser from more than just main document
     new 5425d02  update changes for TIKA-2191 and TIKA-2192
     new 3ee9fd5  TIKA-2191 - step 6(?) add list numbering, bookmarks and styles
     new 192e3ca  remove println...the horror...ugh
     new faf6c2b  TIKA-2191: fixes after regression testing on TIKA_1302 corpus: 1) add 'cr' and 'br' and 2) add 'template' to potential main story body parts
     new 0f3fe38  TIKA-2191: fixes after regression testing on TIKA_1302 corpus: 1) add 'cr' and 'br' and 2) add 'template' to potential main story body parts -- git add test file.
     new 0f78a31  TIKA-2191: convert Styles reader to SAX and store only styleId->styleName map.
     new 533572b  TIKA-2195: refactor MockParser to consolidate service loading and custom mime type into tica-core/src/text
     new 85c3457  TIKA-2173: improve configuration of PDFParser via @Field
     new 653b980  TIKA-2191 -- optimize branching in start and endElement based on corpus statistics
     new 1d9445b  Update to PDFBox 2.0.4
     new 90cdf1f  TIKA-2210 -- add experimental SAX parser for pptx -- this is a first cut.  More refactoring is in order.
     new ca37313  TIKA-2218 -- add a few more places where .pptx can include embedded objects
     new 376318f  TIKA-2220 - refactor new sax pptx and docx to reduce code duplication.
     new c624104  TIKA-2221 -- correctly catch and convert encrypted document exception to EncryptedDocumentException in WordParser via Matthew Caruana Galizia
     new bf12e5a  remove printlns in ZeroSizeFileDetectorTest
     new 2dbd651  TIKA-2219 - make sure to transmit encoding name in detectAll() via Pascal Essiembre
     new 8e12ebe  Merge branch 'bug_TIKA-2189' of https://github.com/dasbipulkumar/tika
     new c83f87b  clean up tabs
     new 87c2ef3  New WordPerfect and QuattroPro parsers for TIKA-1946 contributed by pascal.essiembre
     new ae44b9e  TIKA-2190 -- add configurability for preserve interword spacing
     new aa2407a  add comment on outputType and trigger close of TIKA-2189. This closes #139.
     new 0de63a1  Now throwing TikaException on unsupported QuattroPro format instead of logging a message, as suggested by Luis Filipe Nassif.
     new 202f137  TIKA-2211 -- make sure that <head> content doesn't appear as content in epub
     new 7a5b983  Merge branch 'TIKA-1946' of https://github.com/essiembre/tika
     new d011d70  TIKA-1946 -- initial commit of QuattroPro and WordPerfect parsers.  Many thanks to Pascal Essiembre for contributing these!!!
     new df14f78  TIKA-2224 Mime magic for OneNote
     new 009c143  TIKA-2224 Mime sub-entry for .onepkg, a cab file holding other onenote files
     new 135f326  Changelog
     new 9546bd3  TIKA-2224 We now differ from HTTPD on onenote formats, as we have subtypes they lack
     new aa448a3  Merge
     new 2013d33  TIKA-2226 -- add UnsupportedFormatException
     new 84a3720  TIKA-1946 -- updates, add detection for wp 5.0 and 5.1, and quattropro 7-8 vs quattropro 9
     new 6c3e5db  Merge remote-tracking branch 'origin/master'
     new ef1d907  TIKA-2224 Test OneNote file from Krishnan Narayan plus unit test
     new 6dc442d  TIKA-2228 - WordPerfect parser update to handle 5.x from Pascal Essiembre. This closes #142.
     new 940e6f4  TIKA-2230 -- add paragraph markup to WordPerfect parsers
     new e6cbaa0  Fix for TIKA-2232 contributed by pascal.essiembre
     new 09632ef  Now failing on any type of X-TIKA:EXCEPTION
     new 11fe4ba  Now properly decoding JBIG2 when inline in PDF (as opposed to pretend it is PNG).
     new 9fdf9a8  Merge branch 'TIKA-2232' of https://github.com/essiembre/tika
     new 86dbde4  TIKA-2232 -- shorten one unit test and update changes.
     new 66f6310  TIKA-2234 -- get rid of ThreadLocal
     new d1b1ad3  TIKA-2235 - set default dpi for OCR to 300 via Matthew Caruana Galizia
     new ba26f6e  TIKA-2232 add unit test for OCR of jbig2 embedded in PDF.
     new a38a2b0  TIKA-2237
     new 8eb7d35  TIKA-2159 handle preparse exceptions uniformly
     new 526fc08  TIKA-2134 -- handle missing parts more robustly
     new c9639bd  TIKA-2238 -- add mime detection for embedded MSEquation files
     new b9474f1  TIKA-2238 -- add mime detection for embedded MSEquation files
     new 25aa2be  TIKA-2232 -- ImageParser shouldn't allege that it can handle jbig2 when jbig2 library is not on class path
     new 320a1f1  TIKA-2241 Add new config dumping option STATIC_FULL which lists all supported+active mime types for parsers
     new 5c51534  TIKA-2231: Improved param validation of TesseractOCRConfig.setLanguage() and added more tests
     new 8a04f20  TIKA-2242 -- fix style markup in ODT
     new 02eae6d  Merge branch 'TIKA-2231' of https://github.com/ham1/tika
     new c978a11  TIKA-2231 -- update changes.txt. This closes #147
     new 00bb6f4  TIKA-2240 -- improve mime detection for .wri files
     new 9d97e16  TIKA-2232 -- log/warn if jbig2 parser is not on classpath
     new 896c46a  be more parsimonious wrapping streams
     new 9477d03  Merge remote-tracking branch 'apache/master'
     new 4cc15e2  TIKA-2244 -- be more parsimonious with BufferedInputStream. This closes #148.
     new 847156a  TIKA-2250 As of RFC7903, the official mime type for BMP is now the one without the x- prefix
     new e6c0082  TIKA-2250 As of RFC7903, the official mime type for WMF is now an image one and without the x- prefix
     new 90bf4f6  TIKA-2250 As of RFC7903, the official mime type for EMF is now an image one and without the x- prefix
     new 836e2d9  TIKA-2244 -- be more parsimonious with BufferedInputStream -- AutoDetectReader
     new 7afcfc7  Merge remote-tracking branch 'origin/master'
     new fe94908  TIKA-2249 -- update javadocs to alert devs that tables are not "maintained" by the PDFParser
     new 280ab87  TIKA-2251 --  make catch blocks as small as possible and improve "logging" with malformed files in new experimental SAX docx/pptx parsers.
     new c09422a  TIKA-2253 Obtain new Miredot license key and upgrade plugin version in tika-server
     new 8213e53  Merge branch 'TIKA-2253'
     new 2735942  TIKA-2255 Test SAS files
     new c5130ec  TIKA-2255 Magic for older sas data files
     new 73a37a4  TIKA-2255 Mime detection unit tests for SAS files
     new 3c0cd64  TIKA-2025 -- fix xls/x testBigIntegersWGeneralFormat to work in multiple locales. This closes #151
     new da8363f  Merge remote-tracking branch 'origin/master'
     new 7555b13  TIKA-2259 -- improve url extraction from PDFs = copy Tilman Hausherr's code from PDFBOX-3644
     new 0d54f07  TIKA-2181 - upgrade to POI 3.16.beta2
     new bc3b263  TIKA-2198 - add null check to Tika after upgrade to POI 3.16.beta2
     new 27e026e  TIKA-2134 -- remove npe catch after upgrade to POI 3.16.beta2
     new b9befb4  TIKA-2247 and TIKA-2246 -- add parsers for EMF/WMF
     new aa7a0c3  TIKA-1332 -- initial commit for tika-eval module. More work remains.
     new 506b572  TIKA-1332 -- fix one report for eval profiler and clean up whitespace
     new d194ba4  TIKA-1332 -- downgrade Lucene to 5.x to allow for Java 7
     new 6c6b77b  TIKA-1332 -- clean up commons-io version mgmt
     new a2d214c  TIKA-1332 -- fix analyzer chain for common tokens, clean up UTF-8 references
     new dc2dcd4  TIKA-1332 -- add English/Spanish common tokens, fix logging
     new 9cf8258  TIKA-2267 -- add common tokens for some languages into tika-eval
     new 3366bc6  Fix for Tika-2269
     new 0b85460  TIKA-2269 -- Fix potential NPE in FeedParser via Julien Nioche. This closes #153
     new 94fd3f6  change tika-eval default logging to INFO
     new b3837a4  TIKA-2275
     new 166ebb2  TIKA-2276 -- pass through TikaConfig via ParseContext in AutoDetectParser
     new 1c87339  TIKA-2276 -- pass through ParseContext to prevent needless creation of TikaConfig
     new e86f2d8  TIKA-2277 -- remove ParseContext field from AbstractParser
     new bce5c79  TIKA-2277
     new 10ca360  TIKA-2276 -- rollback...sorry.
     new 6e4116b  TIKA-2276 -- Have AutoDetectParser pass itself to the ParseContext for embedded documents if the user hasn't specified a parser _instead_ of keeping around a TikaConfig and passing that in.
     new e3a50ba  TIKA-2276 -- Try to reuse parsers from ParseContext for custom embedded handling, instead of creating a new HtmlParser/RTFParser.
     new 579a92b  TIKA-2276 -- Have AutoDetectParser pass itself to the ParseContext for embedded documents if the user hasn't specified a parser _instead_ of keeping around a TikaConfig and passing that in.
     new 6697dcd  Check for HTMLParser/create a new one just once.
     new 6d022be  TIKA-2273 -- two tests turned off temporarily in bundle.  First draft of adding configurability to EncodingDetectors
     new 500e15d  TIKA-2278 -- clean up extract exception handling
     new 4a4e89a  TIKA-2278 -- clean up extract exception handling, add license header
     new e7a0c3e  TIKA-2273 -- fix configuration of encoding detectors when parsers are loaded statically.
     new 5e0f926  TIKA-2273 -- cleanup, update CHANGES.txt
     new 7a7887b  TIKA-2279 -- simplify token counting
     new efc67b8  TIKA-1857 -- fix extraction of field contents to handle hierarchical structure in <data> section
     new d492657  tika-eval fix bug that stores parent file extension instead of embedded doc file extension
     new 3c3e8e1  TIKA-2286
     new b8fd7ee  TIKA-2285 -- triggering document did not trigger an string out of bounds exception, but a corrupt/very short stylename could.
     new 87745df  TIKA-2285 -- don't trim before check
     new c3383b0  TIKA-2281 -- extract mapi message class
     new 745f13c  TIKA-1865 - step 1, split out sender name from sender email/exchange info where possible in MSG files.
     new 4f10801  clean up whitespace
     new 2234f33  TIKA-1865 -- step 2, the other parsers.
     new fecb19a  TIKA-2281 applied to PSTParser
     new 3bfe830  TIKA-1865 bug fix
     new 24fec4d  TIKA-2287 -- add general jdbc.
     new 409e905  TIKA-2287 --   bug fix, improve handling when ref tables already exist
     new b2f3eaf  TIKA-1879 -- improve recipient email address extraction; revert the X400/500/Exchange processing for the "from" field from TIKA-1865
     new 67cd6c3  TIKA-2290 - fix bug that prevented setting ocr strategy on PDFParserConfig
     new 51e8320  Update mailing list archive links
     new 49d6fd7  Bumped junit and slf4j versions
     new abfc826  TIKA-2242 -- fix annotations with <p> elements inside of <p> elements
     new 65182ee  TIKA-2295 -- extract images from ODT
     new 6465282  TIKA-2297: Added initial Lingo24 Language Detector
     new 79b6c15  TIKA-2292: Updated CXF version to 3.0.12
     new 6f9ca9d  TIKA-2292: Updated CXF version to 3.0.12
     new 0173a2f  Merge branch 'TIKA-2297' of https://github.com/dameikle/tika
     new 1bdc1a3  TIKA-2297: Added Lingo24 Language Detector
     new e9ff4c0  Fix for TIKA-2303 contributed by ppalazon
     new 585316d  TIKA-2287 -- bug fixes
     new 3d64e60  Merge remote-tracking branch 'origin/master'
     new 1725007  TIKA-2236 upgrad PDFBox to 2.0.5 and JempBox to 1.8.13
     new 679e460  Merge branch 'TIKA-2303' of https://github.com/ppalazon/tika
     new 22f6ccf  TIKA-2303 -- allow users to configure whether or not to extract bookmarks via Pablo Palazon. This closes #157
     new 7894819  clean up unit tests, and modify two ODF tests based on feedback of broken build on user list
     new a9145d8   modify two ODF tests based on feedback of broken build on user list -- remove parsing of embedded files.
     new f55b87f  TIKA-2300 include exception for streams that can't be read in pkg parser via Aeham Abushwashi
     new c4660b4  TIKA-2307 avoid swallowing unsupported stream exception in PackageParser
     new 58e1846  TIKA-2307 avoid swallowing unsupported stream exception, wrap in TikaException
     new 256a281  TIKA-2212 ooxml parser should use finer-grained media types so that they can be filtered by users with includes/excludes
     new bb82205  TIKA-1772 More WebVTT magic - for cases with no header, and with custom headers
     new 3c02c4b  TIKA-1772 More test WebVTT files - no text header, and a custom one
     new 40647ea  TIKA-1772 More WebVTT unit tests
     new 8d31ab6  Changelog update
     new 315a0d6  add initial support for xlsb
     new adde012  Merge remote-tracking branch 'origin/master'
     new fbd2a4e  undo super bone-headed commit that was intended for personal fork.  Wait until poi-3.16-beta3.
     new a00ced9  undo super boneheaded commit, add binary back to list of unparseables in OOXMLParser
     new 268a168  Added logging deps to unify it in parsers
     new 5b6d997  Reformat POMs a little
     new e8ee4ce  TIKA-2245 Logging unification
     new 6ff825c  Cosmetics: extra spaces and diamond operator
     new e1cc5a6  Added explicit test scope for junit
     new 4bb2f70  Fixed tika-bundle integration test
     new 465491f  Bumped pax-exam version to 4.10.0
     new 0d8f030  Merge branch 'logging-refactored'
     new 7ce58d6  TIKA-2245 Updated CHANGES.txt with info about logging
     new a5cd6f4  Added dependencies for DL4JImageRecogniser parser
     new f777f21  Imported VGG16 model via deeplearning4j
     new 236db96  fix for TIKA-2306 contributed by kranthigv
     new 0c0bd4b  fix for TIKA-2306 contributed by kranthigv
     new cb8f8f5  fix the image
     new c7f27b5  inceptionapi.py file added for REST API feature
     new 1fc82e8  fix the destination directory
     new 900e4cf  fix no variables to save
     new 0341a5d  unexpected argument
     new b9f496c  undefined variable
     new f8c51ba  undefined variable
     new d199692  undefined variable
     new 0eedec8  Working inceptionapi.py without comments
     new 19c0e91  TIKA-2302 -- make extraction of macros optional in OfficeParsers and set default to false
     new 5877c4c  TIKA-2302 -- make extraction of macros optional in OfficeParsers and set default to false
     new 09cb2df  fix for TIKA-2306 contributed by kranthigv
     new f92809a  fix for TIKA-2306 contributed by kranthigv
     new be773ca  fix for TIKA-2306 contributed by kranthigv
     new 75a2ae1  Changed models repo to a forked repo for future compatibility
     new cc34967  Update python code styling
     new 653abaa  Updated dockerfile to launch the service
     new f1aad6f  Fix typo in Javadoc
     new 7b2b27a  removed some of the dead code
     new 8ebe758  Merge pull request #1 from asmehra95/asmehra95-patch-1-1
     new 82f069b  removed unused dependencies
     new 1472a4e  [TIKA-DL] Added tika-dl module to the build system
     new ce28a6f  Fix scheme value for file URIs
     new 3cbf368  [TIKA-DL] build jar with dependencies by default
     new d1c9513  [TIKA-DL] add license headers
     new 81b3f32  Fix typos and unnecessary spaces
     new 5834afe  Fix XML format
     new 1ea20c6  bump maximum tokens to 1000000
     new 246133a  TIKA-2317 -- warn when content string is truncated, allow easier parameterization of other limits via commandline.
     new e5b0d54  Merge remote-tracking branch 'origin/master'
     new 6b45621  TIKA-2318 fix exception/common count comparisons to include both mime_type_a and mime_type_b
     new 3b33da2  TIKA-2319
     new 8c9e02e  Fixed formatting issues
     new 0fb1458  fixed all formatting issues and added new customization
     new e187d82  Enabled snapshots repo and upgraded DL4J to 0.8.1-SNAPSHOT
     new 2a2e631  TIKA-2323
     new f3db573  TIKA-2325 -- allow configuration of default language code for "common words" metric
     new 3aab15f  TIKA-2311 -- maintain mime information for truncated ooxml
     new 6205742  Merge branch 'TIKA-2307' of https://github.com/KranthiGV/tika into KranthiGV-TIKA-2306
     new dbdead5  Merge remote-tracking branch 'origin/master' into TIKA-2306
     new 84fb6fe  Updated model repo link to official tensorflow's
     new a405fc4  Fixed SentimentParser, upgraded, using params, added test
     new f1caef1  TIKA-2016 Added license header
     new 84ffe8d  TIKA-2016 Undo orthogonal changes
     new ae06bae  TIKA-2016 fix classpath URL for model
     new db8c814  Reduced disk I/O
     new 09698c6  Remove redundancy. Not updating classify_image.py since it has no effect on runtime performance
     new d6b3ca4  Merge branch 'TIKA-2306' of https://github.com/KranthiGV/tika into KranthiGV-TIKA-2306
     new a970303  mock parser's uninterruptible sleep can happen to pause for exactly 3000 millis
     new 67612b8  TIKA-1195 and TIKA-2329
     new 0f1034a  update javadoc for Latin1StringsParser
     new 75eea6e  TIKA-2330 -- prevent preventable ooms in both detecting and parsing corrupt files or files that are misidentified as compressed streams.
     new 9e89b44  TIKA-2331 -- Upgrade RTFParser to use new TikaMemoryLimitException
     new 37d0f05  Merge remote-tracking branch 'origin/master'
     new 80e6a8c  TIKA-2334 -- Upgrade "provided" sqlite parser to 3.16.1
     new 941d61a  update CHANGES.txt in prep for release. reorder changes to most significant first...changes in default behavior  then new parsers...Completely subjective, and I'm open to reordering!
     new a31ed0d  TIKA-2331 -- more opportunities to check the alleged length of a byte[]
     new 77d5745  TIKA-2024 -- another location where the original source path might be recorded
     new 834920e  Updated URL to point to ASF repo
     new dd51591  Merge branch 'KranthiGV-TIKA-2307'
     new 80e6991  change scope of jai-imageio-core (TIKA-2338)
     new 4321f77  TIKA-2339 -- remove test file flagged by one antivirus as potentially problematic.  We assess that the av software had a false positive.  However, to make it easier for the reporter and for others facing this issue, let's remove the offending file
     new 34b630b  Revert "change scope of jai-imageio-core (TIKA-2338)"
     new 87a1b4a  Update util file for snomeds and polarity
     new 347e601  increase token counts to long from int
     new 03035d6  TIKA-2309 Time Stamped Data Envelope parser
     new e568bbc  Merge branch 'Shinobi75-TIKA-2309'
     new a194bc4  TIKA-2099 -- temporarily copy/paste commons-compress' ArchiveStreamFactory to benefit from updates that enable detection of magic-less .tar files.
     new 11ad0fd  TIKA-2039 -- extra unit test... ensure standard handling of exception in embedded file
     new 9b5662d  Refactor utils update
     new 6c903f2  fix for TIKA-2322 contributed by msharan@usc.edu
     new 1efe2e9  Minor edits in java comments
     new 1fa7fc4  Removed local code
     new 70343df  Moving video imports inside classigy_video method
     new 10529eb  Fix URL to inceptionapi.py
     new 7c431b3  Adding opencv support in Inception File
     new 92a90c7  Update to v4 inception.
     new 91d18a6  Merge branch 'TIKA-2322' of https://github.com/smadha/tika into TIKA-2322
     new 4d3a43c  TIKA-2345 Tika Config Serialisation of EncodingDetector details
     new 86e821b  TIKA-2345 Test for Tika Config Serialisation of EncodingDetector
     new 27f3b3d  ExecutorService serialisation TODOs
     new d77fb59  Merge branch 'master' of https://github.com/apache/tika
     new bbd4647  Changing v3 to v4, moving import at the top
     new 0cb3a19  Fixed changes as per v4 PR
     new aa4954f  TIKA-2346 Add OfficeParserConfig support to control extraction from shapes from non-shape-based formats
     new 0876aa9  TIKA-2346 OfficeParserConfig control extraction from shapes from DOCX
     new 6a32d49  V3 to v4 in documentation
     new ba00902  Updating v4 in Java default value
     new 932a4a8  Supporting both opencv 2 and 3
     new 434736b  Adding opencv with ffmpeg
     new 58a116c  Removed personal repository
     new 310fa54  Installing ffmpeg from opt and cone from bash
     new 562e4fa  TIKA-2346 -- add unit tests and configurability for doc, xls and SAX docx parser.
     new e141640  Update Dockerfile for InceptionVideoRest to depend on ubuntu 16.04 (get ffmpeg via apt-get); build OpenCV+Python from scratch and bind to apt-get ffmpeg. Contributed by ThejanW.
     new 49bb469  Merge pull request #168 from smadha/TIKA-2322
     new b19b9c3  Record change for TIKA-2322.
     new 27f7b24  TIKA-2322: update dockerfile
     new 04f150d  Update github links to apache/tika
     new 3462c9d  Keeping the number of parallel threads as 4 for OpenCV build process
     new 104ca3e  include "caused by" exceptions when catching/rethrowing in emf/wmf
     new b0a4b95  Merge remote-tracking branch 'origin/master'
     new 197b9ab  Remove orthogonal line changes
     new 1612028  TIKA-2349 -- try to use digest info to link embedded documents in tika-eval's "Compare" mode
     new 0ee2fe6  Remove unneeded line change
     new 5df8780  TIKA-2350 -- catch malformed open actions.
     new 0b37895  TIKA-2311 -- to handle truncated files more robustly, in ZipContainerDetector, try OPCContainer before ZipFile
     new dbe8a03  Merge https://github.com/ThejanW/tika into TIKA-pr-175
     new 2217a8f  Updating Tika Bundle POM for sentiment analysis - still getting errors.
     new b26aa05  Fix Tika Bundle for TIKA-2016.
     new e7b0cad  Fix for TIKA-2352 contributed by pascal.essiembre
     new c5da6bb  Add multi-class/categorical sentiment config test file.
     new 1934881  TIKA-2352 -- via Pascal Essiembre.  This closes #176
     new b46f20d  Add categorical test.
     new f073660  record change for TIKA-2016.
     new 70d2455  Merge branch 'master' into TIKA-2016
     new 0d3eb1f  Merge pull request #169 from thammegowda/TIKA-2016
     new ea19d62  TIKA-2343 -- add boilerpipe option (tika-app's "text-main") to tika-server
     new 3cc4bc2  Merge remote-tracking branch 'origin/master'
     new 4375a8e  TIKA-2343 -- change put to post for multi-part forms
     new 90cbe00  TIKA-2354 -- incorrectly skipping many images
     new e56e2b2  Merge remote-tracking branch 'apache/master'
     new 01ae987  [TIKA-DL] Updated model path, fixed issue with HTTP URL from XML
     new 414a429  fix for TIKA-2355 contributed by msharan@usc.edu
     new 46334b8  TIKA-2356 -- temporary workaround for bug I added to POI (Bug 61034) <face_palm/>
     new 1e436f5  ignore inception tmp model.
     new d06c521  pin to 0.8.0-1 release.
     new 9dc2360  Factor out the DL4J model version.
     new 0aaa121  TIKA-2357: Increased support for Tesseract PSM up to 13 from Rafael Ferreira
     new 704a039  Pin to DL4J model 0.8.0-2.
     new e477480  Integrate Tika-DL into Tika-Server and Tika-App
     new 3207f12  Merge pull request #165 from thammegowda/tika-dl
     new 5d3f36a  record change for TIKA-DL.
     new e31c933  TIKA-2318 -- include container file length in reports that mention file path, and add a report that compares page count.
     new 2ab94fe  Merge remote-tracking branch 'origin/master'
     new 82fd2ff  TIKA-2358 -- remove tika-dl as a dependency in tika-app and tika-server
     new 9068584  Tika 2262: Create dummy classes for the client
     new f73a117  Merge remote-tracking branch 'upstream/master'
     new c7719fb  Tika 2262: Implement image captioning server
     new 806bdc9  Tika 2262: Relocating image captioning server
     new 25fde02  Tika 2262 #create initial version of im2txtRESTDockerfile #minor fixes for model_info.xml & im2txtapi.py
     new f68d33a  Tika 2262: minor fix for model_info.xml
     new 76815ea  Tika 2262: fix minor error in the "/" route
     new 6b174ec  Merge branch 'master' of https://github.com/apache/tika into HEAD
     new c95c66d  TIKA-2361 upgrade to PDFBox 2.0.6
     new 464fb97  TIKA-2363 skip image recognition test if network call fails
     new c020e48  TIKA-2360 -- require users to turn on SentimentParser; remove glob detection for .sent; skip unit tests if network call fails.
     new f78b7d0  clean up white space
     new ac1791a  clean up indentation
     new d51227f  Tika 2262: Change directories of models & .py files
     new ed57e6e  TIKA-2364 -- convert printstacktrace to log
     new d873147  TIKA-2360 -- fix "fix" for SentimentParserTest
     new 48580b0  TIKA-2367 -- avoid npe in WMF
     new 477ce4c  Tika 2262: # Reformat dockerfile # Update directories in model_info.xml
     new cba1cb2  Tika 2262: # Remove unneeded lines # Add symbolic link to im2txtapi.py
     new d8d2374  Tika 2262: Change route "/getcaptions" into "/captions"
     new 614d951  Tika 2262: Change method name "get_captions()" to "gen_captions()"
     new de87206  Tika 2262: Remove optional metadata to speedup beam search
     new 5e19c01  Tika 2262: Add informative log messages  (for advanced users to troubleshoot errors when modifying model_info.xml)
     new b5b4eb6  TrueTypeParser Close Open Fonts
     new 9d34cbb  Merge pull request #181 from icirellik/close-fonts
     new 9f4bb56  TIKA-2370: avoid potential resource leak by closing TrueTypeFont via Cameron Rollheiser.  This closes #181.  I removed the unit test based on Cameron's advice.
     new 993382c  TIKA-2368: Clean up dependencies of SentimentParser.  At a bare minimum for the release of 1.15, add tika-translate to the exclusion list.
     new ebc87ae  TIKA-2359: Alert user that tesseract is available and will be used.
     new 5a964e5  TIKA-2372 Test DMG file
     new d992c5e  TIKA-2318: fix typo (add comma) in common tokens by mime type report
     new a9883eb  add key
     new b2fc478  TIKA-2373 -- fix licenses via rat in prep for 1.15 release
     new a3b2ab2  Update CHANGES.txt for 1.15 release.
     new 8a68c13  update scm for 1.15 release
     new 8d3140b  Merge remote-tracking branch 'origin/master'
     new 3ba922f  update scm for 1.15 release
     new f604694  [maven-release-plugin] prepare release 1.15-rc1
     new d806b99  [maven-release-plugin] prepare for next development iteration
     new e4bfb16  undo release version update in order to fix tika-dl's parent definition and respin
     new a29ae4d  fix tika-dl's pom's parent definition
     new 1761530  [maven-release-plugin] prepare release 1.15-rc1
     new 05ccbdf  [maven-release-plugin] prepare for next development iteration
     new 674f67d  Update changes for 1.16
     new 75accba  Merge remote-tracking branch 'origin/master'
     new 6485b8b  update pointer for sources to github in email template.
     new 977314f  prep for 1.15-rc2.  Thanks to Oleg Tikhonov for catching my gaffes in -rc1.
     new 6740a97  prep for 1.15-rc2.  Change 1.15-rc1 back to HEAD in scm <tag>
     new d831efb  [maven-release-plugin] prepare release 1.15-rc2
     new 95292dd  [maven-release-plugin] prepare for next development iteration
     new 49f3530  fix indents/whitespace
     new f718cb9  fix indents/whitespace
     new b290cd7  TIKA-1804 -- convert json parsing to SAX in TEIParser, step 1: test current output.
     new 9ea855b  Added the vgg16 model
     new a99a8d0  added test case for vgg16 model integration
     new 7ddb47f  added default configuration and image to test
     new 938acbd  removed the earlier required dependencies
     new 2d5e5b9  added the vgg16 class
     new cead1d0  TIKA 2262 : Implement initial version of java client
     new c35e257  TIKA 2262 : Update return type to List<? extends RecognisedObject>
     new 4f06e2c  TIKA 2262 : Remove unneeded return type change to List<? extends RecognisedObject>
     new 8979094  TIKA 2262 : Minor fixes # Fix minor error in apiUri # add toString() to CaptionObject class
     new 6be752c  TIKA 2262 : Minor fixes
     new 6e9f796  TIKA 2262 : Change routes to "/inception/v3/.."
     new d53fa48  TIKA 2262 : Allowing to set post params through config.xml
     new 34ae621  update key with signatures
     new 99a6db5  Merge remote-tracking branch 'origin/master'
     new dab7788  TIKA 2262 : Modify ObjectRecognitionParser to support im2txt
     new 92fb768  TIKA 2262 : Add tests & test images
     new 9a4f576  TIKA 2262 : Remove wildcard import
     new 8ffd67c  TIKA 2262 : Remove unneeded initialization of TensorflowRESTRecognizer
     new f534fce  TIKA 2262 : Minor fixes & code reformatting
     new a963a32  TIKA-2381 - include tika-eval artifact in release
     new 574fcec  TIKA-2360 -- broaden allowable exceptions to IOException
     new fa1d411  TIKA-2379 - Make logging in bundle optional.  Fix test.
     new c3cf30f  TIKA-2379 - Add test scope back in for JUnit.
     new cd7aa37  Merge remote-tracking branch 'origin/master'
     new b3ad71b  TIKA-2383 -- upgrade forbiddenapis to 2.3 and update internalRuntimeForbidden->nonPortableRuntimeForbidden
     new 77900ab  TIKA-2341 -- upgrade commons-compress to 1.14, added capabilities for snappy and lz4-framed
     new 6d710da  TIKA-2384 - TikaResource can close an InputStream twice; this revealed a bug in CommonsDigester, which this also fixes. (via Haris Osmanagic).
     new a8aa2fc  fix for TIKA-2835 contributed by pmweiss5
     new 5410928  TIKA-2386 -- enable more options for DigestingParser
     new f74b089  TIKA-2384 -- went too far in earlier commit.  We should close the inputstream in parse.
     new 933ae1c  TIKA-2384 -- update changes to reflect fix.
     new 630d1fe  add files
     new 7acf48d  TIKA-2387 -- parameterize scale for image rendering of pages in PDFs for OCR
     new 916e5ed  TIKA-2388 OpenOffice database files have application/vnd.oasis.opendocument.base as their embedded mimetype, so make that the canonical one
     new 7842600  TIKA-1945 -- extract text from diagrams in ooxml files.
     new f8f407a  Merge remote-tracking branch 'origin/master'
     new d2820ce  TIKA-2254 -- extract text from charts in ooxml.
     new 5cbaed8  TIKA-2362 -- Allow users to turn off extraction of headers and footers from .doc, .docx, .xls, .xlsx, .xlsb
     new 79f2740  TIKA-2383 -- remove nonPortableRuntimeForbidden and add new signatures
     new dbafffa  TIKA-2383 -- remove nonPortableRuntimeForbidden and add new signatures
     new d700aa4  fix for TIKA-1988 contributed by msharan@usc.edu
     new ed2ceeb  TIKA-2397 -- remove circular dependencies with conflicting versions brought in my the SentimentParser.
     new 132d3e7  TIKA-2391 -- extract <script> elements as embedded documents
     new c9031bf  Reverting committed work dir
     new 8410067  TIKA 2262 : Add apiBaseURI + reformat ObjectRecognitionParser
     new c53af43  TIKA 2262 : Swap json imports to json simple
     new 1760249  TIKA-1804 -- remove dependency on org.json
     new 00de530  Merge branch 'tika-dl-me' of https://github.com/asmehra95/tika into TIKA-2298
     new c476ec1  TIKA-2298: DL4J-VGG16 simplify conf, implementation
     new 134cb38  Merge pull request #2 from thammegowda/TIKA-2298
     new ab21426  turn dl4j test back on.
     new 2bdf783  Add all image format support to the captioning server
     new 5478311  Tika 2262 : Add all image format support to the captioning client + clean imports
     new 7b47239  TIKA 2262 : Minor fixes in TensorflowRESTCaptioner & Tests
     new ebb4bc6  Merge remote-tracking branch 'upstream/master'
     new 7e51e2b  ignore snaps
     new 4abd944  ignore snap build dir
     new 06eb659  add snap change
     new 223c006  remove old files
     new 2369149  TIKA 2262 : # cleanup imports in ObjectRecognitionParser # reformat im2txtapi.py # add pillow python package install to Im2txtRestDockerfile # add gif image & gif image test
     new 6a60a79  TIKA 2262 : Minor reformatting + Change log level into info to see captioning time
     new 1aacfa0  Normalized the confidence range b/w 0 and 1 and fixed topN return issue
     new 8ef1eb9  TIKA 2262 : Remove unused imports + spaces
     new b9825c6  TIKA-2336: Upgrade to POI 3.17-beta1
     new bada130  As per RFC2361, the official mimetype for WAV is audio/vnd.wav
     new 443bc7d  Merge remote-tracking branch 'origin/master'
     new 6e3fb26  TIKA-2380: Upgrade to Jackcess 2.1.8
     new 93f941e  TIKA-2389 and fix CHANGES.txt file
     new 4161f22  TIKA-2389 -- allow users to configure warnings for problems during initialization
     new 2a43a69  move test scope image processing dependencies together in the pom
     new 2deadf4  TIKA-2374 -- tika-app cli should extract inline images by default
     new 5cb985e  remove tika-serialization from tika-bundle
     new b409ff6  TIKA-2368 -- rename SentimentParser to avoid conflict with dependency
     new 91d0af2  TIKA-2410 -- turn off bold/italic on \plain
     new 7fd5bd7  TIKA-2335 -- extract x15ac:absPath metadata from xlsx and xlsb when available
     new 95c515d  TIKA-2089 -- extract macros from ppt
     new 05e334a  TIKA-2411 -- clean up unneeded dependencies
     new bfedea8  TIKA-2368 -- move to different package to avoid split package warning
     new 15d6078  TIKA-2312 upgrade provided xerial version
     new 04cc72c  TIKA-2313 upgrade mime4j to 0.8.1
     new d192004  Upgrade gson
     new 87d8f55  Upgrade gson and libpst (TIKA-2414 and TIKA-2415).
     new a7b5705  Upgrade dependencies in tika-eval TIKA-2416
     new dd2149a  Revert upgrade of libpst to 0.9.3 back to 0.8.1
     new 9a312f2  Revert upgrade of libpst to 0.9.3 back to 0.8.1
     new b618984  Merge remote-tracking branch 'upstream/master'
     new 0815b21  TIKA-2418 Make the QuickTime start-of-file Atom matches a bit more specific where possible, to reduce false positives
     new 8eef056  Merge branch 'master' of https://github.com/apache/tika
     new d98bec0  TIKA-2419 Do all 4 html doctype varients for the same text range
     new 3830152  TIKA-2419 If we detect XML but the XML is broken, try the HTML magics before declaring it to be broken xml = plain text. Needed because, to avoid false positives on html-like formats such as email, XML has a higher magic priority than HTML
     new ab4ea47  Merge branch 'tika-dl-me' of https://github.com/asmehra95/tika into TIKA-2298
     new 0b92a77  Fix Tika Bundle error RE: tika serialization.
     new 82ab03c  Implement needed object recogniser method
     new cb7b84a  TIKA-2089 - bug fix, check for nulls
     new 24b54af  Merge remote-tracking branch 'origin/master'
     new 621ded8  TIKA-2089 -- clean up try/catch with autocloseable
     new 4ed69a8  TIKA-2420 -- protect against unsupportedoperationexception with query.toSQLString() on unknown query types.
     new 215c262  Up max memory for surefire to 3GB.
     new 9dc8e21  small cleanups to sql
     new b58cfcf  Record change for TIKA-2298: Very Deep Convolutional Networks for Large-Scale Image Recognition
     new d65c2a1  Merge branch 'TIKA-2298'
     new 94f8b9f  Use ${commons.compress.version} per tballison.
     new 158675d  TIKA-2298 -- skip test if no network connectivity.  Should rework for more elegant solution at some point.
     new be02273  set config equal to the new Config object.
     new a00d112  Test GraphViz files
     new 8d8e818  TIKA-2422 -- improve detection of Graphviz *.dot format
     new da7ade6  TIKA-2422 -- improve detection of Graphviz *.dot format - allow leading C++-style comments - add unit test, incl. comments and graphs with name/ID
     new dd7acbf  Merge pull request #190 from sebastian-nagel/TIKA-2422-detect-graphviz
     new 3891b87  Add a few recent mime/magic changes to the changelog
     new 92b7b68  fix merge of tika-parsers/pom.xml for age predictor.
     new 4e24a14  Implement initializable interface.
     new 048dd8e  Fix import statement.
     new a549ec6  Get needed OpenNLP models for age detection.
     new 2d48094  Configure age detector based on classpath.
     new 5a653d0  formatting in ModelGetter.
     new f8cfa4b  Fix typo.
     new ce4fdda  Use static class.getResource() instead of getClass().getClassLoader()
     new 6ff1147  Fix pom to work with Age Recogniser.
     new 3c2595d  Automatically copy the Age models to the model dir so that you can have them available for classpath tests.
     new 9ad510c  Use mock class for testing.
     new 327ae0b  Record change for TIKA-1988 contributed by msharan@usc.edu
     new 9be1785  Fix Felix bundle rules for Age Prediction Parser OGSI bundle. TIKA-1988.
     new 05f8f89  TIKA-2389 -- add static checks to PDFParser, Tesseract, SQLLite to make sure that potential warnings only happen once.  Rework TikaCLI to build parser only once based on tikaConfig so that initialization warning settings actually work.
     new 0f6449f  Merge remote-tracking branch 'origin/master'
     new 58a602f  TIKA-1988 -- allow for failure to copy age recognition models
     new f776c24  TIKA-2399 exclude jj2000 because of potential license problems with ASL 2.0
     new 632f52d  TIKA-1988 -- allow for errors downloading models
     new e07d9e1  - add Tika-NLP module - move AgeRecogniser out of tika-parsers TIKA-1988
     new f94616a  add Tika-NLP to build
     new 3040849  Update CHANGES.txt for 1.16 release.
     new d353914  [maven-release-plugin] prepare release 1.16-rc1
     new 23c98cd  [maven-release-plugin] prepare for next development iteration
     new e9c8794  roll back to 1.16-SNAPSHOT
     new f7fe12e  exclude models from src.zip
     new 1963322  [maven-release-plugin] prepare release 1.16-rc1
     new 0c6e1ad  not sure why pom.xml.releaseBackup files are now included after last commit.
     new 75944e5  Merge remote-tracking branch 'origin/master'
     new 64b59fe  revert to 1.16-SNAPSHOT; third time is the charm
     new 2af8fb1  revert to 1.16-SNAPSHOT; third time is the charm
     new b99f344  [maven-release-plugin] prepare release 1.16-rc1
     new 6aa4bc6  [maven-release-plugin] prepare for next development iteration
     new 44082d3  TIKA 2262 : Adopt changes in TIKA-2389
     new 36d742e  Merge pull request #189 from ThejanW/master
     new 2e7a4f5  Merge https://github.com/apache/tika into gsoc17
     new b36a5b7  - explicitly set Locale in String.format
     new e9793d3  - ignore pydevproject
     new 2eabced  Make sure tests get run if Tensorflow is available via Docker or script
     new 56ab7b2  - handle exceptions
     new 65ef6d8  record change for TIKA-2262
     new 9f0144c  TIKA-2426 -- fix locale-sensitive test for xlsb
     new bea5b9d  WARC and ARC magic from Andy Jackson from https://github.com/ukwa/tika/
     new c23c648  update tag
     new 068c87c  Merge remote-tracking branch 'origin/master'
     new 0277fbb  TIKA-2042 Add a few more mbox patterns, based on file supplied by mcaruanagalizia Matthew Caruana Galizia
     new 55caab7  POIFSContainerDetector ASCII-encoded magic number
     new 08d09a5  Update WordMLParser.java
     new 9687f08  SUPPORTED_TYPES is an immutable singleton set
     new 9869851  TIKA-2430 -- add a capability to allow devs to easily run parsers against randomly corrupted files.
     new 898946e  Merge branch 'patch-1' of https://github.com/onealj/tika
     new 71a80c9  Merge branch 'patch-2' of https://github.com/onealj/tika into onealj-patch2
     new 12cce58  Merge branch 'onealj-patch2'
     new 13ebacf  Fix conflicts.  This closes #193
     new f3acaed  Fix conflicts.  This closes #193.  Thank you, Javen!
     new f53a2f2  SUPPORTED_TYPES is immutable
     new 3de4c4f  TIKA-2430 -- allow devs to fuzz embedded files individually
     new 3f86a6b  TIKA-2430 -- remove dev ignore emf
     new 268a815  TIKA 2262 : Update links
     new d57a621  Fix a typo in log message, and adjust code indentation
     new 8af9c96  Merge pull request #195 from kinow/fix-typo-and-indentation
     new 30b27ab  Merge pull request #194 from onealj/patch-5
     new 0d5cab9  TIKA 2262 : Minor changes to dockerfile
     new ef12f7d  Merge remote-tracking branch 'upstream/master'
     new 0579efe  Two more EML header magics from Matthew Caruana Galizia from TIKA-2042
     new c92730c  Minor changes
     new 00221ad  TIKA-2042 -- fix typo.
     new 1bf3a7e  TIKA-2431 -- upgrade to PDFBox 2.0.7
     new 63ae47a  Merge pull request #196 from ThejanW/master
     new f31b7f1  TIKA-2433 All non-pipe modes need configuring, otherwise the Tika Server fails
     new a51add2  Forbidden APIs fix - Use a specified encoding when turning Strings into Bytes
     new 4455a6f  TIKA-2436 Add a mime type for EMZ, subclass of gzip, much as we have for the related WMZ
     new e2f5b17  TIKA 2402 : Include pillow  + include logic to convert non jpeg image
     new 523f5d6  TIKA 2402 : Minor changes in client & config + Include png, gif tests
     new fe7533d  TIKA 2402 : Minor fix in dockerfile
     new 57ee179  TIKA-2439 throwing IllegalStateException in OptimaizeLangDetector.detectAll if no module have been loaded and thus detector is null which avoid an unhelpful NullPointerException
     new 4e03f90  TIKA-2438 -- ooxml locale should be set via POI's LocaleUtil.  Fix unit tests to be robust in different locales.  Many thanks to Karl Richter for raising this issue.
     new dada2b7  Merge branch 'npe' of https://github.com/krichter722/tika into krichter722-npe
     new e5526b5  Merge branch 'krichter722-npe'
     new 1941a29  Record changes related to Image Captioning.
     new b422eda  Merge branch 'master' into gsoc17
     new 2bcc0a7  TIKA-2268 -- add report for common_tokens/alphabetic tokens
     new a288bce  Merge branch 'master' into gsoc17
     new 0e1abd2  Merge branch 'TIKA-2355' of https://github.com/smadha/tika into gsoc17
     new 0f57ea8  Merge branch 'TIKA-2355' of https://github.com/smadha/tika
     new c6b6b17  record changes for TIKA-2265.
     new 3621ee4  fix  merge
     new e2af4bd  Update snapcraft.yaml
     new 10baddc  TIKA-2374 and TIKA-2434 - roll back extracting inline images for pdfs in tika-app to just -z option
     new 20a0cd7  add alternate reason message to skip test
     new 79c52ed  TIKA-2434 add headless mode to tika-batch
     new 1d1119e  Add a sample Windows batch file
     new 67e2c5a  TIKA-2445 Windows Batch .bat / .cmd need their own type, as they are text-based, with some common-ish magic, plus unit tests
     new 2207cd9  HTTPD magic is once more wrong, disable one check and explain why
     new 2c54f93  Changes update
     new 587e4ae  TIKA-2447 Inspired by the patch from Jan Burkhardt, do not bother fetching+keeping data from PSD sections we ignore
     new 930e677  Changelog
     new cf1c283  directly compare stderr to empty string in testRedirectionOfStreams to obtain more meaningful messages if test fails
     new bf03dd4  Convert EncryptedPowerPointFileException to EncryptedDocumentException
     new 921949c  Merge branch 'mattch-patch-1'
     new 87033d6  Add unit test for PR-202 submitted by Matthew Caruana Galizia: https://github.com/apache/tika/pull/202
     new 74574e3  TIKA-2440 -- extract phonetic runs from xls and allow users to turn off extraction of phonetic runs in both xls and xlsx.
     new 72772c5  bump read timeout for downloading file to 60s (1min) in tika-dl
     new 6564bc8  Merge pull request #203 from boegel/bump_timeout
     new aa12fc1  Merge pull request #174 from q-centrix/TIKA-2332
     new 516bbaa  Merge pull request #187 from buggtb/master
     new e763021  Improvement for TIKA-2449 contributed by Giuseppe Totaro
     new 5638ebc  TIKA-2448: Extract phonetic runs in docx with experimental SAX parser
     new 70de289  Merge remote-tracking branch 'origin/master'
     new a153397  TIKA-2450 -- AutoDetectParser should throw a ZeroByteFileException for zero-byte files after detection on the file extension.
     new 83f1afa  TIKA-2454: add OverrideDetector and allow PSTParser to specify body content type as text or html -- to avoid incorrect auto-detection of rfc/mbox, etc.
     new e0ff3eb  TIKA-2454: don't process the htmlbody.  There could be encoding conflicts.  Fallback to what we were doing...just process text.
     new 3eff3ac  TIKA-2454: cleanup unused TEXT_PARSER thanks to Matthew Caruana Galizia
     new 87e483a  Better to fix the .mv.db than to complain to the user.
     new 560e91a  TIKA-2456: fix detection of emails inside mbox
     new 9e6a91c  Merge branch 'master' of https://github.com/apache/tika.git
     new 4de0c66  TIKA-2455: flag the containing multipart type
     new 8000cfe  TIKA-2451 - Extract number of tiffs in a multi-page tiff (TIKA-2451); many thanks to Mike Cantrell for supplying a test file.
     new 82ac81b  Merge pull request #201 from boegel/stderr_vs_empty_string
     new 083f7b8  Further refinement on PR-201
     new e41c129  Duh...further refinement on PR-201
     new 79f4b4e  fix conflicts in CHANGES.txt during merge
     new 5b57ae4  Merge branch 'gsoc17'
     new 7b869c0  Added a regular expression to match standard word within a pattern for TIKA-2449 contributed by Giuseppe Totaro
     new 21c0f37  PicturesSource has been copied to Apache POI, mark the class to remove once we have upgraded to a version with it in
     new 31625a2  Used the alphabetical order for the list of the standard organizations by relying on TreeMap. Thanks to Lewis McGibbney for this insightful suggestion (TIKA-2449 contributed by Giuseppe Totaro).
     new 70ca280  TIKA-2460: load custom mimetypes XML from sys prop
     new 26d6e0d  further refinement on PR-201
     new 582083e  Merge remote-tracking branch 'origin/master'
     new d1a8bff  TIKA-2459 -- fix special character handling
     new b987125  Update InceptionVideoRestDockerfile
     new 7dd38d5  Merge branch 'master' of https://github.com/apache/tika
     new db89ab3  TIKA-2449: Enabling extraction of standard references from text
     new f311188  TIKA-2465 -- add explicit unit tests for xxe vulnerabilities
     new 99abe4e  TIKA-2465 -- scope catch/fail more finely
     new 62d5665  Merge pull request #207 from armathur/patch-1
     new ed0574b  TIKA-2465 -- scope catch/fail more finely
     new 92fe9b8  Merge remote-tracking branch 'origin/master'
     new 1b951f2  improve docs for scope of these tests
     new c0c2eaf  TIKA-2467 refactor creation/configuration of XML parsers/factories/readers to be static methods.
     new af4ea8a  TIKA-2465 -- make sure to include slides for SAX PPTX parser
     new 2e8d45a  TIKA-2465 -- add epub
     new a78717c  TIKA 2400 : # Define apiBaseUri for inceptionREST # Link wiki pages
     new 4f784fc  TIKA 2400 : # Fix formatting issues of inceptionapi.py # include logic for checking minimum confidence
     new 6b31053  TIKA 2400 : Minor changes to im2txtapi.py
     new 8a875cd  TIKA 2400 : Changes to video_util.py # Fix formatting issues # Remove unused imports
     new 712b697  TIKA 2400 : Minor change to im2txtapi.py
     new 9b94e17  TIKA 2400 : # Adjust the Object Recognition REST clients to work with changed servers
     new 07abb31  TIKA 2400 : # Few refactoring to Object Recognition REST clients
     new dead956  TIKA 2400 : Update dockerfiles
     new 92c65e0  TIKA 2400 : Update dockerfiles with namespace 'thejanw'
     new f16bd0e  TIKA-2429 -- upgrade to POI 3.17, last version of POI that runs on Java < 1.8
     new 015c695  TIKA-2429 -- upgrade to POI 3.17, and get it right in tika-eval
     new 2a81e97  TIKA 2400 : Update dockerfiles with namespace 'uscdatascience'
     new 384e971  make strawman app driver actually work.  Add ability to specify a list of files.
     new ac25932  TIKA-2466 Remove JAXB for easier use with Java 9 via Robert Munteanu.
     new 0e38f94  TIKA-2470 -- modernize DocumentBuilderFactory security for Java 9
     new c54efd8  TIKA-2470 -- fix...add back namespace aware
     new 5d41096  prevent div by 0 exception in profile-reports.xml
     new 2748538  TIKA-2268 -- add more reports and fix div by 0 bug
     new 33da38e  TIKA-2455: test for feature; only store multipart subtype in metadata
     new 766cdac  [TIKA-2472] Implementing Metadata#hashCode
     new abfca01  Add test PCX and DCX files, generated by ImageMagick from the Test PNG file TIKA-2473
     new 450ab4b  TIKA-2473 PCX and DCX mime magic and detection unit tests
     new 03e7d12  TIKA 2400 : Change requests
     new b1d80bd  [TIKA-2472] Reimplementing Metadata.hashCode using the AbstractMap code but with Arrays.hashCode as suggested by Ken
     new 2966cab  [TIKA-2472] Removing a null value check as Arrays.hashCode does it
     new 1f38be3  fix for TIKA-2475 contributed by seanstory
     new 369a04e  [TIKA-2476] Making sure the trailing space is not added
     new f444fd7  add tests for xml vulnerabilities.  More work remains on entity expansion...
     new 40e99f9  Merge branch 'TIKA-2475' of https://github.com/seanstory/tika into seanstory-TIKA-2475
     new d5d739c  Merge branch 'seanstory-TIKA-2475'
     new 94850f2  TIKA-2475 mods and some new tests/cleanup for CharsetDetector. This closes #210.
     new ad23d84  TIKA-2469 -- narrow mime detection for ms-owner files and add detection for nls files.
     new 18aa69a  [TIKA-1788] RFC822Parser: provide email attachment filenames when available
     new 9653e77  [TIKA-1788] Provide Content-Disposition metadata in embedded files
     new 9fb3461  TIKA 2400 : # Minor reformatting # Include inception v4 definition script # Remove unwanted classify_image.py file # Get rid of the need to have tensorflow models with PYTHONPATH
     new a043cef  TIKA 2400 : Remove redundant functions + Minor refactoring
     new f6beced  TIKA 2400 : Update environment of python scripts to python3
     new 17e4b66  A dummy parser unit test for iWorks 13
     new 5c7547b  Have the iWorks 13 parser set the content type on the metadata if possible, otherwise remains no-op
     new 0d92bc8  Add notes on why we can't get the Numbers or Pages type just yet - need to call out to another library or decode the Document.iwa snappy stream ourselves
     new 1f28f46  Merge branch 'TIKA-1788' of https://github.com/AarjavP/tika into aarjavp-TIKA-1788
     new 6fc2b7e  Merge branch 'aarjavp-TIKA-1788'
     new 96a3502  update some unit tests to use the RecursiveParserWrapper
     new b5f5403  Merge branch 'master' into patch-2
     new a01163d  Merge branch 'mattcg-patch-2'
     new 877d621  update a unit tests to use the RecursiveParserWrapper. This closes 205.
     new 1047f64  TIKA 2400 : Changing environment of python scripts to python2
     new ff481b2  TIKA-2478 -- rfc822 parser should handle alternative parts as the Outlook parser does.  Added parameter to allow for legacy behavior in RFC822Parser and a parameter to "include all alternatives" to the OutlookParser.
     new c009dc7   TIKA-2485 -- Allow configuration of markLimit in EncodingDetectors via tika-config.xml
     new 93411f4   TIKA-2489 -- upgrade to PDFBox 2.0.8
     new eae1002  allow for greater leniency in failure to load resources from the network
     new 88a5e51  TIKA-2490 and TIKA-2491 -- turn off initializable problem stderr warnings in tika-app, confirm that configuration of initializable problems works from an input file and allow for a tika-config.xml file without specifying a classloader
     new 690c744  TIKA 2400 : Changing environment of python scripts back to python3 for docker testing
     new d8caba1  TIKA 2400 : finalize dockerfiles + scripts
     new cc08d39  TIKA 2400 : Fix minor error in im2txt dockerfile
     new dfb7187  TIKA-2492 -- exclude pdfbox debugger
     new 66be8e7  TIKA-2492 -- exclude pdfbox debugger, but get it right this time.
     new 04b0837  TIKA-2492 -- exclude pdfbox debugger from tika-bundle
     new 9c2e1b9  TIKA-2488 -- catch potential npe in getting attachment's inputstream
     new 18deefa  Upgrade to Jackson 2.9.2 (TIKA-2501).
     new 780ab0c    * Upgrade to OpenNLP 1.8.3 (TIKA-2502).
     new f0b6a17  TIKA-2503.  Need to confirm this doesn't break anything
     new 1b48d73  TIKA-2486 upgrade metadata-extractor to avoid CVE in xmp-core to 2.10.1
     new be434b1  remove unused dependency
     new b19c2d7  TIKA-2502 -- rollback until we can figure out how to get the upgrade working with our OSGi bundle.
     new 06486c8  TIKA-2483 -- revert loading of mime repository in PackageParser from TIKA-2311 to avoid NPE in ForkParser
     new ff5d065  TIKA-2034 upgrade xmpcore
     new 7d83b86  TIKA-2504 exclude dependency on old vfs2 to remove vulnerability from plexus-utils
     new b6bdb67  TIKA-2502 - Upgrade opennlp-tools to 1.8.3  maven-bundle-plugin to 3.3.0
     new 1e8008c  TIKA-2506 - Check config for null during DL4J Test.
     new 91ef9a9  Merge pull request #208 from ThejanW/master
     new 3ee0aff  Remove docker files now present in https://github.com/USCDataScience/tika-dockers
     new 946614b  Update changes with TIKA-2400 / GH-208
     new d64a32c  Fix for TIKA-2347 Adds underline extraction from word documents
     new 93cbed6  TIKA-2347 - Added extraction of <strike> element in DOCX files
     new 639f3bf  TIKA-2347 - Add underline extraction from Word documents (doc/docx) from Stuart Hendren as well as strikethrough extraction in docx.
     new beedc42  TIKA-2347 - Add underline extraction from Word documents (doc/docx) from Stuart Hendren as well as strikethrough extraction in docx.
     new d4fd659  TIKA-2510 -- Extract media files from ooxml
     new ce4d948  TIKA-2511 Cache TikaConfig in EmbeddedDocumentUtil for faster processing of files with lots of embedded files.
     new 6fa83ff  clean up imports, update unit tests to use assertContains, and confirm that <strike> in xhtml doesn't add spaces in extracted text.
     new fb93ab1  Fix OOM when parsing very large PDFs
     new ef3fc7b  TIKA-2512 add underline/strikethrough extraction for docx and pptx in SAX-based parsers
     new 33bf39f  Merge branch 'fix-oom-when-parsing-large-pdfs' of https://github.com/shrike/tika into shrike-fix-oom-when-parsing-large-pdfs
     new 2e27414  Merge branch 'shrike-fix-oom-when-parsing-large-pdfs'
     new 72c4e33  Update test and add note in release notes.  Many thanks, shrike! This closes 213.
     new a047fa9  TIKA-2510, correct fix. Only add to seen/handledTarget _after_ processing.
     new 7ca6597  Merge branch 'TIKA-2835' of https://github.com/pmweiss/tika into TIKA-2385
     new 537cfca  Merge branch 'TIKA-2835' of https://github.com/pmweiss/tika into TIKA-2385
     new b3434cd  Merge branch 'TIKA-2385'
     new f3842ea  TIKA-2385: Corrected Tesseract OCR rotation.py script and made it a configurable option via Peter Weiss
     new 7a9411f  TIKA-2385: Added check for Python dependencies
     new aff782d  TIKA-2385: Updated check for Python dependencies to use temporary script instead of -c switch
     new f3acc8f  TIKA-2385: Fixed typo in dependency checker script
     new 50295be  TIKA-2385: Removed deprecated call
     new 88b93c1  TIKA-2516 upgrade to cxf 3.0.13
     new ac9f24e  Merge remote-tracking branch 'origin/master'
     new e83844c  TIKA-2516 upgrade to cxf 3.0.16
     new 6b2b626  TIKA-2503 upgrade to httpcomponents 4.5.4
     new 2169cae  Fix thread-safety in ChmExtractor (TIKA-2519).
     new 95baca2  TIKA-2519 clean up, fix bug in MultiThreadedTikaTest files that failed to prevent files that caused exceptions; revert new ChmBlockInfo() to private
     new 90d6245  TIKA-2483 -- add in all children of zip and tar to prevent overwriting of child file types by the PackageParser.  Ensure that our semi-manual list is updated when there are changes to TikaConfig.
     new f57e0e7  TIKA-2521
     new f983eb4  Update CHANGES.txt for 1.17 release.
     new b071ab1  add missing license headers.  THANK YOU RAT!
     new 94777e3  [maven-release-plugin] prepare release 1.17-rc1
     new c069ad5  [maven-release-plugin] prepare for next development iteration
     new af3d017  update changes for next release cycle
     new 6f33bae  roll back to start rc2
     new b054df2  [maven-release-plugin] prepare release 1.17-rc2
     new 6087955  [maven-release-plugin] prepare for next development iteration
     new 78c8d74  TIKA-2524 -- add an XPS parser
     new d0c315c  Update Changes for branch_1x
     new 2922511  TIKA-2509: Updated TesseractOCRParser to use configured ImageMagick path
     new 9e1cd30  upgrade geo-apis and sis (TIKA-2535).
     new fe97b16  TIKA-2535 -- but get it right... in tika-bundle and tika-java7
     new 54bce08  TIKA-2556: Swap out com.tdunning:json for com.github.openjson:openjson to avoid jar conflicts.
     new 6829643   TIKA-2547: RFC822 with multipart/mixed, first text element should be treated as the main body of the email, not an attachment.
     new 32cbe38  TIKA-2561 upgrade jsoup to avoid potential xss vuln in grib
     new 504ba00  TIKA-2564 -- wrap embedded stream in a stream that supports mark/reset in --extract option in tika-app
     new 0e5fded  TIKA-2559: Extract language metadata item from PDF files via Matt Sheppard.
     new 5314bc4  update changes for TIKA-2559
     new 856a90d  TIKA-2571 -- rethrow SecurityException
     new d9be32c  TIKA-2569 -- Extract text from grouped text boxes in PPT.
     new fd7ec73  Remove java 8 String.join
     new aded888  TIKA-2563 -- Extract files embedded in HTML and javascript inside HTML that are stored in the Data URI scheme.
     new 287f5d1  TIKA-2563 -- mods for 1.x branch
     new 69a85d2  TIKA-2570 - upgrade to more recent version of jackson to avoid CVE-2017-17485 via Ewan Mellor.
     new 72e6f70  fix documentation via David Pilato on twitter.
     new 35d87d1  TIKA-2580 via Ewan Mellor.
     new 520e73f  TIKA-2580
     new 71cf654  TIKA-2588 -- extract xlsx stored within ole objects in ppt/x via Brian McColgan
     new 99f4852  Merge branch 'branch_1x' of https://github.com/apache/tika into branch_1x
     new 5e3e910   TIKA-2578 and TIKA-2587 -- Allow for RFC822 detection for files starting with "dkim-" and/or "x-" via Andreas Meier
     new 2cbca1c  TIKA-1518: Add local docker build based on dockerfile-maven-plugin
     new d810fba  TIKA-1518: Updated the README and changed image name to tika-server for clarity
     new 2e48245   TIKA-2598 -- add enforcerplugin to fail on dependency convergence problems,  and fix dependency conflicts where possible.
     new 4eb8ae1  Merge branch 'branch_1x' of https://github.com/apache/tika into branch_1x
     new be6e95d  TIKA-2576 -- Upgrade commons compress and add detection and parsing of zstd (if user provides com.github.luben:zstd-jni... via Andreas Meier
     new cf0348d  TIKA-2598 -- unbreak the build (sorry!), fix problems after tika-app
     new 8163b59  TIKA-2598 -- unbreak the build (sorry, again!), fix missing javacpp dependency.
     new d9f63a0  turn off debug in powerpointparsertest
     new 32c19de  TIKA-2600 -- remove md5 checksum, and switch sha-1 to sha-512 for release artifacts
     new b9e9e5b  TIKA-2594 -- improve eml detection for those starting with Subject: and containing html
     new 164c928  TIKA-2592 -- ignore charsets not supported by IANA in html meta-headers via Andreas Meier.
     new b4047eb  TIKA-2591 -- Add workaround to identify TIFFs that might confuse commons-compress's tar detection via Daniel Schmidt
     new a9b4b36  TIKA-2590 -- revert listenForAllRecords = false thanks to Grigoriy Alekseev
     new c566cc4  TIKA-2590 update Changes.txt
     new 33f756f  TIKA-2527 -- Various new mimes and typo fixes in tika-mimetypes.xml via Andreas Meier.
     new e12117c  TIKA-2594 improve eml detection via Luis Filipe Nassif
     new 42aa774  TIKA-1518: Detach docker file build from build phase in Maven execution
     new c996d01  TIKA-2568: detection of full encrypted 7z files
     new 2e1a810  TIKA-2338 support for tif in pdfs
     new eb35173  TIKA-2338 -- fix imageio version conflict in tika-dl
     new ceee42a  TIKA-2530 -- temporary workaround -- check for zero length byte array in rtf body to avoid buffer underflow from POI, via Pascal Essiembre.
     new 4d75a32  TIKA-2591 -- prevent AIOOBE when haystack shorter than needle
     new 17d8fe4  TIKA-2604 -- properly escape (or not) class path in windows and linux environments.
     new 029715d  fix cherry-pick conflict
     new 3ad2274  TIKA-2614 -- treat simple body inline, not as an attachment
     new 1cd565c  TIKA-2616 -- preserve message/news
     new c5cf55f  TIKA-2617 -- handle new IOOBE on streams now parsed as npoifs in ppt embedded streams as any other IOException on an embedded stream
     new e44a38d  Update forbiddenapis to version 2.5 and remove commons-io hack from pom.xml
     new ca9c2f5  TIKA-2618 -- avoid overwriting labels
     new f9910e2  update CHANGES.txt because of conflict in cherry-pick
     new d1526d0  Fix for TIKA-2582 contributed by ewanmellor.
     new b2ca378  Fix for TIKA-2584 contributed by ewanmellor.
     new 2efe3f9  Fix for TIKA-2613 contributed by ewanmellor.
     new 04225d2  TIKA-2621 -- add support for brotli
     new cfefc4c  TIKA-2621 -- add support for brotli - update CHANGES.txt
     new fc718f4  TIKA-2620 allow configuration of setting KCMS
     new cecce45  fix cherry-picked version clash for TIKA-2621
     new b928453  TIKA-2625
     new d1a7cab  TIKA-2626
     new b85d2f8  Update CHANGES.txt for 1.18 release.
     new 72db7c5  fix license issues identified by rat during prep for 1.18 release
     new 0afdf50  [maven-release-plugin] prepare release 1.18-rc1
     new 2362a00  [maven-release-plugin] prepare for next development iteration
     new 7fb331d  rollback 1.18-rc1 attempt -- error transferring data to nexus
     new c551a15  Update CHANGES.txt for 1.18 release
     new c44e8b6  [maven-release-plugin] prepare release 1.18-rc1
     new ef85fa8  forgot to delete existing tag -- revert to 1.18-SNAPSHOT
     new 5cff40f  [maven-release-plugin] prepare release 1.18-rc1
     new 5b12e7f  [maven-release-plugin] prepare for next development iteration
     new e82c2ef  fix potential resource leak
     new b2d3932  fix potential resource leak
     new 4fdc51a  followup fix
     new ffb48dd  fix chm parser
     new 302f22a  fix readUE7
     new 5d983aa  fix chm; remove println
     new 3b6682e  rollback to 1.18-SNAPSHOT in prep for rc2
     new d1bc093  fix potential resource leak, continued
     new c9c1844  Update CHANGES.txt for 1.18-rc2 release
     new 1203862  [maven-release-plugin] prepare release 1.18-rc2
     new a39b325  [maven-release-plugin] prepare for next development iteration
     new bb7adac  TIKA-2634 upgrade Jackson to 2.9.5
     new a8b41d3  TIKA-2634 upgrade Jackson to 2.9.5
     new 85b2504  Merge remote-tracking branch 'origin/branch_1x' into branch_1x
     new c68994f  fix broken build on *nix caused by recent fixes; improve documentation; ensure trailing slash behavior on all OS
     new e84d0d5  TIKA-2635 -- require that user specify path for imagemagick on windows to avoid conflict with system util "convert.exe"
     new 15410ed  roll back to 1.18-SNAPSHOT in prep for RC3
     new 24cd176  update CHANGES.txt in prep for RC3
     new 38ff2a9  [maven-release-plugin] prepare release 1.18-rc3
     new 4d2753c  [maven-release-plugin] prepare for next development iteration
     new 0c1909a  For now, if there's a network problem grabbing dl4j's model, skip the test silently. Do the same thing in both tests.
     new 426e92d  Merge branch 'branch_1x' of https://github.com/apache/tika into branch_1x
     new f9722b4  TIKA-2644 improve api for recursiveparserwrapper
     new 64fef4e  TIKA-2644 improve api for recursiveparserwrapper -- deconflicted
     new c203ef3  TIKA-2645 for tika-core
     new 0101164  TIKA-2645 - use a pool for SAXParsers -- tika-parsers package
     new aa1a749  TIKA-2645 - use a pool for SAXParsers -- improve comments and avoid permanent hangs if a parser has forgotten to release its SAXParser.
     new 017096f  TIKA-2645 - remove commons math from dependencies
     new c40045a  update changes
     new cdca0f7  TIKA-2645 -- make pool methods private for better encapsulation and add a pool for DOM building
     new 124a06d  TIKA-2520 optimize OptimaizeLangDetector default loadModel()
     new 7e3e34c  Merge branch 'TIKA-2520' of https://github.com/mbaechler/tika into branch_1x
     new ac73693  allow more flexibility for OCR variations in a PDFParser test
     new 62926ca  fix logic in iptc parser
     new c9294bf  make ocr test more flexible to allow for different versions/settings of tesseract
     new 8d26096  TIKA-2100 extract content language from html lang attribute
     new 3aba5c4  revert changes to imports
     new 70662cd  improve audioparser
     new e9d807d  TIKA-2100 -- fix unit test
     new b9cf2f3  TIKA-2655 - Allow the RecursiveParserWrapper to work with the ForkParser
     new acd92ac  TIKA-2655 - Allow the RecursiveParserWrapper to work with the ForkParser --merge conflicts in test with 1x
     new 6c747d1  TIKA-2655 - allow handlers to be proxied back or not when the handler is an AbstractRecursiveParserWrapperHandler
     new 0105869  TIKA-2655 - failed to merge changelists before last commit.  Sorry!
     new 4a7bf9a  merge conflicts
     new 0e1f4e7  merge conflicts
     new 4afd8f0  TIKA-2653 -- fix debugger on ForkParser test
     new 5ee06ca  merge conflicts
     new 85cc113  TIKA-2653 fix merge conflictx
     new 12f455c  ForkParser -- update to master; handful of fixes
     new 00ff640  TIKA-2656 -- allow absolute timeout for ForkParser
     new 12884fd  TIKA-2656 -- allow absolute timeout for ForkParser -- update CHANGES.txt
     new aa6a63f  TIKA-2657 -- add system exit, thread interrupt and gc-triggering new Date() in MockParser
     new e350488  TIKA-2446 -- prevent oom during detection of corrupt zip
     new 88fe62c  TIKA-2446 -- prevent oom during detection of corrupt zip -- catch POI's "could be odt exception" and POI's RuntimeException
     new f2b0e5a  TIKA-2659 -- add parameters for max files processed in forkclient, and improve some of the offline smoke testing infrastructure.
     new 90e3387  TIKA-2662 add streaming json serializer
     new 5d6e09a  TIKA-2661 upgrade commons-compress to 1.17
     new 2f6933f  TIKA-2661 upgrade junrar to 1.0.1
     new 39c85da  clean up dev mess
     new 92aaf22  avoid npe when lang code not found on classpath
     new ecbd316  undo idiocy
     new 0b62157  avoid npe when lang code not found on classpath
     new ebb22dd  undo idiocy
     new f309935  TIKA-2668 -- fix TaggedSAXException for Java 11-ea
     new 2512051  TIKA-2660 -- enable building with Java 10
     new 45468aa  TIKA-2670 add try/catch block into ModelGetter
     new 9a56aa4  TIKA-2660 -- enable building with Java 10 -- revert tika-dl until full fix is available.
     new def58f6  TIKA-2675 OpenDocumentParser should fail on invalid zip files - throw IOException if ZipInputStream is invalid or does not contain any entries
     new b4cdfcf  TIKA-2677 -- fix multithreaded updating/access to MediaTypeRegistry, via Yuriy Koval
     new df9ed82  TIKA-2673 -- unit tests for stricter adherence to spec via Gerard Bouchar
     new c6f7b45  TIKA-2673 -- unit tests for stricter adherence to spec via Gerard Bouchar -- fix illegal getBytes()...mea culpa...
     new 729d29e  TIKA-2679 -- bump 1.x to 1.8
     new bae509c  Bumped PDFBox to 2.0.11
     new ad8765d  TIKA-2682 -- update jempbox to 1.8.175
     new 6933efd  TIKA-2669 -- pdf and tesseract config set in a tika-config.xml file on server start up are always overwritten to DefaultConfig in tika-server
     new a333a4a  Merge pull request #240 from sebastian-nagel/TIKA-2675-OpenDocumentParser-fail-invalid-zip
     new db1301d  improve htmlparser
     new 525889a  TIKA-2673 -- add StrictHtmlEncodingDetector, contributed by Gerard Bouchar
     new a09d853  TIKA-2687
     new 5c78eb7  TIKA-2687
     new b2973e3  TIKA-2691 -- upgrade jai-imageio-core and pdfbox's jbig2-imageio while we're at it.
     new 19364b8  TIKA-2690 via Hans Brende
     new fc23648  TIKA-2688 via Yury Kats
     new fe2b3ae  TIKA-2692 -- minimal upgrades to pass ossindex-maven module -- except for tika-nlp module, which requires significant work. fix conflicts
     new 6b37754  TIKA-2692 -- minimal upgrades to allow building w Java 11-ea
     new 1438d8a  TIKA-2692 -- general upgrades in prep for 1.19
     new 8f61126  TIKA-2692 -- general upgrades in prep for 1.19
     new 6afdf19  Depend on Parso for SAS7BDAT support
     new 4c5bbae  Add parso to the OSGi bundle
     new fa5f282  Test Columnar files - SAS7BDAT and CSV (other spreadsheet+DB formats still required)
     new 2d19fe0  TIKA-2462 Initial parser for SAS7BDAT files powered by Parso (now ASLv2). Still to do: Metadata, Unit Tests, Consistency with similar format tests
     new f3508f2  XHTML improvements
     new 284965e  Some SAS7BDAT metadata and unit testing
     new 39e1194  More SAS7BDAT metadata
     new c31d40f  SAS7BDAT html tests
     new 02bef03  Clean up imports
     new aaa78a3  Stub a unit test for TIKA-2641
     new b6399c6  Handle .epub files using .htm rather than .html extensions for the embedded contents (TIKA-1288)
     new 95a247c  Add a test .sas7bdat file with labels, and generate the columnar/tabular test file in a few more formats
     new 7f68ebb  Add a time column to the test columnar files
     new b92f752  CSV assert as best we can (no dedicated parser), start on XLS and SAS7BDAT consistency tests
     new 65af2d9  Check header contents, check data rows count, add XLSX test
     new 3f2b7a5  Remaining values to check
     new 5d3dd69  Ensure that empty cells are still output
     new 507f59f  Not all formats know about %s, dates not completely consistent either...
     new 81caa71  Use patterns to handle the date format variations
     new d871b1f  Add disabled, currently failing ODS test
     new de53df9  Mime magic for DPX and ACES, thanks to Andreas Meier (TIKA-2628 and TIKA-2629)
     new 6880127  TIKA-2479 Option to request missing rows where possible in Excel-like formats
     new dcfbe5a  TIKA-2479 Output missing left/mid cells in XLSX and XLSB, and optionally also missing rows
     new b336360  Updated Columnar output from SAS with better formats
     new 65cf9f2  Formatted columns in the columnar test Excel files
     new 8ea6b22  TIKA-2479 Update XLS missing cell/row handling to match XLSX and XLSB, add unit test for missing rows, and enable the Columnar tests for the Excel formats
     new 060bfa5  Move some fixes that didn't make it into 1.18 into 1.19
     new 08a767a  Changelog update
     new 3da39b8  Add the other jackcess jar to the bundle
     new d811a3a  Move some fixes that didn't make it into 1.18 into 1.19, clean up
     new 2cf0a96  TIKA-2703 make sure to process shape's parent drawing only once.
     new 2745cfd  TIKA-2703 -- related...simplify XSSFB to use more of XSSF rather copy/paste
     new d66dcbb  TIKA-2701 -- via Grigoriy Alekseev
     new 6badaea  TIKA-2673 -- add StandardHtmlEncodingDetector via Gerard Bouchar
     new 4475b72  TIKA-2673 -- fix forbidden-apis failure and retro-fit for branch_1x
     new f5a2fae  TIKA-2648 detect interpreted server-side script languages
     new bd9d75d  improve xml reading
     new 36fa58f  TIKA-2704
     new 375e3d7  TIKA-2705 -- allow parameter configuration for tesseract via tika-config.xml
     new 5346cbb  TIKA-2706 -- store exceptions from macroreader in child metadata
     new 2cdf627  TIKA-2695 -- upgrade Lucene to 7.4.0
     new b717ca6  TIKA-2667 upgrade jmatio
     new 719826a  fix doubled junit dependency in tika-nlp
     new f44e109  TIKA-2672 -- upgrade deeplearning4j to 1.0.0-beta2 via Thejan Wijesinghe.  Thank you, Thejan!!!
     new b542f9b  TIKA-2672 -- remove hard coded input dimensions
     new ed0d3d1  TIKA-2707 -- upgrade to commons-compress 1.18
     new 1f5669d  TIKA-2710 - Change Tika OSGi Execution Environment to 1.8.
     new 3c76b3a  TIKA-2710 - Change Tika OSGi Execution Environment to 1.8.  Format fix.
     new 0dbf67d  TIKA-2721: removed spring-* from tika-parsers deps
     new 0951bf9  TIKA-2722 -- remove dead code and prevent potentially bad date.toString() call.
     new 8a1392b  Merge remote-tracking branch 'origin/branch_1x' into branch_1x
     new 8d70109  TIKA-2722 -- clean up setting calendar values
     new 2fd54ff  TIKA-2722 -- clean up setting calendar values, take2
     new 1ff63b0  improve xml parsing
     new 39f69ef  Mime magic for "MIME Encapsulation of Aggregate HTML Documents" (MHTML), pulled out from rfc822 (may not be fully correct long-term...)
     new 4f85418  Changes update
     new bb10dc2  TIKA-2552 -- upgrade to POI 4.0.0
     new 49ed309  TIKA-2552 -- upgrade to POI 4.0.0 -- fix merge conflicts
     new 92e488b  TIKA-2719 -- add automatic module names
     new 58dadac  TIKA-2725 -- checkpoint commit ... basic child process is started...need to integrate actual statuswatcher, etc.
     new e7cef35  TIKA-2725 -- first working draft...ready for commit and future cleanups
     new 3af35f1  TIKA-2725 -- first working draft...include commit with conflicts resolved. :(
     new 5211fc7  TIKA-2725 -- add synchronization to avoid potential NPE in watcher thread
     new 153c394  update changes.txt in prep for 1.19 rc1
     new 4aef777  TIKA-2692 -- upgrade a few other dependencies
     new 0db4724  Update CHANGES for 1.19 release
     new 10c75a1  Fix missing license headers; h/t rat!
     new af04995  fix conflict
     new 90285dc  [maven-release-plugin] prepare release 1.19-rc1
     new 7aba1c5  [maven-release-plugin] prepare for next development iteration
     new 03e0942  roll back to 1.19.SNAPSHOT for second attempt of RC1
     new 48e76da  [maven-release-plugin] prepare release 1.19-rc1
     new 82146ad  roll back to 1.19.SNAPSHOT for second attempt of RC1, take 2
     new 199112b  [maven-release-plugin] prepare release 1.19-rc1
     new 7259325  [maven-release-plugin] prepare for next development iteration
     new 231fbb0  Fixed javadocs
     new a24976a  Cosmetics
     new a366813  Removed #getDetector from ImportContextImpl
     new d66c04a  update changes after 1.19 release
     new 49d1e82  Fixed javadocs
     new e36fafe  Fixed javadocs
     new 8053e31  Removed #getDetector from ImportContextImpl
     new b213fb3  Merge remote-tracking branch 'origin/branch_1x' into branch_1x
     new ed1e2f3  TIKA-2729 -- child process should run in headless mode.
     new 962c015  upgrade to forbiddenapis 2.6 https://github.com/apache/tika/pull/249 via Uwe Schindler
     new 80cfd6d  TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF
     new f6c38ef  TIKA-2731 via jkakvas.  This closes 250.
     new c25671c  TIKA-2731 via jkakavas.  This closes 250.
     new b29e11f  Merge branch 'branch_1x' of https://github.com/apache/tika into branch_1x
     new 6c2c8ad  update test corrupted files
     new 932ff38  TIKA-2638 -- allow multiple languages in config for OCR parser
     new 88bb6ab  TIKA-2732 -- allow configuration of XMLReaderUtils via TikaConfig
     new f75ba63  TIKA-2733 -- improve oom unit test and error/logging when the child process can't start in tika-server
     new a1f48b0  TIKA-2727
     new 55742a4  TIKA-2736 -- improve reports for comparisons
     new d712103  TIKA-2738 -- ForkParser option isn't working in tika-app. Make PasswordProvider serializable.
     new e75554a  TIKA-2739 -- ForkParser's child process should be headless
     new a1d4e55  update changes for 1.19.1 release
     new 60bc5d8  [maven-release-plugin] prepare release 1.19.1-rc1
     new dc02c59  [maven-release-plugin] prepare for next development iteration
     new 46dee91  reset to 1.19.1-SNAPSHOT after timed-out rc1 attempt
     new b8d28ef  [maven-release-plugin] prepare release 1.19.1-rc1
     new 92a5a62  [maven-release-plugin] prepare for next development iteration
     new aa9aeb7  reset to 1.19.1-SNAPSHOT after timed-out rc1 attempt, take 2
     new a5162fb  [maven-release-plugin] prepare release 1.19.1-rc1
     new b78ed12  [maven-release-plugin] prepare for next development iteration
     new 481481a  reset to 1.19.1-SNAPSHOT after timed-out rc1 attempt, take 3
     new 70823ad  [maven-release-plugin] prepare release 1.19.1-rc1
     new 5c6bbff  [maven-release-plugin] prepare for next development iteration
     new 019bd30  reset to 1.19.1-SNAPSHOT after timed-out rc1 attempt, take 4
     new 628d34f  [maven-release-plugin] prepare release 1.19.1-rc1
     new 9ee3b48  [maven-release-plugin] prepare for next development iteration
     new a279490  rolling back, again
     new 341e359  [maven-release-plugin] prepare release 1.19.1-rc1
     new 4392731  [maven-release-plugin] prepare for next development iteration
     new 1d68362  TIKA-2740: Added TkInter module into Python dependency check
     new 9ab9485  TIKA-2740: Updated Changes File
     new ea0eb90  TIKA-2742 -- upgrade jmatio to 1.5 to avoid bringing in slf4j-log4j12
     new 033758a  TIKA-2473 - Replace com.sun.xml.bind:jaxb-impl and jaxb-core with      org.glassfish.jaxb:jaxb-runtime and jaxb-core
     new 4bb4ad6  TIKA-2478 -- maxFiles should take an argument...duh
     new ad0f41c  TIKA-2478 -- add preliminary pseudo test for -maxFiles
     new 336c351  TIKA-2745 -- update PDFBox, jempbox and jbig2
     new c6ad906  Update CHANGES for 1.19.1 release
     new 3c2c410  [maven-release-plugin] prepare release 1.19.1-rc2
     new b5596cb  [maven-release-plugin] prepare for next development iteration
     new fb849e6  Update changes file for 1.20
     new c61ed55  Add logging for OOM.
     new 307a8bd  TIKA-2735 -- allow user to avoid extracting "master" sections and notes sections from ppt[xm]?
     new 0f7e86c  TIKA-2753 -- use -javaHome or $JAVA_HOME when starting child process w -spawnChild mode in tika-server
     new c6cacec  TIKA-2754 -- include filename in logging in tika-server
     new 65d18af  TIKA-2756 -- factor out code that relies on the old commons-lang... once Jackcess migrates to lang3, we'll be good to go.
     new 889c2c9  TIKA-2757 -- add versions plugin
     new 4076991  RSS test file is RSS v0.91, so name appropriately
     new d31f568  Add a test RSS 2.0 file
     new 416f996  Use the new RSS 2.0 file in tests too, alongside the current 0.91 one
     new 9054732  TIKA-2764 parameterize inclusion/exclusion of deleted text, and fix '-' while you're at it.
     new 6103161  TIKA-2761 -- write as much metadata as possible before writing to xhtml.
     new 7a34b58  TIKA-2759 -- don't extract data uri if inside a <script/> element when not extracting <script/> content.
     new eb53077  TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by Ronan O'Sullivan.
     new 50a2a8f  TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by Ronan O'Sullivan.
     new 324cbd2  Merge pull request #253 from dameikle/TIKA-2599
     new 6b2cdc9  TIKA-2762 Capture short fields (<150 chars) in EnviParserHeader Metadata
     new 33d960c  TIKA-2762 Capture short fields (<150 chars) in EnviParserHeader Metadata
     new 0c49c85  TIKA-2773 upgrade sqlite version
     new 4c564bd  TIKA-2777 -- improve inefficient regex performance in Optimaize in tika-eval
     new e0991f4  Corrected file name to match test
     new 41608d5  Move glassfish warning from license to notice file.
     new 589ee7f  TIKA-2775 -- bulk upgrades
     new 2ead2bb  TIKA-2775 - bulk upgrade dependencies
     new ccb96cd  TIKA-2775 - bulk upgrade dependencies -- backoff minimum maven dependency to 3.1; clean up whitespace in tika-eval's pom
     new 41dda34  prefer System.currentTimeMillis to creating a new Date object, throughout...
     new 2309974  TIKA-2778 -- the shutdown method for tika-batch mode should not be typing anything on stdin of the parent process.  Rather, require an interrupt and/or kill signal and then make sure the children are stopped as well.
     new fe4f41b  TIKA-2780 -- the shutdown method for tika-batch mode should not be typing anything on stdin of the parent process.  Rather, require an interrupt and/or kill signal and then make sure the children are stopped as well.
     new f9eff6f  Merge branch 'branch_1x' of https://github.com/apache/tika into branch_1x
     new 9fd50ed  TIKA-2780 -- fix changes.txt
     new c0b594e  TIKA-2782 -- confirm child streams are redirected.  Add workaround (shameless hack) if logger writes before streams are redirected.
     new f9eec83  TIKA-2778 -- Upgrade jaxb-runtime and javax.activation for use in Java > 8
     new 22f5707  TIKA-2776 -- tika-server in legacy mode should ignore oom.
     new 9e2a9bb  TIKA-2776 -- update CHANGES.txt
     new d6938de  fix for TIKA-2770 contributed by kristencheung
     new 4538f1d  TIKA-2770 fix merge conflicts from 81430d51c kristencheung
     new f4c0b5a  removing extra test
     new f136f97  fix for TIKA-2770 conversion for UTM only
     new 0a3a8be  TIKA-2776 -- update CHANGES.txt
     new 9137249  TIKA-2784 -- MockParser should allow us to simulate a parser grabbing stdout/stderr during static initialization.
     new c7bb0c9  TIKA-2785 -- switch communication from child to parent to a shared memory-mapped file in -spawnChild mode in tika-server.
     new 3e7e89a  TIKA-2785 -- clean up logging in tika-server; redirect child stdout to parent stderr to avoid maven complaining about corrupting stdout in forked process; convert oom to fake oom
     new e31e8ce  TIKA-2785 -- try to fix test that is failing on Linux, but not Windows
     new 5590e0a  TIKA-2785 -- fix unit test that is failing in Linux but not Windows, take 3
     new eec08ba  cleanup accidental println
     new 4141411  TIKA-2776 -- improve documentation for -maxFiles
     new 690586f  fix whitespace
     new 4d6bc01  TIKA-2550 -- prevent content from script/style elements to be written in ToTextContentHandler
     new d837e1b  Upgrade to PDFBox 2.0.13 (TIKA-2788)
     new 6b56ed2  TIKA-2779: Integrate/parameterize new rotated text handling in PDFBox
     new 6322421  TIKA-2751 -- Upgrade to POI 4.0.1
     new 44165a3  TIKA-2550 -- make sure that ToTextHandler's new behavior of ignoring script/style contents doesn't harm macro extraction in HTML parser
     new 2439927  Upgrade MP4Parser to newer dependency coordinates org.mp4parser:isoparser (TIKA-2792).
     new 6c122d1  put the overridden processTextPositions within the inner class -- bug fix for TIKA-2779.
     new 582a1d4  TIKA-2795 -- catch IOException if child deletes shared file
     new 8475ddb  TIKA-2795 -- swapped memorymapped buffer for traditional open, write close of a temp file because of cross-platform challenges.
     new df5792e  TIKA-2637 -- ParsingReader should return -1 for a zero byte file
     new 40b0427  handful of dependency upgrades
     new 7696e38  TIKA-2792 -- revert mp4 parser based on large scale regression test results
     new 8c88966  TIKA-2798 -- improve reporting for attachment diffs
     new 4c9e38e  TIKA-2791 -- add tags/structure to tika-eval
     new 6a6c82a  TIKA-2775 -- more updates (these were made locally before the first major regression run pre-1.20-rc-1)
     new ad39610  TIKA-2798 -- revert junrar
     new b2680df  TIKA-2800 -- add num unique alphabetic tokens and num unique common tokens
     new 1a1f980  TIKA-2799 - revert jackcess based on regression results
     new 6f62b95  update CHANGES.txt and KEYS for 1.20 release.
     new fc0e1a3  license fixes for rat
     new adf4a1b  [maven-release-plugin] prepare release 1.20-rc1
     new 37407ad  [maven-release-plugin] prepare for next development iteration
     new 750bbfc  TIKA-2801 -- add ossindex-maven-plugin and upgrade vulnerable dependencies (skipping tika-nlp for now).
     new 3aa311c  TIKA-2804 -- upgrade Lucene and Jackcess
     new f2eb5ac  TIKA-2802 -- try to clear the XMLReader's resources to avoid OOM
     new 8fc1ed1  TIKA-2765
     new c9ecf1c  TIKA-2765 -- fix capitalization of test file.
     new 35601db  TIKA-2807 -- extract sdt content from within textbox in docx
     new eaaf1e3  TIKA-2808 -- exclude h2 from ossindex-maven-plugin
     new 73d009a  TIKA-2809 -- add reports for tags; and add "b" tag.
     new 2983c36  TIKA-2810 -- handle bad tags more robustly
     new df99549  TIKA-2816 -- allow OCR parameter header setting in tika-server to include parameters of type long/Long
     new e82382a  TIKA-2816 -- fix unit test
     new ea23d25  TIKA-2822 -- update common tokens lists with 7.x Lucene.
     new 4a0c26b  TIKA-2823
     new a32234e  TIKA-2822 -- remove common >=4 letter html markup entities
     new 40ab7f8  rm println
     new c566e65  TIKA-2717 -- upgrade jackson
     new c7438e6  TIKA-2819 -- upgrade jaxb via Hans Brende -- many thanks for your patience and figuring this out!
     new a1b83d2  TIKA-2802 -- bundle xerces2 with tika-parsers and upgrade CHANGES.txt
     new 92c8575  TIKA-2824 -- general dependency upgrades
     new 07d8277  TIKA-2824 -- general dependency/plugin upgrades and plugin cleanup
     new acdc4fc  TIKA-2825
     new d7a5c20  TIKA-2819 -- remove activation-api from dependencies
     new 2cd927a  TIKA-2756 -- upgrade Jackcess and remove dependencies on commons-lang
     new cfa524e  TIKA-2824 update Lucene to 7.0.0
     new 150d4bc  TIKA-2828 -- initial CSVParser commit
     new 3ad6edc  TIKA-2826 - mea culpa and my apologies...fixed master vs branch_1x incompatibilities.
     new fab1954  TIKA-2827 -- include both mime_a and mime_b more often in comparison diff reports
     new 0252277  TIKA-2824 - general upgrades: h2
     new d3317f9  TIKA-2833 -- initial commit with csv detection and swapping out the TXTParser in favor of the CSVParser

The 4366 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.