You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ma...@apache.org on 2016/10/19 18:47:33 UTC
svn commit: r16589 - /dev/tika/

Author: mattmann
Date: Wed Oct 19 18:47:33 2016
New Revision: 16589

Log:
Apache Tika 1.14 RC #1

Added:
    dev/tika/CHANGES-1.14.txt
    dev/tika/tika-1.14-src.zip   (with props)
    dev/tika/tika-1.14-src.zip.asc
    dev/tika/tika-1.14-src.zip.md5
    dev/tika/tika-1.14-src.zip.sha
    dev/tika/tika-app-1.14.jar   (with props)
    dev/tika/tika-app-1.14.jar.asc
    dev/tika/tika-app-1.14.jar.md5
    dev/tika/tika-app-1.14.jar.sha
    dev/tika/tika-server-1.14.jar   (with props)
    dev/tika/tika-server-1.14.jar.asc
    dev/tika/tika-server-1.14.jar.md5
    dev/tika/tika-server-1.14.jar.sha
Removed:
    dev/tika/CHANGES-1.13.txt
    dev/tika/tika-1.13-src.zip
    dev/tika/tika-1.13-src.zip.asc
    dev/tika/tika-1.13-src.zip.md5
    dev/tika/tika-1.13-src.zip.sha
    dev/tika/tika-app-1.13.jar
    dev/tika/tika-app-1.13.jar.asc
    dev/tika/tika-app-1.13.jar.md5
    dev/tika/tika-app-1.13.jar.sha
    dev/tika/tika-server-1.13.jar
    dev/tika/tika-server-1.13.jar.asc
    dev/tika/tika-server-1.13.jar.md5
    dev/tika/tika-server-1.13.jar.sha

Added: dev/tika/CHANGES-1.14.txt
==============================================================================
--- dev/tika/CHANGES-1.14.txt (added)
+++ dev/tika/CHANGES-1.14.txt Wed Oct 19 18:47:33 2016
@@ -0,0 +1,1956 @@
+Release 1.14 - 10/19/2016
+
+  * Extract all headers from MSG/RFC822 (TIKA-2122).
+
+  * Upgrade metadata-extractor to 2.9.1 (TIKA-2113).
+
+  * Extract PDF DocInfo metadata into separate keys to prevent
+    overwriting by XMP metadata (TIKA-2057).
+
+  * Re-enable fileUrl for tika-server (TIKA-2081).  If you choose,
+    to use this feature, beware of the security vulnerabilities!
+    See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271
+
+  * Add Tesseract's hOCR output format as an option, via Eric Pugh
+    (TIKA-2093)
+
+  * Extract macros from MSOffice files (TIKA-2069).
+
+  * Maintain passed-in mime in TXTParser (TIKA-2047).
+
+  * Upgrade to POI.3-15 (TIKA-2013).
+
+  * Upgrade to PDFBox 2.0.3 (TIKA-2051).
+
+  * Fix hyperlinks with formatting in DOC and DOCX (TIKA-1255
+    and TIKA-2078)
+
+  * Tika now is integrated with the Tensorflow library from Google 
+    and it can use its Inception v3 image classification model to 
+    identify objects in images (TIKA-1993).
+
+  * Parser configuration is now type-safe and parameters for parsers
+    can have assigned types (TIKA-1508, TIKA-1986).
+
+  * Prevent OOM/permanent hang on some corrupt CHM files (TIKA-2040).
+
+  * Upgrade ICU4J charset detection components to fix multithreading
+    bug (TIKA-2041).
+
+  * Upgrade to Jackcess 2.1.4 (TIKA-2039).
+
+  * Maintain more significant digits in cells of "General" format
+    in XLS and XLSX (TIKA-2025).
+
+  * Avoid mark/reset issues when extracting or detecting embedded resources
+    in RFC822 emails (TIKA-2037).
+
+  * Improving accuracy of Tesseract for better extraction of numeric 
+    and alphanumeric text from images (TIKA-2021, TIKA-2031).
+
+  * Improve extraction of embedded documents from PPT, PPTX and XLSX
+    (TIKA-2026).
+
+  * Add parser for applefile (AppleSingle) (TIKA-2022).
+
+  * Add mime types, mime magic and/or globs for:
+     * Endnote Import File (TIKA-2011)
+     * DJVU files (TIKA-2009)
+     * MS Owner File (TIKA-2008)
+     * Windows Media Metafile (TIKA-2004)
+     * iCal and vCalendar (TIKA-2006)
+     * MBOX (TIKA-2042)
+     * Stata DTA (TIKA-2064)
+
+  * Add configurable maximum threshold for number of events extracted
+    from the XMP Media Management Schema in JempboxExtractor (TIKA-1999).
+
+  * Integrate TesseractOCR with full page image rendering for PDFs (TIKA-1994).
+
+  * Add mime detection via Nick C and parser for DBF files (TIKA-1513).
+  
+  * Add mime detection and parsers for MSOffice 2003 XML Word
+    and Excel formats (TIKA-1958).
+
+  * Extract hyperlinks from PPT, PPTX, XSLX (TIKA-1454).
+
+  * Upgrade to Commons Compress 1.12 (supports progress on TIKA-1358)
+
+Release 1.13 - 05/08/2016
+
+  * Upgrade to PDFBox 2.0.1 (TIKA-1285/TIKA-1959).
+    MAJOR CHANGES in PDFParser:
+    * The classic sequential parser is no longer available.
+    * Tiff files are no longer extracted by default.  See
+      https://pdfbox.apache.org/2.0/dependencies.html#optional-components
+      for optional components to process Tiff files.
+    * Some truncated/corrupted files that had some content extracted
+      with 1.8.x may have no content extracted in 2.0.x (see TIKA-1912).
+
+  * The MIT-NLP Information Extraction (MITIE) Named Entity
+    Recognition (NER) system is now supported in Tika
+    (TIKA-1913, GitHub-108).
+
+  * Tika now supports the use of the Yandex translation 
+    service (TIKA-1943, GitHub-106).
+
+  * Tika now uses NER to extract scientific measurements
+    from text using either GROBID Quantities which uses 
+    conditional random fields and NLTK which uses regular 
+    expressesions (TIKA-1917, GitHub-104).
+
+  * Fixed JournalParser to handle null responses from 
+    GROBID and to log a message (TIKA-1925).
+
+  * Refactored Language Detector into tika-landetect module,
+    added default N-Gram implementation, Optimaize Lang
+    Detector and MIT Text.jl implementation 
+    (TIKA-1872, TIKA-1696, TIKA-1723).
+ 
+  * Extract metadata from MP4 videos whether or not the
+    PooledTimeSeries parser is available via Aditya Dhulipala
+    (TIKA-1844).
+
+  * Fix NPE when trying to get embedded image identifier in
+    WordParser (TIKA-1956).
+
+  * Improvements to MIME database for detection of Scientific
+    and other formats present in the TREC-DD-Polar dataset
+    (TIKA-1881, GitHub-85, TIKA-1883, TIKA-1884, TIKA-1886,
+     TIKA-1882).
+
+  * LinkContentHandler now extracts links from script tags
+    via Joseph Naegele (TIKA-1937).
+
+  * Handle per page IOExceptions more robustly in PDFParser (TIKA-1948).
+
+  * Upgrade commons-compress to 1.11 (TIKA-1949).
+
+  * Add detection for embedded MSChart.Graph files (TIKA-1033).
+
+  * Fix NPE in Sqlite parser from Nick C (TIKA-1927).
+
+  * Fix NPE in Open Document parser from Nick C (TIKA-1916).
+
+  * Upgrade mp4parser's isoparser to 1.1.7 (TIKA-1924 and TIKA-1931).
+
+  * Upgrade BouncyCastle to 1.54 (TIKA-1923).
+
+  * Upgrade Jackcess to 2.1.3 (TIKA-1922).
+
+  * Upgrade Drew Noakes' metadata-extractor to 2.8.1 (TIKA-1921).
+
+  * Upgrade Gson in tika-serialization to 2.6.2 (TIka-1920).
+
+  * Upgrade commons-cli in tika-batch to 1.3.1 (TIKA-1919).
+
+  * Add XMPMM support to PDFParser and JpegParser via Jempbox (TIKA-1894).
+
+  * Move serialization of TikaConfig to tika-core and enable dumping
+    of the config file via tika-app (TIKA-1657).
+
+  * Tika now incorporates the Natural Language Toolkit (NLTK) from the
+    Python community as an option for Named Entity Recognition (TIKA-1876).
+
+  * Add support for XFA extraction via Pascal Essiembre (TIKA-1857).
+
+  * Upgrade to sqlite-jdbc 3.8.11.2 (TIKA-1861).  NOTE: this dependency
+    is still <scope>provided</scope>.  You need to include this dependency
+    in order to parse sqlite files.
+
+  * Upgrade to POI 3.15-beta1 (TIKA-1895).
+
+  * Upgrade to Jackson 2.7.1 (TIKA-1869).
+
+  * Upgrade to Apache SIS 0.6 (TIKA-1878).
+
+  * RichTextContentHandler moved from the Server package to Core (TIKA-1870).
+
+  * Added ZeroSizeFileDetector to support application/x-zerovalue via
+    Adesh Gupta (TIKA-1885).  
+  
+  * Addition of types information to Grobid quantities parser via 
+    Can Menekse (TIKA-1965).
+
+Release 1.12 - 01/24/2016
+
+  * Support for iFrames and element link extraction is provided in
+    the link Content Handler (TIKA-1835).
+
+  * Slide notes are now linked to the slide XHTML in the PPT output
+    (TIKA-1840).
+
+  * JSON tests in Tika server were updated to remove impossible casts
+    (Github-73).
+
+  * Fix bug in GeoTopicParser where NER is reused instead of instantiated
+    with each request (TIKA-1834).
+
+  * Upgrade rome to 1.5.1 && Downgrade Rome dependency to 0.9 to avoid 
+    nasty NPE (TIKA-1820, TIKA-1516)
+
+  * The NamedEntityParser was enhanced to generate text content
+    in addition to metadata (TIKA-1815, TIKA-1816).
+
+  * A significant speed-up is made to the GeoTopicParser by
+    using the new REST server capabilities from Lucene Geo
+    Gazetteer (TIKA-1803).
+
+  * A parser to compute motion properties in Videos, e.g., 
+    Histogram of Oriented Gradients and Histogram of Optical Flows
+    using the Pooled Time Series algorithm, was added (TIKA-1798).
+
+  * Provide NamedEntityParser which exposes Named Entity Recognition
+    from OpenNLP and Stanford NER providers (TIKA-1787, GitHub-61,
+    GitHub-62).
+
+  * Allow XHTMLContentHandler to pass attributes of html element 
+    via Markus Jelsma (TIKA-1782).
+
+  * Fix regression with spacing in PPT via Andreas Beeker (TIKA-1777).
+
+  * Tika Facade parse methods for Path and File added which take a
+    Metadata object, to mirror the existing InputStream one (GitHub-60)
+
+  * GeoParser fix for loading the NER model from a jar file (TIKA-1791)
+
+
+Release 1.11 - 10/18/2015
+
+  * Java7 API support for allowing java.nio.file.Path as method arguments
+    was added to Tika and to ParsingReader, TikaFileTypeDetector, and to
+    Tika Config (TIKA-1745, TIKA-1746, TIKA-1751).
+
+  * MIME support was added for WebVTT: The Web Video Text Tracks Format
+    files (TIKA-1772).
+
+  * MIME magic improved to ensure emails detected as message/rfc822
+    (TIKA-1771).
+
+  * Upgrade to Jackcess Encrypt 2.1.1 to avoid binary incompatibility
+    with Bouncy Castle (TIKA-1736).
+  
+  * Make div and other markup more consistent between PPT and 
+    PPTX (TIKA-1755).
+
+  * Parse multiple authors from MSOffice's semi-colon delimited
+    author field (TIKA-1765).
+  
+  * Include CTAKESConfig.properties within tika-parsers resources 
+    by default (TIKA-1741).
+  
+  * Prevent infinite recursion when processing inline images
+    in PDF files by limiting extraction of duplicate images
+    within the same page (TIKA-1742).
+
+  * Upgrade to POI 3.13-final (via Andreas Beeker) (TIKA-1707).
+
+  * Upgraded tika-batch to use Path throughout (TIKA-1747 and
+    (TIKA-1754).
+
+  * Upgraded to Path in TikaInputStream (via Yaniv Kunda) (TIKA-1744).
+
+  * Changed default content handler type for "/rmeta" in tika-server
+    to "xml" to align with "-J" option in tika-app.  
+    Clients can now specify handler types via PathParam. (TIKA-1716).
+
+  * The fantastic GROBID (or Grobid) GeneRation Of BIbliographic Data
+    for machine learning from PDF files is now integrated as a 
+    Tika parser (TIKA-1699, TIKA-1712).
+
+  * The ability to specify the Tesseract Config Path was added
+    to the OCR Parser (TIKA-1703).
+
+  * Upgraded to ASM 5.0.4 (TIKA-1705).
+
+  * Corrected Tika Config XML detector definition explicit loading 
+    of MimeTypes (TIKA-1708)
+
+  * In Tika Parsers, Batch, Server, App and Examples, use Apache
+    Commons IO instead of inlined ex-Commons classes, and the Java 7
+    Standard Charset definitions (TIKA-1710)
+
+  * Upgraded to Commons Compress 1.10, which enables zlib compressed
+    archives support (TIKA-1718)
+
+
+Release 1.10 - 8/1/2015
+
+  * Tika Config XML can now be used to create composite detectors,
+    and exclude detectors that DefaultDetector would otherwise
+    have used. This brings support in-line with Parsers. (TIKA-1702)
+
+  * Reverted to legacy sort order of parsers that was 
+    mistakenly reversed in Tika 1.9 (TIKA-1689).
+
+  * Upgrade to POI 3.13-beta1 (TIKA-1667).
+
+  * Upgrade to PDFBox 1.8.10 (TIKA-1588).
+
+  * MimeTypes now tries to find a registered type with and 
+    without parameters (TIKA-1692).
+
+  * Added more robust error handling for encoding detection
+    of .MSG files (TIKA-1238).
+
+  * Fixed bug in Tika's use of the Jackcess parser that 
+    prevented reading of v97 Access files (TIKA-1681).
+
+  * Upgrade xerial.org's sqlite-jdbc to 3.8.10.1. NOTE: 
+    as of Tika 1.9, this jar is "provided." Make sure 
+    to upgrade your provided jar! (TIKA-1687).
+
+  * Add header/footer extraction to xls (via Aeham Abushwashi)
+    (TIKA-1400).
+
+  * Drop the source file name from the embedded file path in
+    RecursiveParserWrapper's "X-TIKA:embedded_resource_path" 
+    (TIKA-1673).
+
+  * Upgraded to Java 7 (TIKA-1536).
+
+  * Non-standards compliant emails are now correctly detected
+    as message/rfc822 (TIKA-1602).
+
+  * Added parser for MS Access files via Jackcess. Many thanks 
+    to Health Market Science, Brian O'Neill and James Ahlborn 
+    for relicensing Jackcess to Apache v2! (TIKA-1601)
+
+  * GDALParser now correctly sets "nitf" as a supported 
+    MediaType (TIKA-1664).
+
+  * Added DigestingParser to calculate digest hashes 
+    and record them in metadata. Integrated with
+    tika-app and tika-server (TIKA-1663).
+
+  * Fixed ZipContainerDetector to detect all IPA files
+    (TIKA-1659).
+
+
+Release 1.9 - 6/6/2015
+
+  * The ability to use the cTAKES clinical text
+    knowledge extraction system for biomedical data is 
+    now included as a Tika parser (TIKA-1645, TIKA-1642).
+
+  * Tika-server allows a user to specify the Tika config
+    from the command line (TIKA-1652, TIKA-1426).
+
+  * Matlab file detection has been improved (TIKA-1634).
+
+  * The EXIFTool was added as an External parser
+    (TIKA-1639).
+
+  * If FFMPEG is installed and on the PATH, it is a 
+    usable Parser in Tika now (TIKA-1510).
+
+  * Fixes have been applied to the ExternalParser to make
+    it functional (TIKA-1638).
+
+  * Tika service loading can now be more verbose with the 
+    org.apache.tika.service.error.warn system property (TIKA-1636).
+
+  * Tika Server now allows for metadata extraction from remote
+    URLs and in addition it outputs the detected language as a
+    metadata field (TIKA-1625).
+
+  * OUTPUT_FILE_TOKEN not being replaced in ExternalParser 
+    contributed by Pascal Essiembre (TIKA-1620).
+
+  * Tika REST server now supports language identification
+    (TIKA-1622).
+
+  * All of the example code from the Tika in Action book has 
+    been donated to Tika and added to tika-examples (TIKA-1562).
+
+  * Tika server now logs errors determining ContentDisposition
+    (TIKA-1621).
+
+  * An algorithm for using Byte Histogram frequencies to construct
+    a Neural Network and to perform MIME detection was added
+    (TIKA-1582).
+
+  * A Bayesian algorithm for MIME detection by probabilistic
+    means was added (TIKA-1517).
+
+  * Tika now incorporates the Apache Spatial Information
+    System capability of parsing Geographic ISO 19139 
+    files (TIKA-443). It can also detect those files as
+    well.
+
+  * Update the MimeTypes code to support inheritance
+    (TIKA-1535).
+
+  * Provide ability to parse and identify Global Change 
+    Master Directory Interchange Format (GCMD DIF) 
+    scientific data files (TIKA-1532).
+
+  * Improvements to detect CBOR files by extension (TIKA-1610).
+
+  * Change xerial.org's sqlite-jdbc jar to "provided" (TIKA-1511).
+    Users will now need to add sqlite-jdbc to their classpath for
+    the Sqlite3Parser to work.
+
+  * ExternalParser.check now catches (suppresses) SecurityException
+    and returns false, so it's OK to run Tika with a security policy
+    that does not allow execution of external processes (TIKA-1628).
+
+Release 1.8 - 4/13/2015
+
+  * Fix null pointer when processing ODT footer styles (TIKA-1600).
+
+  * Upgrade to com.drewnoakes' metadata-extractor to 2.0 and
+    add parser for webp metadata (TIKA-1594).
+
+  * Duration extracted from MP3s with no ID3 tags (TIKA-1589).
+
+  * Upgraded to PDFBox 1.8.9 (TIKA-1575).
+
+  * Tika now supports the IsaTab data standard for bioinformatics
+    both in terms of MIME identification and in terms of parsing
+    (TIKA-1580).
+
+  * Tika server can now enable CORS requests with the command line
+    "--cors" or "-C" option (TIKA-1586).
+
+  * Update jhighlight dependency to avoid using LGPL license. Thank
+    @kkrugler for his great contribution (TIKA-1581).
+  
+  * Updated HDF and NetCDF parsers to output file version in 
+    metadata (TIKA-1578 and TIKA-1579).
+
+  * Upgraded to POI 3.12-beta1 (TIKA-1531).
+
+  * Added tika-batch module for directory to directory batch
+    processing.  This is a new, experimental capability, and the API will 
+    likely change in future releases (TIKA-1330).
+
+  * Translator.translate() Exceptions are now restricted to
+    TikaException and IOException (TIKA-1416).
+
+  * Tika now supports MIME detection for Microsoft Extended 
+    Makefiles (EMF) (TIKA-1554).
+
+  * Tika has improved delineation in XML and HTML MIME detection
+    (TIKA-1365).
+
+  * Upgraded the Drew Noakes metadata-extractor to version 2.7.2
+    (TIKA-1576).
+
+  * Added basic style support for ODF documents, contributed by
+    Axel Dörfler (TIKA-1063).
+
+  * Move Tika server resources and writers to separate
+    org.apache.tika.server.resource and writer packages (TIKA-1564).
+
+  * Upgrade UCAR dependencies to 4.5.5 (TIKA-1571).
+  
+  * Fix Paths in Tika server welcome page (TIKA-1567).
+
+  * Fixed infinite recursion while parsing some PDFs (TIKA-1038).
+
+  * XHTMLContentHandler now properly passes along body attributes,
+    contributed by Markus Jelsma (TIKA-995).
+
+  * TikaCLI option --compare-file-magic to report mime types known to
+    the file(1) tool but not known / fully known to Tika.
+
+  * MediaTypeRegistry support for returning known child types.
+
+  * Support for excluding (blacklisting) certain Parsers from being
+    used by DefaultParser via the Tika Config file, using the new
+    parser-exclude tag (TIKA-1558).
+
+  * Detect Global Change Master Directory (GCMD) Directory
+    Interchange Format (DIF) files (TIKA-1561).
+
+  * Tika's JAX-RS server can now return stacktraces for
+    parse exceptions (TIKA-1323).
+
+  * Added MockParser for testing handling of exceptions, errors
+    and hangs in code that uses parsers (TIKA-1553).
+
+  * The ForkParser service removed from Activator. Rollback of (TIKA-1354).
+
+  * Increased the speed of language identification by 
+    a factor of two -- contributed by Toke Eskildsen (TIKA-1549).
+
+  * Added parser for Sqlite3 db files. Some users will need to 
+    exclude the dependency on xerial.org's sqlite-jdbc because
+    it contains native libs (TIKA-1511).
+
+  * Use POST instead of PUT for tika-server form methods
+    (TIKA-1547).
+
+  * A basic wrapper around the UNIX file command was 
+    added to extract Strings. In addition a parse to 
+    handle Strings parsing from octet-streams using Latin1
+    charsets as added (TIKA-1541, TIKA-1483).
+
+  * Add test files and detection mechanism for Gridded
+    Binary (GRIB) files (TIKA-1539).
+
+  * The RAR parser was updated to handle Chinese characters 
+    using the functionality provided by allowing encoding to
+    be used within ZipArchiveInputStream (TIKA-936).
+
+  * Fix out of memory error in surefire plugin (TIKA-1537).
+
+  * Build a parser to extract data from GRIB formats (TIKA-1423).
+
+  * Upgrade to Commons Compress 1.9 (TIKA-1534).
+
+  * Include media duration in metadata parsed by MP4Parser (TIKA-1530).
+
+  * Support password protected 7zip files (using a PasswordProvider,
+    in keeping with the other password supporting formats) (TIKA-1521).
+
+  * Password protected Zip files should not trigger an exception (TIKA-1028).
+
+Release 1.7 - 1/9/2015
+
+  * Fixed resource leak in OutlookPSTParser that caused TikaException 
+    when invoked via AutoDetectParser on Windows (TIKA-1506).
+
+  * HTML tags are properly stripped from content by FeedParser
+    (TIKA-1500).
+
+  * Tika Server support for selecting a single metadata key;
+    wrapped MetadataEP into MetadataResource (TIKA-1499).
+
+  * Tika Server support for JSON and XMP views of metadata (TIKA-1497).
+
+  * Tika Parent uses dependency management to keep duplicate 
+    dependencies in different modules the same version (TIKA-1384).
+
+  * Upgraded slf4j to version 1.7.7 (TIKA-1496).
+
+  * Tika Server support for RecursiveParserWrapper's JSON output
+    (endpoint=rmeta) equivalent to (TIKA-1451's) -J option 
+    in tika-app (TIKA-1498).
+
+  * Tika Server support for providing the password for files on a 
+    per-request basis through the Password http header (TIKA-1494).
+
+  * Simple support for the BPG (Better Portable Graphics) image format
+    (TIKA-1491, TIKA-1495).
+
+  * Prevent exceptions from being thrown for some malformed
+    mp3 files (TIKA-1218).
+
+  * Reformat pom.xml files to use two spaces per indent (TIKA-1475).
+
+  * Fix warning of slf4j logger on Tika Server startup (TIKA-1472).
+
+  * Tika CLI and GUI now have option to view JSON rendering of output
+    of RecursiveParserWrapper (TIKA-1451).
+
+  * Tika now integrates the Geospatial Data Abstraction Library
+    (GDAL) for parsing hundreds of geospatial formats (TIKA-605,
+    TIKA-1503).
+
+  * ExternalParsers can now use Regexs to specify dynamic keys
+   (TIKA-1441).
+
+  * Thread safety issues in ImageMetadataExtractor were resolved
+    (TIKA-1369).
+ 
+  * The ForkParser service is now registered in Activator
+    (TIKA-1354).
+
+  * The Rome Library was upgraded to version 1.5 (TIKA-1435).
+
+  * Add markup for files embedded in PDFs (TIKA-1427).
+ 
+  * Extract files embedded in annotations in PDFS (TIKA-1433).
+
+  * Upgrade to PDFBox 1.8.8 (TIKA-1419, TIKA-1442).
+
+  * Add RecursiveParserWrapper (aka Jukka's and Nick's) 
+    RecursiveMetadataParser (TIKA-1329)
+
+  * Add example for how to dump TikaConfig to XML (TIKA-1418).
+
+  * Allow users to specify a tika config file for tika-app (TIKA-1426).
+
+  * PackageParser includes the last-modified date from the archive
+    in the metadata, when handling embedded entries (TIKA-1246)
+
+  * Created a new Tesseract OCR Parser to extract text from images.
+    Requires installation of Tesseract before use (TIKA-93).
+
+  * Basic parser for older Excel formats, such as Excel 4, 5 and 95,
+    which can get simple text, and metadata for Excel 5+95 (TIKA-1490)
+
+
+Release 1.6 - 08/31/2014
+
+  * Parse output should indicate which Parser was actually used
+    (TIKA-674).
+
+  * Use the forbidden-apis Maven plugin to check for unsafe Java
+    operations (TIKA-1387).
+
+  * Created an ExternalTranslator class to interface with command
+    line Translators (TIKA-1385).
+
+  * Created a MosesTranslator as a subclass of ExternalTranslator
+    that calls the Moses Decoder machine translation program (TIKA-1385).
+
+  * Created the tika-example module. It will have examples of how to
+    use the main Tika interfaces (TIKA-1390).
+
+  * Upgraded to Commons Compress 1.8.1 (TIKA-1275).
+
+  * Upgraded to POI 3.11-beta1 (TIKA-1380).
+
+  * Tika now extracts SDTCell content from tables in .docx files (TIKA-1317).
+
+  * Tika now supports detection of the Persian/Farsi language.
+    (TIKA-1337)
+  
+  * The Tika Detector interface is now exposed through the JAX-RS
+    server (TIKA-1336, TIKA-1336).
+
+  * Tika now has support for parsing binary Matlab files as part of 
+    our larger effort to increase the number of scientific data formats 
+    supported. (TIKA-1327)
+
+  * The Tika Server URLs for the unpacker resources have been changed,
+    to bring them under a common prefix (TIKA-1324). The mapping is
+    /unpacker/{id} -> /unpack/{id}
+    /all/{id}      -> /unpack/all/{id}
+
+  * Added module and core Tika interface for translating text between
+    languages and added a default implementation that call's Microsoft's
+    translate service (TIKA-1319)
+
+  * Added an Translator implementation that calls Lingo24's Premium
+    Machine Translation API (TIKA-1381)
+
+  * Made RTFParser's list handling slightly more robust against corrupt
+    list metadata (TIKA-1305)
+
+  * Fixed bug in CLI json output (TIKA-1291/TIKA-1310)
+
+  * Added ability to turn off image extraction from PDFs (TIKA-1294).
+    Users must now turn on this capability via the PDFParserConfig.
+
+  * Upgrade to PDFBox 1.8.6 (TIKA-1290, TIKA-1231, TIKA-1233, TIKA-1352)
+
+  * Zip Container Detection for DWFX and XPS formats, which are OPC
+    based (TIKA-1204, TIKA-1221)
+
+  * Added a user facing welcome page to the Tika Server, which
+    says what it is, and a very brief summary of what is available. 
+    (TIKA-1269)
+
+  * Added Tika Server endpoints to list the available mime types,
+    Parsers and Detectors, similar to the --list-<foo> methods on
+    the Tika CLI App (TIKA-1270)
+
+  * Improvements to NetCDF and HDF parsing to mimic the output of
+    ncdump and extract text dimensions and spatial and variable
+    information from scientific data files (TIKA-1265)
+
+  * Extract attachments from RTF files (TIKA-1010)
+
+  * Support Outlook Personal Folders File Format *.pst (TIKA-623)
+  
+  * Added mime entries for additional Ogg based formats (TIKA-1259)
+
+  * Updated the Ogg Vorbis plugin to v0.4, which adds detection for a wider
+    range of Ogg formats, and parsers for more Ogg Audio ones (TIKA-1113)
+
+  * PDF: Images in PDF documents can now be extracted as embedded resources.
+    (TIKA-1268)
+
+  * Fixed RuntimeException thrown for certain Word Documents (TIKA-1251).
+
+  * CLI: TikaCLI now has another option: --list-parser-details-apt, which outputs
+    the list of supported parsers in APT format. This is used to generate the list
+    on the formats page (TIKA-411).
+
+Release 1.5 - 02/04/2014
+
+  * Fixed bug in handling of embedded file processing in PDFs (TIKA-1228).
+
+  * Added SourceCodeParser to support java, Groovy, C++ files (TIKA-1224).
+  
+  * Updated Tika Server to support multipart/form-data payloads (TIKA-1198).
+
+  * Updated Tika Server to CXF 2.7.8 (TIKA-1197).
+
+  * Updated Tika Server to accept requests over wildcard addresses (TIKA-1196).
+
+  * Added option to use alternate NonSequentialPDFParser (TIKA-1201).
+
+  * Content from PDF AcroForms is now extracted (TIKA-973).
+
+  * Fixed invalid asterisks from master slide in PPT (TIKA-1171).
+
+  * Added test cases to confirm handling of auto-date in PPT and PPTX (TIKA-817).
+ 
+  * Text from tables in PPT files is once again extracted correctly (TIKA-1076).
+  
+  * Text is extracted from text boxes in XLSX (TIKA-1100).
+
+  * Tika no longer hangs when processing Excel files with custom fraction format (TIKA-1132).
+
+  * Disconcerting stacktrace from missing beans no longer printed for some DOCX files (TIKA-792).
+
+  * Upgraded POI to 3.10-beta2 (TIKA-1173).
+
+  * Upgraded PDFBox to 1.8.4 (TIKA-1230).
+
+  * Made HtmlEncodingDetector more flexible in finding meta 
+    header charset (TIKA-1001).
+
+  * Added sanitized test HTML file for local file test (TIKA-1139).
+
+  * Fixed bug that prevented attachments within a PDF from being processed
+    if the PDF itself was an attachment (TIKA-1124).
+
+  * Text from paragraph-level structured document tags in DOCX files is now extracted (TIKA-1130).
+
+  * RTF: Fixed ArrayIndexOutOfBoundsException when parsing list override (TIKA-1192).
+
+  * CLI: TikaCLI now escapes invalid filename characters as hex
+    characters (TIKA-1078).
+
+Release 1.4 - 06/15/2013
+
+  * Removed a test HTML file with a poorly chosen GPL text in it (TIKA-1129).
+
+  * Improvements to tika-server to allow it to produce text/html and
+    text/xml content (TIKA-1126, TIKA-1127).
+
+  * Improvements were made to the Compressor Parser to handle g'zipped files
+    that require the decompressConcatenated option set to true (TIKA-1096).
+
+  * Addressed a typographic error that was preventing from detection of 
+    awk files (TIKA-1081).
+
+  * Added a new end-point to Tika's JAX-RS REST server that only detects
+    the media-type based on a small portion of the document submitted
+   (TIKA-1047).
+
+  * RTF: Ordered and unordered lists are now extracted (TIKA-1062).
+
+  * MP3: Audio duration is now extracted (TIKA-991)
+
+  * Java .class files: upgraded from ASM 3.1 to ASM 4.1 for parsing
+    the Java bytecodes (TIKA-1053).
+
+  * Mime Types: Definitions extended to optionally include Link (URL) and
+    UTI, along with details for several common formats (TIKA-1012 / TIKA-1083)
+
+  * Exceptions when parsing OLE10 embedded documents, when parsing
+    summary information from Office documents, and when saving
+    embedded documennts in TikaCLI are now logged instead
+    of aborting extraction (TIKA-1074)
+
+  * MS Word: line tabular character is now replaced with newline
+    (TIKA-1128)
+
+  * XML: ElementMetadataHandlers can now optionally accept duplicate
+    and empty values (TIKA-1133)
+
+Release 1.3 - 01/19/2013
+
+  * Mimetype definitions added for more common programming languages,
+    including common extensions, but not magic patterns. (TIKA-1055)
+
+  * MS Word: When a Word (.doc) document contains embedded files or
+    links to external documents, Tika now places a <div
+    class="embedded" id="_XXX"/> placeholder into the XHTML so you can
+    see where in the main text the embedded document occurred
+    (TIKA-956, TIKA-1019).  Embedded Wordpad/RTF documents are now
+    recognized (TIKA-982).
+
+  * PDF: Text from pop-up annotations is now extracted (TIKA-981).
+    Text from bookmarks is now extracted (TIKA-1035).
+
+  * PKCS7: Detached signatures no longer through NullPointerException
+    (TIKA-986).
+
+  * iWork: The chart name for charts embedded in numbers documents is
+    now extracted (TIKA-918).
+
+  * CLI: TikaCLI -m now handles multi-valued metadata keys correctly
+    (previously it only printed the first value).  (TIKA-920)
+
+  * MS Word (.docx): When a Word (.docx) document contains embedded
+    files, Tika now places a <div class="embedded" id="XXX"/> into the
+    XHTML so you can see where in the main text the embedded document
+    occurred.  The id (rId) is included in the Metadata of each
+    embedded document as the new Metadata.EMBEDDED_RELATIONSHIP_ID
+    key, and TikaCLI prepends the rId (if present) onto the filename
+    it extracts (TIKA-989).  Fixed NullPointerException when style is
+    null (TIKA-1006).  Text inside text boxes is now extracted
+    (TIKA-1005).
+
+  * RTF: Page, word, character count and creation date metadata are
+    now extracted for RTF documents (TIKA-999).
+
+  * MS PowerPoint (.pptx): When a PowerPoint (.pptx) document contains
+    embedded files, Tika now places a <div class="embedded" id="XXX"/> into the
+    XHTML so you can see where in the main text the embedded document
+    occurred.  The id (rId) is included in the Metadata of each
+    embedded document as the new Metadata.EMBEDDED_RELATIONSHIP_ID
+    key, and TikaCLI prepends the rId (if present) onto the filename
+    it extracts (TIKA-997, TIKA-1032).
+
+  * MS PowerPoint (.ppt): When a PowerPoint (.ppt) document contains
+    embedded files, Tika now places a <div class="embedded" id="XXX"/> into the
+    XHTML so you can see where in the main text the embedded document
+    occurred (TIKA-1025).  Text from the master slide is now extracted
+    (TIKA-712).
+
+  * MHTML: fixed Null charset name exception when a mime part has an
+    unrecognized charset (TIKA-1011).
+
+  * MP3: if an ID3 tag was encoded in UTF-16 with only the BOM then on
+    certain JVMs this would incorrectly extract the BOM as the tag's
+    value (TIKA-1024).
+
+  * ZIP: placeholders (<div class="embedded" id="<entry name>"/>) are
+    now left in the XHTML so you can see where each archive member
+    appears (TIKA-1036). TikaCLI would hit FileNotFoundException when
+    extracting files that were under sub-directories from a ZIP
+    archive, because it failed to create the parent directories first
+    (TIKA-1031).
+
+  * XML: a space character is now added before each element
+    (TIKA-1048)
+
+Release 1.2 - 07/10/2012
+---------------------------------
+
+  * Tika's JAX-RS based Network server now is based on Apache CXF,
+    which is available in Maven Central and now allows the server
+    module to be packaged and included in our release
+    (TIKA-593, TIKA-901).
+
+  * Tika: parseToString now lets you specify the max string length
+    per-call, in addition to per-Tika-instance. (TIKA-870)
+
+  * Tika now has the ability to detect FITS (Flexible Image Transport System) 
+    files (TIKA-874).
+
+  * Images: Fixed file handle leak in ImageParser. (TIKA-875)
+
+  * iWork: Comments in Pages files are now extracted (TIKA-907).
+    Headers, footers and footnotes in Pages files are now extracted
+    (TIKA-906).  Don't throw NullPointerException on passsword
+    protected iWork files, even though we can't parse their contents
+    yet (TIKA-903).  Text extracted from Keynote text boxes and bullet
+    points no longer runs together (TIKA-910). Also extract text for
+    Pages documents created in layout mode (TIKA-904).  Table names
+    are now extracted in Numbers documents (TIKA-924).  Content added
+    to master slides is also extracted (TIKA-923).
+
+  * Archive and compression formats: The Commons Compress dependency was
+    upgraded from 1.3 to 1.4.1. With this change Tika can now parse also
+    Unix dump archives and documents compressed using the XZ and Pack200
+    compression formats. (TIKA-932)
+
+  * KML: Tika now has basic support for Keyhole Markup Language documents
+    (KML and KMZ) used by tools like Google Earth. See also
+    http://www.opengeospatial.org/standards/kml/. (TIKA-941)
+
+  * CLI: You can now use the TIKA_PASSWORD environment variable or the
+    --password=X command line option to specify the password that Tika CLI
+    should use for opening encrypted documents (TIKA-943).
+
+  * Character encodings: Tika's character encoding detection mechanism was
+    improved by adding integration to the juniversalchardet library that
+    implements Mozilla's universal charset detection algorithm. The slower
+    ICU4J algorithms are still used as a fallback thanks to their wider
+    coverage of custom character encodings. (TIKA-322, TIKA-471)
+
+  * Charset parameter: Related to the character encoding improvements
+    mentioned above, Tika now returns the detected character encoding as
+    a "charset" parameter of the content type metadata field for text/plain
+    and text/html documents. For example, instead of just "text/plain", the
+    returned content type will be something like "text/plain; charset=UTF-8"
+    for a UTF-8 encoded text document. Character encoding information is still
+    present also in the content encoding metadata field for backwards
+    compatibility, but that field should be considered deprecated. (TIKA-431)
+
+  * Extraction of embedded resources from OLE2 Office Documents, where
+    the resource isn't another office document, has been fixed (TIKA-948)
+
+Release 1.1 - 3/7/2012
+---------------------------------
+
+ * Link Extraction: The rel attribute is now extracted from 
+   links per the LinkConteHandler. (TIKA-824)
+
+ * MP3: Fixed handling of UTF-16 (two byte) ID3v2 tags (previously
+   the last character in a UTF-16 tag could be corrupted) (TIKA-793)
+
+ * Performance: Loading of the default media type registry is now
+   significantly faster. (TIKA-780)
+
+ * PDF: Allow controlling whether overlapping duplicated text should
+   be removed.  Disabling this (the default) can give big
+   speedups to text extraction and may workaround cases where
+   non-duplicated characters were incorrectly removed (TIKA-767).
+   Allow controlling whether text tokens should be sorted by their x/y
+   position before extracting text (TIKA-612); this is necessary for
+   certain PDFs.  Fixed cases where too many </p> tags appear in the
+   XHTML output, causing NPE when opening some PDFs with the GUI
+   (TIKA-778).
+
+ * RTF: Fixed case where a font change would result in processing
+   bytes in the wrong font's charset, producing bogus text output
+   (TIKA-777).  Don't output whitespace in ignored group states,
+   avoiding excessive whitespace output (TIKA-781).  Binary embedded
+   content (using \bin control word) is now skipped correctly;
+   previously it could cause the parser to incorrectly extract binary
+   content as text (TIKA-782).
+
+ * CLI: New TikaCLI option "--list-detectors", which displays the
+   mimetype detectors that are available, similar to the existing
+   "--list-parsers" option for parsers. (TIKA-785).
+
+ * Detectors: The order of detectors, as supplied via the service
+   registry loader, is now controlled. User supplied detectors are
+   prefered, then Tika detectors (such as the container aware ones),
+   and finally the core Tika MimeTypes is used as a backup. This
+   allows for specific, detailed detectors to take preference over
+   the default mime magic + filename detector. (TIKA-786)
+
+ * Microsoft Project (MPP): Filetype detection has been fixed,
+   and basic metadata (but no text) is now extracted. (TIKA-789)
+
+ * Outlook: fixed NullPointerException in TikaGUI when messages with
+   embedded RTF or HTML content were filtered (TIKA-801).
+
+ * Ogg Vorbis and FLAC: Parser added for Ogg Vorbis and FLAC audio
+   files, which extract audio metadata and tags (TIKA-747)
+
+ * MP4: Improved mime magic detection for MP4 based formats (including
+   QuickTime, MP4 Video and Audio, and 3GPP) (TIKA-851)
+
+ * MP4: Basic metadata extracting parser for MP4 files added, which includes
+   limited audio and video metadata, along with the iTunes media metadata
+   (such as Artist and Title) (TIKA-852)
+
+ * Document Passwords: A new ParseContext object, PasswordProvider,
+   has been added. This provides a way to supply the password for 
+   a document during processing. Currently, only password protected
+   PDFs and Microsoft OOXML Files are supported. (TIKA-850)
+
+Release 1.0 - 11/4/2011
+---------------------------------
+
+The most notable changes in Tika 1.0 over previous releases are:
+
+ * API: All methods, classes and interfaces that were marked as
+   deprecated in Tika 0.10 have been removed to clean up the API
+   (TIKA-703). You may need to adjust and recompile client code
+   accordingly. The declared OSGi package versions are now 1.0, and
+   will thus not resolve for client bundles that still refer to 0.x
+   versions (TIKA-565).
+
+ * Configuration: The context class loader of the current thread is
+   no longer used as the default for loading configured parser and
+   detector classes. You can still pass an explicit class loader
+   to the configuration mechanism to get the previous behaviour.
+   (TIKA-565)
+
+ * OSGi: The tika-core bundle will now automatically pick up and use
+   any available Parser and Detector services when deployed to an OSGi
+   environment. The tika-parsers bundle provides such services based on
+   for all the supported file formats for which the upstream parser library
+   is available. If you don't want to track all the parser libraries as
+   separate OSGi bundles, you can use the tika-bundle bundle that packages
+   tika-parsers together with all its upstream dependencies. (TIKA-565)
+
+ * RTF: Hyperlinks in RTF documents are now extracted as an <a
+   href=...>...</a> element (TIKA-632). The RTF parser is also now
+   more robust when encountering too many closing {'s vs. opening {'s
+   (TIKA-733).
+
+ * MS Word: From Word (.doc) documents we now extract optional hyphen
+   as Unicode zero-width space (U+200B), and non-breaking hyphen as
+   Unicode non-breaking hyphen (U+2011). (TIKA-711)
+
+ * Outlook: Tika can now process also attachments in Outlook messages.
+   (TIKA-396)
+
+ * MS Office: Performance of extracting embedded office docs was improved.
+   (TIKA-753)
+
+ * PDF: The PDF parser now extracts paragraphs within each page 
+   (TIKA-742) and  can now optionally extract text from PDF 
+   annotations (TIKA-738). There's also an option to enable (the 
+   default) or disable auto-space insertion (TIKA-724). 
+
+ * Language detection: Tika can now detect Belarusian, Catalan,
+   Esperanto, Galician, Lithuanian (TIKA-582), Romanian, Slovak,
+   Slovenian, and Ukrainian (TIKA-681).
+
+ * Java: Tika no longer ships retrotranslated Java 1.4 binaries along
+   with the normal ones that work with Java 5 and higher. (TIKA-744)
+
+ * OpenOffice documents: header/footer text is now extracted for text,
+   presentation and spreadsheet documents (TIKA-736)
+
+Tika 1.0 relies on the following set of major dependencies (generated using
+mvn dependency:tree from tika-parsers):
+
+   org.apache.tika:tika-parsers:bundle:1.0
+   +- org.apache.tika:tika-core:jar:1.0:compile
+   +- edu.ucar:netcdf:jar:4.2-min:compile
+   |  \- org.slf4j:slf4j-api:jar:1.5.6:compile
+   +- org.apache.james:apache-mime4j-core:jar:0.7:compile
+   +- org.apache.james:apache-mime4j-dom:jar:0.7:compile
+   +- org.apache.commons:commons-compress:jar:1.3:compile
+   +- commons-codec:commons-codec:jar:1.5:compile
+   +- org.apache.pdfbox:pdfbox:jar:1.6.0:compile
+   |  +- org.apache.pdfbox:fontbox:jar:1.6.0:compile
+   |  +- org.apache.pdfbox:jempbox:jar:1.6.0:compile
+   |  \- commons-logging:commons-logging:jar:1.1.1:compile
+   +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
+   +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
+   +- org.apache.poi:poi:jar:3.8-beta4:compile
+   +- org.apache.poi:poi-scratchpad:jar:3.8-beta4:compile
+   +- org.apache.poi:poi-ooxml:jar:3.8-beta4:compile
+   |  +- org.apache.poi:poi-ooxml-schemas:jar:3.8-beta4:compile
+   |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
+   |  \- dom4j:dom4j:jar:1.6.1:compile
+   +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
+   +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
+   +- asm:asm:jar:3.1:compile
+   +- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+   +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
+   +- rome:rome:jar:0.9:compile
+      \- jdom:jdom:jar:1.0:compile
+
+The following people have contributed to Tika 1.0 by submitting or commenting
+on the issues resolved in this release:
+
+Andrzej Bialecki
+Antoni Mylka
+Benson Margulies
+Chris A. Mattmann
+Cristian Vat
+Dave Meikle
+David Smiley
+Dennis Adler
+Erik Hetzner
+Ingo Renner
+Jeremias Maerki
+Jeremy Anderson
+Jeroen van Vianen
+John Bartak
+Jukka Zitting
+Julien Nioche
+Ken Krugler
+Mark Butler
+Maxim Valyanskiy
+Michael Bryant
+Michael McCandless 
+Nick Burch
+Pablo Queixalos
+Uwe Schindler
+Žygimantas Medelis
+
+
+See http://s.apache.org/Zk6 for more details on these contributions.
+
+
+Release 0.10 - 09/25/2011
+-------------------------
+
+The most notable changes in Tika 0.10 over previous releases are:
+
+ * A parser for CHM help files was added. (TIKA-245)
+
+ * TIKA-698: Invalid characters are now replaced with the Unicode
+   replacement character (U+FFFD), whereas before such characters were
+   replaced with spaces, so you may need to change your processing of
+   Tika's output to now handle U+FFFD.
+
+ * The RTF parser was rewritten to perform its own direct shallow
+   parse of the RTF content, instead of using RTFEditorKit from
+   javax.swing.  This fixes several issues in the old parser,
+   including doubling of Unicode characters in certain cases
+   (TIKA-683), exceptions on mal-formed RTF docs (TIKA-666), and
+   missing text from some elements (header/footer, hyperlinks,
+   footnotes, text inside pictures).
+
+ * Handling of temporary files within Tika was much improved
+   (TIKA-701, TIKA-654, TIKA-645, TIKA-153)
+
+ * The Tika GUI got a facelift and some extra features (TIKA-635)
+
+ * The apache-mime4j dependency of the email message parser was upgraded
+   from version 0.6 to 0.7 (TIKA-716). The parser also now accepts a
+   MimeConfig object in the ParseContext as configuration (TIKA-640).
+
+Tika 0.10 relies on the following set of major dependencies (generated using
+mvn dependency:tree from tika-parsers):
+
+   org.apache.tika:tika-parsers:bundle:0.10
+   +- org.apache.tika:tika-core:jar:0.10:compile
+   +- edu.ucar:netcdf:jar:4.2-min:compile
+   |  \- org.slf4j:slf4j-api:jar:1.5.6:compile
+   +- org.apache.james:apache-mime4j-core:jar:0.7:compile
+   +- org.apache.james:apache-mime4j-dom:jar:0.7:compile
+   +- org.apache.commons:commons-compress:jar:1.1:compile
+   +- commons-codec:commons-codec:jar:1.4:compile
+   +- org.apache.pdfbox:pdfbox:jar:1.6.0:compile
+   |  +- org.apache.pdfbox:fontbox:jar:1.6.0:compile
+   |  +- org.apache.pdfbox:jempbox:jar:1.6.0:compile
+   |  \- commons-logging:commons-logging:jar:1.1.1:compile
+   +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
+   +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
+   +- org.apache.poi:poi:jar:3.8-beta4:compile
+   +- org.apache.poi:poi-scratchpad:jar:3.8-beta4:compile
+   +- org.apache.poi:poi-ooxml:jar:3.8-beta4:compile
+   |  +- org.apache.poi:poi-ooxml-schemas:jar:3.8-beta4:compile
+   |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
+   |  \- dom4j:dom4j:jar:1.6.1:compile
+   +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
+   +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
+   +- asm:asm:jar:3.1:compile
+   +- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+   +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
+   +- rome:rome:jar:0.9:compile
+      \- jdom:jdom:jar:1.0:compile
+
+The following people have contributed to Tika 0.10 by submitting or commenting
+on the issues resolved in this release:
+
+   Alain Viret
+   Alex Ott
+   Alexander Chow
+   Andreas Kemkes
+   Andrew Khoury
+   Babak Farhang
+   Benjamin Douglas
+   Benson Margulies
+   Chris A. Mattmann
+   chris hudson
+   Chris Lott
+   Cristian Vat
+   Curt Arnold
+   Cynthia L Wong
+   Dave Brosius
+   David Benson
+   Enrico Donelli
+   Erik Hetzner
+   Erna de Groot
+   Gabriele Columbro
+   Gavin
+   Geoff Jarrad
+   Gregory Kanevsky
+   gunter rombauts
+   Henning Gross
+   Henri Bergius
+   Ingo Renner
+   Ingo Wiarda
+   Izaak Alpert
+   Jan H√∏ydahl
+   Jens Wilmer
+   Jeremy Anderson
+   Joseph Vychtrle
+   Joshua Turner
+   Jukka Zitting
+   Julien Nioche
+   Karl Heinz Marbaise
+   Ken Krugler
+   Kostya Gribov
+   Luciano Leggieri
+   Mads Hansen
+   Mark Butler
+   Matt Sheppard
+   Maxim Valyanskiy
+   Michael McCandless
+   Michael Pisula
+   Murad Shahid
+   Nick Burch
+   Oleg Tikhonov
+   Pablo Queixalos
+   Paul Jakubik
+   Raimund Merkert
+   Rajiv Kumar
+   Robert Trickey
+   Sami Siren
+   samraj
+   Selva Ganesan
+   Sjoerd Smeets
+   Stephen Duncan Jr
+   Tran Nam Quang
+   Uwe Schindler
+   Vitaliy Filippov
+
+See http://s.apache.org/vR for more details on these contributions.
+
+
+Release 0.9 - 02/13/2011
+------------------------
+
+The most notable changes in Tika 0.9 over previous releases are:
+
+ * A critical bugfix preventing metadata from printing to the 
+   command line when the underlying Parser didn't generate 
+   XHTML output was fixed. (TIKA-596)
+   
+ * The 0.8 version of Tika included a NetCDF jar file that pulled 
+   in tremendous amounts of redundant dependencies. This has 
+   been addressed in Tika 0.9 by republishing a minimal NetCDF 
+   jar and changing Tika to depend on that. (TIKA-556)
+   
+ * MIME detection for iWork, and OpenXML documents has been 
+   improved. (TIKA-533, TIKA-562, TIKA-588)
+   
+ * A critical backwards incompatible bug in PDF parsing that 
+   was introduced in Tika 0.8 has been fixed. (TIKA-548)
+   
+ * Support for forked parsing in separate processes was added. 
+   (TIKA-416)
+ 
+ * Tika's language identifier now supports the Lithuanian 
+   language. (TIKA-582)
+
+Tika 0.9 relies on the following set of major dependencies (generated using
+mvn dependency:tree from tika-parsers):
+
+   org.apache.tika:tika-parsers:bundle:0.9
+   +- org.apache.tika:tika-core:jar:0.9:compile
+   +- edu.ucar:netcdf:jar:4.2-min:compile
+   |  \- org.slf4j:slf4j-api:jar:1.5.6:compile
+   +- commons-httpclient:commons-httpclient:jar:3.1:compile
+   |  +- commons-logging:commons-logging:jar:1.1.1:compile (version managed from 1.0.4)
+   |  \- commons-codec:commons-codec:jar:1.2:compile
+   +- org.apache.james:apache-mime4j:jar:0.6:compile
+   +- org.apache.commons:commons-compress:jar:1.1:compile
+   +- org.apache.pdfbox:pdfbox:jar:1.4.0:compile
+   |  +- org.apache.pdfbox:fontbox:jar:1.4.0:compile
+   |  \- org.apache.pdfbox:jempbox:jar:1.4.0:compile
+   +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
+   +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
+   +- org.apache.poi:poi:jar:3.7:compile
+   +- org.apache.poi:poi-scratchpad:jar:3.7:compile
+   +- org.apache.poi:poi-ooxml:jar:3.7:compile
+   |  +- org.apache.poi:poi-ooxml-schemas:jar:3.7:compile
+   |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
+   |  \- dom4j:dom4j:jar:1.6.1:compile
+   +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
+   +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
+   +- asm:asm:jar:3.1:compile
+   +- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+   +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
+   +- rome:rome:jar:0.9:compile
+      \- jdom:jdom:jar:1.0:compile
+
+The following people have contributed to Tika 0.9 by submitting or commenting
+on the issues resolved in this release:
+
+   Alex Skochin
+   Alexander Chow
+   Antoine L.
+   Antoni Mylka
+   Benjamin Douglas
+   Benson Margulies
+   Chris A. Mattmann
+   Cristian Vat
+   Cyriel Vringer
+   David Benson
+   Erik Hetzner
+   Gabriel Miklos
+   Geoff Jarrad
+   Jukka Zitting
+   Ken Krugler
+   Kostya Gribov
+   Leszek Piotrowicz
+   Martijn van Groningen
+   Maxim Valyanskiy
+   Michel Tremblay
+   Nick Burch
+   paul
+   Paul Pearcy
+   Peter van Raamsdonk
+   Piotr Bartosiewicz
+   Reinhard Schwab
+   Scott Severtson
+   Shinsuke Sugaya
+   Staffan Olsson
+   Steve Kearns
+   Tom Klonikowski
+   ≈Ωygimantas Medelis
+
+See http://s.apache.org/qi for more details on these contributions.
+
+
+Release 0.8 - 11/07/2010
+------------------------
+
+The most notable changes in Tika 0.8 over previous releases are:
+
+ * Language identification is now dynamically configurable, 
+   managed via a config file loaded from the classpath. (TIKA-490)
+
+ * Tika now supports parsing Feeds by wrapping the underlying
+   Rome library. (TIKA-466)
+
+ * A quick-start guide for Tika parsing was contributed. (TIKA-464)
+
+ * An approach for plumbing through XHTML attributes was added. (TIKA-379)
+
+ * Media type hierarchy information is now taken into account when
+   selecting the best parser for a given input document. (TIKA-298)
+
+ * Support for parsing common scientific data formats including netCDF
+   and HDF4/5 was added (TIKA-400 and TIKA-399).
+
+ * Unit tests for Windows have been fixed, allowing TestParsers
+   to complete. (TIKA-398)
+
+Tika 0.8 relies on the following set of major dependencies (generated using
+mvn dependency:tree from tika-parsers):
+
+   org.apache.tika:tika-parsers:bundle:0.8
+   +- org.apache.tika:tika-core:jar:0.8:compile
+   +- edu.ucar:netcdf:jar:4.2:compile
+   |  \- org.slf4j:slf4j-api:jar:1.5.6:compile
+   +- commons-httpclient:commons-httpclient:jar:3.1:compile
+   |  +- commons-logging:commons-logging:jar:1.1.1:compile (version managed from 1.0.4)
+   |  \- commons-codec:commons-codec:jar:1.2:compile
+   +- org.apache.commons:commons-compress:jar:1.1:compile
+   +- org.apache.pdfbox:pdfbox:jar:1.3.1:compile
+   |  +- org.apache.pdfbox:fontbox:jar:1.3.1:compile
+   |  \- org.apache.pdfbox:jempbox:jar:1.3.1:compile
+   +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
+   +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
+   +- org.apache.poi:poi:jar:3.7:compile
+   +- org.apache.poi:poi-scratchpad:jar:3.7:compile
+   +- org.apache.poi:poi-ooxml:jar:3.7:compile
+   |  +- org.apache.poi:poi-ooxml-schemas:jar:3.7:compile
+   |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
+   |  \- dom4j:dom4j:jar:1.6.1:compile
+   +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
+   +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
+   +- asm:asm:jar:3.1:compile
+   +- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+   +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
+   +- rome:rome:jar:0.9:compile
+      \- jdom:jdom:jar:1.0:compile
+
+The following people have contributed to Tika 0.8 by submitting or commenting
+on the issues resolved in this release:
+
+   ≈Åukasz Wiktor
+   Adam Wilmer
+   Alex Baranau
+   Alex Ott
+   Andr√© Ricardo
+   Andrey Barhatov
+   Andrey Sidorenko
+   Antoni Mylka
+   Arturo Beltran
+   Attila Kir√°ly
+   Brad Greenlee
+   Bruno Dumon
+   Chris A. Mattmann
+   Chris Bamford
+   Christophe Gourmelon
+   Dave Meikle
+   David Weekly
+   Dmitry Kuzmenko
+   Erik Hetzner
+   Geoff Jarrad
+   Gerd Bremer
+   Grant Ingersoll
+   Jan H√∏ydahl
+   Jean-Philippe Ricard
+   Jeremias Maerki
+   Joao Garcia
+   Jukka Zitting
+   Julien Nioche
+   Ken Krugler
+   Liam O'Boyle
+   Mads Hansen
+   Marcel May
+   Markus Goldbach
+   Martijn van Groningen
+   Maxim Valyanskiy
+   Mike Hays
+   Miroslav Pokorny
+   Nick Burch
+   Otis Gospodnetic
+   Peter van Raamsdonk
+   Peter Wolanin
+   Peter_Lenahan@ibi.com
+   Piotr Bartosiewicz
+   Radek
+   Rajiv Kumar
+   Reinhard Schwab
+   rick cameron
+   Robert Muir
+   Sanjeev Rao
+   Simon Tyler
+   Sjoerd Smeets
+   Slavomir Varchula
+   Staffan Olsson
+   Tom De Leu
+   Uwe Schindler
+   Victor Kazakov
+
+See http://s.apache.org/ab0 for more details on these contributions.
+
+
+Release 0.7 - 3/31/2010
+-----------------------
+
+The most notable changes in Tika 0.7 over previous releases are:
+
+ * MP3 file parsing was improved, including Channel and SampleRate 
+   extraction and ID3v2 support (TIKA-368, TIKA-372). Further, audio
+   parsing mime detection was also improved for the MIDI format. (TIKA-199)
+
+ * Tika no longer relies on X11 for its RTF parsing functionality. (TIKA-386)
+
+ * A Thread-safe bug in the AutoDetectParser was discovered and 
+   addressed. (TIKA-374)
+
+ * Upgrade to PDFBox 1.0.0. The new PDFBox version improves PDF parsing
+   performance and fixes a number of text extraction issues. (TIKA-380)
+
+The following people have contributed to Tika 0.7 by submitting or commenting
+on the issues resolved in this release:
+
+   Adam Rauch
+   Benson Margulies
+   Brett S.
+   Chris A. Mattmann
+   Daan de Wit
+   Dave Meikle
+   Durville
+   Ingo Renner
+   Jukka Zitting
+   Ken Krugler
+   Kenny Neal
+   Markus Goldbach
+   Maxim Valyanskiy
+   Nick Burch
+   Sami Siren
+   Uwe Schindler
+
+See http://tinyurl.com/yklopby for more details on these contributions.
+
+
+Release 0.6 - 01/20/2010
+------------------------
+
+The most notable changes in Tika 0.6 over the previous release are:
+
+ * Mime-type detection for HTML (and all types) has been improved, allowing malformed
+   HTML files and those HTML files that require a bit more observed content
+   before the type is properly detected, are now correctly identified by 
+   the AutoDetectParser. (TIKA-327, TIKA-357, TIKA-366, TIKA-367)
+
+ * Tika now has an additional OSGi bundle packaging that includes all the
+   required parser libraries. This bundle package makes it easy to use all
+   Tika features in an OSGi environment. (TIKA-340, TIKA-342)
+
+ * The Apache POI dependency used for parsing Microsoft Office file formats
+   has been upgraded to version 3.6. The most visible improvement in this
+   version is the notably reduced ooxml jar file size. The tika-app jar size
+   is now down to 15MB from the 25MB in Tika 0.5. (TIKA-353)
+
+ * Handling of character encoding information in input metadata and HTML
+   <meta> tags has been improved. When no applicable encoding information is
+   available, the encoding is detected by looking at the input data.
+   (TIKA-332, TIKA-334, TIKA-335, TIKA-341) 
+
+ * Some document types like Excel spreadsheets contain content like
+   numbers or formulas whose exact text format depends on the current locale.
+   So far Tika has used the platform default locale in such cases, but
+   clients can now explicitly specify the locale by passing a Locale instance
+   in the parse context. (TIKA-125)
+
+ * The default text output encoding of the tika-app jar is now UTF-8
+   when running on Mac OS X. This is because the default encoding used
+   by Java is not compatible with the console application in Mac OS X.
+   On all other platforms the text output from tika-app still uses
+   the platform default encoding. (TIKA-324)
+
+ * A flash video (video/x-flv) parser has been added. (TIKA-328)
+ 
+ * The handling of Number and Date cell formatting within the Microsoft Excel
+   documents has been added. This include currencies, percentages and
+   scientific formats. (TIKA-103)
+   
+The following people have contributed to Tika 0.6 by submitting or commenting
+on the issues resolved in this release:
+
+   Andrzej Bialecki
+   Bertrand Delacretaz
+   Chris A. Mattmann
+   Dave Meikle
+   Erik Hetzner
+   Felix Meschberger
+   Jukka Zitting
+   Julien Nioche
+   Ken Krugler
+   Luke Nezda
+   Maxim Valyanskiy
+   Niall Pemberton
+   Peter Wolanin
+   Piotr B.
+   Sami Siren
+   Yuan-Fang Li
+
+See http://tinyurl.com/yc3dk67 for more details on these contributions.
+
+
+Release 0.5 - 11/14/2009
+------------------------
+
+The most notable changes in Tika 0.5 over the previous release are:
+
+ * Improved RDF/OWL mime detection using both MIME magic as well as
+   pattern matching (TIKA-309)
+
+ * An org.apache.tika.Tika facade class has been added to simplify common
+   text extraction and type detection use cases. (TIKA-269)
+
+ * A new parse context argument was added to the Parser.parse() method.
+   This context map can be used to pass things like a delegate parser or
+   other settings to the parsing process. The previous parse() method
+   signature has been deprecated and will be removed in Tika 1.0. (TIKA-275)
+
+ * A simple ngram-based language detection mechanism has been added along
+   with predefined language profiles for 18 languages. (TIKA-209)
+
+ * The media type registry in Tika was synchronized with the MIME type
+   configuration in the Apache HTTP Server. Tika now knows about 1274
+   different media types and can detect 672 of those using 927 file
+   extension and 280 magic byte patterns. (TIKA-285)
+
+ * Tika now uses the Apache PDFBox version 0.8.0-incubating for parsing PDF
+   documents. This version is notably better than the 0.7.3 release used
+   earlier. (TIKA-158)
+
+The following people have contributed to Tika 0.5 by submitting or commenting
+on the issues resolved in this release:
+
+   Alex Baranov
+   Bart Hanssens
+   Benson Margulies
+   Chris A. Mattmann
+   Daan de Wit
+   Erik Hetzner
+   Frank Hellwig
+   Jeff Cadow
+   Joachim Zittmayr
+   Jukka Zitting
+   Julien Nioche
+   Ken Krugler
+   Maxim Valyanskiy
+   MRIT64
+   Paul Borgermans
+   Piotr B.
+   Robert Newson
+   Sascha Szott
+   Ted Dunning
+   Thilo Goetz
+   Uwe Schindler
+   Yuan-Fang Li
+
+See http://tinyurl.com/yl9prwp for more details on these contributions.
+
+
+Release 0.4 - 07/14/2009
+------------------------
+
+The most notable changes in Tika 0.4 over the previous release are:
+
+  * Tika has been split to three different components for increased
+    modularity. The tika-core component contains the key interfaces and
+    core functionality of Tika, tika-parsers contains all the adapters
+    to external parser libraries, and tika-app bundles everything together
+    in a single executable jar file. (TIKA-219)
+
+  * All the three Tika components are packaged as OSGi bundles. (TIKA-228)
+
+  * Tika now uses the new Commons Compress library for improved support
+    of compression and packaging formats like gzip, bzip2, tar, cpio,
+    ar, zip and jar. (TIKA-204)
+
+  * The memory use of parsing Excel sheets with lots of numbers
+    has been considerably reduced. (TIKA-211)
+
+  * The AutoDetectParser now has basic protection against "zip bomb"
+    attacks, where a specially crafted input document can expand to
+    practically infinite amount of output text. (TIKA-216)
+
+  * The ParsingReader class can now use a thread pool or a more complex
+    execution model (java.util.concurrent.Executor) for the background
+    parsing task. (TIKA-215)
+
+  * Automatic type detection of text- and XML-based documents has been
+    improved. (TIKA-225)
+
+  * Charset detection functionality from the ICU4J library was inlined
+    in Tika to avoid the dependency to the large ICU4J jar. (TIKA-229)
+
+  * Composite parsers like the AutoDetectParser now make sure that any
+    RuntimeExceptions, IOExceptions or SAXExceptions unrelated to the given
+    document stream or content handler are converted to TikaExceptions
+    before being passed to the client. (TIKA-198, TIKA-237)
+    
+The following people have contributed to Tika 0.4 by submitting or commenting
+on the issues resolved in this release:
+
+   Chris A. Mattmann
+   Daan de Wit
+   Dave Meikle
+   David Weekly
+   Jeremias Maerki
+   Jonathan Koren
+   Jukka Zitting
+   Karl Heinz Marbaise
+   Keith R. Bennett
+   Maxim Valyanskiy
+   Niall Pemberton
+   Robert Burrell Donkin
+   Sami Siren
+   Siddharth Gargate
+   Uwe Schindler
+
+See http://tinyurl.com/mgv9o3 for more details on these contributions.
+
+
+Release 0.3 - 03/09/2009
+------------------------
+
+The most notable changes in Tika 0.3 over the previous release are:
+
+ * Tika now supports mime type glob patterns specified using
+   standard JDK 1.4 (and beyond) syntax via the isregex attribute
+   on the glob tag. See:
+
+     http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
+
+   for more information. (TIKA-194)
+
+ * Tika now supports the Office Open XML format used by
+   Microsoft Office 2007. (TIKA-152)
+
+ * All the metadata keys for Microsoft Office document properties are now
+   included as constants in the MSOffice interface. Clients should use
+   these constants instead of the raw string values to refer to specific
+   metadata items. (TIKA-186)
+
+ * Automatic detection of document types in Tika has been improved.
+   For example Tika can now detect plain text just by looking at the first
+   few bytes of the document. (TIKA-154)
+
+ * Tika now disables the loading of all external entities in XML files
+   that it parses as input documents. This improves security and avoids
+   problems with potentially broken references. (TIKA-185)
+
+ * Tika now replaces all invalid XML characters in the extracted text
+   content with spaces. This prevents problems when output from Tika
+   is processed with XML tools. (TIKA-180)
+
+ * The Tika CLI now correctly flushes its buffers when invoked with the
+   --text argument. This prevents the end of the text output from being
+   lost. (TIKA-179)
+
+ * Embedded text in MIDI files is now extracted. For example many karaoke
+   files contain song lyrics embedded as MIDI text.
+
+ * The text content of Microsoft Outlook message files no longer appears as
+   multiple copies in the extracted text. (TIKA-197)
+
+ * The ParsingReader class now makes most document metadata available
+   already before any of the extracted text is consumed. This makes it
+   easier for example to construct Lucene Document instances that contain
+   both extracted text and metadata. (TIKA-203)
+
+See http://tinyurl.com/tika-0-3-changes for a list of all changes in Tika 0.3.
+
+The following people have contributed to Tika 0.3 by submitting or commenting
+on the issues resolved in this release:
+
+   Andrzej Rusin
+   Chris A. Mattmann
+   Dave Meikle
+   Georger Ara√∫jo
+   Guillermo Arribas
+   Jonathan Koren
+   Jukka Zitting
+   Karl Heinz Marbaise
+   Kumar Raja Jana
+   Paul Borgermans
+   Peter Becker
+   S√©bastien Michel
+   Uwe Schindler
+
+See http://tinyurl.com/tika-0-3-contributions for more details on
+these contributions.
+
+
+Release 0.2 - 12/04/2008
+------------------------
+
+1.  TIKA-109 - WordParser fails on some Word files (Dave Meikle)
+
+2.  TIKA-105 - Excel parser implementation based on POI's Event API
+               (Niall Pemberton)
+
+3.  TIKA-116 - Streaming parser for OpenDocument files (Jukka Zitting)
+
+4.  TIKA-117 - Drop JDOM and Jaxen dependencies (Jukka Zitting)
+
+5.  TIKA-115 - Tika package with all the dependencies (Jukka Zitting)
+
+6.  TIKA-97  - Tika GUI (Jukka Zitting)
+
+7.  TIKA-96  - Tika CLI (Jukka Zitting)
+
+8.  TIKA-112 - Use Commons IO 1.4 (Jukka Zitting)
+
+9.  TIKA-127 - Add support for Visio files (Jukka Zitting)
+
+10. TIKA-129 - node() support for the streaming XPath utility (Jukka Zitting)
+
+11. TIKA-130 - self-or-descendant axis does not match self in streaming XPath
+               (Jukka Zitting)
+
+12. TIKA-131 - Lazy XHTML prefix generation (Jukka Zitting)
+
+13. TIKA-128 - HTML parser should produce XHTML SAX events (Jukka Zitting)
+
+14. TIKA-133 - TeeContentHandler constructor should use varargs (Jukka Zitting)
+
+15. TIKA-132 - Refactor Excel extractor to parse per sheet and add
+               hyperlink support (Niall Pemberton)
+
+16. TIKA-134 - mvn package does not produce packages for bin/src
+               (Karl Heinz Marbaise)
+
+17. TIKA-138 - Ignore HTML style and script content (Jukka Zitting)
+
+18. TIKA-113 - Metadata (such as title) should not be part of content
+               (Jukka Zitting)
+
+19. TIKA-139 - Add a composite parser (Jukka Zitting)
+
+20. TIKA-142 - Include application/xhtml+xml as valid mime type for XMLParser
+               (mattmann)
+
+21. TIKA-143 - Add ParsingReader (Jukka Zitting)
+
+22. TIKA-144 - Upgrade nekohtml dependency (Jukka Zitting)
+
+23. TIKA-145 - Separate NOTICEs and LICENSEs for binary and source packages
+               (Jukka Zitting)
+
+24. TIKA-146 - Upgrade to POI 3.1 (Jukka Zitting)
+
+25. TIKA-99  - Support external parser programs (Jukka Zitting)
+
+26. TIKA-149 - Parser for Zip files (Dave Meikle & Jukka Zitting)
+
+27. TIKA-150 - Parser for tar files (Jukka Zitting)
+
+28. TIKA-151 - Stream compression support (Jukka Zitting)
+
+29. TIKA-156 - Some MIME magic patterns are ignored by MimeTypes
+               (Jukka Zitting)
+
+30. TIKA-155 - Java class file parser (Dave Brosius & Jukka Zitting)
+
+31. TIKA-108 - New Tika logos (Yongqian Li & Jukka Zitting)
+
+32. TIKA-120 - Add support for retrieving ID3 tags from MP3 files
+               (Dave Meikle & Jukka Zitting)
+
+33. TIKA-54  - Outlook msg parser
+               (Rida Benjelloun, Dave Meikle & Jukka Zitting)
+
+34. TIKA-114 - PDFParser : Getting content of the document using
+               "writer.ToString ()" , some words are stuck together
+               (Dave Meikle)
+
+35. TIKA-161 - Enable PMD reports (Jukka Zitting)
+
+36. TIKA-159 - Add support for parsing basic audio types: wav, aiff, au, midi
+               (Sami Siren)
+
+37. TIKA-140 - HTML parser unable to extract text
+               (Julien Nioche & Jukka Zitting)
+
+38. TIKA-163 - GUI does not support drag and drop in Gnome or KDE (Dave Meikle)
+
+39. TIKA-166 - Update HTMLParser to parse contents of meta tags (Dave Meikle)
+
+40. TIKA-164 - Upgrade of the nekohtml dependency to 1.9.9 (Jukka Zitting)
+
+41. TIKA-165 - Upgrade of the ICU4J dependency to version 3.8 (Jukka Zitting)
+
+42. TIKA-172 - New Open Document Parser that emits structured XHTML content
+               (Uwe Schindler & Jukka Zitting)
+
+43. TIKA-175 - Retrotranslate Tika for use in Java 1.4 environments (Jukka Zitting)
+
+44. TIKA-177 - Improvements to build instruction in README (Chris Hostetter & Jukka Zitting)
+
+45. TIKA-171 - New ContentHandler for plain text output that has no problem with
+               missing white space after XHTML block tags (Uwe Schindler & Jukka Zitting)
+
+
+Release 0.1-incubating - 12/27/2007
+-----------------------------------
+
+1. TIKA-5 - Port Metadata Framework from Nutch (mattmann)
+
+2. TIKA-11 - Consolidate test classes into a src/test/java directory tree (mattmann)
+
+3. TIKA-15 - Utils.print does not print a Content having no value (jukka)
+
+4. TIKA-19 - org.apache.tika.TestParsers fails (bdelacretaz)
+
+5. TIKA-16 - Issues with data files used for testing by TestParsers (bdelacretaz)
+
+6. TIKA-14 - MimeTypeUtils.getMimeType() returns the default mime type for
+             .odt (Open Office) file (bdelacretaz)
+
+7. TIKA-12 - Add URL capability to MimeTypesUtils (jukka)
+
+8. TIKA-13 - Fix obsolete package names in config.xml (siren)
+
+9. TIKA-10 - Remove MimeInfoException catch clauses and import from TestParsers (siren)
+
+10. TIKA-8 - Replaced the jmimeinfo dependency with a trivial mime type detector (jukka)
+
+11. TIKA-7 - Added the Lius Lite code. Added missing dependencies to POM (jukka)
+
+12. TIKA-18 - "Office" interface should be renamed "MSOffice" (mattmann)
+
+13. TIKA-23 - Decouple Parser from ParserConfig (jukka)
+
+14. TIKA-6 - Port Nutch (or better) MimeType detection system into Tika (J. Charron & mattmann)
+
+15. TIKA-25 - Removed hardcoded reference to C:\oo.xml in OpenOfficeParser (K. Bennett & jukka)
+
+16. TIKA-17 - Need to support URL's for input resources. (K. Bennett & mattmann)
+
+17. TIKA-22 - Remove @author tags from the java source (mattmann)
+
+18. TIKA-21 - Simplified configuration code (jukka)
+
+19. TIKA-17 - Rename all "Lius" classes to be "Tika" classes (jukka)
+
+20. TIKA-30 - Added utility constructors to TikaConfig (K. Bennett & jukka)
+
+21. TIKA-28 - Rename config.xml to tika-config.xml or similar (mattmann)
+
+22. TIKA-26 - Use Map<String, Content> instead of List<Content> (jukka)
+
+23. TIKA-31 - protected Parser.parse(InputStream stream,
+              Iterable<Content> contents) (jukka & K. Bennett)
+
+24. TIKA-36 - A convenience method for getting a document's content's text
+              would be helpful (K. Bennett & mattmann)
+
+25. TIKA-33 - Stateless parsers (jukka)
+
+26. TIKA-38 - TXTParser adds a space to the content it reads from a file (K. Bennett & ridabenjelloun)
+
+27. TIKA-35 - Extract MsOffice properties, use RereadableInputStream devloped by K. Bennett (ridabenjelloun & K. Bennett)
+
+28. TIKA-39 - Excel parsing improvements (siren & ridabenjelloun)
+
+29. TIKA-34 - Provide a method that will return a default configuration
+              (TikaConfig) (K. Bennett & mattmann)
+
+30. TIKA-42 - Content class needs (String, String, String) constructor (K. Bennett)
+
+31. TIKA-43 - Parser interface (jukka)
+
+32. TIKA-47 - Remove TikaLogger (jukka)
+
+33. TIKA-46 - Use Metadata in Parser (jukka & mattmann)
+
+34. TIKA-48 - Merge MS Extractors and Parsers (jukka)
+
+35. TIKA-45 - RereadableInputStream needs to be able to read to
+              the end of the original stream on first rewind. (K. Bennett)
+
+36. TIKA-41 - Resource files occur twice in jar file. (jukka)
+
+37. TIKA-49 - Some files have old-style license headers, fixed (Robert Burrell Donkin & bdelacretaz)
+
+38. TIKA-51 - Leftover temp files after running Tika tests, fixed (bdelacretaz)
+
+39. TIKA-40 - Tika needs to support diverse character encodings (jukka)
+
+40. TIKA-55 - ParseUtils.getParser() method variants should have consistent parameter orders
+              (K. Bennett)
+
+41. TIKA-52 - RereadableInputStream needs to support not closing the input stream it wraps.
+              (K. Bennett via bdelacretaz)
+
+42. TIKA-53 - XHTML SAX events from parsers (jukka)
+
+43. TIKA-57 - Rename org.apache.tika.ms to org.apache.tika.parser.ms (jukka)
+
+44. TIKA-62 - Use TikaConfig.getDefaultConfig() instead of a hardcoded
+              config path in TestParsers (jukka)
+
+45. TIKA-58 - Replace jtidy html parser with nekohtml based parser (siren)
+
+46. TIKA-60 - Rename Microsoft parser classes (jukka)
+
+47. TIKA-63 - Avoid multiple passes over the input stream in Microsoft parsers
+              (jukka)
+
+48. TIKA-66 - Use Java 5 features in org.apache.tika.mime (jukka)
+
+49. TIKA-56 - Mime type detection fails with upper case file extensions such as "PDF"
+             (mattmann)
+
+50. TIKA-65 - Add encode detection support for HTML parser (siren)
+
+51. TIKA-68 - Add dummy parser classes to be used as sentinels (jukka)
+
+52. TIKA-67 - Add an auto-detecting Parser implementation (jukka)
+
+53. TIKA-70 - Better MIME information for the Open Document formats (jukka)
+
+54. TIKA-71 - Remove ParserConfig and ParserFactory (jukka)
+
+55. TIKA-83 - Create a org.apache.tika.sax package for SAX utilities (jukka)
+
+56. TIKA-84 - Add MimeTypes.getMimeType(InputStream) (jukka)
+
+57. TIKA-85 - Add glob patterns from the ASF svn:eol-style documentation (jukka)
+
+58. TIKA-100 - Structured PDF parsing (jukka)
+
+59. TIKA-101 - Improve site and build (mattmann)
+
+60. TIKA-102 - Parser implementations loading a large amount of content
+               into a single String could be problematic (Niall Pemberton)
+
+61. TIKA-107 - Remove use of assertions for argument checking (Niall Pemberton)
+
+62. TIKA-104 - Add utility methods to throw IOException with the caused
+               intialized (jukka & Niall Pemberton)
+
+63. TIKA-106 - Remove dependency on Jakarta ORO - use JDK 1.4 Regex
+               (Niall Pemberton)
+
+64. TIKA-111 - Missing license headers (jukka)
+
+65. TIKA-112 - XMLParser improvement (ridabenjelloun)

Added: dev/tika/tika-1.14-src.zip
==============================================================================
Binary file - no diff available.

Propchange: dev/tika/tika-1.14-src.zip
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/tika/tika-1.14-src.zip.asc
==============================================================================
--- dev/tika/tika-1.14-src.zip.asc (added)
+++ dev/tika/tika-1.14-src.zip.asc Wed Oct 19 18:47:33 2016
@@ -0,0 +1,11 @@
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJYB7RuAAoJEE6q+LYMHmVLAFsH/jl32rYZ4PT5XDOAruwPA5l6
+/1c8UixDcAdVymH9Y4LutBH/W6Zb7tHZpThQnTpRwkBwqPOjYiOCQC5AGGH6m4e2
+x+SPmBi4nbqKoghsflyp+H4r/alxdqk+RPLrNMMNO3Q821/MTKt549ZYVTL6cmi7
+5l4KYXVpNqwT/LC4cf+4ewN0KtNZV+fReequYyYcWIrQOzaVowLnpPmuYlsIy1gj
++1ujFlh8g/xFg+GE/E35+/4GcCJa7eZrFURuPKSduVx9+IwL2nMQiInufWcxtD1B
+wMHSOI+5XTpZ2LJNaltT0qQhy+UGUuwnCSE+1iGWYTI2L06b4ljS1Wrjf8Rtr30=
+=FK1h
+-----END PGP SIGNATURE-----

Added: dev/tika/tika-1.14-src.zip.md5
==============================================================================
--- dev/tika/tika-1.14-src.zip.md5 (added)
+++ dev/tika/tika-1.14-src.zip.md5 Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+f445d95629038f6af40ef1531632b9c8

Added: dev/tika/tika-1.14-src.zip.sha
==============================================================================
--- dev/tika/tika-1.14-src.zip.sha (added)
+++ dev/tika/tika-1.14-src.zip.sha Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+ad9152392ffe6b620c8102ab538df0579b36c520

Added: dev/tika/tika-app-1.14.jar
==============================================================================
Binary file - no diff available.

Propchange: dev/tika/tika-app-1.14.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/tika/tika-app-1.14.jar.asc
==============================================================================
--- dev/tika/tika-app-1.14.jar.asc (added)
+++ dev/tika/tika-app-1.14.jar.asc Wed Oct 19 18:47:33 2016
@@ -0,0 +1,11 @@
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJYB7PnAAoJEE6q+LYMHmVLDkgIAJrh+nHUYHe7FkphxQw4FiFS
+/eZCMvlyKSf9eezpGClE21egWDJloHVurtSC+WApXea3TR5MWnKDo6XIZMtnK5Y3
+OG164Yv0iQGmz6wOckMSLeu17fRl2s1aqD7YJaa2Dj9qBvP2Xcn7A7VmA6Yj+0oY
+iFb9mM19wrDS3J4bN8zDll7oNOCfv1onhnamewaZyzORjNi6f2dUYHbLsox8IhV7
+Vz8Mjaaehdr/pAEOrLJYuIDRz/yMZN7qyrOvHHpxDWPWxXWGelnsC0D9ZBjVvtld
+ky+RufipWNTXIBfckzsnk7F3ZQwbv1/d9cXCKR2NeSjkWfwwpWkSf9wdA3RC3+o=
+=5yK6
+-----END PGP SIGNATURE-----

Added: dev/tika/tika-app-1.14.jar.md5
==============================================================================
--- dev/tika/tika-app-1.14.jar.md5 (added)
+++ dev/tika/tika-app-1.14.jar.md5 Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+16d33ca3ec334ea4733d92a61b22f0c4

Added: dev/tika/tika-app-1.14.jar.sha
==============================================================================
--- dev/tika/tika-app-1.14.jar.sha (added)
+++ dev/tika/tika-app-1.14.jar.sha Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+72497d43c1070c55596d2bc3605f1c05f5e42971

Added: dev/tika/tika-server-1.14.jar
==============================================================================
Binary file - no diff available.

Propchange: dev/tika/tika-server-1.14.jar
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: dev/tika/tika-server-1.14.jar.asc
==============================================================================
--- dev/tika/tika-server-1.14.jar.asc (added)
+++ dev/tika/tika-server-1.14.jar.asc Wed Oct 19 18:47:33 2016
@@ -0,0 +1,11 @@
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJYB7RAAAoJEE6q+LYMHmVLgxsH/AhvXEsJjh4LvZO5elMz+Ih0
+D1Tw3xpS1NSLfvcK/99Ds6IMipWiKd0Isflc2/HQTcV60LW+TbK2AXQoldsEkYlG
+07yKU1yp6g4hVoMmvw/AlVaqntuc7bWe2E3henI7hdFBQAkQBG5g84f0hmmai23J
+4N+TCX3Hg6FxVLMajUr+McAHp8Djael9LUXC4GUCGX5kJLWZe2R1ElXSoDe8hpWt
+hvzbIkJIohEe4MPnhxCzHmZh1FYFJhvlgW3Uv6ChL+UBGoqdnzBcy5V+lu9KldRf
+/m7f3tzDcXwSOhM0pp3ZDw8pOSBIFe+tSrJ5Y/ZtPhmBZsszCCcRBktovrk6Ra4=
+=EtVw
+-----END PGP SIGNATURE-----

Added: dev/tika/tika-server-1.14.jar.md5
==============================================================================
--- dev/tika/tika-server-1.14.jar.md5 (added)
+++ dev/tika/tika-server-1.14.jar.md5 Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+39055fc71358d774b9da066f80b1141c

Added: dev/tika/tika-server-1.14.jar.sha
==============================================================================
--- dev/tika/tika-server-1.14.jar.sha (added)
+++ dev/tika/tika-server-1.14.jar.sha Wed Oct 19 18:47:33 2016
@@ -0,0 +1 @@
+4b651f0fcde4954986f5025826f5e0520befb389