You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ju...@apache.org on 2011/10/26 19:41:36 UTC
svn commit: r1189334 - /tika/trunk/CHANGES.txt
Author: jukka
Date: Wed Oct 26 17:41:36 2011
New Revision: 1189334
URL: http://svn.apache.org/viewvc?rev=1189334&view=rev
Log:
Summarize changelog entries by feature rather than by issue
Modified:
tika/trunk/CHANGES.txt
Modified: tika/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev=1189334&r1=1189333&r2=1189334&view=diff
==============================================================================
--- tika/trunk/CHANGES.txt (original)
+++ tika/trunk/CHANGES.txt Wed Oct 26 17:41:36 2011
@@ -9,27 +9,24 @@ Release 1.0 - Current Development
to the configuration mechanism to get the previous behaviour.
(TIKA-565)
- * TIKA-632: Hyperlinks in RTF documents are now extracted as an <a
- href=...>...</a> element.
+ * RTF: Hyperlinks in RTF documents are now extracted as an
+ <a href=...>...</a> element. The RTF parser is also now more
+ robust when encountering too many closing {'s vs. opening {'s.
+ (TIKA-632, TIKA-733)
- * TIKA-733: Try to be robust when an RTF has too many closing {'s vs
- opening {'s.
-
- * TIKA-711: From Word (.doc) documents we now extract optional hyphen
+ * MS Word: From Word (.doc) documents we now extract optional hyphen
as Unicode zero-width space (U+200B), and non-breaking hyphen as
- Unicode non-breaking hyphen (U+2011).
-
- * TIKA-742: Paragraphs are now extracted within each page of a PDF
- document.
-
- * TIKA-753: Improve performance when extracting embedded office docs.
+ Unicode non-breaking hyphen (U+2011). (TIKA-711)
- * TIKA-738: Optionally extract text from PDF annotations.
+ * MS Office: Performance of extracting embedded office docs was improved.
+ (TIKA-753)
- * TIKA-724: Added option to PDFParser to enable (the default) or
- disable auto-space insertion.
+ * PDF: The PDF parser now extracts paragraphs within each page and
+ can also optionally extract text from PDF annotations. There's also
+ an option to enable (the default) or disable auto-space insertion.
+ (TIKA-742, TIKA-738, TIKA-724)
- * TIKA-582: Lithuanian was never detected by LanguageIdentifier.
+ * Language detection: Tika can now detect Lithuanian. (TIKA-582)
Release 0.10 - 09/25/2011