You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/12/02 15:26:32 UTC
[Tika Wiki] Update of "Tika2_0RoadMap" by TimothyAllison
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "Tika2_0RoadMap" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/Tika2_0RoadMap?action=diff&rev1=4&rev2=5
= Background =
This page is intended for a discussion of changes anticipated in Tika 2.0.
- This is only a first draft from one voice. Please contribute!
+ This is only a first draft initially from one voice. Please contribute!
= Major Planned Changes =
@@ -18, +18 @@
* Allow users to build composite parsers with configurable strategies via the config file ([[https://issues.apache.org/jira/browse/TIKA-1509|TIKA-1509]] and CompositeParserDiscussion). We will be working towards this gradually through Tika 1.8 and 1.9. By Tika 2.0, however, this will be the default.
+ * Allow for easily configurable parser sub-packages. The tika-app, tika-server and tika-bundle jars are now pushing or are > 50MB. It would be great if users easily could specify a subset of parsers they care about, either a la carte or by category (image, common office files (MSOffice, PDF, etc.), environmental data) and only get the dependencies required for that subset of parsers.
+
- * Move to Java 1.7 (???)
+ * Move to Java 1.8 (???)
+
+ * Solve the complex metadata challenge; see: [[https://issues.apache.org/jira/browse/TIKA-1607|TIKA-1607]] and [[https://issues.apache.org/jira/browse/TIKA-1691|TIKA-1691]] and [[http://mail-archives.apache.org/mod_mbox/incubator-tika-dev/201510.mbox/%3c561B8B26.30105@geomatys.com%3e|ISO 19115 discussion]] .... Or at least come to some accommodation that will allow for both easy key/values access and more advanced access for those who know what they're doing.
= Minor Planned Changes =
= Wishes =
- * Allow for easily configurable parser sub-packages. The tika-app, tika-server and tika-bundle jars are now pushing or are > 30MB. It would be great if users easily could specify a subset of parsers they care about, either a la carte or by category (image, common office files (MSOffice, PDF, etc.), environmental data) and only get the dependencies required for that subset of parsers.
+