You are viewing a plain text version of this content. The canonical link for it is here.
- Container Extractor? - posted by Nick Burch <ni...@alfresco.com> on 2010/09/01 11:54:20 UTC, 16 replies.
- [jira] Created: (TIKA-502) Add programming language mime-types - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/01 19:51:53 UTC, 0 replies.
- [jira] Updated: (TIKA-502) Add programming language mime-types - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/01 19:55:53 UTC, 0 replies.
- [jira] Created: (TIKA-503) Add a ContentHandler for collecting links from parser output - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/02 12:55:52 UTC, 1 replies.
- [jira] Commented: (TIKA-503) Add a ContentHandler for collecting links from parser output - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/02 15:44:54 UTC, 0 replies.
- [jira] Created: (TIKA-504) Support XMP metadata keys for more of the common EXIF tags - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/03 15:46:33 UTC, 0 replies.
- [jira] Commented: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/03 17:04:32 UTC, 8 replies.
- [jira] Commented: (TIKA-451) Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/03 17:06:33 UTC, 0 replies.
- [jira] Commented: (TIKA-504) Support XMP metadata keys for more of the common EXIF tags - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/03 19:04:34 UTC, 2 replies.
- [jira] Resolved: (TIKA-504) Support XMP metadata keys for more of the common EXIF tags - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/03 19:04:34 UTC, 0 replies.
- [jira] Issue Comment Edited: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API - posted by "Staffan Olsson (JIRA)" <ji...@apache.org> on 2010/09/06 06:43:32 UTC, 0 replies.
- [jira] Created: (TIKA-505) set sortByPosition option by default - posted by "Sandor Dj (JIRA)" <ji...@apache.org> on 2010/09/06 09:03:32 UTC, 0 replies.
- [jira] Assigned: (TIKA-461) RFC822 messages not parsed - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/09/06 11:25:33 UTC, 0 replies.
- [jira] Commented: (TIKA-461) RFC822 messages not parsed - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/09/06 11:25:34 UTC, 3 replies.
- [jira] Updated: (TIKA-461) RFC822 messages not parsed - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/09/06 14:51:33 UTC, 1 replies.
- [jira] Commented: (TIKA-486) ContainerAwareDetector doesn't support non-MSOffice files which use the same magic - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/06 17:39:33 UTC, 0 replies.
- [jira] Resolved: (TIKA-486) ContainerAwareDetector doesn't support non-MSOffice files which use the same magic - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/06 19:48:33 UTC, 0 replies.
- [jira] Resolved: (TIKA-485) ContainerAwareDetector doesn't support truncated POI files - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/06 19:52:34 UTC, 0 replies.
- [jira] Resolved: (TIKA-451) Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/06 19:54:32 UTC, 0 replies.
- [jira] Commented: (TIKA-484) xlsx files created with open office are detected as application/zip - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/06 19:58:33 UTC, 1 replies.
- Jpeg parsing issues - posted by Ken Krugler <kk...@transpac.com> on 2010/09/07 05:03:03 UTC, 3 replies.
- [jira] Created: (TIKA-506) Improve doc and docx parsing to include more things - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/07 17:56:32 UTC, 0 replies.
- [jira] Created: (TIKA-507) Parser for font files - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/08 00:06:33 UTC, 0 replies.
- [jira] Created: (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/08 03:15:35 UTC, 0 replies.
- [jira] Created: (TIKA-509) Container contents extraction - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/08 15:50:33 UTC, 0 replies.
- [jira] Commented: (TIKA-509) Container contents extraction - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/08 19:11:32 UTC, 8 replies.
- [jira] Updated: (TIKA-509) Container contents extraction - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/08 20:11:32 UTC, 0 replies.
- Build failed in Hudson: Tika-trunk #365 - posted by Apache Hudson Server <hu...@hudson.apache.org> on 2010/09/09 11:03:07 UTC, 0 replies.
- Build failed in Hudson: Tika-trunk » Apache Tika parent #365 - posted by Apache Hudson Server <hu...@hudson.apache.org> on 2010/09/09 11:03:07 UTC, 0 replies.
- [jira] Created: (TIKA-510) Use POI API for text extraction from XSLF shape - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2010/09/09 13:19:32 UTC, 0 replies.
- [jira] Updated: (TIKA-510) Use POI API for text extraction from XSLF shape - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2010/09/09 13:21:32 UTC, 0 replies.
- [jira] Created: (TIKA-511) NPE when POI is configured to prefer event extractors - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2010/09/09 13:27:32 UTC, 0 replies.
- [jira] Updated: (TIKA-511) NPE when POI is configured to prefer event extractors - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2010/09/09 13:27:33 UTC, 0 replies.
- Build failed in Hudson: Tika-trunk #366 - posted by Apache Hudson Server <hu...@hudson.apache.org> on 2010/09/09 18:13:58 UTC, 0 replies.
- Hudson build is back to normal : Tika-trunk » Apache Tika parent #367 - posted by Apache Hudson Server <hu...@hudson.apache.org> on 2010/09/09 18:24:23 UTC, 0 replies.
- Hudson build is back to normal : Tika-trunk #367 - posted by Apache Hudson Server <hu...@hudson.apache.org> on 2010/09/09 18:24:24 UTC, 0 replies.
- Error thrown with TikaConfig() constructor - posted by Ken Krugler <kk...@transpac.com> on 2010/09/10 05:22:50 UTC, 13 replies.
- buildbot failure in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2010/09/10 19:17:33 UTC, 0 replies.
- buildbot success in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2010/09/10 19:26:28 UTC, 0 replies.
- Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java - posted by Jukka Zitting <ju...@gmail.com> on 2010/09/10 20:19:43 UTC, 0 replies.
- Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParse r.java - posted by Nick Burch <ni...@alfresco.com> on 2010/09/10 21:29:20 UTC, 0 replies.
- [jira] Created: (TIKA-512) Print the supported Metadata models and their associated met keys in tika-app - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/09/12 03:07:39 UTC, 0 replies.
- [jira] Resolved: (TIKA-512) Print the supported Metadata models and their associated met keys in tika-app - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/09/12 04:40:33 UTC, 0 replies.
- [jira] Updated: (TIKA-512) Print the supported Metadata models and their associated met keys in tika-app - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/09/12 04:44:32 UTC, 0 replies.
- [jira] Commented: (TIKA-419) Allow parser lookup from a custom class loader - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/12 21:17:33 UTC, 0 replies.
- [jira] Commented: (TIKA-373) Upgrade to POI 3.7 - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/13 15:08:32 UTC, 1 replies.
- [jira] Created: (TIKA-513) Support of Deja Vu (DjVu) format - posted by "Oleg Tikhonov (JIRA)" <ji...@apache.org> on 2010/09/14 13:46:33 UTC, 0 replies.
- [jira] Commented: (TIKA-408) Word 6.0/7.0 documents support in office parser - posted by "Adam Wilmer (JIRA)" <ji...@apache.org> on 2010/09/14 16:37:33 UTC, 0 replies.
- [jira] Created: (TIKA-514) Provide constructor for AutoDetectParser that has explicit list of supported parsers - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/14 18:42:36 UTC, 0 replies.
- [jira] Commented: (TIKA-514) Provide constructor for AutoDetectParser that has explicit list of supported parsers - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/14 18:44:34 UTC, 2 replies.
- [jira] Updated: (TIKA-514) Provide constructor for AutoDetectParser that has explicit list of supported parsers - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/14 18:56:36 UTC, 0 replies.
- [jira] Resolved: (TIKA-514) Provide constructor for AutoDetectParser that has explicit list of supported parsers - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/14 19:00:34 UTC, 0 replies.
- [jira] Updated: (TIKA-506) Improve doc and docx parsing to include more things - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/14 21:12:32 UTC, 4 replies.
- [jira] Resolved: (TIKA-408) Word 6.0/7.0 documents support in office parser - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/14 22:18:34 UTC, 0 replies.
- [jira] Resolved: (TIKA-484) xlsx files created with open office are detected as application/zip - posted by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/09/15 02:43:34 UTC, 0 replies.
- [jira] Created: (TIKA-515) MimeType.getDescription() often returns nothing when "tika-mimetypes.xml" has a useful description already available. - posted by "Miroslav Pokorny (JIRA)" <ji...@apache.org> on 2010/09/15 05:15:46 UTC, 0 replies.
- [jira] Updated: (TIKA-515) MimeType.getDescription() often returns nothing when "tika-mimetypes.xml" has a useful description already available. - posted by "Miroslav Pokorny (JIRA)" <ji...@apache.org> on 2010/09/15 05:19:33 UTC, 0 replies.
- [jira] Assigned: (TIKA-515) MimeType.getDescription() often returns nothing when "tika-mimetypes.xml" has a useful description already available. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/09/15 06:07:38 UTC, 0 replies.
- [jira] Created: (TIKA-516) Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel" - posted by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/09/16 00:44:32 UTC, 0 replies.
- [jira] Updated: (TIKA-516) Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel" - posted by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/09/16 00:44:33 UTC, 0 replies.
- [jira] Commented: (TIKA-516) Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel" - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/16 12:21:33 UTC, 0 replies.
- [jira] Created: (TIKA-517) java.io.UnsupportedEncodingException with Russian, Chinese, ... document - posted by "Dominique Béjean (JIRA)" <ji...@apache.org> on 2010/09/18 17:39:33 UTC, 0 replies.
- [jira] Commented: (TIKA-407) Push NetCDF4 lib dependency to Maven Central and Update Tika POM - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/09/18 22:38:39 UTC, 2 replies.
- [jira] Updated: (TIKA-490) Support for adding language profiles dynamically - posted by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2010/09/20 16:00:40 UTC, 1 replies.
- [jira] Assigned: (TIKA-517) java.io.UnsupportedEncodingException with Russian, Chinese, ... document - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/20 19:44:45 UTC, 0 replies.
- [jira] Commented: (TIKA-517) java.io.UnsupportedEncodingException with Russian, Chinese, ... document - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/20 19:54:33 UTC, 0 replies.
- [jira] Commented: (TIKA-385) Incorrect handling of hyperlinks in .docx - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/20 21:32:34 UTC, 1 replies.
- [jira] Created: (TIKA-518) Attribute values are not indexed - posted by "Ovidiu Cilnician (JIRA)" <ji...@apache.org> on 2010/09/23 16:16:34 UTC, 0 replies.
- [jira] Commented: (TIKA-506) Improve doc and docx parsing to include more things - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/24 17:47:32 UTC, 7 replies.
- [jira] Assigned: (TIKA-518) Attribute values are not indexed - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/24 19:33:33 UTC, 0 replies.
- Great 2-part blog article on Apache Tika - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2010/09/25 05:47:50 UTC, 1 replies.
- [jira] Resolved: (TIKA-506) Improve doc and docx parsing to include more things - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/25 15:36:32 UTC, 0 replies.
- [jira] Resolved: (TIKA-385) Incorrect handling of hyperlinks in .docx - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/25 15:40:35 UTC, 0 replies.
- improving odf / general questions on forms and deleted text - posted by Hanssens Bart <Ba...@fedict.be> on 2010/09/25 16:56:34 UTC, 1 replies.
- [jira] Created: (TIKA-519) Display embedded images in the GUI Formatted Text pane where they occur in the document - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/28 17:35:33 UTC, 0 replies.
- [jira] Updated: (TIKA-519) Display embedded images in the GUI Formatted Text pane where they occur in the document - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/28 17:35:34 UTC, 1 replies.
- [jira] Commented: (TIKA-519) Display embedded images in the GUI Formatted Text pane where they occur in the document - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/29 00:37:45 UTC, 2 replies.
- [jira] Created: (TIKA-520) DWG parser throws ArrayIndexOutOfBoundsException when address to the header is 0x00 - posted by "Sjoerd Smeets (JIRA)" <ji...@apache.org> on 2010/09/30 02:43:33 UTC, 0 replies.
- [jira] Updated: (TIKA-520) DWG parser throws ArrayIndexOutOfBoundsException when address to the header is 0x00 - posted by "Sjoerd Smeets (JIRA)" <ji...@apache.org> on 2010/09/30 02:45:33 UTC, 0 replies.
- [jira] Resolved: (TIKA-519) Display embedded images in the GUI Formatted Text pane where they occur in the document - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/30 12:53:33 UTC, 0 replies.
- [jira] Resolved: (TIKA-520) DWG parser throws ArrayIndexOutOfBoundsException when address to the header is 0x00 - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/30 13:19:34 UTC, 0 replies.
- [jira] Updated: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API - posted by "Staffan Olsson (JIRA)" <ji...@apache.org> on 2010/09/30 15:54:32 UTC, 0 replies.
- [jira] Created: (TIKA-521) OutOfMemoryError Parsing XSLX File - posted by "Stephen Duncan Jr (JIRA)" <ji...@apache.org> on 2010/09/30 17:27:33 UTC, 0 replies.
- [jira] Updated: (TIKA-521) OutOfMemoryError Parsing XSLX File - posted by "Stephen Duncan Jr (JIRA)" <ji...@apache.org> on 2010/09/30 17:27:34 UTC, 0 replies.
- [jira] Commented: (TIKA-521) OutOfMemoryError Parsing XSLX File - posted by "Stephen Duncan Jr (JIRA)" <ji...@apache.org> on 2010/09/30 17:31:37 UTC, 2 replies.
- [jira] Commented: (TIKA-383) new option for TIKA CLI to get only the languages of a document - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/30 18:07:33 UTC, 0 replies.
- [jira] Resolved: (TIKA-383) new option for TIKA CLI to get only the languages of a document - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/30 21:14:33 UTC, 0 replies.