You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (TIKA-1117) IWorkPackageParser should not close the InputStream - posted by "Andrew Jackson (JIRA)" <ji...@apache.org> on 2013/05/01 15:26:16 UTC, 0 replies.
- [jira] [Assigned] (TIKA-1115) ExifHandler throws NullPointerException - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2013/05/01 18:50:15 UTC, 0 replies.
- [jira] [Commented] (TIKA-1115) ExifHandler throws NullPointerException - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2013/05/01 18:58:15 UTC, 1 replies.
- [jira] [Resolved] (TIKA-1115) ExifHandler throws NullPointerException - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2013/05/01 19:46:17 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #994 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/05/01 23:01:28 UTC, 3 replies.
- Jenkins build is back to normal : Tika-trunk ยป Apache Tika parsers #995 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/05/02 03:47:50 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #995 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2013/05/02 03:47:52 UTC, 0 replies.
- [jira] [Commented] (TIKA-788) DWG parser infinite loop on possibly corrupt file - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/09 13:55:17 UTC, 0 replies.
- [jira] [Commented] (TIKA-992) OpenGraph meta tags to allow multiple values - posted by "kiran (JIRA)" <ji...@apache.org> on 2013/05/12 21:49:15 UTC, 4 replies.
- [jira] [Updated] (TIKA-967) Tika comes with transitive Maven dependency to a test artifact of vorbis-java-core - posted by "Andreas Hubold (JIRA)" <ji...@apache.org> on 2013/05/13 11:39:16 UTC, 0 replies.
- Wanting to contribute to Tika (was Re: [jira] [Commented] (TIKA-992) OpenGraph meta tags to allow multiple values) - posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov> on 2013/05/13 22:33:31 UTC, 0 replies.
- [jira] [Resolved] (TIKA-992) OpenGraph meta tags to allow multiple values - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2013/05/14 11:03:16 UTC, 0 replies.
- [jira] [Resolved] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2013/05/14 17:41:16 UTC, 0 replies.
- [jira] [Created] (TIKA-1118) OOXML parser throws when relationship points to 0 byte embedded part - posted by "Lee Graber (JIRA)" <ji...@apache.org> on 2013/05/14 22:55:16 UTC, 0 replies.
- [jira] [Commented] (TIKA-1118) OOXML parser throws when relationship points to 0 byte embedded part - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/14 23:07:16 UTC, 2 replies.
- [jira] [Created] (TIKA-1119) HSLFExtractor throws if PictureData is not readable - posted by "Lee Graber (JIRA)" <ji...@apache.org> on 2013/05/15 02:12:13 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1118) OOXML parser throws when relationship points to 0 byte embedded part - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/15 02:34:13 UTC, 0 replies.
- [jira] [Commented] (TIKA-1119) HSLFExtractor throws if PictureData is not readable - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/15 02:42:13 UTC, 1 replies.
- [jira] [Created] (TIKA-1120) Enable direct use of org.apache.tika.mime.MediaType.detect(...) - posted by "Oliver Kopp (JIRA)" <ji...@apache.org> on 2013/05/18 18:13:15 UTC, 0 replies.
- [jira] [Created] (TIKA-1121) Socket server text parsing error on large text files - posted by "Dave Meikle (JIRA)" <ji...@apache.org> on 2013/05/20 00:33:16 UTC, 0 replies.
- [jira] [Created] (TIKA-1122) Tika fails to parse chm files - posted by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/05/21 04:00:15 UTC, 0 replies.
- [jira] [Commented] (TIKA-1122) Tika fails to parse chm files - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/21 18:33:16 UTC, 0 replies.
- [jira] [Created] (TIKA-1123) Add more mimetypes for famous programming languages - posted by "Bernhard Berger (JIRA)" <ji...@apache.org> on 2013/05/22 09:51:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-1123) Add more mimetypes for famous programming languages - posted by "Bernhard Berger (JIRA)" <ji...@apache.org> on 2013/05/22 09:53:20 UTC, 0 replies.
- [jira] [Created] (TIKA-1124) Nested documents not extracted if a PDF file is in the chain - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2013/05/23 20:43:25 UTC, 0 replies.
- [jira] [Updated] (TIKA-1124) Nested documents not extracted if a PDF file is in the chain - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2013/05/23 20:45:31 UTC, 0 replies.
- [jira] [Created] (TIKA-1125) Why does tika-app-0.9.jar contain slf4j? - posted by "Stenger (JIRA)" <ji...@apache.org> on 2013/05/24 12:53:19 UTC, 0 replies.
- [jira] [Created] (TIKA-1126) text/html procuder for tika-server - posted by "Ali Mosavian (JIRA)" <ji...@apache.org> on 2013/05/24 15:24:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-1126) text/html procuder for tika-server - posted by "Ali Mosavian (JIRA)" <ji...@apache.org> on 2013/05/24 15:26:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-1123) Add more mimetypes for famous programming languages - posted by "Dave Meikle (JIRA)" <ji...@apache.org> on 2013/05/25 10:52:21 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1123) Add more mimetypes for famous programming languages - posted by "Dave Meikle (JIRA)" <ji...@apache.org> on 2013/05/25 10:52:21 UTC, 0 replies.
- [jira] [Commented] (TIKA-1125) Why does tika-app-0.9.jar contain slf4j? - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/26 00:35:20 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1125) Why does tika-app-0.9.jar contain slf4j? - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/05/26 00:35:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-1126) text/html procuder for tika-server - posted by "Dave Meikle (JIRA)" <ji...@apache.org> on 2013/05/26 13:38:20 UTC, 1 replies.
- [jira] [Resolved] (TIKA-1126) text/html procuder for tika-server - posted by "Dave Meikle (JIRA)" <ji...@apache.org> on 2013/05/26 13:38:20 UTC, 0 replies.
- tika pull request: Similar to TIKA-1126, this commit adds the ability to pr... - posted by stdexcept <gi...@git.apache.org> on 2013/05/27 16:26:41 UTC, 0 replies.
- [jira] [Updated] (TIKA-1086) Tika-bundle 1.3 does not import org.w3c.dom package - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:50:21 UTC, 1 replies.
- [jira] [Updated] (TIKA-1079) Word document hits AIOOBE in SummaryExtractor.parseSummaries - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:20 UTC, 1 replies.
- [jira] [Updated] (TIKA-1109) Metadata not extracted before the context in OOXML (pptx) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:20 UTC, 1 replies.
- [jira] [Updated] (TIKA-1067) Tika extracts non-existent asterisks (*) from .ppt files - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-1078) TikaCLI: invalid characters in embedded document name causes FNFE when trying to save - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:21 UTC, 1 replies.
- [jira] [Updated] (TIKA-1072) AIOOBE when handling embedded document in .doc file - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:21 UTC, 1 replies.
- [jira] [Updated] (TIKA-1046) Get "java.util.zip.ZipException: unknown compression method" when indexing ppf97-file containing wmf-image - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1054) Problem with parsing excel date formats - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:52:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1045) Unsupported AutoCAD drawing version: AC1014 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-1037) No text extracted from Excel file (rus chars) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-1108) Represent individual slides in pptx - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:21 UTC, 1 replies.
- [jira] [Updated] (TIKA-1017) DefaultHtmlMapper misses some safe elements - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-1111) Class loading issues when running in OSGi environment - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-1107) Can't parse velocity file - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1102) Can we add
to the list of heuristics for bad html fragments? - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1110) Incorrectly declared SUPPORTED_TYPES in ChmParser. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:22 UTC, 1 replies.
- [jira] [Updated] (TIKA-988) We don't extract a placeholder for a Word document embedded in an Excel document - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:54:22 UTC, 1 replies.
- [jira] [Updated] (TIKA-993) Language Detection Fault - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:56:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-1057) document content property "Status" is not extracted for *.doc files - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:56:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1004) Support "ansi" as an alias for windows-1252 charset - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:56:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-1120) Enable direct use of org.apache.tika.mime.MediaType.detect(...) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:56:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-978) OSGi bundle build fails if space exists in build path - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:56:23 UTC, 0 replies.
- [jira] [Updated] (TIKA-1106) CLAVIN Integration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-995) XHTMLContentHandler doesn't pass attributes of body element - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-817) (PPT/PPTX) Missing date/time in text content. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-1076) Upgrade to Apache POI 3.9 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-1059) Better Handling of InterruptedException in ExternalParser and ExternalEmbedder - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true) - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-605) Tika GDAL parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-774) ExifTool Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:23 UTC, 0 replies.
- [jira] [Updated] (TIKA-980) MicrodataContentHandler for Apache Tika - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:23 UTC, 0 replies.
- [jira] [Updated] (TIKA-987) Embedded drawing (SHAPE MERGEFORMAT) sometimes not extracted - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:23 UTC, 0 replies.
- [jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:23 UTC, 0 replies.
- [jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:24 UTC, 0 replies.
- [jira] [Updated] (TIKA-1122) Tika fails to parse chm files - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:25 UTC, 0 replies.
- [jira] [Updated] (TIKA-820) Locator is unset for HTML parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:26 UTC, 0 replies.
- [jira] [Updated] (TIKA-985) Support for HTML5 elements - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:26 UTC, 0 replies.
- [jira] [Updated] (TIKA-891) Use POST in addition to PUT on method calls in tika-server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-776) ExifTool Embedder - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:27 UTC, 0 replies.
- [jira] [Created] (TIKA-1127) text/xml for tika-server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 18:58:28 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1127) text/xml for tika-server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/27 19:05:20 UTC, 0 replies.
- [DISCUSS] Apache Tika 1.4 RC? - posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov> on 2013/05/27 19:06:27 UTC, 2 replies.
- [jira] [Issue Comment Deleted] (TIKA-1127) text/xml for tika-server - posted by "Ali Mosavian (JIRA)" <ji...@apache.org> on 2013/05/27 22:00:22 UTC, 0 replies.
- [jira] [Commented] (TIKA-1127) text/xml for tika-server - posted by "Ali Mosavian (JIRA)" <ji...@apache.org> on 2013/05/27 22:00:22 UTC, 2 replies.
- [jira] [Commented] (TIKA-1102) Can we add
to the list of heuristics for bad html fragments? - posted by "David Morana (JIRA)" <ji...@apache.org> on 2013/05/28 15:39:20 UTC, 1 replies.
- [jira] [Resolved] (TIKA-1102) Can we add
to the list of heuristics for bad html fragments? - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2013/05/28 16:28:20 UTC, 0 replies.
- Offsets in - posted by Ken Krugler <kk...@transpac.com> on 2013/05/28 16:30:12 UTC, 0 replies.
- MP4Parser Triggers no ContentHandler.startDocument() and ContentHandler.endDocument() in one case - posted by Christian Reuschling <re...@dfki.uni-kl.de> on 2013/05/28 16:49:06 UTC, 3 replies.
- [jira] [Created] (TIKA-1128) Replace line tabulation with line break - posted by "Privezentsev Konstantin (JIRA)" <ji...@apache.org> on 2013/05/29 13:27:19 UTC, 0 replies.
- [jira] [Updated] (TIKA-1128) Replace line tabulation with line break - posted by "Privezentsev Konstantin (JIRA)" <ji...@apache.org> on 2013/05/29 13:29:20 UTC, 3 replies.
- Parser does not produce proper sentence breaks? - posted by Shai Erera <se...@gmail.com> on 2013/05/29 22:43:21 UTC, 0 replies.
- [jira] [Assigned] (TIKA-1128) Replace line tabulation with line break - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/05/30 13:22:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-1128) Replace line tabulation with line break - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/05/30 13:22:21 UTC, 1 replies.
- [ANNOUNCE] Open Source Summit 3.0: Communities Meeting: June 25,26, Washington DC USA - posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov> on 2013/05/30 13:34:08 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1128) Replace line tabulation with line break - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/05/30 17:19:20 UTC, 0 replies.
- [jira] [Created] (TIKA-1129) Test HTML file has poorly chosen GPL text in it - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/31 17:03:21 UTC, 0 replies.
- [jira] [Assigned] (TIKA-1129) Test HTML file has poorly chosen GPL text in it - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2013/05/31 17:09:20 UTC, 0 replies.