You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (TIKA-735) OpenOffice parser: embedded OLE docs are extracted at the end, as extra ... - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/01 12:13:45 UTC, 0 replies.
- [jira] [Updated] (TIKA-735) OpenOffice parser: embedded OLE docs are extracted at the end, as extra ... - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/01 12:15:45 UTC, 0 replies.
- [jira] [Created] (TIKA-736) OpenOffice parser: master footer text isn't extracted - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/01 12:33:45 UTC, 0 replies.
- [jira] [Updated] (TIKA-736) OpenOffice parser: master footer text isn't extracted - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/01 12:35:45 UTC, 1 replies.
- buildbot failure in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2011/10/01 12:57:28 UTC, 1 replies.
- [jira] [Resolved] (TIKA-632) Rtf parsing ignores links - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/01 12:57:45 UTC, 0 replies.
- buildbot success in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2011/10/01 13:05:20 UTC, 1 replies.
- Jenkins build became unstable: Tika-trunk » Apache Tika parsers #657 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/01 13:07:58 UTC, 0 replies.
- Jenkins build became unstable: Tika-trunk #657 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/01 13:07:59 UTC, 0 replies.
- Jenkins build is back to stable : Tika-trunk » Apache Tika parsers #658 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/01 14:06:37 UTC, 0 replies.
- Jenkins build is back to stable : Tika-trunk #658 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/01 14:06:38 UTC, 0 replies.
- [jira] [Commented] (TIKA-735) OpenOffice parser: embedded OLE docs are extracted at the end, as extra ... - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/01 14:45:34 UTC, 2 replies.
- [jira] [Commented] (TIKA-736) OpenOffice parser: master footer text isn't extracted - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/01 14:47:34 UTC, 7 replies.
- Re: Jenkins build became unstable: Tika-trunk » Apache Tika parsers #657 - posted by Michael McCandless <lu...@mikemccandless.com> on 2011/10/01 16:25:48 UTC, 0 replies.
- [jira] [Created] (TIKA-737) Use (Incubating) ODFToolkit to improve ODF file format processing - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/01 16:55:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-737) Use (Incubating) ODFToolkit to improve ODF file format processing - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/01 17:31:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-720) EBCDIC encoding not detected - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/01 18:26:34 UTC, 0 replies.
- [RESULT] [VOTE] Add Any23 to the Apache Incubator - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/10/01 18:38:02 UTC, 0 replies.
- [HEADS UP] Added Tika ApacheCon NA 2011 news item - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/10/01 20:27:13 UTC, 0 replies.
- [jira] [Commented] (TIKA-711) Word parser doesn't extract optional hyphen correctly - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/01 22:52:33 UTC, 0 replies.
- [jira] [Assigned] (TIKA-711) Word parser doesn't extract optional hyphen correctly - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/02 13:00:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-711) Word parser doesn't extract optional hyphen correctly - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/02 13:02:33 UTC, 0 replies.
- [jira] [Assigned] (TIKA-721) UTF16-LE not detected - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/02 15:17:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-721) UTF16-LE not detected - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/02 17:04:35 UTC, 3 replies.
- [jira] [Updated] (TIKA-721) UTF16-LE not detected - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/02 18:20:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-713) Tika can not parse all of the persian pdf files - posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org> on 2011/10/02 22:18:34 UTC, 5 replies.
- [jira] [Commented] (TIKA-717) Comment/annotation is sometimes not extracted - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 12:53:34 UTC, 1 replies.
- [jira] [Resolved] (TIKA-717) Comment/annotation is sometimes not extracted - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/03 12:56:11 UTC, 0 replies.
- [jira] [Created] (TIKA-738) Tika fails to extract text from PDF annotations - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/03 12:56:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-738) Tika fails to extract text from PDF annotations - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 14:38:33 UTC, 1 replies.
- Newb: IDE + Maven? - posted by "Albert Law (Logik)" <al...@logik.com> on 2011/10/03 16:42:46 UTC, 4 replies.
- [jira] [Commented] (TIKA-722) Arabic PDF doesn't extract correctly - posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 19:07:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-722) Arabic PDF doesn't extract correctly - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/03 19:15:35 UTC, 0 replies.
- [jira] [Commented] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException - posted by "Jeremy Anderson (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 20:01:37 UTC, 3 replies.
- [jira] [Issue Comment Edited] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException - posted by "Jeremy Anderson (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/10/03 20:01:38 UTC, 4 replies.
- [jira] [Resolved] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/03 20:21:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-711) Word parser doesn't extract optional hyphen correctly - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/03 20:27:33 UTC, 0 replies.
- [jira] [Created] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage - posted by "John Bartak (Created) (JIRA)" <ji...@apache.org> on 2011/10/03 21:10:34 UTC, 0 replies.
- [jira] [Issue Comment Edited] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage - posted by "John Bartak (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/10/03 21:12:34 UTC, 1 replies.
- [jira] [Updated] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage - posted by "John Bartak (Updated) (JIRA)" <ji...@apache.org> on 2011/10/03 21:12:34 UTC, 3 replies.
- [jira] [Commented] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 22:47:34 UTC, 5 replies.
- [jira] [Created] (TIKA-740) SAX parser used for HTML - posted by "Erik Hetzner (Created) (JIRA)" <ji...@apache.org> on 2011/10/04 03:12:35 UTC, 0 replies.
- [jira] [Created] (TIKA-741) Make "Zip bomb" (XML nesting) detection level configurable? - posted by "Erik Hetzner (Created) (JIRA)" <ji...@apache.org> on 2011/10/04 03:16:34 UTC, 0 replies.
- [jira] [Created] (TIKA-742) PDF2XHTML fails to insert

nor space around page marker - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/04 12:32:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-742) PDF2XHTML fails to insert

nor space around page marker - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/04 12:32:33 UTC, 1 replies.
- [jira] [Commented] (TIKA-623) Add support for Outlook PST - posted by "Mark Kerzner (Commented) (JIRA)" <ji...@apache.org> on 2011/10/05 04:15:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-622) Switch from POIFSFileSystem to NPOIFSFileSystem, for speed and memory improvements - posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 11:05:34 UTC, 0 replies.
- Download-Link to tika-app-0.10.jar doesn't work - posted by Bernhard Berger <be...@yahoo.de> on 2011/10/05 11:06:02 UTC, 1 replies.
- [jira] [Resolved] (TIKA-742) PDF2XHTML fails to insert

nor space around page marker - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 12:44:33 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #664 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/05 13:13:00 UTC, 2 replies.
- Build failed in Jenkins: Tika-trunk » Apache Tika parsers #664 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/05 13:13:00 UTC, 0 replies.
- [jira] [Created] (TIKA-743) Upgrade to Apache parent POM version 10 - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/05 15:05:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-743) Upgrade to Apache parent POM version 10 - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 15:12:34 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk » Apache Tika parsers #665 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/05 15:12:45 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #665 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/05 15:12:45 UTC, 0 replies.
- [jira] [Resolved] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 15:52:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-740) SAX parser used for HTML - posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org> on 2011/10/05 16:02:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict - posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org> on 2011/10/05 17:01:37 UTC, 0 replies.
- [jira] [Resolved] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 17:14:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/05 17:17:34 UTC, 4 replies.
- [jira] [Resolved] (TIKA-730) WriteOutContentHandler concatenates title tag and body text. - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 17:21:36 UTC, 0 replies.
- [jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict - posted by "Erik Hetzner (Commented) (JIRA)" <ji...@apache.org> on 2011/10/05 17:45:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-605) Tika GDAL parser - posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org> on 2011/10/05 18:41:34 UTC, 1 replies.
- [jira] [Resolved] (TIKA-699) Automatic checks against backwards-incompatible API changes - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 18:59:35 UTC, 0 replies.
- [jira] [Commented] (TIKA-605) Tika GDAL parser - posted by "Chris A. Mattmann (Commented) (JIRA)" <ji...@apache.org> on 2011/10/05 19:03:34 UTC, 0 replies.
- [jira] [Created] (TIKA-744) Drop support for Java 1.4 - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/05 19:07:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-744) Drop support for Java 1.4 - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 19:09:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/05 19:11:39 UTC, 0 replies.
- [jira] [Resolved] (TIKA-642) Few of RTF files not extracting properly - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/05 19:13:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-468) Missing Slide-Count metadata for PPT files - posted by "Łukasz Wiktor (Updated JIRA)" <ji...@apache.org> on 2011/10/06 15:58:29 UTC, 0 replies.
- [jira] [Created] (TIKA-745) MP3 parser should handle genres not in ID3v1 - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 17:33:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-745) MP3 parser should handle genres not in ID3v1 - posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/06 17:33:30 UTC, 0 replies.
- HSLFExtractor Bug - posted by Joe Gallo <jo...@gmail.com> on 2011/10/06 20:45:57 UTC, 2 replies.
- [jira] [Created] (TIKA-746) Support custom mime types - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 22:29:30 UTC, 0 replies.
- [jira] [Commented] (TIKA-746) Support custom mime types - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/06 22:33:29 UTC, 0 replies.
- Re: Appending Mime Types - posted by Nick Burch <ni...@alfresco.com> on 2011/10/06 22:48:44 UTC, 0 replies.
- [jira] [Created] (TIKA-747) Ogg Vorbis and FLAC Parsers - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 23:17:30 UTC, 0 replies.
- [jira] [Updated] (TIKA-423) Parse docx and output to text file missing words - posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org> on 2011/10/07 10:18:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-410) textbox content extaction for word documents - posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org> on 2011/10/07 10:20:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-396) Parser Attachements from Outlook Messages - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 10:22:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-381) HtmlParser should strip linefeeds out of links - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/07 10:32:30 UTC, 0 replies.
- [jira] [Commented] (TIKA-272) Expose characters offsets information while parsing text-based inputs. - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/07 10:40:30 UTC, 0 replies.
- [jira] [Resolved] (TIKA-123) Structured MS Office parsing - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 10:42:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-448) Tika FLVParser hangs - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 10:54:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-487) ContainerAwareDetector doesn't support truncated Open XML files - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 10:58:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-433) Tika + Hadoop - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:00:30 UTC, 0 replies.
- [jira] [Resolved] (TIKA-429) Error parsing DTD - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:00:30 UTC, 0 replies.
- [jira] [Commented] (TIKA-513) Support of Deja Vu (DjVu) format - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/07 11:02:29 UTC, 1 replies.
- [jira] [Resolved] (TIKA-554) ParseUtils.getStringContent needs an option to set the write limit that can be passed into the BodyContentHandler - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:04:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date. - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:04:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-581) Parser fails on files that parsed with v0.7 - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:08:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-509) Container contents extraction - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:10:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-576) OutofMemory issues while building Tika - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:10:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-685) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1a8402c - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 11:12:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-682) Creative Suite formats are not supported - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/07 18:35:29 UTC, 1 replies.
- [jira] [Created] (TIKA-748) RTF parser fails to extract the body - posted by "Andrzej Bialecki (Created) (JIRA)" <ji...@apache.org> on 2011/10/07 19:48:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-748) RTF parser fails to extract the body - posted by "Andrzej Bialecki (Updated) (JIRA)" <ji...@apache.org> on 2011/10/07 19:50:30 UTC, 1 replies.
- [jira] [Resolved] (TIKA-541) Use commons-cli in lieu of writing our own option parser - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 21:32:29 UTC, 0 replies.
- [jira] [Created] (TIKA-749) Avoid using POI's LittleEndian in non-POI parsers - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/07 22:52:29 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk » Apache Tika parsers #674 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/07 23:03:07 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #674 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/07 23:03:07 UTC, 0 replies.
- [jira] [Commented] (TIKA-749) Avoid using POI's LittleEndian in non-POI parsers - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/07 23:06:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-749) Avoid using POI's LittleEndian in non-POI parsers - posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/07 23:06:29 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk » Apache Tika parsers #675 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/08 00:14:54 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #675 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/08 00:15:00 UTC, 0 replies.
- [jira] [Created] (TIKA-750) JavaDoc of Tika XPathParser should mention descendant:node() - posted by "David Smiley (Created) (JIRA)" <ji...@apache.org> on 2011/10/09 06:52:29 UTC, 0 replies.
- [jira] [Assigned] (TIKA-748) RTF parser fails to extract the body - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/10 01:11:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-748) RTF parser fails to extract the body - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/10 01:27:29 UTC, 2 replies.
- [jira] [Resolved] (TIKA-748) RTF parser fails to extract the body - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/10 20:16:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-750) JavaDoc of Tika XPathParser should mention descendant:node() - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/10 23:40:30 UTC, 0 replies.
- [jira] [Resolved] (TIKA-575) Links on the Web-Site for 0.8 to API not correct - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/11 00:10:30 UTC, 0 replies.
- [jira] [Resolved] (TIKA-670) MD5 sum is wrong on http://tika.apache.org/download.html - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/11 00:16:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-681) eight new n-gram language profiles - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/11 00:28:29 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk » Apache Tika core #678 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/11 01:51:14 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #678 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/11 01:52:11 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk » Apache Tika core #679 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/11 11:08:48 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #679 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/11 11:08:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB - posted by "Anirban Mitra (Updated) (JIRA)" <ji...@apache.org> on 2011/10/11 20:09:12 UTC, 0 replies.
- [jira] [Commented] (TIKA-93) OCR support - posted by "Enrico Stahn (Commented) (JIRA)" <ji...@apache.org> on 2011/10/12 13:57:12 UTC, 0 replies.
- [jira] [Created] (TIKA-751) Small improvements to how embedded docs are parsed in AbstractPOIFSExtractor.handleEmbeddedOfficeDoc - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/12 14:55:12 UTC, 0 replies.
- [jira] [Updated] (TIKA-751) Small improvements to how embedded docs are parsed in AbstractPOIFSExtractor.handleEmbeddedOfficeDoc - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/12 14:57:11 UTC, 0 replies.
- [jira] [Resolved] (TIKA-751) Small improvements to how embedded docs are parsed in AbstractPOIFSExtractor.handleEmbeddedOfficeDoc - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/12 21:21:11 UTC, 0 replies.
- [jira] [Created] (TIKA-752) Typo in timezone used in Metadata.iso8601Format - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/13 17:28:11 UTC, 0 replies.
- [jira] [Resolved] (TIKA-752) Typo in timezone used in Metadata.iso8601Format - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/13 21:15:12 UTC, 0 replies.
- [jira] [Resolved] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/13 21:35:15 UTC, 0 replies.
- [jira] [Resolved] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/13 21:37:13 UTC, 0 replies.
- [jira] [Commented] (TIKA-657) Email parser gets into trouble on malformed html in enron corpus - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/13 23:19:12 UTC, 0 replies.
- Jenkins build became unstable: Tika-trunk » Apache Tika parsers #683 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/14 00:10:10 UTC, 0 replies.
- Jenkins build became unstable: Tika-trunk #683 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/14 00:10:11 UTC, 0 replies.
- Re: Jenkins build became unstable: Tika-trunk » Apache Tika parsers #683 - posted by Jukka Zitting <ju...@gmail.com> on 2011/10/14 10:52:42 UTC, 0 replies.
- Jenkins build is back to stable : Tika-trunk » Apache Tika parsers #684 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/14 11:13:00 UTC, 0 replies.
- Jenkins build is back to stable : Tika-trunk #684 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/14 11:13:01 UTC, 0 replies.
- [jira] [Resolved] (TIKA-657) Email parser gets into trouble on malformed html in enron corpus - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/14 15:38:12 UTC, 0 replies.
- [jira] [Created] (TIKA-753) Improve performance when parsing embedded Office docs - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/14 20:28:12 UTC, 0 replies.
- [jira] [Commented] (TIKA-753) Improve performance when parsing embedded Office docs - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/14 20:30:11 UTC, 3 replies.
- [jira] [Updated] (TIKA-753) Improve performance when parsing embedded Office docs - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/14 20:32:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-712) Master slide text isn't extracted - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/15 17:59:11 UTC, 0 replies.
- TikaConfig.getDetector? - posted by Nick Burch <ni...@alfresco.com> on 2011/10/17 14:32:19 UTC, 6 replies.
- [jira] [Created] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler - posted by "Pablo Queixalos (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 11:58:10 UTC, 0 replies.
- [jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler - posted by "Pablo Queixalos (Updated) (JIRA)" <ji...@apache.org> on 2011/10/18 12:04:10 UTC, 1 replies.
- [jira] [Commented] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/18 12:14:10 UTC, 0 replies.
- [jira] [Created] (TIKA-755) Add getDetector() method to TikaConfig - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 12:56:10 UTC, 0 replies.
- [jira] [Commented] (TIKA-755) Add getDetector() method to TikaConfig - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/18 15:54:10 UTC, 1 replies.
- [jira] [Assigned] (TIKA-738) Tika fails to extract text from PDF annotations - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/18 19:43:10 UTC, 0 replies.
- [jira] [Updated] (TIKA-738) Tika fails to extract text from PDF annotations - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/18 19:59:10 UTC, 0 replies.
- tika-parsers maven dependencies (commons-logging) - posted by gross <gr...@gmail.com> on 2011/10/18 20:39:43 UTC, 0 replies.
- [jira] [Created] (TIKA-756) XMP output from Tika CLI - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 20:41:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-756) XMP output from Tika CLI - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/18 21:13:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-718) PDF bookmark text isn't extracted - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/18 21:13:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-755) Add getDetector() method to TikaConfig - posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/18 22:59:10 UTC, 0 replies.
- Updating CHANGES.txt? - posted by Nick Burch <ni...@alfresco.com> on 2011/10/19 13:06:00 UTC, 11 replies.
- [jira] [Assigned] (TIKA-724) PDF text sometimes has extra space between letters - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/19 13:10:10 UTC, 0 replies.
- [jira] [Commented] (TIKA-724) PDF text sometimes has extra space between letters - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/19 13:12:10 UTC, 0 replies.
- [jira] [Updated] (TIKA-724) PDF text sometimes has extra space between letters - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/19 13:12:10 UTC, 0 replies.
- [jira] [Created] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5) - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 14:37:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-753) Improve performance when parsing embedded Office docs - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/20 14:37:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-705) Valid OOXML PPT file hits InvalidFormatException thrown in POI - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/20 14:45:10 UTC, 0 replies.
- [jira] [Updated] (TIKA-705) Valid OOXML PPT file hits InvalidFormatException thrown in POI - posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org> on 2011/10/20 14:45:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-738) Tika fails to extract text from PDF annotations - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/20 14:51:10 UTC, 0 replies.
- [jira] [Created] (TIKA-758) Address TODOs when we upgrade to next PDFBox release - posted by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 14:52:15 UTC, 0 replies.
- [jira] [Resolved] (TIKA-724) PDF text sometimes has extra space between letters - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/20 14:56:10 UTC, 0 replies.
- [jira] [Created] (TIKA-759) Better handling of content type metadata - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/21 13:04:32 UTC, 0 replies.
- [jira] [Created] (TIKA-760) NPE XHTMLContentHandler in characters Method - posted by "Torsten Krah (Created) (JIRA)" <ji...@apache.org> on 2011/10/21 16:06:32 UTC, 0 replies.
- [jira] [Commented] (TIKA-759) Better handling of content type metadata - posted by "Chris A. Mattmann (Commented) (JIRA)" <ji...@apache.org> on 2011/10/21 18:38:32 UTC, 0 replies.
- DZone article on Tika - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/10/21 20:38:44 UTC, 0 replies.
- [jira] [Commented] (TIKA-245) Support of CHM Format - posted by "Tran Nam Quang (Commented) (JIRA)" <ji...@apache.org> on 2011/10/22 08:08:32 UTC, 1 replies.
- [jira] [Commented] (TIKA-760) NPE XHTMLContentHandler in characters Method - posted by "Pablo Queixalos (Commented) (JIRA)" <ji...@apache.org> on 2011/10/24 09:58:32 UTC, 1 replies.
- Tika is waiting for ODFToolkit to improve ODF file format processing - posted by Devin Han <de...@apache.org> on 2011/10/24 10:54:57 UTC, 4 replies.
- failure in parsing pdf files with tika 0.9 with nutch 1.3 - posted by digho <di...@oracle.com> on 2011/10/24 13:48:45 UTC, 1 replies.
- [jira] [Created] (TIKA-761) Provide version number by CLI argument -V - posted by "Ingo Renner (Created) (JIRA)" <ji...@apache.org> on 2011/10/24 15:17:32 UTC, 0 replies.
- Google's Compact Language Detector - posted by Jérôme Charron <je...@gmail.com> on 2011/10/24 15:18:57 UTC, 11 replies.
- [jira] [Commented] (TIKA-761) Provide version number by CLI argument -V - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/24 15:27:32 UTC, 15 replies.
- [jira] [Updated] (TIKA-761) Provide version number by CLI argument -V - posted by "Ingo Renner (Updated) (JIRA)" <ji...@apache.org> on 2011/10/24 15:27:33 UTC, 3 replies.
- [jira] [Issue Comment Edited] (TIKA-761) Provide version number by CLI argument -V - posted by "Ingo Renner (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/10/24 17:25:32 UTC, 2 replies.
- [jira] [Updated] (TIKA-746) Support custom mime types - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:32 UTC, 0 replies.
- [jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:32 UTC, 0 replies.
- [jira] [Updated] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5) - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-747) Ogg Vorbis and FLAC Parsers - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-593) Tika network server - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-758) Address TODOs when we upgrade to next PDFBox release - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-565) Improved OSGi bundling - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-703) Drop deprecated methods/classes/interfaces - posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org> on 2011/10/25 23:12:33 UTC, 0 replies.
- Tika 1.0 RC? - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/10/26 03:16:23 UTC, 3 replies.
- [jira] [Assigned] (TIKA-736) OpenOffice parser: master footer text isn't extracted - posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org> on 2011/10/26 12:55:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-746) Support custom mime types - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/26 12:55:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-582) Lithuanian language identification - posted by "Žygimantas Medelis (Updated JIRA)" <ji...@apache.org> on 2011/10/26 13:09:32 UTC, 0 replies.
- [jira] [Reopened] (TIKA-582) Lithuanian language identification - posted by "Michael McCandless (Reopened) (JIRA)" <ji...@apache.org> on 2011/10/26 14:49:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-582) Lithuanian language identification - posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/26 14:53:32 UTC, 3 replies.
- [jira] [Resolved] (TIKA-582) Lithuanian language identification - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/26 16:39:32 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #692 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/26 20:21:09 UTC, 0 replies.
- [jira] [Created] (TIKA-762) EXIF extraction from PNG images - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/26 23:13:32 UTC, 0 replies.
- [jira] [Updated] (TIKA-762) EXIF extraction from PNG images - posted by "Nick Burch (Updated) (JIRA)" <ji...@apache.org> on 2011/10/26 23:15:32 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #693 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/27 15:09:22 UTC, 0 replies.
- [jira] [Resolved] (TIKA-703) Drop deprecated methods/classes/interfaces - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/27 18:26:32 UTC, 0 replies.
- [jira] [Resolved] (TIKA-761) Provide version number by CLI argument -V - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/28 18:09:32 UTC, 0 replies.
- [jira] [Created] (TIKA-763) Update license metadata - posted by "Jukka Zitting (Created) (JIRA)" <ji...@apache.org> on 2011/10/28 18:57:32 UTC, 0 replies.
- [jira] [Commented] (TIKA-763) Update license metadata - posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org> on 2011/10/28 19:13:32 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #696 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/28 19:18:14 UTC, 0 replies.
- [jira] [Resolved] (TIKA-736) OpenOffice parser: master footer text isn't extracted - posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/28 20:05:32 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #697 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/28 21:26:20 UTC, 0 replies.
- [jira] [Created] (TIKA-764) OpenDocumentMetaParser should use common metadata keys for document statistics - posted by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/29 04:19:32 UTC, 0 replies.
- [jira] [Commented] (TIKA-764) OpenDocumentMetaParser should use common metadata keys for document statistics - posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/29 04:23:32 UTC, 0 replies.
- [jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files - posted by "Ahmad Ajiloo (Updated) (JIRA)" <ji...@apache.org> on 2011/10/31 14:15:32 UTC, 1 replies.
- A problem in the right-to-left languages - posted by ahmad ajiloo <ah...@gmail.com> on 2011/10/31 18:35:07 UTC, 1 replies.
- location of pdfbox in sources of Tika - posted by ahmad ajiloo <ah...@gmail.com> on 2011/10/31 18:36:44 UTC, 1 replies.
- [jira] [Resolved] (TIKA-565) Improved OSGi bundling - posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/31 22:49:32 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk » Apache Tika OSGi bundle #703 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/31 23:32:26 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #703 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/10/31 23:32:27 UTC, 0 replies.