You are viewing a plain text version of this content. The canonical link for it is here.
- sub - posted by Heitor Peralles <he...@gmail.com> on 2011/08/01 21:21:16 UTC, 0 replies.
- [jira] [Commented] (TIKA-593) Tika network server - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2011/08/02 13:49:27 UTC, 0 replies.
- Re: svn commit: r1153097 - in /tika/trunk/tika-server: ./ src/main/java/org/apache/tika/server/ src/main/resources/ src/test/java/org/apache/tika/server/ - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/08/02 14:00:01 UTC, 2 replies.
- [Tika Parser 0.9] Errors in parsing of mp3 files - posted by Alexander Sherbakov <al...@dsr-company.com> on 2011/08/02 14:45:27 UTC, 1 replies.
- [jira] [Commented] (TIKA-638) Language recognition - Failed trying to load language profile for language lt . Error: java.lang.IllegalArgumentException: Unable to add an ngram of incorrect length: 5 != 3 - posted by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2011/08/02 16:53:28 UTC, 3 replies.
- 1.0 RC in next 2 weeks - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/08/03 01:32:30 UTC, 1 replies.
- [jira] [Commented] (TIKA-369) Improve accuracy of language detection - posted by "Georger Araújo (JIRA)" <ji...@apache.org> on 2011/08/03 06:55:27 UTC, 0 replies.
- Re: WMA Parser - posted by Jukka Zitting <ju...@gmail.com> on 2011/08/03 15:14:08 UTC, 0 replies.
- [jira] [Issue Comment Edited] (TIKA-369) Improve accuracy of language detection - posted by "Georger Araújo (JIRA)" <ji...@apache.org> on 2011/08/03 19:31:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-632) Rtf parsing ignores links - posted by "Cristian Vat (JIRA)" <ji...@apache.org> on 2011/08/06 22:15:28 UTC, 0 replies.
- [jira] [Commented] (TIKA-642) Few of RTF files not extracting properly - posted by "Cristian Vat (JIRA)" <ji...@apache.org> on 2011/08/06 22:51:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-666) Unable to extract content from RTF files - posted by "Cristian Vat (JIRA)" <ji...@apache.org> on 2011/08/07 00:13:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-683) RTF Parser issues with non european characters - posted by "Cristian Vat (JIRA)" <ji...@apache.org> on 2011/08/07 01:11:27 UTC, 3 replies.
- [jira] [Commented] (TIKA-683) RTF Parser issues with non european characters - posted by "Cristian Vat (JIRA)" <ji...@apache.org> on 2011/08/07 01:27:27 UTC, 10 replies.
- [jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app - posted by "Nicholas Dodd (JIRA)" <ji...@apache.org> on 2011/08/08 15:46:27 UTC, 2 replies.
- [jira] [Created] (TIKA-688) Enhance content-type detector to recognize almost plain text - posted by "Chris Lott (JIRA)" <ji...@apache.org> on 2011/08/09 20:56:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-688) Enhance content-type detector to recognize almost plain text - posted by "Chris Lott (JIRA)" <ji...@apache.org> on 2011/08/09 21:08:27 UTC, 0 replies.
- Supporting quotes for Apache POI - posted by Yegor Kozlov <ye...@dinom.ru> on 2011/08/12 13:02:32 UTC, 0 replies.
- [jira] [Created] (TIKA-689) MimeTypes detector detects text/plain content type of a PPT file - posted by "Joseph Vychtrle (JIRA)" <ji...@apache.org> on 2011/08/14 11:05:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-689) MimeTypes detector detects text/plain content type of a PPT file - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/14 11:12:29 UTC, 3 replies.
- [jira] [Closed] (TIKA-689) MimeTypes detector detects text/plain content type of a PPT file - posted by "Joseph Vychtrle (JIRA)" <ji...@apache.org> on 2011/08/14 13:45:27 UTC, 0 replies.
- [jira] [Created] (TIKA-690) WordExtractor doesn't extract text from HWPFDocument - posted by "Joseph Vychtrle (JIRA)" <ji...@apache.org> on 2011/08/14 15:19:27 UTC, 0 replies.
- [jira] [Created] (TIKA-691) java.lang.ArrayIndexOutOfBoundsException by MS Word CDF V2 Document - posted by "Eddie Verkhoturov (JIRA)" <ji...@apache.org> on 2011/08/14 16:57:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-691) java.lang.ArrayIndexOutOfBoundsException by MS Word CDF V2 Document - posted by "Eddie Verkhoturov (JIRA)" <ji...@apache.org> on 2011/08/14 16:59:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-691) java.lang.ArrayIndexOutOfBoundsException by MS Word CDF V2 Document - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/14 17:51:27 UTC, 3 replies.
- [jira] [Commented] (TIKA-690) WordExtractor doesn't extract text from HWPFDocument - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/14 17:51:27 UTC, 3 replies.
- Re: [Tika Wiki] Trivial Update of "ReleaseProcess" by MikeMcCandless - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/08/14 18:24:25 UTC, 1 replies.
- [jira] [Closed] (TIKA-690) WordExtractor doesn't extract text from HWPFDocument - posted by "Joseph Vychtrle (JIRA)" <ji...@apache.org> on 2011/08/14 20:30:32 UTC, 0 replies.
- [jira] [Updated] (TIKA-422) Wrong charset conversion in some RTF documents. - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/15 12:15:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-648) Parsing HTML anchors with embedded div faulty - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/08/15 16:26:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-565) Improved OSGi bundling - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/15 23:25:27 UTC, 0 replies.
- Failed test: testBMP(org.apache.tika.parser.image.ImageParserTest) - posted by Steve Aulenbach <sa...@neoninc.org> on 2011/08/16 23:38:33 UTC, 3 replies.
- [jira] [Assigned] (TIKA-683) RTF Parser issues with non european characters - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2011/08/17 16:58:28 UTC, 1 replies.
- [jira] [Commented] (TIKA-676) Boilerpipe fails - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/08/17 17:10:29 UTC, 4 replies.
- [jira] [Commented] (TIKA-422) Wrong charset conversion in some RTF documents. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2011/08/17 17:48:27 UTC, 1 replies.
- Appending Mime Types - posted by Tom Grant <tg...@sms-fed.com> on 2011/08/19 00:04:53 UTC, 7 replies.
- Tika 0.9 integration in Solr 3.3.0 - posted by nirnaydewan <ni...@gmail.com> on 2011/08/19 13:44:32 UTC, 7 replies.
- Issue in text extraction in Solr / Tika - posted by nirnaydewan <ni...@gmail.com> on 2011/08/19 13:49:48 UTC, 13 replies.
- [jira] [Commented] (TIKA-392) RTF parser smashes words together in subsequent table cells - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/19 14:34:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-392) RTF parser smashes words together in subsequent table cells - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/19 14:40:27 UTC, 0 replies.
- Normalizing meta tag names - posted by Ken Krugler <kk...@transpac.com> on 2011/08/20 03:21:07 UTC, 0 replies.
- Preview of Rich Documents - posted by nirnaydewan <ni...@gmail.com> on 2011/08/20 15:39:34 UTC, 6 replies.
- [jira] [Created] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/20 17:39:27 UTC, 0 replies.
- [jira] [Updated] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/20 17:51:27 UTC, 5 replies.
- [jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/08/20 18:25:27 UTC, 7 replies.
- [jira] [Issue Comment Edited] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/20 18:31:27 UTC, 0 replies.
- [jira] [Reopened] (TIKA-651) Unescaped attribute value generated - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/20 20:27:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-651) Unescaped attribute value generated - posted by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/08/20 20:59:27 UTC, 3 replies.
- [jira] [Resolved] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/21 15:37:27 UTC, 0 replies.
- [jira] [Resolved] (TIKA-447) Container aware mimetype detection - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/21 16:15:27 UTC, 0 replies.
- [jira] [Resolved] (TIKA-677) Installing Tika 0.9 using Maven fails tests - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/21 16:37:27 UTC, 0 replies.
- [jira] [Commented] (TIKA-648) Parsing HTML anchors with embedded div faulty - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/21 16:55:27 UTC, 2 replies.
- [jira] [Resolved] (TIKA-667) Changes to RFC822Parser to support turning off strict parsing - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/21 19:39:27 UTC, 0 replies.
- [jira] [Created] (TIKA-693) Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2011/08/22 13:39:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2011/08/22 13:43:29 UTC, 0 replies.
- [jira] [Resolved] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser - posted by "Maxim Valyanskiy (JIRA)" <ji...@apache.org> on 2011/08/22 13:45:29 UTC, 0 replies.
- http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException - posted by prince shah <pr...@gmail.com> on 2011/08/22 18:33:48 UTC, 1 replies.
- [jira] [Created] (TIKA-694) On extraction, get properties AND / OR content extraction - posted by "Etienne Jouvin (JIRA)" <ji...@apache.org> on 2011/08/23 09:22:29 UTC, 0 replies.
- [jira] [Created] (TIKA-695) Custom properties on xlsx, docx, pptx - posted by "Etienne Jouvin (JIRA)" <ji...@apache.org> on 2011/08/23 09:28:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-695) Custom properties on xlsx, docx, pptx - posted by "Etienne Jouvin (JIRA)" <ji...@apache.org> on 2011/08/23 09:30:28 UTC, 0 replies.
- [jira] [Commented] (TIKA-434) Bug in TagSoup causes IOException - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/23 11:44:29 UTC, 0 replies.
- buildbot failure in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2011/08/23 11:44:44 UTC, 0 replies.
- [jira] [Commented] (TIKA-694) On extraction, get properties AND / OR content extraction - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/23 13:14:29 UTC, 0 replies.
- buildbot success in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2011/08/23 14:49:25 UTC, 0 replies.
- [jira] [Updated] (TIKA-696) Extract watermarks from Word documents - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2011/08/23 15:44:29 UTC, 1 replies.
- [jira] [Created] (TIKA-696) Extract watermarks from Word documents - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2011/08/23 15:44:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-696) Extract watermarks from Word documents - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2011/08/23 15:46:30 UTC, 2 replies.
- [jira] [Created] (TIKA-697) Tika reports the content type of AR archives as "text/plain" - posted by "PNS (JIRA)" <ji...@apache.org> on 2011/08/23 20:16:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-697) Tika reports the content type of AR archives as "text/plain" - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/24 12:17:29 UTC, 0 replies.
- [jira] [Reopened] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/24 12:41:29 UTC, 0 replies.
- [jira] [Commented] (TIKA-611) PDFParser mixes the text from separate columns - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/24 14:19:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-612) Specify PDFBox options via ParseContext - posted by "Julien Nioche (JIRA)" <ji...@apache.org> on 2011/08/24 15:13:29 UTC, 1 replies.
- [jira] [Commented] (TIKA-394) Missing spaces on html parsing - posted by "Andrey Barhatov (JIRA)" <ji...@apache.org> on 2011/08/25 18:29:29 UTC, 0 replies.
- [jira] [Created] (TIKA-698) "Invalid UTF-16 surrogate detected:" parsing PowerPoint 97-2003 - posted by "Pablo Queixalos (JIRA)" <ji...@apache.org> on 2011/08/26 10:03:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-698) "Invalid UTF-16 surrogate detected:" parsing PowerPoint 97-2003 - posted by "Pablo Queixalos (JIRA)" <ji...@apache.org> on 2011/08/26 10:03:29 UTC, 0 replies.
- [jira] [Created] (TIKA-699) Automatic checks against backwards-incompatible API changes - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/26 12:22:29 UTC, 1 replies.
- [jira] [Commented] (TIKA-699) Automatic checks against backwards-incompatible API changes - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/26 14:35:29 UTC, 0 replies.
- [jira] [Updated] (TIKA-699) Automatic checks against backwards-incompatible API changes - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/26 15:43:29 UTC, 0 replies.
- Welcome Mike McCandless to the Tika PMC and as a Tika Committer - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/08/29 16:48:41 UTC, 3 replies.
- [jira] [Commented] (TIKA-489) Embedded Documents within documents - posted by "Jeremy Anderson (JIRA)" <ji...@apache.org> on 2011/08/29 22:43:38 UTC, 2 replies.
- [jira] [Created] (TIKA-700) Upgrade to POI 3.8 as available - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/30 16:59:38 UTC, 0 replies.
- [jira] [Commented] (TIKA-700) Upgrade to POI 3.8 as available - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/08/30 17:01:39 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk » Apache Tika parsers #591 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/08/30 17:06:50 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #591 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/08/30 17:06:52 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk » Apache Tika parsers #592 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/08/30 18:10:54 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #592 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2011/08/30 18:10:55 UTC, 0 replies.
- [jira] [Updated] (TIKA-489) Embedded Documents within documents - posted by "Jeremy Anderson (JIRA)" <ji...@apache.org> on 2011/08/30 18:29:37 UTC, 0 replies.
- Jira karma - posted by Michael McCandless <lu...@mikemccandless.com> on 2011/08/30 19:20:15 UTC, 2 replies.
- Re: svn commit: r1163336 - in /tika/trunk/tika-parsers/src/test: java/org/apache/tika/parser/rtf/ resources/test-documents/ - posted by Jukka Zitting <ju...@gmail.com> on 2011/08/30 23:35:38 UTC, 1 replies.
- when Tika closes InputStreams - posted by Michael McCandless <lu...@mikemccandless.com> on 2011/08/31 14:25:06 UTC, 1 replies.
- [jira] [Created] (TIKA-701) Fix problems with TemporaryFiles - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/08/31 15:03:09 UTC, 0 replies.