You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (TIKA-2018) Attempt to get Title from Full text if not present in MetaData ( Application/Pdf ) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 11:42:11 UTC, 1 replies.
- office dissector - posted by "Allison, Timothy B." <ta...@mitre.org> on 2016/07/01 17:50:44 UTC, 0 replies.
- [jira] [Updated] (TIKA-2027) Missing some text from docx - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:21:11 UTC, 0 replies.
- [jira] [Created] (TIKA-2027) Missing some text from docx - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:21:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-2027) Missing some text from docx - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:23:11 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2027) Missing some text from docx - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:32:11 UTC, 0 replies.
- [jira] [Created] (TIKA-2028) Extract text from drawings in MSOffice - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:45:11 UTC, 0 replies.
- [jira] [Updated] (TIKA-2028) Extract text from drawings in MSOffice - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/01 18:46:11 UTC, 0 replies.
- Re: TIKA-1164 - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2016/07/04 15:45:41 UTC, 3 replies.
- [vm] Initial test-run for POI mass-testing - posted by Dominik Stadler <do...@gmx.at> on 2016/07/04 21:09:53 UTC, 0 replies.
- [jira] [Created] (TIKA-2029) Add link string to hrefs in PDF - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/06 16:24:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-2029) Add link string to hrefs in PDF - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/07/06 20:53:11 UTC, 1 replies.
- Sentiment Analysis Parser updates - posted by Anastasija Mensikova <me...@gmail.com> on 2016/07/06 21:06:01 UTC, 1 replies.
- [GitHub] tika pull request #126: fix for TIKA-2021 contributed by Zarana Parekh - posted by asfgit <gi...@git.apache.org> on 2016/07/07 06:39:31 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2021) Improving accuracy of Tesseract parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/07/07 06:40:11 UTC, 0 replies.
- [jira] [Updated] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/07/07 06:40:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-2021) Improving accuracy of Tesseract parser - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/07 06:40:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/07/07 08:51:11 UTC, 3 replies.
- [jira] [Created] (TIKA-2030) A space is suppressed when parsing Odt file - posted by "David Pilato (JIRA)" <ji...@apache.org> on 2016/07/07 08:56:11 UTC, 0 replies.
- [jira] [Updated] (TIKA-2030) A space is suppressed when parsing Odt file - posted by "David Pilato (JIRA)" <ji...@apache.org> on 2016/07/07 08:57:10 UTC, 0 replies.
- [jira] [Assigned] (TIKA-2030) A space is suppressed when parsing Odt file - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/08 12:50:11 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2030) A space is suppressed when parsing Odt file - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/08 18:23:11 UTC, 0 replies.
- [jira] [Commented] (TIKA-2030) A space is suppressed when parsing Odt file - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/07/08 18:49:11 UTC, 3 replies.
- tika-2.x-windows - Build # 25 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/08 19:16:59 UTC, 0 replies.
- [jira] [Commented] (TIKA-1164) InputStream get modified by content type detection - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/08 19:33:11 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/08 19:34:11 UTC, 0 replies.
- [jira] [Updated] (TIKA-2029) Add link string to hrefs in PDF - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/08 19:36:10 UTC, 0 replies.
- [jira] [Created] (TIKA-2031) Update Tesseract OCR Parser - posted by "Zarana Parekh (JIRA)" <ji...@apache.org> on 2016/07/09 16:35:10 UTC, 0 replies.
- [jira] [Commented] (TIKA-2025) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results - posted by "Aeham Abushwashi (JIRA)" <ji...@apache.org> on 2016/07/12 09:22:20 UTC, 9 replies.
- [jira] [Comment Edited] (TIKA-2025) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results - posted by "Aeham Abushwashi (JIRA)" <ji...@apache.org> on 2016/07/12 09:23:20 UTC, 2 replies.
- [jira] [Created] (TIKA-2032) OptimaizeLangDetector can not be resolved - posted by "Christoph (JIRA)" <ji...@apache.org> on 2016/07/12 11:13:20 UTC, 0 replies.
- SentimentAnalysisParser updates - posted by Anastasija Mensikova <me...@gmail.com> on 2016/07/13 19:36:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1967) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@10b8c32 - posted by "kostali (JIRA)" <ji...@apache.org> on 2016/07/14 09:06:20 UTC, 2 replies.
- [jira] [Commented] (TIKA-1967) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@10b8c32 - posted by "kostali (JIRA)" <ji...@apache.org> on 2016/07/14 09:22:20 UTC, 7 replies.
- [jira] [Comment Edited] (TIKA-1967) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@10b8c32 - posted by "kostali (JIRA)" <ji...@apache.org> on 2016/07/14 14:10:20 UTC, 0 replies.
- ApacheCon Europe call for papers open - posted by Rich Bowen <rb...@apache.org> on 2016/07/14 18:11:44 UTC, 0 replies.
- [jira] [Created] (TIKA-2033) Value attributes of input elements not extracted from HTML - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/14 20:39:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2033) Value attributes of input elements not extracted from HTML - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2016/07/14 20:54:20 UTC, 2 replies.
- [jira] [Created] (TIKA-2034) Upgrade XMPCore to 5.1.3 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/15 14:00:26 UTC, 0 replies.
- [jira] [Created] (TIKA-2035) Infinite restart of child process if jvm can't be started in tika-batch - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/15 15:39:20 UTC, 0 replies.
- xmpcore in Maven Central? - posted by "Allison, Timothy B." <ta...@mitre.org> on 2016/07/15 17:26:14 UTC, 3 replies.
- [jira] [Created] (TIKA-2036) Deleted Text from Word File Shows Up in Extract - posted by "Steve Gullion (JIRA)" <ji...@apache.org> on 2016/07/15 23:49:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2036) Deleted Text from Word File Shows Up in Extract - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/16 00:13:20 UTC, 1 replies.
- [jira] [Updated] (TIKA-2036) Deleted Text from Word File Shows Up in Extract - posted by "Steve Gullion (JIRA)" <ji...@apache.org> on 2016/07/16 15:48:20 UTC, 0 replies.
- Your project VM needs to be migrated. - posted by Gav <gm...@apache.org> on 2016/07/17 02:41:53 UTC, 1 replies.
- [jira] [Commented] (TIKA-2032) OptimaizeLangDetector can not be resolved - posted by "Eli Trucco (JIRA)" <ji...@apache.org> on 2016/07/20 12:48:20 UTC, 1 replies.
- [jira] [Closed] (TIKA-2032) OptimaizeLangDetector can not be resolved - posted by "Christoph (JIRA)" <ji...@apache.org> on 2016/07/20 12:56:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2037) Problems with email attachments - posted by "Eli Trucco (JIRA)" <ji...@apache.org> on 2016/07/20 15:42:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2037) Problems with email attachments - posted by "Eli Trucco (JIRA)" <ji...@apache.org> on 2016/07/20 15:44:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2037) Problems with email attachments - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/20 15:49:20 UTC, 7 replies.
- [jira] [Resolved] (TIKA-2037) Problems with email attachments - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/07/20 17:21:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/07/21 22:25:20 UTC, 0 replies.
- A more accurate facility for detecting Charset Encoding of HTML documents - posted by Shabanali Faghani <sh...@gmail.com> on 2016/07/21 22:31:59 UTC, 0 replies.
- [jira] [Updated] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2016/07/21 22:40:20 UTC, 1 replies.
- [jira] [Resolved] (TIKA-2025) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/22 12:57:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/22 13:12:20 UTC, 8 replies.
- tika-2.x-windows - Build # 26 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/22 13:16:20 UTC, 0 replies.
- tika-2.x - Build # 122 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/22 13:19:22 UTC, 0 replies.
- [jira] [Created] (TIKA-2039) Upgrade jackcess to 2.1.4 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/22 19:49:20 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2039) Upgrade jackcess to 2.1.4 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/22 19:58:20 UTC, 0 replies.
- tika-2.x-windows - Build # 27 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/22 20:16:14 UTC, 0 replies.
- [jira] [Commented] (TIKA-2039) Upgrade jackcess to 2.1.4 - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/07/22 20:16:20 UTC, 1 replies.
- tika-2.x - Build # 123 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/22 21:08:07 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/07/23 11:53:20 UTC, 3 replies.
- [jira] [Created] (TIKA-2040) OOM when parsing a corrupted CHM - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/25 19:47:21 UTC, 0 replies.
- [jira] [Updated] (TIKA-2040) OOM when parsing a corrupted CHM - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/25 19:50:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/26 02:52:21 UTC, 0 replies.
- [jira] [Created] (TIKA-2042) MBOX file detected wrongly as text/html - posted by "Vjeran Marcinko (JIRA)" <ji...@apache.org> on 2016/07/26 05:43:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2042) MBOX file detected wrongly as text/html - posted by "Vjeran Marcinko (JIRA)" <ji...@apache.org> on 2016/07/26 05:45:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2042) MBOX file detected wrongly as text/html - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/07/26 10:47:20 UTC, 3 replies.
- [jira] [Resolved] (TIKA-2042) MBOX file detected wrongly as text/html - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/07/26 10:47:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Florian Leitner (JIRA)" <ji...@apache.org> on 2016/07/26 11:05:20 UTC, 15 replies.
- tika-2.x - Build # 124 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/26 11:22:52 UTC, 0 replies.
- [jira] [Updated] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Christian Aistleitner (JIRA)" <ji...@apache.org> on 2016/07/26 11:35:20 UTC, 2 replies.
- tika-2.x-windows - Build # 28 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/26 12:16:26 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Christian Aistleitner (JIRA)" <ji...@apache.org> on 2016/07/26 12:30:20 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/26 12:59:20 UTC, 7 replies.
- [jira] [Resolved] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/27 00:48:20 UTC, 0 replies.
- tika-2.x-windows - Build # 29 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/27 01:16:20 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2040) OOM when parsing a corrupted CHM - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/27 01:34:20 UTC, 0 replies.
- tika-2.x-windows - Build # 30 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/07/27 02:16:23 UTC, 0 replies.
- [jira] [Commented] (TIKA-2040) OOM when parsing a corrupted CHM - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/07/27 02:17:20 UTC, 3 replies.
- [jira] [Reopened] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/27 11:39:21 UTC, 0 replies.
- [jira] [Created] (TIKA-2043) junrar tika outofmemoryerror - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2016/07/27 18:51:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2044) MboxParser wrongly concatenates multiple text lines into single header line - posted by "Vjeran Marcinko (JIRA)" <ji...@apache.org> on 2016/07/27 18:54:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2043) junrar tika outofmemoryerror - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2016/07/27 18:57:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2044) MboxParser wrongly concatenates multiple text lines into single header line - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/07/27 21:54:20 UTC, 1 replies.
- [jira] [Created] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF - posted by "Egbert (JIRA)" <ji...@apache.org> on 2016/07/28 11:27:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF - posted by "Egbert (JIRA)" <ji...@apache.org> on 2016/07/28 11:32:20 UTC, 5 replies.
- [jira] [Closed] (TIKA-1267) Improve Mbox file detection - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2016/07/28 11:35:20 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF - posted by "Egbert (JIRA)" <ji...@apache.org> on 2016/07/28 13:07:20 UTC, 1 replies.
- [jira] [Commented] (TIKA-2043) junrar tika outofmemoryerror - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/07/29 17:54:20 UTC, 0 replies.
- [GitHub] tika pull request #128: fix for TIKA-2031 contributed by Zarana-Parekh - posted by Zarana-Parekh <gi...@git.apache.org> on 2016/07/29 18:31:19 UTC, 0 replies.
- [jira] [Commented] (TIKA-2031) Update Tesseract OCR Parser - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/29 18:31:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2031) Update Tesseract OCR Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/07/29 22:16:20 UTC, 2 replies.
- [jira] [Assigned] (TIKA-2031) Update Tesseract OCR Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/07/29 22:16:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2046) Can not read PDF correctly - posted by "gopalbhalala (JIRA)" <ji...@apache.org> on 2016/07/31 09:21:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2046) Can not read PDF correctly - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/07/31 10:31:20 UTC, 0 replies.