You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Created] (TIKA-2803) Apache Tika not properly extracting text from PDF for Indian languages - posted by "Subramanian (JIRA)" <ji...@apache.org> on 2019/01/01 06:15:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2803) Apache Tika not properly extracting text from PDF for Indian languages - posted by "Subramanian (JIRA)" <ji...@apache.org> on 2019/01/01 06:22:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box - posted by "Markus Mandalka (JIRA)" <ji...@apache.org> on 2019/01/02 13:12:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-2801) Tika includes 2 vulnerable components - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 16:20:00 UTC, 3 replies.
- [jira] [Created] (TIKA-2804) Blanket dependency upgrades for next release cycle - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 16:51:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2804) Blanket dependency upgrades for next release cycle - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 17:17:00 UTC, 3 replies.
- [jira] [Commented] (TIKA-2787) Make WriteLimitReachedException public and not subclass of SAXException - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 17:42:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2787) Make WriteLimitReachedException public and not subclass of SAXException - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 17:45:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2803) Apache Tika not properly extracting text from PDF for Indian languages - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 17:55:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2802) Out of memory issues when extracting large files (pst) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 18:30:00 UTC, 15 replies.
- tika-2.x-windows - Build # 369 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/03 20:16:56 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2765) Regression extracting text from corrupted docx files - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 20:33:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2765) Regression extracting text from corrupted docx files - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 20:33:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2726) Handle truncated ooxml more robustly - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/03 20:34:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/03 21:00:00 UTC, 5 replies.
- JDK 12 Early Access build 26 & JDK 13 Early Access builds available - posted by Rory O'Donnell <ro...@oracle.com> on 2019/01/04 10:22:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-2802) Out of memory issues when extracting large files (pst) - posted by "Caleb Ott (JIRA)" <ji...@apache.org> on 2019/01/04 16:32:00 UTC, 2 replies.
- [jira] [Created] (TIKA-2805) Should the HTML parser by default just ignore the - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/06 00:04:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2805) Should the HTML parser by default just ignore the - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/06 00:05:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2806) QP - posted by "christianbo (JIRA)" <ji...@apache.org> on 2019/01/07 09:34:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2806) QP decode problem - posted by "christianbo (JIRA)" <ji...@apache.org> on 2019/01/07 09:35:00 UTC, 1 replies.
- [jira] [Created] (TIKA-2807) .docx text extract leaves out rich text content-control inside of a text box - posted by "Claudia Mickiewicz (JIRA)" <ji...@apache.org> on 2019/01/07 12:43:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2802) Out of memory issues when extracting large files (pst) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 14:02:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-2806) QP decode problem - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 14:13:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2807) .docx text extract leaves out rich text content-control inside of a text box - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 14:39:00 UTC, 3 replies.
- [jira] [Comment Edited] (TIKA-2807) .docx text extract leaves out rich text content-control inside of a text box - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 14:40:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2807) .docx text extract leaves out rich text content-control inside of a text box - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 16:09:00 UTC, 0 replies.
- tika-2.x-windows - Build # 372 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2019/01/07 16:32:51 UTC, 0 replies.
- [jira] [Created] (TIKA-2808) Skip h2 1.4.197 in ossindex-maven-plugin in tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 17:23:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2808) Skip h2 1.4.197 in ossindex-maven-plugin in tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 17:36:00 UTC, 3 replies.
- [jira] [Updated] (TIKA-2808) Skip h2 1.4.197 in ossindex-maven-plugin in tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 17:39:00 UTC, 1 replies.
- [jira] [Created] (TIKA-2809) Add reports for structure tags to tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 19:35:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2810) Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 19:54:00 UTC, 0 replies.
- Preferred logging implementation - posted by Andreas Beeker <ki...@apache.org> on 2019/01/07 19:58:38 UTC, 2 replies.
- [jira] [Resolved] (TIKA-2809) Add reports for structure tags to tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 19:59:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2810) Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/07 19:59:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2810) Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/07 20:38:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-2809) Add reports for structure tags to tika-eval - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/07 20:38:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/07 23:04:00 UTC, 3 replies.
- Re: revoking signing key - posted by Tim Allison <ta...@apache.org> on 2019/01/09 13:23:24 UTC, 1 replies.
- [jira] [Created] (TIKA-2811) Illegal IOException from org.apache.tika.parser.jpeg.JpegParser@7f416310 - posted by "Maxence SAUNIER (JIRA)" <ji...@apache.org> on 2019/01/10 14:02:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2811) Illegal IOException from org.apache.tika.parser.jpeg.JpegParser@7f416310 - posted by "Maxence SAUNIER (JIRA)" <ji...@apache.org> on 2019/01/10 14:02:00 UTC, 1 replies.
- [jira] [Created] (TIKA-2812) NPE when parsing text with write limit set on IBM JDK - posted by "Sergiy Shyrkov (JIRA)" <ji...@apache.org> on 2019/01/11 11:22:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2812) NPE when parsing text with write limit set on IBM JDK - posted by "Sergiy Shyrkov (JIRA)" <ji...@apache.org> on 2019/01/11 11:23:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-2812) NPE when parsing text with write limit set on IBM JDK - posted by "Sergiy Shyrkov (JIRA)" <ji...@apache.org> on 2019/01/11 11:48:00 UTC, 0 replies.
- [jira] [Assigned] (TIKA-2812) NPE when parsing text with write limit set on IBM JDK - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/11 18:56:00 UTC, 0 replies.
- [jira] [Assigned] (TIKA-2811) Illegal IOException from org.apache.tika.parser.jpeg.JpegParser@7f416310 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/11 18:57:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2813) AutoCAD support for AC1012 - posted by "Caleb Ott (JIRA)" <ji...@apache.org> on 2019/01/14 15:34:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2813) AutoCAD support for AC1012 - posted by "Caleb Ott (JIRA)" <ji...@apache.org> on 2019/01/14 15:43:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2224) Mime magic for OneNote formats - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/15 03:33:00 UTC, 5 replies.
- [jira] [Updated] (TIKA-2224) Mime magic for OneNote formats - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/15 03:35:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2224) Mime magic for OneNote formats - posted by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/15 03:35:00 UTC, 2 replies.
- [jira] [Created] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" - posted by "Edwin Yeo Zheng Lin (JIRA)" <ji...@apache.org> on 2019/01/15 11:35:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" - posted by "Edwin Yeo Zheng Lin (JIRA)" <ji...@apache.org> on 2019/01/15 11:37:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2815) Priority of processing EML file should be TEXT_PLAIN instead of TEXT_HTML - posted by "Edwin Yeo Zheng Lin (JIRA)" <ji...@apache.org> on 2019/01/15 11:49:00 UTC, 0 replies.
- [ANNOUNCE] Apache Roadshow Chicago, Call for Presentations - posted by Trevor Grant <ra...@apache.org> on 2019/01/15 14:41:58 UTC, 0 replies.
- [jira] [Created] (TIKA-2816) Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr - posted by "Anssi Törmä (JIRA)" <ji...@apache.org> on 2019/01/15 15:13:00 UTC, 0 replies.
- [jira] [Assigned] (TIKA-2816) Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/15 16:23:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2816) Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/15 18:11:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2816) Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/15 18:41:00 UTC, 6 replies.
- [jira] [Commented] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/15 19:15:00 UTC, 9 replies.
- [jira] [Comment Edited] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/15 19:20:00 UTC, 4 replies.
- [jira] [Resolved] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/17 16:29:00 UTC, 0 replies.
- Chinese and Korea being detected as Lithuanian by LanguageDetector - posted by Mike Thomsen <mi...@gmail.com> on 2019/01/17 17:39:09 UTC, 3 replies.
- [jira] [Created] (TIKA-2817) Tika doesn't respect gzip filename - posted by "Tom Brisland (JIRA)" <ji...@apache.org> on 2019/01/18 00:16:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2817) Tika doesn't respect gzip filename - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/18 00:40:00 UTC, 4 replies.
- [jira] [Updated] (TIKA-1975) Different behaviour between tika-app and tika-server - posted by "Finn Woelm (JIRA)" <ji...@apache.org> on 2019/01/18 06:52:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-1975) Different behaviour between tika-app and tika-server - posted by "Finn Woelm (JIRA)" <ji...@apache.org> on 2019/01/18 06:52:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2818) RarParser throws EncryptedDocumentException only when whole archiveis encrypted - posted by "Pavel Arnošt (JIRA)" <ji...@apache.org> on 2019/01/18 09:29:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2818) RarParser throws EncryptedDocumentException only when whole archive is encrypted - posted by "Pavel Arnošt (JIRA)" <ji...@apache.org> on 2019/01/18 09:29:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-2818) RarParser throws EncryptedDocumentException only when whole archive is encrypted - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/18 17:01:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES - posted by "Roberto Benedetti (JIRA)" <ji...@apache.org> on 2019/01/18 22:31:00 UTC, 2 replies.
- [jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES - posted by "Roberto Benedetti (JIRA)" <ji...@apache.org> on 2019/01/18 22:46:00 UTC, 8 replies.
- JDK 12 enters Rampdown Phase Two - posted by Rory O'Donnell <ro...@oracle.com> on 2019/01/21 11:33:27 UTC, 0 replies.
- [jira] [Created] (TIKA-2819) Update jaxb & activation - posted by "Hans Brende (JIRA)" <ji...@apache.org> on 2019/01/23 04:50:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2820) Magic patterns for Unix Dump files (x-tika-unix-dump) - posted by "Johan van der Knijff (JIRA)" <ji...@apache.org> on 2019/01/24 16:27:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2820) Magic patterns for Unix Dump files (x-tika-unix-dump) - posted by "Johan van der Knijff (JIRA)" <ji...@apache.org> on 2019/01/25 13:34:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-94) Speech recognition - posted by "Furkan KAMACI (JIRA)" <ji...@apache.org> on 2019/01/26 14:22:00 UTC, 0 replies.
- [jira] [Created] (TIKA-2821) RFC822 messages erroneously parsing continuations as new headers - posted by "Joshua Turner (JIRA)" <ji...@apache.org> on 2019/01/28 16:18:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2821) RFC822 messages erroneously parsing continuations as new headers - posted by "Joshua Turner (JIRA)" <ji...@apache.org> on 2019/01/28 16:44:00 UTC, 1 replies.
- [jira] [Updated] (TIKA-2822) Update common tokens files for tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/28 16:47:00 UTC, 1 replies.
- [jira] [Created] (TIKA-2822) Update common tokens files - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/28 16:47:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2822) Update common tokens files for tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/28 16:49:00 UTC, 8 replies.
- [jira] [Updated] (TIKA-2821) RFC822 messages erroneously parsing continuations as new headers - posted by "Joshua Turner (JIRA)" <ji...@apache.org> on 2019/01/28 17:10:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2822) Update common tokens files for tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/28 17:26:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-2147) ClassCastException on a valid Word template - posted by "Jawahar (JIRA)" <ji...@apache.org> on 2019/01/29 11:35:00 UTC, 1 replies.
- [jira] [Created] (TIKA-2823) Remove printstacktrace in XMLReaderUtils - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/29 17:17:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2147) ClassCastException on a valid Word template - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/29 17:20:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2823) Remove printstacktrace in XMLReaderUtils - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/29 17:30:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2823) Remove printstacktrace in XMLReaderUtils - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/29 18:07:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-2717) Sonatype Nexus auditor is reporting that Jackson databind version used by Apache Tika is vulnerable - posted by "Abhijit Rajwade (JIRA)" <ji...@apache.org> on 2019/01/30 13:22:00 UTC, 4 replies.
- [jira] [Reopened] (TIKA-2822) Update common tokens files for tika-eval - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/30 16:22:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2717) Sonatype Nexus auditor is reporting that Jackson databind version used by Apache Tika is vulnerable - posted by "Abhijit Rajwade (JIRA)" <ji...@apache.org> on 2019/01/31 08:33:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-2717) Sonatype Nexus auditor is reporting that Jackson databind version used by Apache Tika is vulnerable - posted by "Abhijit Rajwade (JIRA)" <ji...@apache.org> on 2019/01/31 08:33:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2819) Update jaxb & activation - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/31 12:44:00 UTC, 3 replies.
- [jira] [Created] (TIKA-2824) General dependency/plugin upgrades for next release - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/31 13:08:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2802) Out of memory issues when extracting large files (pst) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/31 14:01:01 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2819) Update jaxb & activation - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/31 14:02:01 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2717) Sonatype Nexus auditor is reporting that Jackson databind version used by Apache Tika is vulnerable - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/31 14:02:01 UTC, 0 replies.
- [jira] [Commented] (TIKA-2824) General dependency/plugin upgrades for next release - posted by "Hudson (JIRA)" <ji...@apache.org> on 2019/01/31 15:09:00 UTC, 2 replies.