You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/01 06:28:20 UTC, 21 replies.
- [jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/01 06:33:20 UTC, 10 replies.
- [jira] [Commented] (TIKA-2046) Can not read PDF correctly - posted by "gopalbhalala (JIRA)" <ji...@apache.org> on 2016/08/01 06:40:20 UTC, 5 replies.
- [jira] [Commented] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF - posted by "Egbert (JIRA)" <ji...@apache.org> on 2016/08/01 12:23:20 UTC, 1 replies.
- [jira] [Updated] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/01 12:52:21 UTC, 4 replies.
- [jira] [Resolved] (TIKA-2046) Can not read PDF correctly - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/02 16:55:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-721) UTF16-LE not detected - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/03 11:57:20 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-721) UTF16-LE not detected - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/03 12:04:20 UTC, 0 replies.
- [jira] [Issue Comment Deleted] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/04 10:22:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies - posted by "Manfred Schenk (JIRA)" <ji...@apache.org> on 2016/08/05 09:36:20 UTC, 3 replies.
- [jira] [Comment Edited] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies - posted by "Manfred Schenk (JIRA)" <ji...@apache.org> on 2016/08/05 09:38:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2047) TXTParser overwrites mime type/masks types that are subtype of text - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 12:11:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2047) TXTParser overwrites mime type/masks types that are subtype of text - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 12:32:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2048) Add space for
elements in MSWord 2003XML
- posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 16:21:20 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2048) Add space for
elements in MSWord 2003XML
- posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 16:30:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2048) Add space for
elements in MSWord 2003XML
- posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 16:30:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2049) Add parser for vcal - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/05 16:47:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2048) Add space for
elements in MSWord 2003XML
- posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/08/05 17:15:20 UTC, 2 replies.
- tika-2.x-windows - Build # 31 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/08/05 17:16:29 UTC, 0 replies.
- tika-2.x - Build # 127 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/08/05 17:31:35 UTC, 0 replies.
- [jira] [Created] (TIKA-2050) HTMLEncodingDetector class fails on some HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/05 21:21:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2050) HTMLEncodingDetector class fails on some HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/05 21:22:20 UTC, 0 replies.
- [jira] [Created] (TIKA-2051) Upgrade to PDFBox 2.0.3 when available - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/08 10:51:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2050) HTMLEncodingDetector class fails on some HTML documents - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/08 11:11:20 UTC, 3 replies.
- [jira] [Created] (TIKA-2052) Words are concatenated where there is a clear separation in the PDF document - posted by "Sebastian Landwehr (JIRA)" <ji...@apache.org> on 2016/08/09 14:13:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2052) Words are separated where there the letters are spaced together in the PDF document - posted by "Sebastian Landwehr (JIRA)" <ji...@apache.org> on 2016/08/09 14:15:20 UTC, 1 replies.
- [jira] [Commented] (TIKA-2052) Words are separated where there the letters are spaced together in the PDF document - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/09 14:24:20 UTC, 2 replies.
- [jira] [Closed] (TIKA-2052) Words are separated where there the letters are spaced together in the PDF document - posted by "Sebastian Landwehr (JIRA)" <ji...@apache.org> on 2016/08/09 15:09:20 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2050) HTMLEncodingDetector class fails on some HTML documents - posted by "Shabanali Faghani (JIRA)" <ji...@apache.org> on 2016/08/10 09:06:20 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2050) HTMLEncodingDetector class fails on some HTML documents - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/11 12:34:20 UTC, 0 replies.
- Tika 1.14? - posted by "Allison, Timothy B." <ta...@mitre.org> on 2016/08/11 18:59:56 UTC, 8 replies.
- tika-2.x-windows - Build # 32 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/08/11 19:16:48 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/11 20:05:22 UTC, 0 replies.
- [jira] [Updated] (TIKA-2031) Update Tesseract OCR Parser - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/11 20:06:20 UTC, 0 replies.
- tika-2.x-windows - Build # 33 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2016/08/11 20:16:42 UTC, 0 replies.
- [jira] [Commented] (TIKA-2041) Charset detection doesn't appear to be thread-safe - posted by "Hudson (JIRA)" <ji...@apache.org> on 2016/08/11 20:17:21 UTC, 2 replies.
- [GitHub] tika pull request #129: Adding TagRatio Parser to Tika - posted by AravindRam <gi...@git.apache.org> on 2016/08/11 22:42:25 UTC, 1 replies.
- [jira] [Created] (TIKA-2053) Adding TagRatio to Tika Parser - posted by "Aravind Ram Nathan (JIRA)" <ji...@apache.org> on 2016/08/12 06:49:20 UTC, 0 replies.
- [GitHub] tika pull request #130: fix for TIKA-2053 contributed by AravindRam - posted by AravindRam <gi...@git.apache.org> on 2016/08/12 07:09:15 UTC, 0 replies.
- [jira] [Commented] (TIKA-2053) Adding TagRatio to Tika Parser - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/08/12 07:09:20 UTC, 2 replies.
- [jira] [Created] (TIKA-2054) Problem with ligatures converting from PDF to HTML with Tika - posted by "Angela Onslow (JIRA)" <ji...@apache.org> on 2016/08/12 11:46:20 UTC, 0 replies.
- [jira] [Updated] (TIKA-2054) Problem with ligatures converting from PDF to HTML with Tika - posted by "Angela Onslow (JIRA)" <ji...@apache.org> on 2016/08/12 11:48:20 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2054) Problem with ligatures converting from PDF to HTML with Tika - posted by "Angela Onslow (JIRA)" <ji...@apache.org> on 2016/08/12 11:48:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-2054) Problem with ligatures converting from PDF to HTML with Tika - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/12 12:25:22 UTC, 1 replies.
- [jira] [Commented] (TIKA-1980) HTML head tags found after first script not parsed by HtmlParser (regression) - posted by "Joseph Naegele (JIRA)" <ji...@apache.org> on 2016/08/12 15:03:20 UTC, 4 replies.
- [jira] [Assigned] (TIKA-1980) HTML head tags found after first script not parsed by HtmlParser (regression) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/12 15:51:20 UTC, 0 replies.
- [GitHub] tika pull request #121: fix for TIKA-1980 contributed by naegelejd - posted by asfgit <gi...@git.apache.org> on 2016/08/12 16:03:18 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1980) HTML head tags found after first script not parsed by HtmlParser (regression) - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/08/12 16:47:20 UTC, 0 replies.
- [jira] [Commented] (TIKA-1938) HtmlParser drops