You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (TIKA-3101) Include XMPSchemaBasic metadata in xmp metadata extraction - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/01 14:14:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-3104) Detection of memgraph files exported from Xcode - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/01 20:47:00 UTC, 57 replies.
- [jira] [Updated] (TIKA-3104) Detection of memgraph files exported from Xcode - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/02 13:44:00 UTC, 1 replies.
- [jira] [Resolved] (TIKA-3101) Include XMPSchemaBasic metadata in xmp metadata extraction - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/02 13:56:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension. - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/02 15:13:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-2961) Tika 在识别以caff开始的txt文档时会把它错误地识别为audio/x-caf 音频类型 - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/02 15:13:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3105) OFT format detection based on file content - posted by "Ondřej Duchoň (Jira)" <ji...@apache.org> on 2020/06/03 12:19:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3105) OFT format detection based on file name (extension) instead of file content - posted by "Ondřej Duchoň (Jira)" <ji...@apache.org> on 2020/06/03 13:04:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3105) OFT format detection based on file name (extension) instead of file content - posted by "Nick Burch (Jira)" <ji...@apache.org> on 2020/06/03 13:05:00 UTC, 0 replies.
- [GitHub] [tika] KranthiGV commented on a change in pull request #317: fix for TIKA-3089 contributed by pvanderweerd - posted by GitBox <gi...@apache.org> on 2020/06/03 16:10:26 UTC, 0 replies.
- [jira] [Commented] (TIKA-3089) Text should be wrapped in pre-tags instead of in p-tags - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/03 16:11:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-3104) Detection of memgraph files exported from Xcode - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/03 17:19:00 UTC, 4 replies.
- [jira] [Created] (TIKA-3106) Tika Fails to detect some EML files if extension is not .eml - posted by "Xiaohong Yang (Jira)" <ji...@apache.org> on 2020/06/03 18:55:00 UTC, 0 replies.
- Problem in resolving tika parser in Gradle projects - posted by Dupinder Singh <du...@gmail.com> on 2020/06/04 01:47:06 UTC, 1 replies.
- [jira] [Commented] (TIKA-3106) Tika Fails to detect some EML files if extension is not .eml - posted by "Nick Burch (Jira)" <ji...@apache.org> on 2020/06/04 05:04:00 UTC, 5 replies.
- Fwd: New mailing list queued for creation: corpora-dev@tika.apache.org - posted by Tim Allison <ta...@apache.org> on 2020/06/04 12:56:44 UTC, 2 replies.
- [jira] [Created] (TIKA-3107) AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read" - posted by "Xiaohong Yang (Jira)" <ji...@apache.org> on 2020/06/04 13:42:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3107) AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read" - posted by "Nick Burch (Jira)" <ji...@apache.org> on 2020/06/05 02:59:00 UTC, 1 replies.
- new mailing list for corpora vm - posted by Tim Allison <ta...@apache.org> on 2020/06/05 13:20:32 UTC, 0 replies.
- [jira] [Commented] (TIKA-2929) tika-parsers not usable on module path (Java 11) - posted by "Marcos Bori (Jira)" <ji...@apache.org> on 2020/06/05 15:27:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-2929) tika-parsers not usable on module path (Java 11) - posted by "Marcos Bori (Jira)" <ji...@apache.org> on 2020/06/05 15:29:00 UTC, 1 replies.
- [jira] [Created] (TIKA-3108) Extract XMP from JPEG - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/08 14:49:00 UTC, 0 replies.
- Mime type magic and repeated similar blocks - thoughts? - posted by Nick Burch <ni...@apache.org> on 2020/06/09 11:04:46 UTC, 1 replies.
- [jira] [Commented] (TIKA-3097) Out of memory while parsing docx - posted by "suchendra (Jira)" <ji...@apache.org> on 2020/06/09 16:32:01 UTC, 14 replies.
- [GitHub] [tika] pszemus opened a new pull request #320: tika-mimetypes: Add mimetypes for .mpd, .m3u8 and .m4s - posted by GitBox <gi...@apache.org> on 2020/06/10 09:09:49 UTC, 0 replies.
- [jira] [Created] (TIKA-3109) Ingest attachment: failed to extract text from iframe - posted by "Younes (Jira)" <ji...@apache.org> on 2020/06/10 14:11:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3110) cannot extract metadata from 7z .tar archive - posted by "Alex (Jira)" <ji...@apache.org> on 2020/06/10 20:57:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3110) cannot extract metadata from 7z .tar archive - posted by "Alex (Jira)" <ji...@apache.org> on 2020/06/10 20:58:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe - posted by "Kenneth William Krugler (Jira)" <ji...@apache.org> on 2020/06/10 21:07:00 UTC, 8 replies.
- [jira] [Comment Edited] (TIKA-3109) Ingest attachment: failed to extract text from iframe - posted by "Younes (Jira)" <ji...@apache.org> on 2020/06/10 21:22:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3110) cannot extract metadata from 7z .tar archive - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/10 21:29:00 UTC, 16 replies.
- [jira] [Comment Edited] (TIKA-3110) cannot extract metadata from 7z .tar archive - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/10 21:30:00 UTC, 3 replies.
- [jira] [Created] (TIKA-3111) Upgrade to PDFBox 2.0.20 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/10 22:02:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar - posted by "Ip Smile (Jira)" <ji...@apache.org> on 2020/06/11 18:35:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar - posted by "Ip Smile (Jira)" <ji...@apache.org> on 2020/06/11 18:50:00 UTC, 3 replies.
- [jira] [Commented] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar - posted by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/06/11 19:16:00 UTC, 4 replies.
- [jira] [Comment Edited] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar - posted by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/06/11 19:19:00 UTC, 2 replies.
- [jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/11 20:48:00 UTC, 16 replies.
- [jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/11 20:49:00 UTC, 8 replies.
- [jira] [Created] (TIKA-3113) Currently Tika is detecting a .aux file as text/html - posted by "Danny McKinney (Jira)" <ji...@apache.org> on 2020/06/11 23:29:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3114) Error reading transcript from document - posted by "Dushyanth Balasubramanian (Jira)" <ji...@apache.org> on 2020/06/12 00:00:10 UTC, 0 replies.
- [jira] [Commented] (TIKA-3114) Error reading transcript from document - posted by "Kenneth William Krugler (Jira)" <ji...@apache.org> on 2020/06/12 00:06:00 UTC, 6 replies.
- [jira] [Commented] (TIKA-3113) Currently Tika is detecting a .aux file as text/html - posted by "Nick Burch (Jira)" <ji...@apache.org> on 2020/06/12 05:47:00 UTC, 2 replies.
- [jira] [Created] (TIKA-3115) Detect parquet files - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/12 19:49:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3115) Detect parquet files - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/12 19:52:00 UTC, 7 replies.
- [jira] [Updated] (TIKA-3111) Upgrade to PDFBox 2.0.20 - posted by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2020/06/13 10:13:00 UTC, 1 replies.
- [GitHub] [tika] deathy opened a new pull request #321: fix for TIKA-3008 contributed by deathy - posted by GitBox <gi...@apache.org> on 2020/06/14 11:05:04 UTC, 0 replies.
- [jira] [Commented] (TIKA-3008) Word Doc/Docx Formatting Extraction - Superscript/Subscript - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/14 11:06:00 UTC, 1 replies.
- [jira] [Updated] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-gui - posted by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/06/15 16:21:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI - posted by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/06/15 16:22:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3116) .docx can't extract text in nested text content-control - posted by "lee james (Jira)" <ji...@apache.org> on 2020/06/16 04:24:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3116) .docx can't extract text in nested text content-control - posted by "lee james (Jira)" <ji...@apache.org> on 2020/06/16 04:25:00 UTC, 4 replies.
- [jira] [Closed] (TIKA-3116) .docx can't extract text in nested text content-control - posted by "lee james (Jira)" <ji...@apache.org> on 2020/06/16 14:18:00 UTC, 0 replies.
- [GitHub] [tika] matthewford opened a new pull request #322: Update PDFParser.properties - posted by GitBox <gi...@apache.org> on 2020/06/16 16:20:51 UTC, 0 replies.
- [GitHub] [tika] tballison merged pull request #322: Update PDFParser.properties - posted by GitBox <gi...@apache.org> on 2020/06/16 16:45:00 UTC, 0 replies.
- [GitHub] [tika] tballison merged pull request #278: TIKA-2830 add heif mimetype support - posted by GitBox <gi...@apache.org> on 2020/06/16 16:50:55 UTC, 0 replies.
- [jira] [Commented] (TIKA-2830) Detect Media type of HEIF file correctly - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/16 16:51:00 UTC, 1 replies.
- [GitHub] [tika] tballison commented on pull request #278: TIKA-2830 add heif mimetype support - posted by GitBox <gi...@apache.org> on 2020/06/16 16:53:01 UTC, 0 replies.
- [GitHub] [tika] tballison merged pull request #320: tika-mimetypes: Add MIME types for .mpd, .m3u8 and .m4s - posted by GitBox <gi...@apache.org> on 2020/06/16 16:53:32 UTC, 0 replies.
- [GitHub] [tika] tballison merged pull request #272: TIKA-2888 Add wmv2 codec detection for WMV files - posted by GitBox <gi...@apache.org> on 2020/06/16 16:56:36 UTC, 0 replies.
- [GitHub] [tika] tballison merged pull request #276: Disable external DTD + Stylesheets with the TransformerFactory - posted by GitBox <gi...@apache.org> on 2020/06/16 16:57:02 UTC, 0 replies.
- [jira] [Commented] (TIKA-2888) Add wmv2 codec detection to ASF container - posted by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/16 16:57:02 UTC, 2 replies.
- [jira] [Created] (TIKA-3117) Upgrade to metadata-extractor 2.14.0 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/16 17:02:00 UTC, 0 replies.
- renaming master? - posted by Tim Allison <ta...@apache.org> on 2020/06/16 17:31:26 UTC, 3 replies.
- [jira] [Resolved] (TIKA-3117) Upgrade to metadata-extractor 2.14.0 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/16 18:19:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-2888) Add wmv2 codec detection to ASF container - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/16 18:20:00 UTC, 0 replies.
- [jira] [Resolved] (TIKA-3104) Detection of memgraph files exported from Xcode - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/16 18:20:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/16 19:16:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3117) Upgrade to metadata-extractor 2.14.0 - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/16 19:16:00 UTC, 1 replies.
- Re: [EXTERNAL] renaming master? - posted by Chris Mattmann <ma...@apache.org> on 2020/06/16 19:22:59 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-3097) Out of memory while parsing docx - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/17 10:18:00 UTC, 1 replies.
- [jira] [Created] (TIKA-3118) PDFParser: totalCharsPerPage vs. actual chars per page after parsing - posted by "Jeroen Steggink (Jira)" <ji...@apache.org> on 2020/06/19 07:11:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3118) PDFParser: totalCharsPerPage vs. actual chars per page after parsing - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 13:52:00 UTC, 3 replies.
- [jira] [Created] (TIKA-3119) General upgrades for 1.25 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 17:12:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3120) Remove whitelist/blacklist terminology - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 17:25:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3121) Rename master branch - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 17:32:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3121) Rename master branch - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 17:38:00 UTC, 4 replies.
- [jira] [Updated] (TIKA-3119) General upgrades for 1.25 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 18:54:00 UTC, 1 replies.
- [jira] [Commented] (TIKA-3119) General upgrades for 1.25 - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 18:56:00 UTC, 6 replies.
- [jira] [Resolved] (TIKA-3120) Remove whitelist/blacklist terminology - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/19 20:59:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3120) Remove whitelist/blacklist terminology - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/19 21:38:00 UTC, 2 replies.
- [jira] [Comment Edited] (TIKA-3119) General upgrades for 1.25 - posted by "Thamme Gowda (Jira)" <ji...@apache.org> on 2020/06/20 04:30:00 UTC, 2 replies.
- Request for access to edit the ASF Tika wiki - posted by Vegard Stikbakke <ve...@gmail.com> on 2020/06/22 08:56:04 UTC, 7 replies.
- [jira] [Created] (TIKA-3122) Extract inline image metadata without rendering for PDFs - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/22 15:45:00 UTC, 0 replies.
- JDK 15 is in Rampdown Phase One - posted by Rory O'Donnell <ro...@oracle.com> on 2020/06/22 15:51:10 UTC, 0 replies.
- [jira] [Resolved] (TIKA-3122) Extract inline image metadata without rendering for PDFs - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/22 16:49:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3122) Extract inline image metadata without rendering for PDFs - posted by "Hudson (Jira)" <ji...@apache.org> on 2020/06/22 17:19:00 UTC, 1 replies.
- [jira] [Created] (TIKA-3123) request to parse Chinese, but return Russian - posted by "阿里木 (Jira)" <ji...@apache.org> on 2020/06/23 07:36:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3123) request to parse Chinese, but return Russian - posted by "阿里木 (Jira)" <ji...@apache.org> on 2020/06/23 07:36:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3123) request to parse Chinese, but return Russian - posted by "Kenneth William Krugler (Jira)" <ji...@apache.org> on 2020/06/23 13:23:00 UTC, 1 replies.
- [jira] [Created] (TIKA-3124) .MOV file crashes Tika app, causes exception on server - posted by "Patrick Maloney (Jira)" <ji...@apache.org> on 2020/06/24 16:39:00 UTC, 0 replies.
- [jira] [Reopened] (TIKA-3104) Detection of memgraph files exported from Xcode - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/24 17:01:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3124) .MOV file crashes Tika app, causes exception on server - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/24 21:58:00 UTC, 2 replies.
- [jira] [Comment Edited] (TIKA-3124) .MOV file crashes Tika app, causes exception on server - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/24 22:00:00 UTC, 1 replies.
- Is there a way to use Tika Fork parser along with the Tika Server? - posted by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/25 13:40:57 UTC, 2 replies.
- Need some help understanding why this code gets stuck in timeout exceptions - posted by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/25 18:10:59 UTC, 0 replies.
- How do you read the __METADATA__ file from tika server programmatically? - posted by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/25 22:22:53 UTC, 1 replies.
- Tika Server - Getting the log output with MDC to associate the file being parsed - posted by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/26 18:30:27 UTC, 2 replies.
- [jira] [Created] (TIKA-3125) rmeta/text and unpack - the __DATA__ file and X-TIKA:content differ by some leading new line characters - posted by "Nicholas DiPiazza (Jira)" <ji...@apache.org> on 2020/06/27 12:44:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3125) rmeta/text and unpack - the __TEXT__ file and X-TIKA:content differ by some leading new line characters - posted by "Nicholas DiPiazza (Jira)" <ji...@apache.org> on 2020/06/27 12:46:00 UTC, 4 replies.
- [jira] [Updated] (TIKA-3125) rmeta/text and unpack - the __DATA__ file and X-TIKA:content differ by some leading new line characters - posted by "Nicholas DiPiazza (Jira)" <ji...@apache.org> on 2020/06/27 12:46:00 UTC, 1 replies.
- What directory does tika server use as it's work directory? - posted by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/27 15:12:37 UTC, 1 replies.
- [jira] [Commented] (TIKA-3125) rmeta/text and unpack - the __TEXT__ file and X-TIKA:content differ by some leading new line characters - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/27 18:32:00 UTC, 2 replies.
- Announcing ApacheCon @Home 2020 - posted by Rich Bowen <rb...@apache.org> on 2020/06/29 12:54:01 UTC, 0 replies.
- [jira] [Created] (TIKA-3126) Consider new endpoint (metadata + content non recursive) - posted by "Carina Antunes (Jira)" <ji...@apache.org> on 2020/06/30 09:57:00 UTC, 0 replies.
- [jira] [Commented] (TIKA-3126) Consider new endpoint (metadata + content non recursive) - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/30 15:47:00 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-3126) Consider new endpoint (metadata + content non recursive) - posted by "Tim Allison (Jira)" <ji...@apache.org> on 2020/06/30 15:47:00 UTC, 0 replies.
- [jira] [Created] (TIKA-3127) When using html parser any empty attribute sets value to attribute name e.g. link gives href="href" - posted by "Milan Vereščák (Jira)" <ji...@apache.org> on 2020/06/30 17:52:00 UTC, 0 replies.
- [jira] [Updated] (TIKA-3127) When using html parser any empty attribute sets value to attribute name e.g. link gives href="href" - posted by "Milan Vereščák (Jira)" <ji...@apache.org> on 2020/06/30 17:54:00 UTC, 1 replies.