You are viewing a plain text version of this content. The canonical link for it is here.
- Re: OCR with tika-server - posted by "Ramirez, Paul M (398J)" <pa...@jpl.nasa.gov> on 2014/10/01 01:28:39 UTC, 10 replies.
- [jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats - posted by "Vineet Ghatge (JIRA)" <ji...@apache.org> on 2014/10/01 07:05:34 UTC, 21 replies.
- [jira] [Resolved] (TIKA-1427) PDF Images don't appear in structured view - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/01 16:14:41 UTC, 0 replies.
- [jira] [Commented] (TIKA-1427) PDF Images don't appear in structured view - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/01 16:50:33 UTC, 9 replies.
- [jira] [Updated] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/02 05:11:34 UTC, 4 replies.
- [jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/10/02 07:35:34 UTC, 5 replies.
- [jira] [Created] (TIKA-1434) Plain text file reported as binary - posted by "Marco Quaranta (JIRA)" <ji...@apache.org> on 2014/10/02 17:18:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1434) Plain text file reported as binary - posted by "Marco Quaranta (JIRA)" <ji...@apache.org> on 2014/10/02 17:20:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/02 21:52:35 UTC, 20 replies.
- Trying To Index A Visio document - posted by mdemarco123 <mi...@hotmail.com> on 2014/10/03 00:30:47 UTC, 1 replies.
- [jira] [Created] (TIKA-1435) Update rome dependency to 1.5 - posted by "Johannes Mockenhaupt (JIRA)" <ji...@apache.org> on 2014/10/03 11:26:33 UTC, 0 replies.
- [GitHub] tika pull request: TIKA-1435: Upgrade Rome to 1.5 - posted by jotomo <gi...@git.apache.org> on 2014/10/03 11:27:34 UTC, 1 replies.
- [jira] [Commented] (TIKA-1435) Update rome dependency to 1.5 - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/03 11:27:34 UTC, 13 replies.
- [jira] [Issue Comment Deleted] (TIKA-1435) Update rome dependency to 1.5 - posted by "Johannes Mockenhaupt (JIRA)" <ji...@apache.org> on 2014/10/03 11:28:33 UTC, 1 replies.
- [jira] [Comment Edited] (TIKA-1427) PDF Images don't appear in structured view - posted by "James Baker (JIRA)" <ji...@apache.org> on 2014/10/03 13:59:35 UTC, 1 replies.
- [jira] [Resolved] (TIKA-1435) Update rome dependency to 1.5 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/04 01:51:35 UTC, 0 replies.
- [GitHub] tika pull request: [TIKA-1354] Register ForkParser service in Acti... - posted by asfgit <gi...@git.apache.org> on 2014/10/04 01:54:04 UTC, 0 replies.
- [jira] [Commented] (TIKA-1354) ForkParser doesn't work in OSGI container - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/04 01:54:33 UTC, 4 replies.
- [GitHub] tika pull request: Similar to TIKA-1126, this commit adds the abil... - posted by asfgit <gi...@git.apache.org> on 2014/10/04 01:58:46 UTC, 0 replies.
- [jira] [Commented] (TIKA-1126) text/html procuder for tika-server - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/04 01:59:33 UTC, 0 replies.
- [GitHub] tika pull request: TIKA-1369 Resolve thread safety issue in ImageM... - posted by asfgit <gi...@git.apache.org> on 2014/10/04 04:22:23 UTC, 0 replies.
- [jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor - posted by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/10/04 04:22:34 UTC, 5 replies.
- [jira] [Resolved] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/04 04:24:34 UTC, 0 replies.
- [jira] [Reopened] (TIKA-1435) Update rome dependency to 1.5 - posted by "Johannes Mockenhaupt (JIRA)" <ji...@apache.org> on 2014/10/04 13:10:33 UTC, 0 replies.
- [jira] [Assigned] (TIKA-1435) Update rome dependency to 1.5 - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/04 23:41:34 UTC, 0 replies.
- Re: [PDFParser] - patch proposal - posted by Stefano Fornari <st...@gmail.com> on 2014/10/05 11:57:03 UTC, 2 replies.
- [jira] [Created] (TIKA-1436) improvement to PDFParser - posted by "Stefano Fornari (JIRA)" <ji...@apache.org> on 2014/10/05 21:20:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1436) improvement to PDFParser - posted by "Stefano Fornari (JIRA)" <ji...@apache.org> on 2014/10/05 21:40:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1437) encoding issue in AutoDetectReader - posted by "Shuai Liu (JIRA)" <ji...@apache.org> on 2014/10/05 22:19:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1437) encoding issue in AutoDetectReader - posted by "Shuai Liu (JIRA)" <ji...@apache.org> on 2014/10/05 22:24:33 UTC, 3 replies.
- [GitHub] tika pull request: TIKA-1369 Avoid ThreadLocal usage from Memory L... - posted by vilmospapp <gi...@git.apache.org> on 2014/10/06 14:14:09 UTC, 1 replies.
- [jira] [Commented] (TIKA-1437) encoding issue in AutoDetectReader - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/06 16:05:33 UTC, 1 replies.
- [jira] [Comment Edited] (TIKA-1437) encoding issue in AutoDetectReader - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/06 16:05:34 UTC, 3 replies.
- [GitHub] tika pull request: TIKA-1354 Add test method with nonfunctional fo... - posted by hlavki <gi...@git.apache.org> on 2014/10/06 16:27:24 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1435) Update rome dependency to 1.5 - posted by "Johannes Mockenhaupt (JIRA)" <ji...@apache.org> on 2014/10/06 21:13:35 UTC, 2 replies.
- [jira] [Updated] (TIKA-1435) Update rome dependency to 1.5 - posted by "Johannes Mockenhaupt (JIRA)" <ji...@apache.org> on 2014/10/06 21:14:34 UTC, 1 replies.
- Tesseract OCR always activeated parser for images - posted by Lewis John Mcgibbney <le...@gmail.com> on 2014/10/07 01:47:36 UTC, 3 replies.
- Apache Tika 1.6 Fails SHA1 and Key Checks - posted by Shannon Brown <sb...@abbacan.com> on 2014/10/07 14:21:17 UTC, 3 replies.
- [jira] [Commented] (TIKA-93) OCR support - posted by "Stef Ald (JIRA)" <ji...@apache.org> on 2014/10/07 23:26:36 UTC, 3 replies.
- [jira] [Updated] (TIKA-1427) PDF Images don't appear in structured view - posted by "James Baker (JIRA)" <ji...@apache.org> on 2014/10/08 18:10:34 UTC, 0 replies.
- buildbot success in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2014/10/08 19:30:20 UTC, 5 replies.
- buildbot failure in ASF Buildbot on tika-trunk - posted by bu...@apache.org on 2014/10/08 20:30:13 UTC, 3 replies.
- [jira] [Created] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/10/08 22:50:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/10/08 22:51:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/08 22:58:33 UTC, 1 replies.
- [jira] [Closed] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/09 00:24:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1439) PDF embeded with document can not parse. - posted by "sunxingzhe (JIRA)" <ji...@apache.org> on 2014/10/09 09:55:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-1439) PDF embeded with document can not parse. - posted by "sunxingzhe (JIRA)" <ji...@apache.org> on 2014/10/09 10:01:33 UTC, 2 replies.
- tika-trunk-jdk1.6 - Build # 231 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/09 10:13:19 UTC, 0 replies.
- [jira] [Commented] (TIKA-1439) PDF embeded with document can not parse. - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/09 14:17:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1419) Upgrade to PDFBox 1.8.7 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/09 17:26:34 UTC, 1 replies.
- [jira] [Created] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document - posted by "Steve Gullion (JIRA)" <ji...@apache.org> on 2014/10/09 18:58:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-605) Tika GDAL parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/10 06:52:33 UTC, 3 replies.
- [jira] [Created] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/10 06:52:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/10 06:54:33 UTC, 0 replies.
- Review Request 26542: Tika GDAL parser - posted by Chris Mattmann <ma...@apache.org> on 2014/10/10 08:22:53 UTC, 2 replies.
- [jira] [Commented] (TIKA-605) Tika GDAL parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/10 08:23:34 UTC, 9 replies.
- [jira] [Created] (TIKA-1442) Upgrade to PDFBox 1.8.8 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/10 14:28:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/10 14:31:34 UTC, 21 replies.
- [jira] [Created] (TIKA-1443) Add a junk text detector to Tika - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/10 14:40:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1419) Upgrade to PDFBox 1.8.7 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/10 14:42:35 UTC, 0 replies.
- [jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/10 16:53:34 UTC, 8 replies.
- [jira] [Created] (TIKA-1444) Detection for VirtualPC VHD files - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2014/10/11 15:50:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/11 17:22:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/11 17:44:34 UTC, 1 replies.
- [jira] [Resolved] (TIKA-605) Tika GDAL parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/11 18:13:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/12 18:20:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/12 18:32:33 UTC, 1 replies.
- [jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/12 18:38:33 UTC, 5 replies.
- [jira] [Created] (TIKA-1446) CHM parser : wrong decompression of aligned blocks - posted by "Bin Hawking (JIRA)" <ji...@apache.org> on 2014/10/12 21:50:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1447) CHM parser: wrong directory list - posted by "Bin Hawking (JIRA)" <ji...@apache.org> on 2014/10/12 22:03:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1448) CHM parser : defect in file extraction - posted by "Bin Hawking (JIRA)" <ji...@apache.org> on 2014/10/12 22:11:33 UTC, 0 replies.
- [jira] [Closed] (TIKA-1397) Can Tika make the metadata extraction of time stamps as timezone sensitive - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/13 09:48:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser - posted by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2014/10/13 10:57:34 UTC, 5 replies.
- [jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks - posted by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2014/10/13 11:05:34 UTC, 6 replies.
- [jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries - posted by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2014/10/13 11:56:35 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1444) Detection for VirtualPC VHD files - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/10/13 12:18:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-1444) Detection for VirtualPC VHD files - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/13 12:45:33 UTC, 2 replies.
- [jira] [Created] (TIKA-1449) Extract Images from PDF at Correct Location - posted by "James Baker (JIRA)" <ji...@apache.org> on 2014/10/13 19:01:33 UTC, 0 replies.
- [GitHub] tika pull request: Rome 1.5 retry - posted by jotomo <gi...@git.apache.org> on 2014/10/13 22:06:36 UTC, 0 replies.
- [jira] [Commented] (TIKA-1443) Add a junk text detector to Tika - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/14 05:33:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-1242) Update CXF version to 3.0.2 - posted by "Sergey Beryozkin (JIRA)" <ji...@apache.org> on 2014/10/14 13:06:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1242) Update CXF version to 3.0.2 - posted by "Sergey Beryozkin (JIRA)" <ji...@apache.org> on 2014/10/14 13:08:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1242) Update CXF version to 3.0.2 - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/14 13:44:34 UTC, 3 replies.
- javax.mail and WADL gen dependencies in Tika JAX-RS server - posted by Sergey Beryozkin <sb...@gmail.com> on 2014/10/14 14:11:21 UTC, 1 replies.
- Re: Nutch vs Lucidworks Fusion - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/10/16 02:52:05 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1391) Create Parser.parse() example - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/16 05:18:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1442) Upgrade to PDFBox 1.8.8 - posted by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/10/16 08:13:33 UTC, 6 replies.
- 1.7 release? - posted by Andrzej Białecki <ab...@getopt.org> on 2014/10/16 10:51:37 UTC, 14 replies.
- [jira] [Created] (TIKA-1450) Tika does not detect the correct mime-type for webp images - posted by "Nelson Monterroso (JIRA)" <ji...@apache.org> on 2014/10/17 22:17:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1450) Tika does not detect the correct mime-type for webp images - posted by "Nelson Monterroso (JIRA)" <ji...@apache.org> on 2014/10/17 22:18:33 UTC, 6 replies.
- [jira] [Resolved] (TIKA-1450) Tika does not detect the correct mime-type for webp images - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/10/18 02:04:34 UTC, 0 replies.
- [jira] [Commented] (TIKA-1450) Tika does not detect the correct mime-type for webp images - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/18 02:46:34 UTC, 1 replies.
- tika-trunk-jdk1.7 - Build # 270 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/19 10:12:07 UTC, 0 replies.
- tika-trunk-jdk1.6 - Build # 251 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/20 10:03:58 UTC, 0 replies.
- tika-trunk-jdk1.7 - Build # 271 - Still Failing - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/20 10:04:10 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1302) Let's run Tika against a large batch of docs nightly - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/20 19:23:35 UTC, 1 replies.
- Re: Tika 1.6 update in Maven Central? - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/10/21 05:18:32 UTC, 3 replies.
- [jira] [Created] (TIKA-1451) Add Recursive Metadata Parser Wrapper output to tika-app and gui - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/21 05:21:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-1451) Add Recursive Metadata Parser Wrapper output to tika-app and gui - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/21 05:22:34 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails - posted by "Oleg Tikhonov (JIRA)" <ji...@apache.org> on 2014/10/21 08:19:34 UTC, 2 replies.
- [jira] [Assigned] (TIKA-1423) Build a parser to extract data from GRIB formats - posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/10/21 08:53:34 UTC, 0 replies.
- tika-trunk-jdk1.7 - Build # 273 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/21 11:42:13 UTC, 0 replies.
- parser.parse() throws exception after which the procesed file is not getting renamed/moved. - posted by Tony Braganza <br...@gmail.com> on 2014/10/21 14:37:37 UTC, 0 replies.
- [jira] [Created] (TIKA-1452) parser.parse() throws exception after which the procesed file is not getting renamed/moved/deleted - posted by "Abhishek (JIRA)" <ji...@apache.org> on 2014/10/21 14:43:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/21 14:54:34 UTC, 1 replies.
- [jira] [Commented] (TIKA-1452) parser.parse() throws exception after which the procesed file is not getting renamed/moved/deleted - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/10/21 14:57:33 UTC, 2 replies.
- Re: svn commit: r1633325 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java test/java/org/apache/tika/parser/mail/RFC822ParserTest.java - posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/10/21 15:49:54 UTC, 2 replies.
- [jira] [Updated] (TIKA-1446) CHM parser : wrong decompression of aligned blocks - posted by "Bin Hawking (JIRA)" <ji...@apache.org> on 2014/10/21 21:09:33 UTC, 2 replies.
- import (re)ordering? - posted by "Allison, Timothy B." <ta...@mitre.org> on 2014/10/21 22:59:19 UTC, 5 replies.
- [jira] [Created] (TIKA-1453) fails to parse RFC3464 documents - posted by "Rob Tulloh (JIRA)" <ji...@apache.org> on 2014/10/21 23:13:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1451) Add Recursive Metadata Parser Wrapper output to tika-app and gui - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/22 02:37:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1451) Add Recursive Metadata Parser Wrapper output to tika-app and gui - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/22 02:47:34 UTC, 3 replies.
- [jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8 - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/22 13:20:36 UTC, 7 replies.
- [jira] [Created] (TIKA-1454) Extracting as HTML loses links in xlsx, ppt, and pptx files - posted by "Chris Bryant (JIRA)" <ji...@apache.org> on 2014/10/22 21:01:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1454) Extracting as HTML loses links in xlsx, ppt, and pptx files - posted by "Chris Bryant (JIRA)" <ji...@apache.org> on 2014/10/22 21:03:33 UTC, 1 replies.
- [jira] [Reopened] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails - posted by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/10/22 22:41:34 UTC, 0 replies.
- [GitHub] tika pull request: TIKA-1446 - posted by thaichat04 <gi...@git.apache.org> on 2014/10/23 17:47:30 UTC, 1 replies.
- [jira] [Created] (TIKA-1455) Upgrade GSON dependency - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/23 17:56:33 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1455) Upgrade GSON dependency - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/23 17:57:34 UTC, 0 replies.
- [GitHub] tika pull request: CHM Parser Improvement - posted by thaichat04 <gi...@git.apache.org> on 2014/10/23 18:17:16 UTC, 0 replies.
- [jira] [Commented] (TIKA-1098) not able to parse pdfs/docs/ppts using 1.1 tika parser‏‏ - posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/10/23 19:45:33 UTC, 0 replies.
- [jira] [Assigned] (TIKA-443) Geographic Information Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/24 07:34:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-443) Geographic Information Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/24 07:36:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1456) Visual Sentiment API parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/24 08:35:33 UTC, 0 replies.
- [jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:34 UTC, 0 replies.
- [jira] [Updated] (TIKA-985) Support for HTML5 elements - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1108) Represent individual slides in pptx - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-988) We don't extract a placeholder for a Word document embedded in an Excel document - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1300) Switch default PDFBox parser to NonSequentialParser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:36 UTC, 0 replies.
- [jira] [Updated] (TIKA-1072) AIOOBE when handling embedded document in .doc file - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:36 UTC, 0 replies.
- [jira] [Updated] (TIKA-1343) Create a Tika Translator implementation that uses JoshuaDecoder - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:37 UTC, 0 replies.
- [jira] [Updated] (TIKA-1395) Create embedded image extraction example - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:37 UTC, 0 replies.
- [jira] [Updated] (TIKA-1426) Let's allow users to specify a tika config file on the commandline for tika-app and tika-server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:37 UTC, 0 replies.
- [jira] [Updated] (TIKA-987) Embedded drawing (SHAPE MERGEFORMAT) sometimes not extracted - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-891) Use POST in addition to PUT on method calls in tika-server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-1387) Add forbidden-apis checker to TIKA build - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-1301) Establish TikaServer on Apache hosted VM - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:39 UTC, 0 replies.
- [jira] [Updated] (TIKA-1425) Automatic batching of Microsoft service calls - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:39 UTC, 0 replies.
- [jira] [Updated] (TIKA-1417) Create Extract Embedded Images from PDFs Example - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:39 UTC, 0 replies.
- [jira] [Updated] (TIKA-995) XHTMLContentHandler doesn't pass attributes of body element - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:40 UTC, 0 replies.
- [jira] [Updated] (TIKA-1106) CLAVIN Integration - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:41 UTC, 0 replies.
- [jira] [Updated] (TIKA-1384) Use tika-parent dependency management for common dependencies - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:41 UTC, 0 replies.
- [jira] [Updated] (TIKA-1220) Parser implementration for IFC files - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-776) ExifTool Embedder - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-980) MicrodataContentHandler for Apache Tika - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:43 UTC, 0 replies.
- [jira] [Updated] (TIKA-1079) Word document hits AIOOBE in SummaryExtractor.parseSummaries - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:44 UTC, 0 replies.
- [jira] [Updated] (TIKA-774) ExifTool Parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:44 UTC, 0 replies.
- [jira] [Updated] (TIKA-1328) Translate Metadata and Content - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:44 UTC, 0 replies.
- [jira] [Updated] (TIKA-1383) Simplify TikeServerCli endpoint setup code - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:45 UTC, 0 replies.
- [jira] [Updated] (TIKA-1388) Tika IOUtils java.lang.OutOfMemoryError - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:45 UTC, 0 replies.
- [jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:45 UTC, 0 replies.
- [jira] [Updated] (TIKA-1366) Update some of Tika Server services to support JAX-RS 2.0 AsyncResponse - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:46 UTC, 0 replies.
- [jira] [Updated] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:46 UTC, 0 replies.
- [jira] [Updated] (TIKA-1308) Support in memory parse mode(don't create temp file): to support run Tika in GAE - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:46 UTC, 0 replies.
- [jira] [Updated] (TIKA-1295) Make some Dublin Core items multi-valued - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:46 UTC, 0 replies.
- [jira] [Updated] (TIKA-1324) Use a common path for the Tika Server unpacker resources - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:47 UTC, 0 replies.
- [jira] [Updated] (TIKA-1269) Self-hosted documentation for the JAX-RS Server - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:47 UTC, 0 replies.
- [jira] [Updated] (TIKA-1167) Embedded object not extracted - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:48 UTC, 0 replies.
- [jira] [Updated] (TIKA-1059) Better Handling of InterruptedException in ExternalParser and ExternalEmbedder - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:48 UTC, 0 replies.
- [jira] [Updated] (TIKA-1390) Create tika-example module - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-1307) Jenkins Java7 job requires a profile in order to build 'tika-java7' module. - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-1416) Refactor Translator Exception Handling - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-1408) Fix version for tikadotnet to be tracked along with trunk and release version - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:49 UTC, 0 replies.
- [jira] [Updated] (TIKA-1379) error in Tika().detect for xml files with xades signature - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:50 UTC, 0 replies.
- [jira] [Updated] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:50 UTC, 0 replies.
- [jira] [Updated] (TIKA-1273) old tika-server jar artifact contains no manifest so not able to invoke from shell - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:50 UTC, 0 replies.
- [jira] [Updated] (TIKA-1306) ClassCastException WARN [main] (COSDocument.java:303) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName in o.a.t.parser.pdf.PDFParserTest - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:50 UTC, 0 replies.
- [jira] [Updated] (TIKA-1315) Basic list support in WordExtractor - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:51 UTC, 0 replies.
- [jira] [Updated] (TIKA-1456) Visual Sentiment API parser - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:51 UTC, 0 replies.
- [jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build - posted by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2014/10/25 14:19:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1387) Add forbidden-apis checker to TIKA build - posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 17:27:34 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1446) CHM parser : wrong decompression of aligned blocks - posted by "Bin Hawking (JIRA)" <ji...@apache.org> on 2014/10/25 22:27:34 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1452) parser.parse() throws exception after which the procesed file is not getting renamed/moved/deleted - posted by "Abhishek (JIRA)" <ji...@apache.org> on 2014/10/27 07:55:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1457) NullPointerException in tika-app, parsing PDF content - posted by "Tadeu Alves (JIRA)" <ji...@apache.org> on 2014/10/27 15:00:35 UTC, 0 replies.
- [jira] [Updated] (TIKA-1457) NullPointerException in tika-app, parsing PDF content - posted by "Tadeu Alves (JIRA)" <ji...@apache.org> on 2014/10/27 15:16:34 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1457) NullPointerException in tika-app, parsing PDF content - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/27 15:46:33 UTC, 0 replies.
- [jira] [Created] (TIKA-1458) Matlab parser throws exception in case of MATLAB fig file - posted by "Johan van der Knijff (JIRA)" <ji...@apache.org> on 2014/10/27 15:49:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1457) NullPointerException in tika-app, parsing PDF content - posted by "Tadeu Alves (JIRA)" <ji...@apache.org> on 2014/10/27 16:21:34 UTC, 7 replies.
- [jira] [Comment Edited] (TIKA-1457) NullPointerException in tika-app, parsing PDF content - posted by "Tadeu Alves (JIRA)" <ji...@apache.org> on 2014/10/27 16:23:35 UTC, 2 replies.
- [jira] [Created] (TIKA-1459) Fix write limit bug in BasicContentHandlerFactory for BodyContentHandler - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/27 17:59:35 UTC, 0 replies.
- [jira] [Resolved] (TIKA-1459) Fix write limit bug in BasicContentHandlerFactory for BodyContentHandler - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/27 18:04:36 UTC, 0 replies.
- tika-trunk-jdk1.7 - Build # 286 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/27 18:38:16 UTC, 0 replies.
- [jira] [Commented] (TIKA-1459) Fix write limit bug in BasicContentHandlerFactory for BodyContentHandler - posted by "Hudson (JIRA)" <ji...@apache.org> on 2014/10/27 18:38:34 UTC, 1 replies.
- tika-trunk-jdk1.6 - Build # 266 - Failure - posted by Apache Jenkins Server <je...@builds.apache.org> on 2014/10/27 18:57:32 UTC, 0 replies.
- [jira] [Comment Edited] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser - posted by "Tim Allison (JIRA)" <ji...@apache.org> on 2014/10/27 20:06:35 UTC, 2 replies.
- [jira] [Created] (TIKA-1460) Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2' - posted by "onyas (JIRA)" <ji...@apache.org> on 2014/10/29 04:00:39 UTC, 0 replies.
- [jira] [Updated] (TIKA-1460) Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2' - posted by "onyas (JIRA)" <ji...@apache.org> on 2014/10/29 04:02:33 UTC, 4 replies.
- [jira] [Commented] (TIKA-1460) Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2' - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/10/29 10:55:34 UTC, 0 replies.
- [jira] [Created] (TIKA-1461) Bad mime detection of certain JAR file - posted by "Cservenak, Tamas (JIRA)" <ji...@apache.org> on 2014/10/29 11:16:33 UTC, 0 replies.
- [jira] [Commented] (TIKA-1461) Bad mime detection of certain JAR file - posted by "Cservenak, Tamas (JIRA)" <ji...@apache.org> on 2014/10/29 11:29:33 UTC, 7 replies.
- [jira] [Resolved] (TIKA-1461) Bad mime detection of certain JAR file - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/10/29 20:24:35 UTC, 0 replies.
- PDF test failing on trunk - posted by Nick Burch <ni...@apache.org> on 2014/10/29 21:39:47 UTC, 7 replies.
- [jira] [Created] (TIKA-1462) PDFont consumes all heap space - posted by "James Hardwick (JIRA)" <ji...@apache.org> on 2014/10/29 22:37:33 UTC, 0 replies.
- [jira] [Closed] (TIKA-1462) PDFont consumes all heap space - posted by "James Hardwick (JIRA)" <ji...@apache.org> on 2014/10/29 22:40:34 UTC, 0 replies.