You are viewing a plain text version of this content. The canonical link for it is here.
- [jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader - posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org> on 2012/08/01 03:02:34 UTC, 1 replies.
- Re: Custom parser error - posted by 122jxgcn <yw...@gmail.com> on 2012/08/01 03:14:22 UTC, 1 replies.
- [jira] [Comment Edited] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/01 12:14:03 UTC, 0 replies.
- [jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/01 12:14:03 UTC, 3 replies.
- [jira] [Updated] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/01 13:31:03 UTC, 0 replies.
- [jira] [Resolved] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2012/08/01 15:47:05 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #906 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/08/01 16:17:12 UTC, 2 replies.
- [jira] [Commented] (TIKA-966) org.apache.tika.Tika missing from tika-bundle-1.2.jar - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/01 16:58:02 UTC, 7 replies.
- [jira] [Created] (TIKA-967) Tika comes with transitive Maven dependency to a test artifact of vorbis-java-core - posted by "Andreas Hubold (JIRA)" <ji...@apache.org> on 2012/08/02 09:26:02 UTC, 0 replies.
- Executing file inside Parser - posted by 122jxgcn <yw...@gmail.com> on 2012/08/02 09:50:01 UTC, 2 replies.
- [jira] [Commented] (TIKA-967) Tika comes with transitive Maven dependency to a test artifact of vorbis-java-core - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2012/08/02 10:40:03 UTC, 1 replies.
- [jira] [Resolved] (TIKA-709) Tika network server does not print anything in response to, for example, Word documents - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/02 11:33:02 UTC, 0 replies.
- [jira] [Created] (TIKA-968) tika-bundle missing org.apache.commons.logging.LogFactory - posted by "Gary Karasiuk (JIRA)" <ji...@apache.org> on 2012/08/02 19:42:02 UTC, 0 replies.
- [jira] [Commented] (TIKA-968) tika-bundle missing org.apache.commons.logging.LogFactory - posted by "Gary Karasiuk (JIRA)" <ji...@apache.org> on 2012/08/02 19:46:02 UTC, 1 replies.
- Tika at ApacheCon - posted by Jukka Zitting <ju...@gmail.com> on 2012/08/03 10:54:58 UTC, 2 replies.
- [jira] [Created] (TIKA-969) Exception "org.apache.tika.exception.TikaException: Can't read JPEG metada" / "com.drew.metadata.MetadataException: Tag '34855' cannot be cast to int. It is of type 'class [I" when indexing some items - posted by "Richard Eccles (JIRA)" <ji...@apache.org> on 2012/08/03 14:05:03 UTC, 0 replies.
- [jira] [Commented] (TIKA-969) Exception "org.apache.tika.exception.TikaException: Can't read JPEG metada" / "com.drew.metadata.MetadataException: Tag '34855' cannot be cast to int. It is of type 'class [I" when indexing some items - posted by "Nick Burch (JIRA)" <ji...@apache.org> on 2012/08/03 14:07:02 UTC, 0 replies.
- [jira] [Updated] (TIKA-969) Exception "org.apache.tika.exception.TikaException: Can't read JPEG metada" / "com.drew.metadata.MetadataException: Tag '34855' cannot be cast to int. It is of type 'class [I" when indexing some items - posted by "Richard Eccles (JIRA)" <ji...@apache.org> on 2012/08/03 14:09:02 UTC, 0 replies.
- [jira] [Updated] (TIKA-969) TikaException Thrown When Handling Unknown Fields for Some JPEGs - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2012/08/03 14:25:02 UTC, 0 replies.
- [jira] [Commented] (TIKA-969) TikaException Thrown When Handling Unknown Fields for Some JPEGs - posted by "Richard Eccles (JIRA)" <ji...@apache.org> on 2012/08/03 14:27:03 UTC, 1 replies.
- [jira] [Resolved] (TIKA-969) TikaException Thrown When Handling Unknown Fields for Some JPEGs - posted by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2012/08/03 14:35:02 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #907 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/08/03 15:12:43 UTC, 0 replies.
- [jira] [Created] (TIKA-970) Full identification of the JPEG 2000 family of formats - posted by "Andrew Jackson (JIRA)" <ji...@apache.org> on 2012/08/03 15:21:03 UTC, 0 replies.
- [jira] [Updated] (TIKA-970) Full identification of the JPEG 2000 family of formats - posted by "Andrew Jackson (JIRA)" <ji...@apache.org> on 2012/08/03 15:21:04 UTC, 0 replies.
- [jira] [Commented] (TIKA-970) Full identification of the JPEG 2000 family of formats - posted by "Andrew Jackson (JIRA)" <ji...@apache.org> on 2012/08/03 15:35:04 UTC, 6 replies.
- [VOTE] Graduate Apache Any23 from the Apache Incubator - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/08/03 19:50:01 UTC, 4 replies.
- [jira] [Assigned] (TIKA-956) Embedded docs in Word doc are not inlined (text is always added to the end) - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/04 14:54:02 UTC, 0 replies.
- [jira] [Updated] (TIKA-956) Embedded docs in Word doc are not inlined (text is always added to the end) - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/04 15:01:02 UTC, 1 replies.
- [jira] [Commented] (TIKA-956) Embedded docs in Word doc are not inlined (text is always added to the end) - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/05 15:50:03 UTC, 1 replies.
- [jira] [Resolved] (TIKA-970) Full identification of the JPEG 2000 family of formats - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/05 16:46:02 UTC, 0 replies.
- [jira] [Resolved] (TIKA-966) org.apache.tika.Tika missing from tika-bundle-1.2.jar - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/05 19:15:03 UTC, 0 replies.
- [jira] [Resolved] (TIKA-968) tika-bundle missing org.apache.commons.logging.LogFactory - posted by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/08/05 19:58:04 UTC, 0 replies.
- AutoDetectParser not picking up custom parser - posted by 122jxgcn <yw...@gmail.com> on 2012/08/06 13:48:18 UTC, 3 replies.
- Detecting content type with file extension - posted by 122jxgcn <yw...@gmail.com> on 2012/08/07 11:02:45 UTC, 0 replies.
- [jira] [Resolved] (TIKA-956) Embedded docs in Word doc are not inlined (text is always added to the end) - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/07 23:43:10 UTC, 0 replies.
- [jira] [Created] (TIKA-971) The ToXMLContentHandler handler creates extra entry when reading ODT files - posted by "François Ouellette (JIRA)" <ji...@apache.org> on 2012/08/08 07:57:09 UTC, 0 replies.
- [jira] [Resolved] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/09 19:42:19 UTC, 0 replies.
- [jira] [Created] (TIKA-972) Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser . - posted by "Priya Kujur (JIRA)" <ji...@apache.org> on 2012/08/09 21:42:18 UTC, 0 replies.
- [jira] [Created] (TIKA-973) PDF form data isn't included in extracted content. - posted by "Michael Graessle (JIRA)" <ji...@apache.org> on 2012/08/09 21:54:19 UTC, 0 replies.
- TIKA-431 and CONTENT_ENCODING - posted by Ken Krugler <kk...@transpac.com> on 2012/08/09 22:56:51 UTC, 2 replies.
- TIKA-431 and CONTENT_ENCODING (updated) - posted by Ken Krugler <kk...@transpac.com> on 2012/08/09 23:24:15 UTC, 0 replies.
- [jira] [Commented] (TIKA-889) XHTMLContentHandler wont emit newline when html element matches ENDLINE set - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/09 23:47:19 UTC, 0 replies.
- [jira] [Resolved] (TIKA-869) IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/09 23:55:19 UTC, 0 replies.
- [jira] [Resolved] (TIKA-889) XHTMLContentHandler wont emit newline when html element matches ENDLINE set - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/09 23:59:19 UTC, 0 replies.
- Re: [ANNOUNCE] Welcome Jörg Ehrlich as new Tika PMC member and committer - posted by Ken Krugler <kk...@transpac.com> on 2012/08/10 00:04:36 UTC, 0 replies.
- [jira] [Assigned] (TIKA-728) Return RDFa meta tags via Metadata - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/10 00:05:20 UTC, 0 replies.
- InputStream reset issue - posted by Ken Krugler <kk...@transpac.com> on 2012/08/10 00:11:30 UTC, 0 replies.
- [jira] [Assigned] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/10 00:13:19 UTC, 0 replies.
- [jira] [Commented] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/10 00:13:19 UTC, 0 replies.
- [jira] [Commented] (TIKA-820) Locator is unset for HTML parser - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/10 00:37:20 UTC, 0 replies.
- [jira] [Assigned] (TIKA-820) Locator is unset for HTML parser - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/10 00:39:19 UTC, 0 replies.
- How can I let Tika know the resource name? - posted by 122jxgcn <yw...@gmail.com> on 2012/08/13 13:31:30 UTC, 1 replies.
- [jira] [Commented] (TIKA-792) NoSuchMethodException "CTMarkupImpl.(org.apache.xmlbeans.SchemaType, boolean)" processing a OOXML document - posted by "Eric Pascal (JIRA)" <ji...@apache.org> on 2012/08/13 16:00:54 UTC, 0 replies.
- [jira] [Commented] (TIKA-868) TXT parser does not honour the specified encoding - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 18:54:38 UTC, 0 replies.
- [jira] [Closed] (TIKA-868) TXT parser does not honour the specified encoding - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 18:56:38 UTC, 0 replies.
- [jira] [Commented] (TIKA-771) "Hello, World!" in UTF-8/ASCII gets detected as IBM500 - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 18:58:38 UTC, 0 replies.
- [jira] [Assigned] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true) - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 19:16:38 UTC, 0 replies.
- [jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true) - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 19:16:38 UTC, 3 replies.
- [jira] [Created] (TIKA-974) No longer return charset info in Metadata's CONTENT_ENCODING - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 19:36:38 UTC, 0 replies.
- [jira] [Resolved] (TIKA-771) "Hello, World!" in UTF-8/ASCII gets detected as IBM500 - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/13 19:55:38 UTC, 0 replies.
- Build failed in Jenkins: Tika-trunk #914 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/08/13 20:15:36 UTC, 0 replies.
- [jira] [Updated] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true) - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/15 14:53:38 UTC, 1 replies.
- [jira] [Created] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/15 15:30:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/15 15:32:38 UTC, 1 replies.
- [RESULT] [VOTE] Graduate Apache Any23 from the Apache Incubator - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/08/16 15:36:03 UTC, 0 replies.
- [jira] [Commented] (TIKA-595) HtmlHandler does not support multivalue metadata - posted by "Michael Kilgore (JIRA)" <ji...@apache.org> on 2012/08/17 22:44:38 UTC, 0 replies.
- Welcome to our new Tika PMC chair! - posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/08/19 19:14:50 UTC, 1 replies.
- [jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote) - posted by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/08/21 23:05:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-920) iWork Numbers sheetnames not being parsed into metadata - posted by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/08/21 23:07:38 UTC, 2 replies.
- [jira] [Updated] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote) - posted by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/08/21 23:07:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-921) iWork Numbers - Cell formats which parser is completely ignoring - posted by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/08/21 23:11:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-919) iWork Page's cell values not being parsed if calculated via formula - posted by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/08/21 23:17:38 UTC, 0 replies.
- [jira] [Updated] (TIKA-976) Inaccurate XLS detection trough POIFSContainerDetector - posted by "Marco Quaranta (JIRA)" <ji...@apache.org> on 2012/08/22 13:25:37 UTC, 0 replies.
- [jira] [Created] (TIKA-976) Inaccurate XLS detection trough POIFSContainerDetector - posted by "Marco Quaranta (JIRA)" <ji...@apache.org> on 2012/08/22 13:25:37 UTC, 0 replies.
- [jira] [Commented] (TIKA-954) Tika throws OOM and GC limited exceeded on Microsoft docx file - posted by "Jörg Ehrlich (JIRA)" <ji...@apache.org> on 2012/08/22 16:45:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-954) Tika throws OOM and GC limited exceeded on Microsoft docx file - posted by "Jörg Ehrlich (JIRA)" <ji...@apache.org> on 2012/08/22 17:21:43 UTC, 0 replies.
- [jira] [Resolved] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/22 20:54:42 UTC, 0 replies.
- Jenkins build is back to normal : Tika-trunk #915 - posted by Apache Jenkins Server <je...@builds.apache.org> on 2012/08/22 21:13:45 UTC, 0 replies.
- [jira] [Created] (TIKA-977) Compilation error building tika-1.2 --- identifier expected in CHM2XHTML.java - posted by "David Jameson (JIRA)" <ji...@apache.org> on 2012/08/23 14:41:43 UTC, 0 replies.
- [jira] [Commented] (TIKA-977) Compilation error building tika-1.2 --- identifier expected in CHM2XHTML.java - posted by "David Jameson (JIRA)" <ji...@apache.org> on 2012/08/23 14:43:41 UTC, 5 replies.
- [jira] [Created] (TIKA-978) OSGi bundle build fails if space exists in build path - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/23 18:40:42 UTC, 0 replies.
- [jira] [Closed] (TIKA-977) Compilation error building tika-1.2 --- identifier expected in CHM2XHTML.java - posted by "David Jameson (JIRA)" <ji...@apache.org> on 2012/08/23 22:48:42 UTC, 0 replies.
- [jira] [Created] (TIKA-979) Metadata not clean after tikaParser.parser. - posted by "Xujunjie (JIRA)" <ji...@apache.org> on 2012/08/24 08:02:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-979) Metadata not clean after tikaParser.parser. - posted by "Xujunjie (JIRA)" <ji...@apache.org> on 2012/08/24 08:04:42 UTC, 0 replies.
- [jira] [Updated] (TIKA-980) MicrodataContentHandler for Apache Tika - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/27 14:58:07 UTC, 1 replies.
- [jira] [Created] (TIKA-981) Text isn't extracted from PDF pop-up annotations - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/27 15:04:07 UTC, 0 replies.
- [jira] [Updated] (TIKA-981) Text isn't extracted from PDF pop-up annotations - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/27 15:06:07 UTC, 0 replies.
- [jira] [Created] (TIKA-982) RTF document embedded into Word (.doc) document is extracted as .unknown - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/27 15:20:07 UTC, 0 replies.
- [jira] [Updated] (TIKA-982) RTF document embedded into Word (.doc) document is extracted as .unknown - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/27 15:22:07 UTC, 0 replies.
- [jira] [Commented] (TIKA-980) MicrodataContentHandler for Apache Tika - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/27 15:54:07 UTC, 2 replies.
- [jira] [Created] (TIKA-983) HTML parser should add Open Graph meta tag data to Metadata returned by parser - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/28 00:07:07 UTC, 0 replies.
- [jira] [Updated] (TIKA-983) HTML parser should add Open Graph meta tag data to Metadata returned by parser - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/28 00:13:07 UTC, 0 replies.
- [jira] [Resolved] (TIKA-983) HTML parser should add Open Graph meta tag data to Metadata returned by parser - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/28 00:27:07 UTC, 0 replies.
- [jira] [Created] (TIKA-984) JpegParserTest fails for some locales - posted by "Oliver Heger (JIRA)" <ji...@apache.org> on 2012/08/28 13:50:09 UTC, 0 replies.
- [jira] [Commented] (TIKA-964) Ability to specify bind address - posted by "Clemens Fuchslocher (JIRA)" <ji...@apache.org> on 2012/08/28 20:34:08 UTC, 2 replies.
- [jira] [Commented] (TIKA-725) Empty title element makes Tika-generated HTML documents not open in Chromium - posted by "Kostya Gribov (JIRA)" <ji...@apache.org> on 2012/08/29 07:50:07 UTC, 0 replies.
- [jira] [Commented] (TIKA-895) Empty title element makes Tika-generated HTML documents not open - posted by "Kostya Gribov (JIRA)" <ji...@apache.org> on 2012/08/29 07:52:08 UTC, 0 replies.
- AutoDetectParser is not parsing UTF-16 content types - posted by chraj007 <ch...@gmail.com> on 2012/08/29 17:55:59 UTC, 4 replies.
- [jira] [Updated] (TIKA-984) JpegParserTest fails for some locales - posted by "Oliver Heger (JIRA)" <ji...@apache.org> on 2012/08/29 21:16:07 UTC, 0 replies.
- [jira] [Created] (TIKA-985) Support for HTML5 elements - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/30 14:13:07 UTC, 0 replies.
- [jira] [Updated] (TIKA-985) Support for HTML5 elements - posted by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/08/30 14:15:08 UTC, 1 replies.
- [jira] [Created] (TIKA-986) NullPointerException trying to parse detached .pk7s signature - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/30 18:22:07 UTC, 0 replies.
- [jira] [Updated] (TIKA-986) NullPointerException trying to parse detached .pk7s signature - posted by "Michael McCandless (JIRA)" <ji...@apache.org> on 2012/08/30 18:30:08 UTC, 2 replies.
- Question about XPath Matcher code & MatchingContentHandler - posted by Ken Krugler <kk...@transpac.com> on 2012/08/30 19:35:11 UTC, 0 replies.
- Standard practice with @author in comments - posted by Ken Krugler <kk...@transpac.com> on 2012/08/30 23:03:22 UTC, 1 replies.
- [jira] [Assigned] (TIKA-980) MicrodataContentHandler for Apache Tika - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/30 23:06:07 UTC, 0 replies.
- [jira] [Commented] (TIKA-539) Encoding detection is too biased by encoding in meta tag - posted by "Ken Krugler (JIRA)" <ji...@apache.org> on 2012/08/31 00:17:08 UTC, 0 replies.