You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/06/02 00:53:07 UTC

[jira] Resolved: (TIKA-236) Premature end of file Exception

     [ https://issues.apache.org/jira/browse/TIKA-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-236.
--------------------------------

    Resolution: Duplicate
      Assignee: Jukka Zitting

This problem is caused by the package containing a malformed XML file that he XMLParser fails to process. Such a failure should cause a TikaException which the package parser would normally just ignore before proceeding to the next package entry, but due to TIKA-237 the XMLParser is incorrectly throwing a SAXException in that case.

Now with TIKA-237 fixed this is no longer the case, and the problem described here no longer occurs. Thus I'm resolving this as a Duplicate of TIKA-237.

> Premature end of file Exception
> -------------------------------
>
>                 Key: TIKA-236
>                 URL: https://issues.apache.org/jira/browse/TIKA-236
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.3
>         Environment: Windows / Unix
>            Reporter: Karl Heinz Marbaise
>            Assignee: Jukka Zitting
>            Priority: Critical
>
> I have reduced the problem down to the following:
> 	@Test
> 	public void testZipFile() throws IOException, SAXException, TikaException {
> 		String fileName = "lucene-2.2.0-src.zip";
> 		FileInputStream fis = new FileInputStream(fileName);
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, fileName);
> 		AutoDetectParser parser = new AutoDetectParser();
> 		DefaultHandler handler = new BodyContentHandler();
> 		parser.parse(fis, handler, metadata);
> 		System.out.println("Handler:" + handler.toString());
> 	}
> and the result of the above is the following:
> FAILED: testZipFile
> org.xml.sax.SAXParseException: Premature end of file.
> 	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
> 	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
> 	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> 	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> 	at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> 	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> 	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
> 	at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
> 	at javax.xml.parsers.SAXParser.parse(SAXParser.java:176)
> 	at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:59)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:108)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:78)
> 	at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:93)
> 	at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:56)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:108)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:78)
> 	at com.soebes.supose.scan.ScanZIPDocumentTest.testZipFile(ScanZIPDocumentTest.java:30)
> ... Removed 22 stack frames
> I have tested the ZIP file with 7-zip, with unzip on command line if it has any errors in there...but there seemed to be none. If you need this file i can attach that file, but it's about 7 mb size...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.