You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/04/26 22:10:17 UTC

[jira] [Commented] (TIKA-1112) Parsing for OGV file with invalid checksum

    [ https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643204#comment-13643204 ] 

Nick Burch commented on TIKA-1112:
----------------------------------

Do you know where the problem files come from? And are you able to use any of the Ogg file level tools to check to see if the checksum is present+valid on the streams?
                
> Parsing for OGV file with invalid checksum
> ------------------------------------------
>
>                 Key: TIKA-1112
>                 URL: https://issues.apache.org/jira/browse/TIKA-1112
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.3
>         Environment: OS X 10.8.3
> JDK 1.6.0_45 64-bit
>            Reporter: Alexander Chow
>
> When parsing any OGV file (e.g., [Typing_example.ogv|http://commons.wikimedia.org/wiki/File:Typing_example.ogv]), log will output something like the following:
> {code}
> Warning - invalid checksum on page 2 of stream 155f (5471)
> Warning - invalid checksum on page 3 of stream 155f (5471)
> Warning - invalid checksum on page 4 of stream 155f (5471)
> Warning - invalid checksum on page 5 of stream 155f (5471)
> Warning - invalid checksum on page 6 of stream 155f (5471)
> Warning - invalid checksum on page 7 of stream 155f (5471)
> ...
> Warning - invalid checksum on page 3071 of stream 155f (5471)
> Warning - invalid checksum on page 3072 of stream 155f (5471)
> Warning - invalid checksum on page 3073 of stream 155f (5471)
> Warning - invalid checksum on page 3074 of stream 155f (5471)
> Exception in thread "main" java.io.IOException: Asked to read 4228 bytes from 0 but hit EoF at 2884
> 	at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> 	at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> 	at org.gagravarr.ogg.OggPage.<init>(OggPage.java:82)
> 	at org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> 	at org.gagravarr.tika.OggDetector.detect(OggDetector.java:79)
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
> 	at com.test.OGVTest.main(OGVTest.java:31)
> {code}
> My test code was the following:
> {code:java}
> 	void parse(String fileName) throws Exception {
> 		InputStream inputStream = new FileInputStream(fileName);
> 		
> 		Metadata metadata = new Metadata();
> 		
> 		Parser parser = new AutoDetectParser();
> 		
> 		ParseContext parserContext = new ParseContext();
> 		parserContext.set(Parser.class, parser);
> 		ContentHandler contentHandler = new WriteOutContentHandler(
> 			new DummyWriter());
> 		parser.parse(inputStream, contentHandler, metadata, parserContext);
> 		
> 		System.out.println(metadata);
> 	}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira