You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Yahav Amsalem (JIRA)" <ji...@apache.org> on 2016/10/19 11:57:58 UTC
[jira] [Updated] (TIKA-2123) CommonsDigester calculates wrong
hashes on large files
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yahav Amsalem updated TIKA-2123:
--------------------------------
Description:
When passing more than one algorithm to CommonsDigester constructor and
then trying to digest a file which is larger than 7.5 MB, results wrong
hashe calculation for all the algorithms except the first.
The next code will reproduce the bug:
// The file that was used w as a simple plain text file with size > 7.5 MB
File file = new File("c:\\\\testLargeFile.txt");
BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));
Metadata metadata = new Metadata();
CommonsDigester digester = new CommonsDigester(20000000,
CommonsDigester.DigestAlgorithm.MD5,
CommonsDigester.DigestAlgorithm.SHA1,
CommonsDigester.DigestAlgorithm.SHA256);
digester.digest(bufferedInputStream, metadata, null);
// Will print correct MD5 but wrong SHA1 and wrong SHA256
System.out.println(metadata);
Initial direction: it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm.
was:
When passing more than one algorithm to CommonsDigester constructor and
then trying to digest a file which is larger than 7.5 MB, results wrong
hashe calculation for all the algorithms except the first.
The next code will reproduce the bug:
// The file that was used w as a simple plain text file with size > 7.5 MB
File file = new File("c:\\testLargeFile.txt");
BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));
Metadata metadata = new Metadata();
CommonsDigester digester = new CommonsDigester(20000000,
CommonsDigester.DigestAlgorithm.MD5,
CommonsDigester.DigestAlgorithm.SHA1,
CommonsDigester.DigestAlgorithm.SHA256);
digester.digest(bufferedInputStream, metadata, null);
// Will print correct MD5 but wrong SHA1 and wrong SHA256
System.out.println(metadata);
Initial direction: it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm.
> CommonsDigester calculates wrong hashes on large files
> ------------------------------------------------------
>
> Key: TIKA-2123
> URL: https://issues.apache.org/jira/browse/TIKA-2123
> Project: Tika
> Issue Type: Bug
> Components: metadata
> Affects Versions: 1.13
> Reporter: Yahav Amsalem
>
> When passing more than one algorithm to CommonsDigester constructor and
> then trying to digest a file which is larger than 7.5 MB, results wrong
> hashe calculation for all the algorithms except the first.
> The next code will reproduce the bug:
> // The file that was used w as a simple plain text file with size > 7.5 MB
> File file = new File("c:\\\\testLargeFile.txt");
> BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));
> Metadata metadata = new Metadata();
> CommonsDigester digester = new CommonsDigester(20000000,
> CommonsDigester.DigestAlgorithm.MD5,
> CommonsDigester.DigestAlgorithm.SHA1,
> CommonsDigester.DigestAlgorithm.SHA256);
> digester.digest(bufferedInputStream, metadata, null);
> // Will print correct MD5 but wrong SHA1 and wrong SHA256
> System.out.println(metadata);
> Initial direction: it seems that the inner buffered stream that is being used doesn't reset to 0 position after the first algorithm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)