You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/01/08 13:45:00 UTC

[jira] [Commented] (JENA-1812) Migrate blank node hash algorithm from MD5 to SHA-256

    [ https://issues.apache.org/jira/browse/JENA-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010684#comment-17010684 ] 

Andy Seaborne commented on JENA-1812:
-------------------------------------

Changing to another hash is a good idea.

The length of the hash is visible in N-Triples output (32 hex chars). 

I think keeping to a 128 bit length is better unless there is a need to change to a longer one.

Jena does not need cryptographic secure hashes for the blank node id allocation. It is a way to generate unique ids for the {{_:a}} and {{[]}} forms at scale (i.e. avoiding needing to keep a temporary map of label to parser-unique allocated label). Each parser run seeds the hash with a 122 bit random number.

A possible hash is murmur3_128, which is available in the Google Guava and in shaded form, Jena already has it as a dependency.

murmur3_128 is fast, not secure.

There may be other suitable hashes.

(There is another place MD5 is used in TDB1 and TDB2 but there it gets onto disk. It does not need to be secure in that usage either.)

> Migrate blank node hash algorithm from MD5 to SHA-256
> -----------------------------------------------------
>
>                 Key: JENA-1812
>                 URL: https://issues.apache.org/jira/browse/JENA-1812
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Nicolas Seydoux
>            Assignee: Andy Seaborne
>            Priority: Trivial
>              Labels: easyfix
>             Fix For: Jena 3.14.0
>
>   Original Estimate: 5m
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> MD5 is a deprecated hashing algorithm, and even though it is not used on sensitive data in the context of Jena, its usage is picked up by security softwares as a security flaw. This may reduce the incentive to use Jena in commercial products, and computing SHA-256 hashes is not prohibitively more expensive than MD5.
>  
> Therefore, I suggest to migrate from using MD5 hashes to SHA-256.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)