You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dietrich Travkin (JIRA)" <ji...@apache.org> on 2018/10/24 12:07:00 UTC

[jira] [Comment Edited] (TIKA-2756) Switch to commons-lang 3

    [ https://issues.apache.org/jira/browse/TIKA-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661950#comment-16661950 ] 

Dietrich Travkin edited comment on TIKA-2756 at 10/24/18 12:06 PM:
-------------------------------------------------------------------

Actually, it seems that the dependency to org.apache.commons.lang (or commons-lang:commons-lang) leads to problems when used with Java 11.

I'm migrating a large software product from Oracle JDK 8 to OpenJDK 11 and found out that util classes in org.apache.commons.lang fail parsing the Java version, because the version's format changed from e.g. "1.8" to "8" (see [https://www.oracle.com/technetwork/java/javase/9-relnote-issues-3704069.html#JDK-8085822).] Using OpenJDK 11, org.apache.tika:tika-parsers:1.19, com.healthmarketscience.jackcess:jackcess:2.1.8, and commons-lang:commons-lang:2.5, I get the following exceptions and stacktrace (here you see only the relevant excerpt of it):
{noformat}
java.lang.ExceptionInInitializerError
	at org.apache.commons.lang.builder.ToStringStyle$MultiLineToStringStyle.<init>(ToStringStyle.java:2276)
	at org.apache.commons.lang.builder.ToStringStyle.<clinit>(ToStringStyle.java:94)
	at org.apache.commons.lang.builder.ToStringBuilder.<clinit>(ToStringBuilder.java:98)
	at org.apache.commons.lang.ArrayUtils.toString(ArrayUtils.java:180)
	at org.apache.commons.lang.ArrayUtils.toString(ArrayUtils.java:161)
[...]
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
	at java.base/java.lang.String.substring(String.java:1874)
	at org.apache.commons.lang.SystemUtils.getJavaVersionAsFloat(SystemUtils.java:1153)
	at org.apache.commons.lang.SystemUtils.<clinit>(SystemUtils.java:818)
	... 135 more
{noformat}
I think, this issue is related to TIKA-2674 and should increase the priority of both tickets.


was (Author: travkin):
Actually, it seems that the dependency to org.apache.commons.lang (or commons-lang:commons-lang) leads to problems when used with Java 11.

I'm migrating a large software product from Oracle JDK 8 to OpenJDK 11 and found out that util classes in org.apache.commons.lang fail parsing the Java version, because the version's format changed from e.g. "1.8" to "8" (see [https://www.oracle.com/technetwork/java/javase/9-relnote-issues-3704069.html#JDK-8085822).] Using OpenJDK 11, org.apache.tika:tika-parsers:1.19, com.healthmarketscience.jackcess:jackcess:2.1.8, and commons-lang:commons-lang:2.6, I get the following exceptions and stacktrace (here you see only the relevant excerpt of it):
{noformat}
java.lang.ExceptionInInitializerError
	at org.apache.commons.lang.builder.ToStringStyle$MultiLineToStringStyle.<init>(ToStringStyle.java:2276)
	at org.apache.commons.lang.builder.ToStringStyle.<clinit>(ToStringStyle.java:94)
	at org.apache.commons.lang.builder.ToStringBuilder.<clinit>(ToStringBuilder.java:98)
	at org.apache.commons.lang.ArrayUtils.toString(ArrayUtils.java:180)
	at org.apache.commons.lang.ArrayUtils.toString(ArrayUtils.java:161)
[...]
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
	at java.base/java.lang.String.substring(String.java:1874)
	at org.apache.commons.lang.SystemUtils.getJavaVersionAsFloat(SystemUtils.java:1153)
	at org.apache.commons.lang.SystemUtils.<clinit>(SystemUtils.java:818)
	... 135 more
{noformat}
I think, this issue is related to TIKA-2674 and should increase the priority of both tickets.

> Switch to commons-lang 3
> ------------------------
>
>                 Key: TIKA-2756
>                 URL: https://issues.apache.org/jira/browse/TIKA-2756
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Robert Munteanu
>            Priority: Major
>
> Tika 1.9.1 is using the legacy commons-lang 2.x series. This series is not going to receive updates anymore and is completely superseded by commons-lang 3.x .
> Projects that use Tika are blocked from dropping commons-lang 2.x due to this dependency.
> The link that I found was from tika-parsers to jackcess and then to commons-lang 2.6
> {noformat}
> [INFO] +- com.healthmarketscience.jackcess:jackcess:jar:2.1.12:compile
> [INFO] |  \- commons-lang:commons-lang:jar:2.6:compile
> {noformat}
> If I understand correctly, this is the only commons-lang 2.x dependency from the Tika runtime and it would be great to remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)