You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Yaniv Kunda (JIRA)" <ji...@apache.org> on 2015/10/01 16:57:26 UTC

[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core

     [ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yaniv Kunda updated TIKA-1706:
------------------------------
    Attachment: TIKA-1706-2.patch
                TIKA-1706-1.patch

A proposed patch per [~grossws]'s suggestion from the dev mailing list -
The first patch contains the following:
- creation of the secondary jar using maven-shade-plugin:
-- used the *uber* classifier using <shadedClassifierName>
alternatives: shaded, nodep, all, etc.
Which one is best?
-- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.org.apache.commons.io}} to avoid potential conflicts with other commons-io-shading dependencies e.g. as in org.ops4j.pax.url:pax-url-aether:2.3.0
-- automatic removal of unused classes using <minimizeJar>
- deprecated all classes that were copied from commons-io and modified them to extend their new counterparts 
- deprecated all constructors
- removed all identical or functionally identical methods
- modified all remaining methods to call alternative existing jdk/commons-io methods, deprecated them and refered to the used alternatives
_*Note: this was done only in IOUtils, where many methods that has the same signature as the ones in commons-io were modified along the way to use UTF-8 instead of the platform default._
- all things should remain backward-compatible, except one: org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a ClassCastException if the Object is not Serializable

The second patch contains trivial import changes in tika-core from org.apache.tika.io to org.apache.commons.io

> Bring back commons-io to tika-core
> ----------------------------------
>
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>         Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which code should be used
> - Having the inlined classes causes more maintenance and/or technology debt (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset objects instead of encoding names, being able to use StringBuilder instead of StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RE: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core

Posted by Yaniv Kunda <ya...@answers.com>.
It’s been almost two months since I provided my patches for this –

Can a committer please review and submit?





*From:* Yaniv Kunda [mailto:yaniv.kunda@answers.com]
*Sent:* Monday, October 12, 2015 23:08
*To:* dev@tika.apache.org
*Subject:* Re: [jira] [Updated] (TIKA-1706) Bring back commons-io to
tika-core



Is this solution applicable?
I have some improvements waiting for this.

On Oct 1, 2015 5:57 PM, "Yaniv Kunda (JIRA)" <ji...@apache.org> wrote:


     [
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yaniv Kunda updated TIKA-1706:
------------------------------
    Attachment: TIKA-1706-2.patch
                TIKA-1706-1.patch

A proposed patch per [~grossws]'s suggestion from the dev mailing list -
The first patch contains the following:
- creation of the secondary jar using maven-shade-plugin:
-- used the *uber* classifier using <shadedClassifierName>
alternatives: shaded, nodep, all, etc.
Which one is best?
-- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.
org.apache.commons.io}} to avoid potential conflicts with other
commons-io-shading dependencies e.g. as in
org.ops4j.pax.url:pax-url-aether:2.3.0
-- automatic removal of unused classes using <minimizeJar>
- deprecated all classes that were copied from commons-io and modified them
to extend their new counterparts
- deprecated all constructors
- removed all identical or functionally identical methods
- modified all remaining methods to call alternative existing
jdk/commons-io methods, deprecated them and refered to the used alternatives
_*Note: this was done only in IOUtils, where many methods that has the same
signature as the ones in commons-io were modified along the way to use
UTF-8 instead of the platform default._
- all things should remain backward-compatible, except one:
org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a
ClassCastException if the Object is not Serializable

The second patch contains trivial import changes in tika-core from
org.apache.tika.io to org.apache.commons.io

> Bring back commons-io to tika-core
> ----------------------------------
>
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>         Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the
dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following
concerns:
> - Most of the non-core modules already use commons-io, and since
tika-core is usually not used by itself, commons-io is already included
with it
> - Since some modules use both tika-core and commons-io, it's not clear
which code should be used
> - Having the inlined classes causes more maintenance and/or technology
debt (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset
objects instead of encoding names, being able to use StringBuilder instead
of StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes
with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

Re: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core

Posted by Yaniv Kunda <ya...@answers.com>.
Is this solution applicable?
I have some improvements waiting for this.
On Oct 1, 2015 5:57 PM, "Yaniv Kunda (JIRA)" <ji...@apache.org> wrote:

>
>      [
> https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Yaniv Kunda updated TIKA-1706:
> ------------------------------
>     Attachment: TIKA-1706-2.patch
>                 TIKA-1706-1.patch
>
> A proposed patch per [~grossws]'s suggestion from the dev mailing list -
> The first patch contains the following:
> - creation of the secondary jar using maven-shade-plugin:
> -- used the *uber* classifier using <shadedClassifierName>
> alternatives: shaded, nodep, all, etc.
> Which one is best?
> -- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.
> org.apache.commons.io}} to avoid potential conflicts with other
> commons-io-shading dependencies e.g. as in
> org.ops4j.pax.url:pax-url-aether:2.3.0
> -- automatic removal of unused classes using <minimizeJar>
> - deprecated all classes that were copied from commons-io and modified
> them to extend their new counterparts
> - deprecated all constructors
> - removed all identical or functionally identical methods
> - modified all remaining methods to call alternative existing
> jdk/commons-io methods, deprecated them and refered to the used alternatives
> _*Note: this was done only in IOUtils, where many methods that has the
> same signature as the ones in commons-io were modified along the way to use
> UTF-8 instead of the platform default._
> - all things should remain backward-compatible, except one:
> org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a
> ClassCastException if the Object is not Serializable
>
> The second patch contains trivial import changes in tika-core from
> org.apache.tika.io to org.apache.commons.io
>
> > Bring back commons-io to tika-core
> > ----------------------------------
> >
> >                 Key: TIKA-1706
> >                 URL: https://issues.apache.org/jira/browse/TIKA-1706
> >             Project: Tika
> >          Issue Type: Improvement
> >          Components: core
> >            Reporter: Yaniv Kunda
> >            Priority: Minor
> >             Fix For: 1.11
> >
> >         Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
> >
> >
> > TIKA-249 inlined select commons-io classes in order to simplify the
> dependency tree and save some space.
> > I believe these arguments are weaker nowadays due to the following
> concerns:
> > - Most of the non-core modules already use commons-io, and since
> tika-core is usually not used by itself, commons-io is already included
> with it
> > - Since some modules use both tika-core and commons-io, it's not clear
> which code should be used
> > - Having the inlined classes causes more maintenance and/or technology
> debt (which in turn causes more maintenance)
> > - Newer commons-io code utilizes newer platform code, e.g. using Charset
> objects instead of encoding names, being able to use StringBuilder instead
> of StringBuffer, and so on.
> > I'll be happy to provide a patch to replace usages of the inlined
> classes with commons-io classes if this is accepted.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.