You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/09/19 07:10:12 UTC

[DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular
allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
>Tim, was your check for File#getName done manually or it's present in tests somehow? If it's >present in tests we can check it on major platforms (I can test on linux, win xp and maybe on >macosx) with different jdks.

It was a unit test that initially uncovered the problem -- all worked well on my windows dev environment but was failed when I made the patch in linux.  I didn't add unit tests to show that File#getName failed.  I can, though, if that would be of any use.  I agree that we should keep this as a tiny bit of code and rely on the rest of commons io in core for 2.0.

> Also, I'll take a look at unresolved jdk7 issues/patches today.
+1, if you want to take over reviewing Yaniv's excellent work, please go for it.  I'm mostly tied up in end of fiscal year administrivia.  Do not wait for me on TIKA-1744 etc.  Thank you!



RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1 to branching.  Given some surprises we've had, I'd want to have a 1.12+-SNAPSHOT branch easily available, because I suspect that 2.0 is still at least 6 months* off given the current pace of progress and what I've seen on other projects making major release changes.   Wish I had more hours in the day...



-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com] 
Sent: Wednesday, September 23, 2015 1:45 PM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Release Tika 1.11?

+1 for the branching strategy.

With respect to slicing up the parsers it would be great to have more discussion on how the parsers should be organized.  I think Tim has a draft out on this mailing list that would benefit from some additional perspectives.  Really cool to be talking about doing this!

- Bob

On Wed, Sep 23, 2015 at 12:36 PM, Konstantin Gribov <gr...@gmail.com>
wrote:

> It seems to be a good idea to avoid inclusion of commons-io into 
> tika-core till 2.0 if we will release it in several months.
> In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT 
> and branch for 1.11+ bugfixes.
>
> Some changes related to java7 can be included to 1.11/1.12 with no 
> problems.
>
> ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov>:
>
> > I’m not so keen on fundamentally changing the organization of Tika 
> > until 2.x. This seems like a major change to me in the way people 
> > expect to consume Tika.
> >
> > Can we:
> >
> > 1. Release a 1.11 that doesn’t include these types of changes 2. 
> > After 1.11, change trunk to be 2.0-SNAPSHOT and work those types of 
> > issues there?
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398) NASA Jet 
> > Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department University 
> > of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Yaniv Kunda <ya...@answers.com>
> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Date: Wednesday, September 23, 2015 at 9:30 AM
> > To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Subject: Re: [DISCUSS] Release Tika 1.11?
> >
> > >+1 for the uber jar!
> > >
> > >Regarding jdk7 issues, I have a few more I will create and patch 
> > >later tonight - I'll post a list of issues as well.
> > >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:
> > >
> > >> Tim, was your check for File#getName done manually or it's 
> > >>present in tests  somehow? If it's present in tests we can check 
> > >>it on major platforms
> (I
> > >>can
> > >> test on linux, win xp and maybe on macosx) with different jdks.
> > >>
> > >> In case commons-io doesn't support ':' as file separator we can 
> > >>have simple  utility class in Tika or send them a patch for it.
> > >>
> > >> I think, we can rethink Tika packaging in 1.11/1.12 and produce 
> > >>these
> > >> artifacts:
> > >> - tika-core w/ dependency on commons-io (and deprecate most of 
> > >>o.a.tika.io  ,  forwarding calls to jdk or commons-io),
> > >> - tika-core-uber w/ shaded commons-io (rename and drop all things  
> > >>unnecessary for o.a.tika.io),
> > >> - sliced tika-parsers-* as Bob suggested earlier,
> > >> - tika-parsers jar w/ all tika-parsers-* parts (for 
> > >>compatibility),
> > >> - other tika-* artifacts (like tika-server, tika-app etc).
> > >>
> > >> One who needs tika-core without dependencies would use 
> > >>tika-core-uber  instead of it, all others, who prefer using 
> > >>maven/ivy/gradle/sbt/lein will  depend on tika-core.
> > >> And we can drop o.a.tika.io in 2.0.
> > >>
> > >> Also, I'll take a look at unresolved jdk7 issues/patches today.
> > >>
> > >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. 
> > >> <tallison@mitre.org
> >:
> > >>
> > >> > Thank _you_ for all of your work in modernizing us.  With your
> > >>efforts,
> > >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon.
> :)
> > >> >
> > >> > >>Regarding FilenameUtils.getName() - I believe that its
> functionality
> > >> can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > I agree that we should use that anytime we're interacting with 
> > >> > the
> > >>file
> > >> > system.
> > >> >
> > >> > However, that's actually the problem for paths that are stored
> within
> > >>the
> > >> > document (say, an embedded resource).  Let's say a user creates 
> > >> > a
> > >>file on
> > >> > Windows, the file path information for the embedded file 
> > >> > (depending
> on
> > >> the
> > >> > parser and the file format) may be in Windows-ese, which is a
> > >>problem if
> > >> > you try to use Path.getFileName() (I think... I haven't 
> > >> > actually
> > >>tested
> > >> > this) on a Linux machine.  I have actually tested this with the 
> > >> > old
> > >>File
> > >> > getName(), and it did not work cross-platform IIRC.
> > >> >
> > >> > In short, Tika needs to have the ability to extract the file 
> > >> > name
> > >>from a
> > >> > path that was created on any platform (including old Mac and its ":"
> > >> > separator) while Tika is running on any platform.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Monday, September 21, 2015 11:31 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Thanks for the positive spirit!
> > >> >
> > >> > Regarding FilenameUtils.getName() - I believe that its 
> > >> > functionality
> > >>can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > >> > Sent: Monday, September 21, 2015 14:27
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> > >> >
> > >> > I'll take TIKA-1734 by tomorrow EDT.
> > >> >
> > >> > As for the other 2, I'm personally ok waiting for 1.12, but I 
> > >> > defer
> to
> > >> the
> > >> > dev community.
> > >> >
> > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to 
> > >> > chime in
> on
> > >> > TIKA-1726, that might help move things forward.
> > >> >
> > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also 
> > >> > share
> > >> Yaniv's
> > >> > point about duplication of code, bloat within Tika and missing 
> > >> > out
> on
> > >> > updates.   Aside from one small bit of code I'd like to keep or
> > >>perhaps
> > >> try
> > >> > to move into commons-io (?)[0], I think I'm now +1 to going 
> > >> > forward
> > >>with
> > >> > TIKA-1706 in core...unless there is a -1 from the community.
> > >> >
> > >> > Best,
> > >> >
> > >> >              Tim
> > >> >
> > >> >
> > >> > [1] I added some customizations for old MAC OS behavior (treat ":"
> as
> > >> file
> > >> > separator) in FileNameUtils.getName() that I don't want to lose.
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Sunday, September 20, 2015 7:15 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > I would really like to push the following:
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
> > >>commons-io
> > >> > to tika-core This requires a decision to re-include commons-io 
> > >> > as a dependency of tika-core.
> > >> > All the pros and cons have been already debated, but no 
> > >> > decision has
> > >>been
> > >> > made.
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment 
> > >> > public
> > >>methods
> > >> > that use a java.io.File with methods that use a 
> > >> > java.nio.file.Path
> > >>Since
> > >> > this adds new methods to the public API, I requested the group 
> > >> > to
> > >>make a
> > >> > decision about the new names - but have not received something
> > >>definite.
> > >> > However, I did create a subtask -
> > >> > https://issues.apache.org/jira/browse/TIKA-1734 Use
> > java.nio.file.Path
> > >> in
> > >> > TemporaryResources - using [~tallison]'s suggestion, which has 
> > >> > not
> > >>been
> > >> > committed yet.
> > >> >
> > >> > If decisions are made on the above issues, I can quickly create
> > >>patches
> > >> > for them.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Mattmann, Chris A (3980) [mailto:
> chris.a.mattmann@jpl.nasa.gov]
> > >> > Sent: Saturday, September 19, 2015 08:10
> > >> > To: dev@tika.apache.org
> > >> > Subject: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Hey Guys and Gals,
> > >> >
> > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
> > >>particular
> > >> > allows some neat functionality in tika-python:
> > >> > https://github.com/chrismattmann/tika-python/pull/67
> > >> >
> > >> >
> > >> > Anything else to try and get into the release?
> > >> >
> > >> > If not, I’ll produce an RC #1 by end of weekend.
> > >> >
> > >> > Cheers,
> > >> > Chris
> > >> >
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Chris Mattmann, Ph.D.
> > >> > Chief Architect
> > >> > Instrument Software and Science Data Systems Section (398) NASA 
> > >> > Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> > Office: 168-519, Mailstop: 168-527
> > >> > Email: chris.a.mattmann@nasa.gov
> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Adjunct Associate Professor, Computer Science Department 
> > >> > University
> of
> > >> > Southern California, Los Angeles, CA 90089 USA
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> --
> > >> Best regards,
> > >> Konstantin Gribov
> > >>
> > >
> > >--
> > >
> > >
> > >This email communication (including any attachments) contains
> information
> > >from Answers Corporation or its affiliates that is confidential and 
> > >may be privileged. The information contained herein is intended 
> > >only for the
> use
> > >of the addressee(s) named above. If you are not the intended 
> > >recipient (or the agent responsible to deliver it to the intended 
> > >recipient), you are hereby notified that any dissemination, 
> > >distribution, use, or copying of this communication is strictly 
> > >prohibited. If you have received this email in error, please 
> > >immediately reply to sender, delete the message and destroy all 
> > >copies of it. If you have questions, please email 
> > >legal@answers.com.
> > >
> > >If you wish to unsubscribe to commercial emails from Answers and 
> > >its affiliates, please go to the Answers Subscription Center 
> > >http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> > --
> Best regards,
> Konstantin Gribov
>

RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
>>I think Tim has a draft out on this mailing list that would benefit from some additional perspectives.  Really cool to be talking about doing this!

Agreed...in 2.0 branch with living 1.12-SNAPSHOT as backup :)

There's still quite a bit to decide...my proposals are still very much strawmen with many details to be worked out.

I think this is the latest: http://mail-archives.apache.org/mod_mbox/incubator-tika-dev/201508.mbox/%3CDM2PR09MB071305DFD203E21BFBE7A63AC7620@DM2PR09MB0713.namprd09.prod.outlook.com%3E 


- Bob

On Wed, Sep 23, 2015 at 12:36 PM, Konstantin Gribov <gr...@gmail.com>
wrote:

> It seems to be a good idea to avoid inclusion of commons-io into 
> tika-core till 2.0 if we will release it in several months.
> In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT 
> and branch for 1.11+ bugfixes.
>
> Some changes related to java7 can be included to 1.11/1.12 with no 
> problems.
>
> ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov>:
>
> > I’m not so keen on fundamentally changing the organization of Tika 
> > until 2.x. This seems like a major change to me in the way people 
> > expect to consume Tika.
> >
> > Can we:
> >
> > 1. Release a 1.11 that doesn’t include these types of changes 2. 
> > After 1.11, change trunk to be 2.0-SNAPSHOT and work those types of 
> > issues there?
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398) NASA Jet 
> > Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department University 
> > of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Yaniv Kunda <ya...@answers.com>
> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Date: Wednesday, September 23, 2015 at 9:30 AM
> > To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Subject: Re: [DISCUSS] Release Tika 1.11?
> >
> > >+1 for the uber jar!
> > >
> > >Regarding jdk7 issues, I have a few more I will create and patch 
> > >later tonight - I'll post a list of issues as well.
> > >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:
> > >
> > >> Tim, was your check for File#getName done manually or it's 
> > >>present in tests  somehow? If it's present in tests we can check 
> > >>it on major platforms
> (I
> > >>can
> > >> test on linux, win xp and maybe on macosx) with different jdks.
> > >>
> > >> In case commons-io doesn't support ':' as file separator we can 
> > >>have simple  utility class in Tika or send them a patch for it.
> > >>
> > >> I think, we can rethink Tika packaging in 1.11/1.12 and produce 
> > >>these
> > >> artifacts:
> > >> - tika-core w/ dependency on commons-io (and deprecate most of 
> > >>o.a.tika.io  ,  forwarding calls to jdk or commons-io),
> > >> - tika-core-uber w/ shaded commons-io (rename and drop all things  
> > >>unnecessary for o.a.tika.io),
> > >> - sliced tika-parsers-* as Bob suggested earlier,
> > >> - tika-parsers jar w/ all tika-parsers-* parts (for 
> > >>compatibility),
> > >> - other tika-* artifacts (like tika-server, tika-app etc).
> > >>
> > >> One who needs tika-core without dependencies would use 
> > >>tika-core-uber  instead of it, all others, who prefer using 
> > >>maven/ivy/gradle/sbt/lein will  depend on tika-core.
> > >> And we can drop o.a.tika.io in 2.0.
> > >>
> > >> Also, I'll take a look at unresolved jdk7 issues/patches today.
> > >>
> > >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. 
> > >> <tallison@mitre.org
> >:
> > >>
> > >> > Thank _you_ for all of your work in modernizing us.  With your
> > >>efforts,
> > >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon.
> :)
> > >> >
> > >> > >>Regarding FilenameUtils.getName() - I believe that its
> functionality
> > >> can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > I agree that we should use that anytime we're interacting with 
> > >> > the
> > >>file
> > >> > system.
> > >> >
> > >> > However, that's actually the problem for paths that are stored
> within
> > >>the
> > >> > document (say, an embedded resource).  Let's say a user creates 
> > >> > a
> > >>file on
> > >> > Windows, the file path information for the embedded file 
> > >> > (depending
> on
> > >> the
> > >> > parser and the file format) may be in Windows-ese, which is a
> > >>problem if
> > >> > you try to use Path.getFileName() (I think... I haven't 
> > >> > actually
> > >>tested
> > >> > this) on a Linux machine.  I have actually tested this with the 
> > >> > old
> > >>File
> > >> > getName(), and it did not work cross-platform IIRC.
> > >> >
> > >> > In short, Tika needs to have the ability to extract the file 
> > >> > name
> > >>from a
> > >> > path that was created on any platform (including old Mac and its ":"
> > >> > separator) while Tika is running on any platform.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Monday, September 21, 2015 11:31 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Thanks for the positive spirit!
> > >> >
> > >> > Regarding FilenameUtils.getName() - I believe that its 
> > >> > functionality
> > >>can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > >> > Sent: Monday, September 21, 2015 14:27
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> > >> >
> > >> > I'll take TIKA-1734 by tomorrow EDT.
> > >> >
> > >> > As for the other 2, I'm personally ok waiting for 1.12, but I 
> > >> > defer
> to
> > >> the
> > >> > dev community.
> > >> >
> > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to 
> > >> > chime in
> on
> > >> > TIKA-1726, that might help move things forward.
> > >> >
> > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also 
> > >> > share
> > >> Yaniv's
> > >> > point about duplication of code, bloat within Tika and missing 
> > >> > out
> on
> > >> > updates.   Aside from one small bit of code I'd like to keep or
> > >>perhaps
> > >> try
> > >> > to move into commons-io (?)[0], I think I'm now +1 to going 
> > >> > forward
> > >>with
> > >> > TIKA-1706 in core...unless there is a -1 from the community.
> > >> >
> > >> > Best,
> > >> >
> > >> >              Tim
> > >> >
> > >> >
> > >> > [1] I added some customizations for old MAC OS behavior (treat ":"
> as
> > >> file
> > >> > separator) in FileNameUtils.getName() that I don't want to lose.
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Sunday, September 20, 2015 7:15 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > I would really like to push the following:
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
> > >>commons-io
> > >> > to tika-core This requires a decision to re-include commons-io 
> > >> > as a dependency of tika-core.
> > >> > All the pros and cons have been already debated, but no 
> > >> > decision has
> > >>been
> > >> > made.
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment 
> > >> > public
> > >>methods
> > >> > that use a java.io.File with methods that use a 
> > >> > java.nio.file.Path
> > >>Since
> > >> > this adds new methods to the public API, I requested the group 
> > >> > to
> > >>make a
> > >> > decision about the new names - but have not received something
> > >>definite.
> > >> > However, I did create a subtask -
> > >> > https://issues.apache.org/jira/browse/TIKA-1734 Use
> > java.nio.file.Path
> > >> in
> > >> > TemporaryResources - using [~tallison]'s suggestion, which has 
> > >> > not
> > >>been
> > >> > committed yet.
> > >> >
> > >> > If decisions are made on the above issues, I can quickly create
> > >>patches
> > >> > for them.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Mattmann, Chris A (3980) [mailto:
> chris.a.mattmann@jpl.nasa.gov]
> > >> > Sent: Saturday, September 19, 2015 08:10
> > >> > To: dev@tika.apache.org
> > >> > Subject: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Hey Guys and Gals,
> > >> >
> > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
> > >>particular
> > >> > allows some neat functionality in tika-python:
> > >> > https://github.com/chrismattmann/tika-python/pull/67
> > >> >
> > >> >
> > >> > Anything else to try and get into the release?
> > >> >
> > >> > If not, I’ll produce an RC #1 by end of weekend.
> > >> >
> > >> > Cheers,
> > >> > Chris
> > >> >
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Chris Mattmann, Ph.D.
> > >> > Chief Architect
> > >> > Instrument Software and Science Data Systems Section (398) NASA 
> > >> > Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> > Office: 168-519, Mailstop: 168-527
> > >> > Email: chris.a.mattmann@nasa.gov
> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Adjunct Associate Professor, Computer Science Department 
> > >> > University
> of
> > >> > Southern California, Los Angeles, CA 90089 USA
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> --
> > >> Best regards,
> > >> Konstantin Gribov
> > >>
> > >
> > >--
> > >
> > >
> > >This email communication (including any attachments) contains
> information
> > >from Answers Corporation or its affiliates that is confidential and 
> > >may be privileged. The information contained herein is intended 
> > >only for the
> use
> > >of the addressee(s) named above. If you are not the intended 
> > >recipient (or the agent responsible to deliver it to the intended 
> > >recipient), you are hereby notified that any dissemination, 
> > >distribution, use, or copying of this communication is strictly 
> > >prohibited. If you have received this email in error, please 
> > >immediately reply to sender, delete the message and destroy all 
> > >copies of it. If you have questions, please email 
> > >legal@answers.com.
> > >
> > >If you wish to unsubscribe to commercial emails from Answers and 
> > >its affiliates, please go to the Answers Subscription Center 
> > >http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> > --
> Best regards,
> Konstantin Gribov
>

Re: [DISCUSS] Release Tika 1.11?

Posted by Bob Paulin <bo...@bobpaulin.com>.
+1 for the branching strategy.

With respect to slicing up the parsers it would be great to have more
discussion on how the parsers should be organized.  I think Tim has a draft
out on this mailing list that would benefit from some additional
perspectives.  Really cool to be talking about doing this!

- Bob

On Wed, Sep 23, 2015 at 12:36 PM, Konstantin Gribov <gr...@gmail.com>
wrote:

> It seems to be a good idea to avoid inclusion of commons-io into tika-core
> till 2.0 if we will release it in several months.
> In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT and
> branch for 1.11+ bugfixes.
>
> Some changes related to java7 can be included to 1.11/1.12 with no
> problems.
>
> ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov>:
>
> > I’m not so keen on fundamentally changing the organization of
> > Tika until 2.x. This seems like a major change to me in the
> > way people expect to consume Tika.
> >
> > Can we:
> >
> > 1. Release a 1.11 that doesn’t include these types of changes
> > 2. After 1.11, change trunk to be 2.0-SNAPSHOT and work those
> > types of issues there?
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Yaniv Kunda <ya...@answers.com>
> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Date: Wednesday, September 23, 2015 at 9:30 AM
> > To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Subject: Re: [DISCUSS] Release Tika 1.11?
> >
> > >+1 for the uber jar!
> > >
> > >Regarding jdk7 issues, I have a few more I will create and patch later
> > >tonight - I'll post a list of issues as well.
> > >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:
> > >
> > >> Tim, was your check for File#getName done manually or it's present in
> > >>tests
> > >> somehow? If it's present in tests we can check it on major platforms
> (I
> > >>can
> > >> test on linux, win xp and maybe on macosx) with different jdks.
> > >>
> > >> In case commons-io doesn't support ':' as file separator we can have
> > >>simple
> > >> utility class in Tika or send them a patch for it.
> > >>
> > >> I think, we can rethink Tika packaging in 1.11/1.12 and produce these
> > >> artifacts:
> > >> - tika-core w/ dependency on commons-io (and deprecate most of
> > >>o.a.tika.io
> > >> ,
> > >> forwarding calls to jdk or commons-io),
> > >> - tika-core-uber w/ shaded commons-io (rename and drop all things
> > >> unnecessary for o.a.tika.io),
> > >> - sliced tika-parsers-* as Bob suggested earlier,
> > >> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility),
> > >> - other tika-* artifacts (like tika-server, tika-app etc).
> > >>
> > >> One who needs tika-core without dependencies would use tika-core-uber
> > >> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein
> > >>will
> > >> depend on tika-core.
> > >> And we can drop o.a.tika.io in 2.0.
> > >>
> > >> Also, I'll take a look at unresolved jdk7 issues/patches today.
> > >>
> > >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <tallison@mitre.org
> >:
> > >>
> > >> > Thank _you_ for all of your work in modernizing us.  With your
> > >>efforts,
> > >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon.
> :)
> > >> >
> > >> > >>Regarding FilenameUtils.getName() - I believe that its
> functionality
> > >> can
> > >> > be replaced by Path.getFileName() - and in a platform-aware manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > I agree that we should use that anytime we're interacting with the
> > >>file
> > >> > system.
> > >> >
> > >> > However, that's actually the problem for paths that are stored
> within
> > >>the
> > >> > document (say, an embedded resource).  Let's say a user creates a
> > >>file on
> > >> > Windows, the file path information for the embedded file (depending
> on
> > >> the
> > >> > parser and the file format) may be in Windows-ese, which is a
> > >>problem if
> > >> > you try to use Path.getFileName() (I think... I haven't actually
> > >>tested
> > >> > this) on a Linux machine.  I have actually tested this with the old
> > >>File
> > >> > getName(), and it did not work cross-platform IIRC.
> > >> >
> > >> > In short, Tika needs to have the ability to extract the file name
> > >>from a
> > >> > path that was created on any platform (including old Mac and its ":"
> > >> > separator) while Tika is running on any platform.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Monday, September 21, 2015 11:31 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Thanks for the positive spirit!
> > >> >
> > >> > Regarding FilenameUtils.getName() - I believe that its functionality
> > >>can
> > >> > be replaced by Path.getFileName() - and in a platform-aware manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > >> > Sent: Monday, September 21, 2015 14:27
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> > >> >
> > >> > I'll take TIKA-1734 by tomorrow EDT.
> > >> >
> > >> > As for the other 2, I'm personally ok waiting for 1.12, but I defer
> to
> > >> the
> > >> > dev community.
> > >> >
> > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in
> on
> > >> > TIKA-1726, that might help move things forward.
> > >> >
> > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share
> > >> Yaniv's
> > >> > point about duplication of code, bloat within Tika and missing out
> on
> > >> > updates.   Aside from one small bit of code I'd like to keep or
> > >>perhaps
> > >> try
> > >> > to move into commons-io (?)[0], I think I'm now +1 to going forward
> > >>with
> > >> > TIKA-1706 in core...unless there is a -1 from the community.
> > >> >
> > >> > Best,
> > >> >
> > >> >              Tim
> > >> >
> > >> >
> > >> > [1] I added some customizations for old MAC OS behavior (treat ":"
> as
> > >> file
> > >> > separator) in FileNameUtils.getName() that I don't want to lose.
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Sunday, September 20, 2015 7:15 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > I would really like to push the following:
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
> > >>commons-io
> > >> > to tika-core This requires a decision to re-include commons-io as a
> > >> > dependency of tika-core.
> > >> > All the pros and cons have been already debated, but no decision has
> > >>been
> > >> > made.
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public
> > >>methods
> > >> > that use a java.io.File with methods that use a java.nio.file.Path
> > >>Since
> > >> > this adds new methods to the public API, I requested the group to
> > >>make a
> > >> > decision about the new names - but have not received something
> > >>definite.
> > >> > However, I did create a subtask -
> > >> > https://issues.apache.org/jira/browse/TIKA-1734 Use
> > java.nio.file.Path
> > >> in
> > >> > TemporaryResources - using [~tallison]'s suggestion, which has not
> > >>been
> > >> > committed yet.
> > >> >
> > >> > If decisions are made on the above issues, I can quickly create
> > >>patches
> > >> > for them.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Mattmann, Chris A (3980) [mailto:
> chris.a.mattmann@jpl.nasa.gov]
> > >> > Sent: Saturday, September 19, 2015 08:10
> > >> > To: dev@tika.apache.org
> > >> > Subject: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Hey Guys and Gals,
> > >> >
> > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
> > >>particular
> > >> > allows some neat functionality in tika-python:
> > >> > https://github.com/chrismattmann/tika-python/pull/67
> > >> >
> > >> >
> > >> > Anything else to try and get into the release?
> > >> >
> > >> > If not, I’ll produce an RC #1 by end of weekend.
> > >> >
> > >> > Cheers,
> > >> > Chris
> > >> >
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Chris Mattmann, Ph.D.
> > >> > Chief Architect
> > >> > Instrument Software and Science Data Systems Section (398) NASA Jet
> > >> > Propulsion Laboratory Pasadena, CA 91109 USA
> > >> > Office: 168-519, Mailstop: 168-527
> > >> > Email: chris.a.mattmann@nasa.gov
> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Adjunct Associate Professor, Computer Science Department University
> of
> > >> > Southern California, Los Angeles, CA 90089 USA
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received this
> > >> email
> > >> > in error, please immediately reply to sender, delete the message and
> > >> > destroy all copies of it. If you have questions, please email
> > >> > legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers and its
> > >> > affiliates, please go to the Answers Subscription Center
> > >> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received this
> > >> email
> > >> > in error, please immediately reply to sender, delete the message and
> > >> > destroy all copies of it. If you have questions, please email
> > >> > legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers and its
> > >> > affiliates, please go to the Answers Subscription Center
> > >> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> > >> >
> > >> --
> > >> Best regards,
> > >> Konstantin Gribov
> > >>
> > >
> > >--
> > >
> > >
> > >This email communication (including any attachments) contains
> information
> > >from Answers Corporation or its affiliates that is confidential and may
> > >be
> > >privileged. The information contained herein is intended only for the
> use
> > >of the addressee(s) named above. If you are not the intended recipient
> > >(or
> > >the agent responsible to deliver it to the intended recipient), you are
> > >hereby notified that any dissemination, distribution, use, or copying of
> > >this communication is strictly prohibited. If you have received this
> > >email
> > >in error, please immediately reply to sender, delete the message and
> > >destroy all copies of it. If you have questions, please email
> > >legal@answers.com.
> > >
> > >If you wish to unsubscribe to commercial emails from Answers and its
> > >affiliates, please go to the Answers Subscription Center
> > >http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> > --
> Best regards,
> Konstantin Gribov
>

Re: [DISCUSS] Release Tika 1.11?

Posted by Konstantin Gribov <gr...@gmail.com>.
It seems to be a good idea to avoid inclusion of commons-io into tika-core
till 2.0 if we will release it in several months.
In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT and
branch for 1.11+ bugfixes.

Some changes related to java7 can be included to 1.11/1.12 with no problems.

ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov>:

> I’m not so keen on fundamentally changing the organization of
> Tika until 2.x. This seems like a major change to me in the
> way people expect to consume Tika.
>
> Can we:
>
> 1. Release a 1.11 that doesn’t include these types of changes
> 2. After 1.11, change trunk to be 2.0-SNAPSHOT and work those
> types of issues there?
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Yaniv Kunda <ya...@answers.com>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Wednesday, September 23, 2015 at 9:30 AM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: Re: [DISCUSS] Release Tika 1.11?
>
> >+1 for the uber jar!
> >
> >Regarding jdk7 issues, I have a few more I will create and patch later
> >tonight - I'll post a list of issues as well.
> >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:
> >
> >> Tim, was your check for File#getName done manually or it's present in
> >>tests
> >> somehow? If it's present in tests we can check it on major platforms (I
> >>can
> >> test on linux, win xp and maybe on macosx) with different jdks.
> >>
> >> In case commons-io doesn't support ':' as file separator we can have
> >>simple
> >> utility class in Tika or send them a patch for it.
> >>
> >> I think, we can rethink Tika packaging in 1.11/1.12 and produce these
> >> artifacts:
> >> - tika-core w/ dependency on commons-io (and deprecate most of
> >>o.a.tika.io
> >> ,
> >> forwarding calls to jdk or commons-io),
> >> - tika-core-uber w/ shaded commons-io (rename and drop all things
> >> unnecessary for o.a.tika.io),
> >> - sliced tika-parsers-* as Bob suggested earlier,
> >> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility),
> >> - other tika-* artifacts (like tika-server, tika-app etc).
> >>
> >> One who needs tika-core without dependencies would use tika-core-uber
> >> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein
> >>will
> >> depend on tika-core.
> >> And we can drop o.a.tika.io in 2.0.
> >>
> >> Also, I'll take a look at unresolved jdk7 issues/patches today.
> >>
> >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <ta...@mitre.org>:
> >>
> >> > Thank _you_ for all of your work in modernizing us.  With your
> >>efforts,
> >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :)
> >> >
> >> > >>Regarding FilenameUtils.getName() - I believe that its functionality
> >> can
> >> > be replaced by Path.getFileName() - and in a platform-aware manner, as
> >> each
> >> > JVM distribution comes with a specific provider implementation for
> >>the OS
> >> > it's for.
> >> >
> >> > I agree that we should use that anytime we're interacting with the
> >>file
> >> > system.
> >> >
> >> > However, that's actually the problem for paths that are stored within
> >>the
> >> > document (say, an embedded resource).  Let's say a user creates a
> >>file on
> >> > Windows, the file path information for the embedded file (depending on
> >> the
> >> > parser and the file format) may be in Windows-ese, which is a
> >>problem if
> >> > you try to use Path.getFileName() (I think... I haven't actually
> >>tested
> >> > this) on a Linux machine.  I have actually tested this with the old
> >>File
> >> > getName(), and it did not work cross-platform IIRC.
> >> >
> >> > In short, Tika needs to have the ability to extract the file name
> >>from a
> >> > path that was created on any platform (including old Mac and its ":"
> >> > separator) while Tika is running on any platform.
> >> >
> >> > -----Original Message-----
> >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> >> > Sent: Monday, September 21, 2015 11:31 AM
> >> > To: dev@tika.apache.org
> >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >> >
> >> > Thanks for the positive spirit!
> >> >
> >> > Regarding FilenameUtils.getName() - I believe that its functionality
> >>can
> >> > be replaced by Path.getFileName() - and in a platform-aware manner, as
> >> each
> >> > JVM distribution comes with a specific provider implementation for
> >>the OS
> >> > it's for.
> >> >
> >> > -----Original Message-----
> >> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> >> > Sent: Monday, September 21, 2015 14:27
> >> > To: dev@tika.apache.org
> >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >> >
> >> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> >> >
> >> > I'll take TIKA-1734 by tomorrow EDT.
> >> >
> >> > As for the other 2, I'm personally ok waiting for 1.12, but I defer to
> >> the
> >> > dev community.
> >> >
> >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on
> >> > TIKA-1726, that might help move things forward.
> >> >
> >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share
> >> Yaniv's
> >> > point about duplication of code, bloat within Tika and missing out on
> >> > updates.   Aside from one small bit of code I'd like to keep or
> >>perhaps
> >> try
> >> > to move into commons-io (?)[0], I think I'm now +1 to going forward
> >>with
> >> > TIKA-1706 in core...unless there is a -1 from the community.
> >> >
> >> > Best,
> >> >
> >> >              Tim
> >> >
> >> >
> >> > [1] I added some customizations for old MAC OS behavior (treat ":" as
> >> file
> >> > separator) in FileNameUtils.getName() that I don't want to lose.
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> >> > Sent: Sunday, September 20, 2015 7:15 AM
> >> > To: dev@tika.apache.org
> >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >> >
> >> > I would really like to push the following:
> >> >
> >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
> >>commons-io
> >> > to tika-core This requires a decision to re-include commons-io as a
> >> > dependency of tika-core.
> >> > All the pros and cons have been already debated, but no decision has
> >>been
> >> > made.
> >> >
> >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public
> >>methods
> >> > that use a java.io.File with methods that use a java.nio.file.Path
> >>Since
> >> > this adds new methods to the public API, I requested the group to
> >>make a
> >> > decision about the new names - but have not received something
> >>definite.
> >> > However, I did create a subtask -
> >> > https://issues.apache.org/jira/browse/TIKA-1734 Use
> java.nio.file.Path
> >> in
> >> > TemporaryResources - using [~tallison]'s suggestion, which has not
> >>been
> >> > committed yet.
> >> >
> >> > If decisions are made on the above issues, I can quickly create
> >>patches
> >> > for them.
> >> >
> >> > -----Original Message-----
> >> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> >> > Sent: Saturday, September 19, 2015 08:10
> >> > To: dev@tika.apache.org
> >> > Subject: [DISCUSS] Release Tika 1.11?
> >> >
> >> > Hey Guys and Gals,
> >> >
> >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
> >>particular
> >> > allows some neat functionality in tika-python:
> >> > https://github.com/chrismattmann/tika-python/pull/67
> >> >
> >> >
> >> > Anything else to try and get into the release?
> >> >
> >> > If not, I’ll produce an RC #1 by end of weekend.
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Chris Mattmann, Ph.D.
> >> > Chief Architect
> >> > Instrument Software and Science Data Systems Section (398) NASA Jet
> >> > Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 168-519, Mailstop: 168-527
> >> > Email: chris.a.mattmann@nasa.gov
> >> > WWW:  http://sunset.usc.edu/~mattmann/
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Adjunct Associate Professor, Computer Science Department University of
> >> > Southern California, Los Angeles, CA 90089 USA
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >
> >> > --
> >> >
> >> >
> >> > This email communication (including any attachments) contains
> >>information
> >> > from Answers Corporation or its affiliates that is confidential and
> >>may
> >> be
> >> > privileged. The information contained herein is intended only for the
> >>use
> >> > of the addressee(s) named above. If you are not the intended recipient
> >> (or
> >> > the agent responsible to deliver it to the intended recipient), you
> >>are
> >> > hereby notified that any dissemination, distribution, use, or copying
> >>of
> >> > this communication is strictly prohibited. If you have received this
> >> email
> >> > in error, please immediately reply to sender, delete the message and
> >> > destroy all copies of it. If you have questions, please email
> >> > legal@answers.com.
> >> >
> >> > If you wish to unsubscribe to commercial emails from Answers and its
> >> > affiliates, please go to the Answers Subscription Center
> >> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >> >
> >> > --
> >> >
> >> >
> >> > This email communication (including any attachments) contains
> >>information
> >> > from Answers Corporation or its affiliates that is confidential and
> >>may
> >> be
> >> > privileged. The information contained herein is intended only for the
> >>use
> >> > of the addressee(s) named above. If you are not the intended recipient
> >> (or
> >> > the agent responsible to deliver it to the intended recipient), you
> >>are
> >> > hereby notified that any dissemination, distribution, use, or copying
> >>of
> >> > this communication is strictly prohibited. If you have received this
> >> email
> >> > in error, please immediately reply to sender, delete the message and
> >> > destroy all copies of it. If you have questions, please email
> >> > legal@answers.com.
> >> >
> >> > If you wish to unsubscribe to commercial emails from Answers and its
> >> > affiliates, please go to the Answers Subscription Center
> >> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >> >
> >> --
> >> Best regards,
> >> Konstantin Gribov
> >>
> >
> >--
> >
> >
> >This email communication (including any attachments) contains information
> >from Answers Corporation or its affiliates that is confidential and may
> >be
> >privileged. The information contained herein is intended only for the use
> >of the addressee(s) named above. If you are not the intended recipient
> >(or
> >the agent responsible to deliver it to the intended recipient), you are
> >hereby notified that any dissemination, distribution, use, or copying of
> >this communication is strictly prohibited. If you have received this
> >email
> >in error, please immediately reply to sender, delete the message and
> >destroy all copies of it. If you have questions, please email
> >legal@answers.com.
> >
> >If you wish to unsubscribe to commercial emails from Answers and its
> >affiliates, please go to the Answers Subscription Center
> >http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>
> --
Best regards,
Konstantin Gribov

Re: [DISCUSS] Release Tika 1.11?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
I’m not so keen on fundamentally changing the organization of
Tika until 2.x. This seems like a major change to me in the
way people expect to consume Tika.

Can we:

1. Release a 1.11 that doesn’t include these types of changes
2. After 1.11, change trunk to be 2.0-SNAPSHOT and work those
types of issues there?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Yaniv Kunda <ya...@answers.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, September 23, 2015 at 9:30 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Release Tika 1.11?

>+1 for the uber jar!
>
>Regarding jdk7 issues, I have a few more I will create and patch later
>tonight - I'll post a list of issues as well.
>On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:
>
>> Tim, was your check for File#getName done manually or it's present in
>>tests
>> somehow? If it's present in tests we can check it on major platforms (I
>>can
>> test on linux, win xp and maybe on macosx) with different jdks.
>>
>> In case commons-io doesn't support ':' as file separator we can have
>>simple
>> utility class in Tika or send them a patch for it.
>>
>> I think, we can rethink Tika packaging in 1.11/1.12 and produce these
>> artifacts:
>> - tika-core w/ dependency on commons-io (and deprecate most of
>>o.a.tika.io
>> ,
>> forwarding calls to jdk or commons-io),
>> - tika-core-uber w/ shaded commons-io (rename and drop all things
>> unnecessary for o.a.tika.io),
>> - sliced tika-parsers-* as Bob suggested earlier,
>> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility),
>> - other tika-* artifacts (like tika-server, tika-app etc).
>>
>> One who needs tika-core without dependencies would use tika-core-uber
>> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein
>>will
>> depend on tika-core.
>> And we can drop o.a.tika.io in 2.0.
>>
>> Also, I'll take a look at unresolved jdk7 issues/patches today.
>>
>> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <ta...@mitre.org>:
>>
>> > Thank _you_ for all of your work in modernizing us.  With your
>>efforts,
>> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :)
>> >
>> > >>Regarding FilenameUtils.getName() - I believe that its functionality
>> can
>> > be replaced by Path.getFileName() - and in a platform-aware manner, as
>> each
>> > JVM distribution comes with a specific provider implementation for
>>the OS
>> > it's for.
>> >
>> > I agree that we should use that anytime we're interacting with the
>>file
>> > system.
>> >
>> > However, that's actually the problem for paths that are stored within
>>the
>> > document (say, an embedded resource).  Let's say a user creates a
>>file on
>> > Windows, the file path information for the embedded file (depending on
>> the
>> > parser and the file format) may be in Windows-ese, which is a
>>problem if
>> > you try to use Path.getFileName() (I think... I haven't actually
>>tested
>> > this) on a Linux machine.  I have actually tested this with the old
>>File
>> > getName(), and it did not work cross-platform IIRC.
>> >
>> > In short, Tika needs to have the ability to extract the file name
>>from a
>> > path that was created on any platform (including old Mac and its ":"
>> > separator) while Tika is running on any platform.
>> >
>> > -----Original Message-----
>> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
>> > Sent: Monday, September 21, 2015 11:31 AM
>> > To: dev@tika.apache.org
>> > Subject: RE: [DISCUSS] Release Tika 1.11?
>> >
>> > Thanks for the positive spirit!
>> >
>> > Regarding FilenameUtils.getName() - I believe that its functionality
>>can
>> > be replaced by Path.getFileName() - and in a platform-aware manner, as
>> each
>> > JVM distribution comes with a specific provider implementation for
>>the OS
>> > it's for.
>> >
>> > -----Original Message-----
>> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
>> > Sent: Monday, September 21, 2015 14:27
>> > To: dev@tika.apache.org
>> > Subject: RE: [DISCUSS] Release Tika 1.11?
>> >
>> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
>> >
>> > I'll take TIKA-1734 by tomorrow EDT.
>> >
>> > As for the other 2, I'm personally ok waiting for 1.12, but I defer to
>> the
>> > dev community.
>> >
>> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on
>> > TIKA-1726, that might help move things forward.
>> >
>> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share
>> Yaniv's
>> > point about duplication of code, bloat within Tika and missing out on
>> > updates.   Aside from one small bit of code I'd like to keep or
>>perhaps
>> try
>> > to move into commons-io (?)[0], I think I'm now +1 to going forward
>>with
>> > TIKA-1706 in core...unless there is a -1 from the community.
>> >
>> > Best,
>> >
>> >              Tim
>> >
>> >
>> > [1] I added some customizations for old MAC OS behavior (treat ":" as
>> file
>> > separator) in FileNameUtils.getName() that I don't want to lose.
>> >
>> >
>> > -----Original Message-----
>> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
>> > Sent: Sunday, September 20, 2015 7:15 AM
>> > To: dev@tika.apache.org
>> > Subject: RE: [DISCUSS] Release Tika 1.11?
>> >
>> > I would really like to push the following:
>> >
>> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
>>commons-io
>> > to tika-core This requires a decision to re-include commons-io as a
>> > dependency of tika-core.
>> > All the pros and cons have been already debated, but no decision has
>>been
>> > made.
>> >
>> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public
>>methods
>> > that use a java.io.File with methods that use a java.nio.file.Path
>>Since
>> > this adds new methods to the public API, I requested the group to
>>make a
>> > decision about the new names - but have not received something
>>definite.
>> > However, I did create a subtask -
>> > https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path
>> in
>> > TemporaryResources - using [~tallison]'s suggestion, which has not
>>been
>> > committed yet.
>> >
>> > If decisions are made on the above issues, I can quickly create
>>patches
>> > for them.
>> >
>> > -----Original Message-----
>> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> > Sent: Saturday, September 19, 2015 08:10
>> > To: dev@tika.apache.org
>> > Subject: [DISCUSS] Release Tika 1.11?
>> >
>> > Hey Guys and Gals,
>> >
>> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
>>particular
>> > allows some neat functionality in tika-python:
>> > https://github.com/chrismattmann/tika-python/pull/67
>> >
>> >
>> > Anything else to try and get into the release?
>> >
>> > If not, I’ll produce an RC #1 by end of weekend.
>> >
>> > Cheers,
>> > Chris
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398) NASA Jet
>> > Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Associate Professor, Computer Science Department University of
>> > Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> > --
>> >
>> >
>> > This email communication (including any attachments) contains
>>information
>> > from Answers Corporation or its affiliates that is confidential and
>>may
>> be
>> > privileged. The information contained herein is intended only for the
>>use
>> > of the addressee(s) named above. If you are not the intended recipient
>> (or
>> > the agent responsible to deliver it to the intended recipient), you
>>are
>> > hereby notified that any dissemination, distribution, use, or copying
>>of
>> > this communication is strictly prohibited. If you have received this
>> email
>> > in error, please immediately reply to sender, delete the message and
>> > destroy all copies of it. If you have questions, please email
>> > legal@answers.com.
>> >
>> > If you wish to unsubscribe to commercial emails from Answers and its
>> > affiliates, please go to the Answers Subscription Center
>> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>> >
>> > --
>> >
>> >
>> > This email communication (including any attachments) contains
>>information
>> > from Answers Corporation or its affiliates that is confidential and
>>may
>> be
>> > privileged. The information contained herein is intended only for the
>>use
>> > of the addressee(s) named above. If you are not the intended recipient
>> (or
>> > the agent responsible to deliver it to the intended recipient), you
>>are
>> > hereby notified that any dissemination, distribution, use, or copying
>>of
>> > this communication is strictly prohibited. If you have received this
>> email
>> > in error, please immediately reply to sender, delete the message and
>> > destroy all copies of it. If you have questions, please email
>> > legal@answers.com.
>> >
>> > If you wish to unsubscribe to commercial emails from Answers and its
>> > affiliates, please go to the Answers Subscription Center
>> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>> >
>> --
>> Best regards,
>> Konstantin Gribov
>>
>
>-- 
>
>
>This email communication (including any attachments) contains information
>from Answers Corporation or its affiliates that is confidential and may
>be 
>privileged. The information contained herein is intended only for the use
>of the addressee(s) named above. If you are not the intended recipient
>(or 
>the agent responsible to deliver it to the intended recipient), you are
>hereby notified that any dissemination, distribution, use, or copying of
>this communication is strictly prohibited. If you have received this
>email 
>in error, please immediately reply to sender, delete the message and
>destroy all copies of it. If you have questions, please email
>legal@answers.com.
>
>If you wish to unsubscribe to commercial emails from Answers and its
>affiliates, please go to the Answers Subscription Center
>http://campaigns.answers.com/subscriptions to opt out.  Thank you.


Re: [DISCUSS] Release Tika 1.11?

Posted by Yaniv Kunda <ya...@answers.com>.
+1 for the uber jar!

Regarding jdk7 issues, I have a few more I will create and patch later
tonight - I'll post a list of issues as well.
On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:

> Tim, was your check for File#getName done manually or it's present in tests
> somehow? If it's present in tests we can check it on major platforms (I can
> test on linux, win xp and maybe on macosx) with different jdks.
>
> In case commons-io doesn't support ':' as file separator we can have simple
> utility class in Tika or send them a patch for it.
>
> I think, we can rethink Tika packaging in 1.11/1.12 and produce these
> artifacts:
> - tika-core w/ dependency on commons-io (and deprecate most of o.a.tika.io
> ,
> forwarding calls to jdk or commons-io),
> - tika-core-uber w/ shaded commons-io (rename and drop all things
> unnecessary for o.a.tika.io),
> - sliced tika-parsers-* as Bob suggested earlier,
> - tika-parsers jar w/ all tika-parsers-* parts (for compatibility),
> - other tika-* artifacts (like tika-server, tika-app etc).
>
> One who needs tika-core without dependencies would use tika-core-uber
> instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein will
> depend on tika-core.
> And we can drop o.a.tika.io in 2.0.
>
> Also, I'll take a look at unresolved jdk7 issues/patches today.
>
> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <ta...@mitre.org>:
>
> > Thank _you_ for all of your work in modernizing us.  With your efforts,
> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :)
> >
> > >>Regarding FilenameUtils.getName() - I believe that its functionality
> can
> > be replaced by Path.getFileName() - and in a platform-aware manner, as
> each
> > JVM distribution comes with a specific provider implementation for the OS
> > it's for.
> >
> > I agree that we should use that anytime we're interacting with the file
> > system.
> >
> > However, that's actually the problem for paths that are stored within the
> > document (say, an embedded resource).  Let's say a user creates a file on
> > Windows, the file path information for the embedded file (depending on
> the
> > parser and the file format) may be in Windows-ese, which is a  problem if
> > you try to use Path.getFileName() (I think... I haven't actually tested
> > this) on a Linux machine.  I have actually tested this with the old File
> > getName(), and it did not work cross-platform IIRC.
> >
> > In short, Tika needs to have the ability to extract the file name from a
> > path that was created on any platform (including old Mac and its ":"
> > separator) while Tika is running on any platform.
> >
> > -----Original Message-----
> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > Sent: Monday, September 21, 2015 11:31 AM
> > To: dev@tika.apache.org
> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >
> > Thanks for the positive spirit!
> >
> > Regarding FilenameUtils.getName() - I believe that its functionality can
> > be replaced by Path.getFileName() - and in a platform-aware manner, as
> each
> > JVM distribution comes with a specific provider implementation for the OS
> > it's for.
> >
> > -----Original Message-----
> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > Sent: Monday, September 21, 2015 14:27
> > To: dev@tika.apache.org
> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >
> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> >
> > I'll take TIKA-1734 by tomorrow EDT.
> >
> > As for the other 2, I'm personally ok waiting for 1.12, but I defer to
> the
> > dev community.
> >
> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on
> > TIKA-1726, that might help move things forward.
> >
> > On TIKA-1706, I share Nick's and Jukka's caution, and I also share
> Yaniv's
> > point about duplication of code, bloat within Tika and missing out on
> > updates.   Aside from one small bit of code I'd like to keep or perhaps
> try
> > to move into commons-io (?)[0], I think I'm now +1 to going forward with
> > TIKA-1706 in core...unless there is a -1 from the community.
> >
> > Best,
> >
> >              Tim
> >
> >
> > [1] I added some customizations for old MAC OS behavior (treat ":" as
> file
> > separator) in FileNameUtils.getName() that I don't want to lose.
> >
> >
> > -----Original Message-----
> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > Sent: Sunday, September 20, 2015 7:15 AM
> > To: dev@tika.apache.org
> > Subject: RE: [DISCUSS] Release Tika 1.11?
> >
> > I would really like to push the following:
> >
> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io
> > to tika-core This requires a decision to re-include commons-io as a
> > dependency of tika-core.
> > All the pros and cons have been already debated, but no decision has been
> > made.
> >
> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods
> > that use a java.io.File with methods that use a java.nio.file.Path Since
> > this adds new methods to the public API, I requested the group to make a
> > decision about the new names - but have not received something definite.
> > However, I did create a subtask -
> > https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path
> in
> > TemporaryResources - using [~tallison]'s suggestion, which has not been
> > committed yet.
> >
> > If decisions are made on the above issues, I can quickly create patches
> > for them.
> >
> > -----Original Message-----
> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> > Sent: Saturday, September 19, 2015 08:10
> > To: dev@tika.apache.org
> > Subject: [DISCUSS] Release Tika 1.11?
> >
> > Hey Guys and Gals,
> >
> > I’d like to roll a 1.11 release. There is TIKA-1716 which in particular
> > allows some neat functionality in tika-python:
> > https://github.com/chrismattmann/tika-python/pull/67
> >
> >
> > Anything else to try and get into the release?
> >
> > If not, I’ll produce an RC #1 by end of weekend.
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398) NASA Jet
> > Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department University of
> > Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> > --
> >
> >
> > This email communication (including any attachments) contains information
> > from Answers Corporation or its affiliates that is confidential and may
> be
> > privileged. The information contained herein is intended only for the use
> > of the addressee(s) named above. If you are not the intended recipient
> (or
> > the agent responsible to deliver it to the intended recipient), you are
> > hereby notified that any dissemination, distribution, use, or copying of
> > this communication is strictly prohibited. If you have received this
> email
> > in error, please immediately reply to sender, delete the message and
> > destroy all copies of it. If you have questions, please email
> > legal@answers.com.
> >
> > If you wish to unsubscribe to commercial emails from Answers and its
> > affiliates, please go to the Answers Subscription Center
> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> > --
> >
> >
> > This email communication (including any attachments) contains information
> > from Answers Corporation or its affiliates that is confidential and may
> be
> > privileged. The information contained herein is intended only for the use
> > of the addressee(s) named above. If you are not the intended recipient
> (or
> > the agent responsible to deliver it to the intended recipient), you are
> > hereby notified that any dissemination, distribution, use, or copying of
> > this communication is strictly prohibited. If you have received this
> email
> > in error, please immediately reply to sender, delete the message and
> > destroy all copies of it. If you have questions, please email
> > legal@answers.com.
> >
> > If you wish to unsubscribe to commercial emails from Answers and its
> > affiliates, please go to the Answers Subscription Center
> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> --
> Best regards,
> Konstantin Gribov
>

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

Re: [DISCUSS] Release Tika 1.11?

Posted by Konstantin Gribov <gr...@gmail.com>.
Tim, was your check for File#getName done manually or it's present in tests
somehow? If it's present in tests we can check it on major platforms (I can
test on linux, win xp and maybe on macosx) with different jdks.

In case commons-io doesn't support ':' as file separator we can have simple
utility class in Tika or send them a patch for it.

I think, we can rethink Tika packaging in 1.11/1.12 and produce these
artifacts:
- tika-core w/ dependency on commons-io (and deprecate most of o.a.tika.io,
forwarding calls to jdk or commons-io),
- tika-core-uber w/ shaded commons-io (rename and drop all things
unnecessary for o.a.tika.io),
- sliced tika-parsers-* as Bob suggested earlier,
- tika-parsers jar w/ all tika-parsers-* parts (for compatibility),
- other tika-* artifacts (like tika-server, tika-app etc).

One who needs tika-core without dependencies would use tika-core-uber
instead of it, all others, who prefer using maven/ivy/gradle/sbt/lein will
depend on tika-core.
And we can drop o.a.tika.io in 2.0.

Also, I'll take a look at unresolved jdk7 issues/patches today.

вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. <ta...@mitre.org>:

> Thank _you_ for all of your work in modernizing us.  With your efforts,
> we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :)
>
> >>Regarding FilenameUtils.getName() - I believe that its functionality can
> be replaced by Path.getFileName() - and in a platform-aware manner, as each
> JVM distribution comes with a specific provider implementation for the OS
> it's for.
>
> I agree that we should use that anytime we're interacting with the file
> system.
>
> However, that's actually the problem for paths that are stored within the
> document (say, an embedded resource).  Let's say a user creates a file on
> Windows, the file path information for the embedded file (depending on the
> parser and the file format) may be in Windows-ese, which is a  problem if
> you try to use Path.getFileName() (I think... I haven't actually tested
> this) on a Linux machine.  I have actually tested this with the old File
> getName(), and it did not work cross-platform IIRC.
>
> In short, Tika needs to have the ability to extract the file name from a
> path that was created on any platform (including old Mac and its ":"
> separator) while Tika is running on any platform.
>
> -----Original Message-----
> From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> Sent: Monday, September 21, 2015 11:31 AM
> To: dev@tika.apache.org
> Subject: RE: [DISCUSS] Release Tika 1.11?
>
> Thanks for the positive spirit!
>
> Regarding FilenameUtils.getName() - I believe that its functionality can
> be replaced by Path.getFileName() - and in a platform-aware manner, as each
> JVM distribution comes with a specific provider implementation for the OS
> it's for.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, September 21, 2015 14:27
> To: dev@tika.apache.org
> Subject: RE: [DISCUSS] Release Tika 1.11?
>
> +1, it would be great to move a bit more into EOL'd Java 7 asap.
>
> I'll take TIKA-1734 by tomorrow EDT.
>
> As for the other 2, I'm personally ok waiting for 1.12, but I defer to the
> dev community.
>
> Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on
> TIKA-1726, that might help move things forward.
>
> On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's
> point about duplication of code, bloat within Tika and missing out on
> updates.   Aside from one small bit of code I'd like to keep or perhaps try
> to move into commons-io (?)[0], I think I'm now +1 to going forward with
> TIKA-1706 in core...unless there is a -1 from the community.
>
> Best,
>
>              Tim
>
>
> [1] I added some customizations for old MAC OS behavior (treat ":" as file
> separator) in FileNameUtils.getName() that I don't want to lose.
>
>
> -----Original Message-----
> From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> Sent: Sunday, September 20, 2015 7:15 AM
> To: dev@tika.apache.org
> Subject: RE: [DISCUSS] Release Tika 1.11?
>
> I would really like to push the following:
>
> https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io
> to tika-core This requires a decision to re-include commons-io as a
> dependency of tika-core.
> All the pros and cons have been already debated, but no decision has been
> made.
>
> https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods
> that use a java.io.File with methods that use a java.nio.file.Path Since
> this adds new methods to the public API, I requested the group to make a
> decision about the new names - but have not received something definite.
> However, I did create a subtask -
> https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in
> TemporaryResources - using [~tallison]'s suggestion, which has not been
> committed yet.
>
> If decisions are made on the above issues, I can quickly create patches
> for them.
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Saturday, September 19, 2015 08:10
> To: dev@tika.apache.org
> Subject: [DISCUSS] Release Tika 1.11?
>
> Hey Guys and Gals,
>
> I’d like to roll a 1.11 release. There is TIKA-1716 which in particular
> allows some neat functionality in tika-python:
> https://github.com/chrismattmann/tika-python/pull/67
>
>
> Anything else to try and get into the release?
>
> If not, I’ll produce an RC #1 by end of weekend.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet
> Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> --
>
>
> This email communication (including any attachments) contains information
> from Answers Corporation or its affiliates that is confidential and may be
> privileged. The information contained herein is intended only for the use
> of the addressee(s) named above. If you are not the intended recipient (or
> the agent responsible to deliver it to the intended recipient), you are
> hereby notified that any dissemination, distribution, use, or copying of
> this communication is strictly prohibited. If you have received this email
> in error, please immediately reply to sender, delete the message and
> destroy all copies of it. If you have questions, please email
> legal@answers.com.
>
> If you wish to unsubscribe to commercial emails from Answers and its
> affiliates, please go to the Answers Subscription Center
> http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>
> --
>
>
> This email communication (including any attachments) contains information
> from Answers Corporation or its affiliates that is confidential and may be
> privileged. The information contained herein is intended only for the use
> of the addressee(s) named above. If you are not the intended recipient (or
> the agent responsible to deliver it to the intended recipient), you are
> hereby notified that any dissemination, distribution, use, or copying of
> this communication is strictly prohibited. If you have received this email
> in error, please immediately reply to sender, delete the message and
> destroy all copies of it. If you have questions, please email
> legal@answers.com.
>
> If you wish to unsubscribe to commercial emails from Answers and its
> affiliates, please go to the Answers Subscription Center
> http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>
-- 
Best regards,
Konstantin Gribov

RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank _you_ for all of your work in modernizing us.  With your efforts, we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :)

>>Regarding FilenameUtils.getName() - I believe that its functionality can be replaced by Path.getFileName() - and in a platform-aware manner, as each JVM distribution comes with a specific provider implementation for the OS it's for.

I agree that we should use that anytime we're interacting with the file system.  

However, that's actually the problem for paths that are stored within the document (say, an embedded resource).  Let's say a user creates a file on Windows, the file path information for the embedded file (depending on the parser and the file format) may be in Windows-ese, which is a  problem if you try to use Path.getFileName() (I think... I haven't actually tested this) on a Linux machine.  I have actually tested this with the old File getName(), and it did not work cross-platform IIRC.

In short, Tika needs to have the ability to extract the file name from a path that was created on any platform (including old Mac and its ":" separator) while Tika is running on any platform.

-----Original Message-----
From: Yaniv Kunda [mailto:yaniv.kunda@answers.com] 
Sent: Monday, September 21, 2015 11:31 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

Thanks for the positive spirit!

Regarding FilenameUtils.getName() - I believe that its functionality can be replaced by Path.getFileName() - and in a platform-aware manner, as each JVM distribution comes with a specific provider implementation for the OS it's for.

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, September 21, 2015 14:27
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

+1, it would be great to move a bit more into EOL'd Java 7 asap.

I'll take TIKA-1734 by tomorrow EDT.

As for the other 2, I'm personally ok waiting for 1.12, but I defer to the dev community.

Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on TIKA-1726, that might help move things forward.

On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's point about duplication of code, bloat within Tika and missing out on
updates.   Aside from one small bit of code I'd like to keep or perhaps try
to move into commons-io (?)[0], I think I'm now +1 to going forward with
TIKA-1706 in core...unless there is a -1 from the community.

Best,

             Tim


[1] I added some customizations for old MAC OS behavior (treat ":" as file
separator) in FileNameUtils.getName() that I don't want to lose.


-----Original Message-----
From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
Sent: Sunday, September 20, 2015 7:15 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

I would really like to push the following:

https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to tika-core This requires a decision to re-include commons-io as a dependency of tika-core.
All the pros and cons have been already debated, but no decision has been made.

https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Since this adds new methods to the public API, I requested the group to make a decision about the new names - but have not received something definite.
However, I did create a subtask -
https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in TemporaryResources - using [~tallison]'s suggestion, which has not been committed yet.

If decisions are made on the above issues, I can quickly create patches for them.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Saturday, September 19, 2015 08:10
To: dev@tika.apache.org
Subject: [DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 


This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email legal@answers.com.

If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.

-- 


This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.

RE: [DISCUSS] Release Tika 1.11?

Posted by Yaniv Kunda <ya...@answers.com>.
Thanks for the positive spirit!

Regarding FilenameUtils.getName() - I believe that its functionality can be
replaced by Path.getFileName() - and in a platform-aware manner, as each JVM
distribution comes with a specific provider implementation for the OS it's
for.

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, September 21, 2015 14:27
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

+1, it would be great to move a bit more into EOL'd Java 7 asap.

I'll take TIKA-1734 by tomorrow EDT.

As for the other 2, I'm personally ok waiting for 1.12, but I defer to the
dev community.

Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on
TIKA-1726, that might help move things forward.

On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's
point about duplication of code, bloat within Tika and missing out on
updates.   Aside from one small bit of code I'd like to keep or perhaps try
to move into commons-io (?)[0], I think I'm now +1 to going forward with
TIKA-1706 in core...unless there is a -1 from the community.

Best,

             Tim


[1] I added some customizations for old MAC OS behavior (treat ":" as file
separator) in FileNameUtils.getName() that I don't want to lose.


-----Original Message-----
From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
Sent: Sunday, September 20, 2015 7:15 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

I would really like to push the following:

https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to
tika-core This requires a decision to re-include commons-io as a dependency
of tika-core.
All the pros and cons have been already debated, but no decision has been
made.

https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods
that use a java.io.File with methods that use a java.nio.file.Path Since
this adds new methods to the public API, I requested the group to make a
decision about the new names - but have not received something definite.
However, I did create a subtask -
https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in
TemporaryResources - using [~tallison]'s suggestion, which has not been
committed yet.

If decisions are made on the above issues, I can quickly create patches for
them.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Saturday, September 19, 2015 08:10
To: dev@tika.apache.org
Subject: [DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular
allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 


This email communication (including any attachments) contains information
from Answers Corporation or its affiliates that is confidential and may be
privileged. The information contained herein is intended only for the use of
the addressee(s) named above. If you are not the intended recipient (or the
agent responsible to deliver it to the intended recipient), you are hereby
notified that any dissemination, distribution, use, or copying of this
communication is strictly prohibited. If you have received this email in
error, please immediately reply to sender, delete the message and destroy
all copies of it. If you have questions, please email legal@answers.com.

If you wish to unsubscribe to commercial emails from Answers and its
affiliates, please go to the Answers Subscription Center
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Need more coffee [0] = [1]...sorry...

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, September 21, 2015 7:27 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

+1, it would be great to move a bit more into EOL'd Java 7 asap.

I'll take TIKA-1734 by tomorrow EDT.

As for the other 2, I'm personally ok waiting for 1.12, but I defer to the dev community.

Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on TIKA-1726, that might help move things forward.

On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's point about duplication of code, bloat within Tika and missing out on updates.   Aside from one small bit of code I'd like to keep or perhaps try to move into commons-io (?)[0], I think I'm now +1 to going forward with TIKA-1706 in core...unless there is a -1 from the community.

Best,

             Tim


[1] I added some customizations for old MAC OS behavior (treat ":" as file separator) in FileNameUtils.getName() that I don't want to lose.


-----Original Message-----
From: Yaniv Kunda [mailto:yaniv.kunda@answers.com] 
Sent: Sunday, September 20, 2015 7:15 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

I would really like to push the following:

https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to tika-core This requires a decision to re-include commons-io as a dependency of tika-core.
All the pros and cons have been already debated, but no decision has been made.

https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Since this adds new methods to the public API, I requested the group to make a decision about the new names - but have not received something definite.
However, I did create a subtask -
https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in TemporaryResources - using [~tallison]'s suggestion, which has not been committed yet.

If decisions are made on the above issues, I can quickly create patches for them.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Saturday, September 19, 2015 08:10
To: dev@tika.apache.org
Subject: [DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 


This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.

RE: [DISCUSS] Release Tika 1.11?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1, it would be great to move a bit more into EOL'd Java 7 asap.

I'll take TIKA-1734 by tomorrow EDT.

As for the other 2, I'm personally ok waiting for 1.12, but I defer to the dev community.

Chris, Nick, Ray, Ken, Konstantin, if you have a chance to chime in on TIKA-1726, that might help move things forward.

On TIKA-1706, I share Nick's and Jukka's caution, and I also share Yaniv's point about duplication of code, bloat within Tika and missing out on updates.   Aside from one small bit of code I'd like to keep or perhaps try to move into commons-io (?)[0], I think I'm now +1 to going forward with TIKA-1706 in core...unless there is a -1 from the community.

Best,

             Tim


[1] I added some customizations for old MAC OS behavior (treat ":" as file separator) in FileNameUtils.getName() that I don't want to lose.


-----Original Message-----
From: Yaniv Kunda [mailto:yaniv.kunda@answers.com] 
Sent: Sunday, September 20, 2015 7:15 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Release Tika 1.11?

I would really like to push the following:

https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to tika-core This requires a decision to re-include commons-io as a dependency of tika-core.
All the pros and cons have been already debated, but no decision has been made.

https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods that use a java.io.File with methods that use a java.nio.file.Path Since this adds new methods to the public API, I requested the group to make a decision about the new names - but have not received something definite.
However, I did create a subtask -
https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in TemporaryResources - using [~tallison]'s suggestion, which has not been committed yet.

If decisions are made on the above issues, I can quickly create patches for them.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Saturday, September 19, 2015 08:10
To: dev@tika.apache.org
Subject: [DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 


This email communication (including any attachments) contains information from Answers Corporation or its affiliates that is confidential and may be privileged. The information contained herein is intended only for the use of the addressee(s) named above. If you are not the intended recipient (or the agent responsible to deliver it to the intended recipient), you are hereby notified that any dissemination, distribution, use, or copying of this communication is strictly prohibited. If you have received this email in error, please immediately reply to sender, delete the message and destroy all copies of it. If you have questions, please email legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its affiliates, please go to the Answers Subscription Center http://campaigns.answers.com/subscriptions to opt out.  Thank you.

RE: [DISCUSS] Release Tika 1.11?

Posted by Yaniv Kunda <ya...@answers.com>.
I would really like to push the following:

https://issues.apache.org/jira/browse/TIKA-1706 - Bring back commons-io to
tika-core
This requires a decision to re-include commons-io as a dependency of
tika-core.
All the pros and cons have been already debated, but no decision has been
made.

https://issues.apache.org/jira/browse/TIKA-1726 - Augment public methods
that use a java.io.File with methods that use a java.nio.file.Path
Since this adds new methods to the public API, I requested the group to make
a decision about the new names - but have not received something definite.
However, I did create a subtask -
https://issues.apache.org/jira/browse/TIKA-1734 Use java.nio.file.Path in
TemporaryResources - using [~tallison]'s suggestion, which has not been
committed yet.

If decisions are made on the above issues, I can quickly create patches for
them.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Saturday, September 19, 2015 08:10
To: dev@tika.apache.org
Subject: [DISCUSS] Release Tika 1.11?

Hey Guys and Gals,

I’d like to roll a 1.11 release. There is TIKA-1716 which in particular
allows some neat functionality in tika-python:
https://github.com/chrismattmann/tika-python/pull/67


Anything else to try and get into the release?

If not, I’ll produce an RC #1 by end of weekend.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.