You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tyler Palsulich <tp...@apache.org> on 2015/03/28 16:01:03 UTC

[DISCUSS] Tika 1.8 or 1.7.1

Hi Folks,

Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
release a new version of Tika. I'll volunteer to be the release manager
again.

Should we release this as 1.8 or 1.7.1?

Does anyone have any last minute issues they'd like to finish and see in
Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
TIKA-1586). Any others?

Have a good weekend,
Tyler

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Once we fix TIKA-1584, I don't have a preference.  I defer to Chris's experience (so I guess, +1 for 1.8) given the amount of work required.

It'd be great if we could make sure we aren't bundling any pdfs in our tika-app jar, too.  Many apologies if that's been fixed!

________________________________________
From: Mattmann, Chris A (3980) <ch...@jpl.nasa.gov>
Sent: Saturday, March 28, 2015 11:41 AM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

Hi Tyler - I would VOTE for 1.8. Given the stuff associated
with releasing (updating the website; sending emails; waiting
periods, etc.) let’s ship all the updates we have too along
with the jhighlight fix.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@apache.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Saturday, March 28, 2015 at 8:01 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: [DISCUSS] Tika 1.8 or 1.7.1

>Hi Folks,
>
>Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
>release a new version of Tika. I'll volunteer to be the release manager
>again.
>
>Should we release this as 1.8 or 1.7.1?
>
>Does anyone have any last minute issues they'd like to finish and see in
>Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
>TIKA-1586). Any others?
>
>Have a good weekend,
>Tyler


Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Oleg Tikhonov <ol...@apache.org>.
+1 for 1.8 release.
On 29 Mar 2015 02:04, "Konstantin Gribov" <gr...@gmail.com> wrote:

> Also, I think, we should resolve TIKA-1575 (upgrade to pdfbox 1.8.9) since
> pdfbox 1.8.8 hangs on some pdf forms.
>
> --
> Best regards,
> Konstantin Gribov
>
> сб, 28 марта 2015 г. в 23:22, Konstantin Gribov <gr...@gmail.com>:
>
> > +1 to releasing 1.8.
> >
> > --
> > Best regards,
> > Konstantin Gribov
> >
> > сб, 28 марта 2015, 22:25, Tyler Palsulich <tp...@apache.org>:
> >
> > I'm also leaning toward 1.8. Especially given the newly identified
> >> regression in TIKA-1584.
> >>
> >> Tyler
> >> On Mar 28, 2015 11:47 AM, "Mattmann, Chris A (3980)" <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>
> >> > Hi Tyler - I would VOTE for 1.8. Given the stuff associated
> >> > with releasing (updating the website; sending emails; waiting
> >> > periods, etc.) let’s ship all the updates we have too along
> >> > with the jhighlight fix.
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Chris Mattmann, Ph.D.
> >> > Chief Architect
> >> > Instrument Software and Science Data Systems Section (398)
> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 168-519, Mailstop: 168-527
> >> > Email: chris.a.mattmann@nasa.gov
> >> > WWW:  http://sunset.usc.edu/~mattmann/
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Adjunct Associate Professor, Computer Science Department
> >> > University of Southern California, Los Angeles, CA 90089 USA
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Tyler Palsulich <tp...@apache.org>
> >> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> > Date: Saturday, March 28, 2015 at 8:01 AM
> >> > To: "dev@tika.apache.org" <de...@tika.apache.org>
> >> > Subject: [DISCUSS] Tika 1.8 or 1.7.1
> >> >
> >> > >Hi Folks,
> >> > >
> >> > >Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need
> >> to
> >> > >release a new version of Tika. I'll volunteer to be the release
> manager
> >> > >again.
> >> > >
> >> > >Should we release this as 1.8 or 1.7.1?
> >> > >
> >> > >Does anyone have any last minute issues they'd like to finish and see
> >> in
> >> > >Tika 1.X? I'd like to get the example working with CORS (TIKA-1585
> and
> >> > >TIKA-1586). Any others?
> >> > >
> >> > >Have a good weekend,
> >> > >Tyler
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Konstantin Gribov <gr...@gmail.com>.
Also, I think, we should resolve TIKA-1575 (upgrade to pdfbox 1.8.9) since
pdfbox 1.8.8 hangs on some pdf forms.

-- 
Best regards,
Konstantin Gribov

сб, 28 марта 2015 г. в 23:22, Konstantin Gribov <gr...@gmail.com>:

> +1 to releasing 1.8.
>
> --
> Best regards,
> Konstantin Gribov
>
> сб, 28 марта 2015, 22:25, Tyler Palsulich <tp...@apache.org>:
>
> I'm also leaning toward 1.8. Especially given the newly identified
>> regression in TIKA-1584.
>>
>> Tyler
>> On Mar 28, 2015 11:47 AM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>> > Hi Tyler - I would VOTE for 1.8. Given the stuff associated
>> > with releasing (updating the website; sending emails; waiting
>> > periods, etc.) let’s ship all the updates we have too along
>> > with the jhighlight fix.
>> >
>> > Cheers,
>> > Chris
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398)
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Associate Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Tyler Palsulich <tp...@apache.org>
>> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>> > Date: Saturday, March 28, 2015 at 8:01 AM
>> > To: "dev@tika.apache.org" <de...@tika.apache.org>
>> > Subject: [DISCUSS] Tika 1.8 or 1.7.1
>> >
>> > >Hi Folks,
>> > >
>> > >Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need
>> to
>> > >release a new version of Tika. I'll volunteer to be the release manager
>> > >again.
>> > >
>> > >Should we release this as 1.8 or 1.7.1?
>> > >
>> > >Does anyone have any last minute issues they'd like to finish and see
>> in
>> > >Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
>> > >TIKA-1586). Any others?
>> > >
>> > >Have a good weekend,
>> > >Tyler
>> >
>> >
>>
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Konstantin Gribov <gr...@gmail.com>.
+1 to releasing 1.8.

-- 
Best regards,
Konstantin Gribov

сб, 28 марта 2015, 22:25, Tyler Palsulich <tp...@apache.org>:

> I'm also leaning toward 1.8. Especially given the newly identified
> regression in TIKA-1584.
>
> Tyler
> On Mar 28, 2015 11:47 AM, "Mattmann, Chris A (3980)" <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Hi Tyler - I would VOTE for 1.8. Given the stuff associated
> > with releasing (updating the website; sending emails; waiting
> > periods, etc.) let’s ship all the updates we have too along
> > with the jhighlight fix.
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Tyler Palsulich <tp...@apache.org>
> > Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Date: Saturday, March 28, 2015 at 8:01 AM
> > To: "dev@tika.apache.org" <de...@tika.apache.org>
> > Subject: [DISCUSS] Tika 1.8 or 1.7.1
> >
> > >Hi Folks,
> > >
> > >Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> > >release a new version of Tika. I'll volunteer to be the release manager
> > >again.
> > >
> > >Should we release this as 1.8 or 1.7.1?
> > >
> > >Does anyone have any last minute issues they'd like to finish and see in
> > >Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > >TIKA-1586). Any others?
> > >
> > >Have a good weekend,
> > >Tyler
> >
> >
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Tyler Palsulich <tp...@apache.org>.
I'm also leaning toward 1.8. Especially given the newly identified
regression in TIKA-1584.

Tyler
On Mar 28, 2015 11:47 AM, "Mattmann, Chris A (3980)" <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Tyler - I would VOTE for 1.8. Given the stuff associated
> with releasing (updating the website; sending emails; waiting
> periods, etc.) let’s ship all the updates we have too along
> with the jhighlight fix.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Tyler Palsulich <tp...@apache.org>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Saturday, March 28, 2015 at 8:01 AM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: [DISCUSS] Tika 1.8 or 1.7.1
>
> >Hi Folks,
> >
> >Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> >release a new version of Tika. I'll volunteer to be the release manager
> >again.
> >
> >Should we release this as 1.8 or 1.7.1?
> >
> >Does anyone have any last minute issues they'd like to finish and see in
> >Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> >TIKA-1586). Any others?
> >
> >Have a good weekend,
> >Tyler
>
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Tyler - I would VOTE for 1.8. Given the stuff associated
with releasing (updating the website; sending emails; waiting
periods, etc.) let’s ship all the updates we have too along
with the jhighlight fix.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@apache.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Saturday, March 28, 2015 at 8:01 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: [DISCUSS] Tika 1.8 or 1.7.1

>Hi Folks,
>
>Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
>release a new version of Tika. I'll volunteer to be the release manager
>again.
>
>Should we release this as 1.8 or 1.7.1?
>
>Does anyone have any last minute issues they'd like to finish and see in
>Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
>TIKA-1586). Any others?
>
>Have a good weekend,
>Tyler


Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Also I can run the RC on a subset of ImageCat [1] to test the
new RC too when it’s ready.

Cheers,
Chris

[1] https://github.com/chrismattmann/imagecat/


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@gmail.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Monday, March 30, 2015 at 3:22 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

>I just remembered TIKA-1509 and TIKA-1558 -- testing now for blacklist
>functionality through TIKA-1509. If that works, I'll back out TIKA-1558.
>
>Tim, I think you should run govdocs from the RC, in case something changes
>between your run and the cut.
>
>Tyler
>
>On Mon, Mar 30, 2015 at 10:17 AM, Allison, Timothy B. <ta...@mitre.org>
>wrote:
>
>> All,
>>
>> I've made the changes that I had hoped to.  Grib pdf exclusion remains
>>for
>> any takers.
>>
>> Let me know when I should initiate the run against govdocs1 to see if
>> there are any surprises on that corpus with Tika 1.8.
>>
>> Best,
>>
>>             Tim
>>
>> -----Original Message-----
>> From: Allison, Timothy B. [mailto:tallison@mitre.org]
>> Sent: Monday, March 30, 2015 7:03 AM
>> To: dev@tika.apache.org
>> Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1
>>
>> Unless there are objections, I'd like these to be resolved before 1.8:
>>
>> TIKA-1584 -- I'll fix
>> TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
>> TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs,
>>but
>> I'll leave this open and do some more digging to see if we need to open
>>a
>> ticket at the POI level
>> TIKA-1511 -- I'll remove "provided" for xerial
>>
>> TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?
>>
>> I'll have these fixes completed by noon EDT.  Should I run against
>> govdocs1 before or after the RC?
>>
>> My last build of Tika app (a few days ago) ballooned to ~43MB, and
>>that's
>> before I add ~3MB for xerial.  Tika server is now ~48MB.  As of my last
>> build, we are still including ~4MB of pdfs (README.NLDAS1.pdf and
>> README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and
>>tika-server
>> jars.
>>
>> Best,
>>
>>               Tim
>>
>>
>>
>> -----Original Message-----
>> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
>> Sent: Sunday, March 29, 2015 9:13 AM
>> To: dev@tika.apache.org
>> Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1
>>
>> Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
>> something else pops up).
>>
>> Thank you everyone.
>>
>> Tyler
>> On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com>
>>wrote:
>>
>> > +1 for 1.8
>> >
>> > Hong-Thai
>> >
>> > > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org>
>> wrote:
>> > >
>> > > Hi Folks,
>> > >
>> > > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we
>>need
>> to
>> > > release a new version of Tika. I'll volunteer to be the release
>>manager
>> > > again.
>> > >
>> > > Should we release this as 1.8 or 1.7.1?
>> > >
>> > > Does anyone have any last minute issues they'd like to finish and
>>see
>> in
>> > > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585
>>and
>> > > TIKA-1586). Any others?
>> > >
>> > > Have a good weekend,
>> > > Tyler
>> >
>>


Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
+1 to running tika-batch and govdocs. Woot.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tyler Palsulich <tp...@gmail.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Monday, March 30, 2015 at 3:22 PM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

>I just remembered TIKA-1509 and TIKA-1558 -- testing now for blacklist
>functionality through TIKA-1509. If that works, I'll back out TIKA-1558.
>
>Tim, I think you should run govdocs from the RC, in case something changes
>between your run and the cut.
>
>Tyler
>
>On Mon, Mar 30, 2015 at 10:17 AM, Allison, Timothy B. <ta...@mitre.org>
>wrote:
>
>> All,
>>
>> I've made the changes that I had hoped to.  Grib pdf exclusion remains
>>for
>> any takers.
>>
>> Let me know when I should initiate the run against govdocs1 to see if
>> there are any surprises on that corpus with Tika 1.8.
>>
>> Best,
>>
>>             Tim
>>
>> -----Original Message-----
>> From: Allison, Timothy B. [mailto:tallison@mitre.org]
>> Sent: Monday, March 30, 2015 7:03 AM
>> To: dev@tika.apache.org
>> Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1
>>
>> Unless there are objections, I'd like these to be resolved before 1.8:
>>
>> TIKA-1584 -- I'll fix
>> TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
>> TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs,
>>but
>> I'll leave this open and do some more digging to see if we need to open
>>a
>> ticket at the POI level
>> TIKA-1511 -- I'll remove "provided" for xerial
>>
>> TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?
>>
>> I'll have these fixes completed by noon EDT.  Should I run against
>> govdocs1 before or after the RC?
>>
>> My last build of Tika app (a few days ago) ballooned to ~43MB, and
>>that's
>> before I add ~3MB for xerial.  Tika server is now ~48MB.  As of my last
>> build, we are still including ~4MB of pdfs (README.NLDAS1.pdf and
>> README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and
>>tika-server
>> jars.
>>
>> Best,
>>
>>               Tim
>>
>>
>>
>> -----Original Message-----
>> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
>> Sent: Sunday, March 29, 2015 9:13 AM
>> To: dev@tika.apache.org
>> Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1
>>
>> Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
>> something else pops up).
>>
>> Thank you everyone.
>>
>> Tyler
>> On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com>
>>wrote:
>>
>> > +1 for 1.8
>> >
>> > Hong-Thai
>> >
>> > > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org>
>> wrote:
>> > >
>> > > Hi Folks,
>> > >
>> > > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we
>>need
>> to
>> > > release a new version of Tika. I'll volunteer to be the release
>>manager
>> > > again.
>> > >
>> > > Should we release this as 1.8 or 1.7.1?
>> > >
>> > > Does anyone have any last minute issues they'd like to finish and
>>see
>> in
>> > > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585
>>and
>> > > TIKA-1586). Any others?
>> > >
>> > > Have a good weekend,
>> > > Tyler
>> >
>>


Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Tyler Palsulich <tp...@gmail.com>.
I just remembered TIKA-1509 and TIKA-1558 -- testing now for blacklist
functionality through TIKA-1509. If that works, I'll back out TIKA-1558.

Tim, I think you should run govdocs from the RC, in case something changes
between your run and the cut.

Tyler

On Mon, Mar 30, 2015 at 10:17 AM, Allison, Timothy B. <ta...@mitre.org>
wrote:

> All,
>
> I've made the changes that I had hoped to.  Grib pdf exclusion remains for
> any takers.
>
> Let me know when I should initiate the run against govdocs1 to see if
> there are any surprises on that corpus with Tika 1.8.
>
> Best,
>
>             Tim
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, March 30, 2015 7:03 AM
> To: dev@tika.apache.org
> Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1
>
> Unless there are objections, I'd like these to be resolved before 1.8:
>
> TIKA-1584 -- I'll fix
> TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
> TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but
> I'll leave this open and do some more digging to see if we need to open a
> ticket at the POI level
> TIKA-1511 -- I'll remove "provided" for xerial
>
> TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?
>
> I'll have these fixes completed by noon EDT.  Should I run against
> govdocs1 before or after the RC?
>
> My last build of Tika app (a few days ago) ballooned to ~43MB, and that's
> before I add ~3MB for xerial.  Tika server is now ~48MB.  As of my last
> build, we are still including ~4MB of pdfs (README.NLDAS1.pdf and
> README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and tika-server
> jars.
>
> Best,
>
>               Tim
>
>
>
> -----Original Message-----
> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> Sent: Sunday, March 29, 2015 9:13 AM
> To: dev@tika.apache.org
> Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1
>
> Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
> something else pops up).
>
> Thank you everyone.
>
> Tyler
> On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com> wrote:
>
> > +1 for 1.8
> >
> > Hong-Thai
> >
> > > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org>
> wrote:
> > >
> > > Hi Folks,
> > >
> > > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need
> to
> > > release a new version of Tika. I'll volunteer to be the release manager
> > > again.
> > >
> > > Should we release this as 1.8 or 1.7.1?
> > >
> > > Does anyone have any last minute issues they'd like to finish and see
> in
> > > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > > TIKA-1586). Any others?
> > >
> > > Have a good weekend,
> > > Tyler
> >
>

RE: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Allison, Timothy B." <ta...@mitre.org>.
All,

I've made the changes that I had hoped to.  Grib pdf exclusion remains for any takers.

Let me know when I should initiate the run against govdocs1 to see if there are any surprises on that corpus with Tika 1.8.

Best,

            Tim

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, March 30, 2015 7:03 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1

Unless there are objections, I'd like these to be resolved before 1.8:

TIKA-1584 -- I'll fix
TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but I'll leave this open and do some more digging to see if we need to open a ticket at the POI level
TIKA-1511 -- I'll remove "provided" for xerial

TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?

I'll have these fixes completed by noon EDT.  Should I run against govdocs1 before or after the RC?

My last build of Tika app (a few days ago) ballooned to ~43MB, and that's before I add ~3MB for xerial.  Tika server is now ~48MB.  As of my last build, we are still including ~4MB of pdfs (README.NLDAS1.pdf and README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and tika-server jars.

Best,

              Tim



-----Original Message-----
From: Tyler Palsulich [mailto:tpalsulich@gmail.com] 
Sent: Sunday, March 29, 2015 9:13 AM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
something else pops up).

Thank you everyone.

Tyler
On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com> wrote:

> +1 for 1.8
>
> Hong-Thai
>
> > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org> wrote:
> >
> > Hi Folks,
> >
> > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> > release a new version of Tika. I'll volunteer to be the release manager
> > again.
> >
> > Should we release this as 1.8 or 1.7.1?
> >
> > Does anyone have any last minute issues they'd like to finish and see in
> > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > TIKA-1586). Any others?
> >
> > Have a good weekend,
> > Tyler
>

RE: [DISCUSS] Tika 1.8 or 1.7.1

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Unless there are objections, I'd like these to be resolved before 1.8:

TIKA-1584 -- I'll fix
TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but I'll leave this open and do some more digging to see if we need to open a ticket at the POI level
TIKA-1511 -- I'll remove "provided" for xerial

TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?

I'll have these fixes completed by noon EDT.  Should I run against govdocs1 before or after the RC?

My last build of Tika app (a few days ago) ballooned to ~43MB, and that's before I add ~3MB for xerial.  Tika server is now ~48MB.  As of my last build, we are still including ~4MB of pdfs (README.NLDAS1.pdf and README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and tika-server jars.

Best,

              Tim



-----Original Message-----
From: Tyler Palsulich [mailto:tpalsulich@gmail.com] 
Sent: Sunday, March 29, 2015 9:13 AM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
something else pops up).

Thank you everyone.

Tyler
On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com> wrote:

> +1 for 1.8
>
> Hong-Thai
>
> > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org> wrote:
> >
> > Hi Folks,
> >
> > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> > release a new version of Tika. I'll volunteer to be the release manager
> > again.
> >
> > Should we release this as 1.8 or 1.7.1?
> >
> > Does anyone have any last minute issues they'd like to finish and see in
> > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > TIKA-1586). Any others?
> >
> > Have a good weekend,
> > Tyler
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Tyler Palsulich <tp...@gmail.com>.
Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
something else pops up).

Thank you everyone.

Tyler
On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <th...@gmail.com> wrote:

> +1 for 1.8
>
> Hong-Thai
>
> > On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org> wrote:
> >
> > Hi Folks,
> >
> > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> > release a new version of Tika. I'll volunteer to be the release manager
> > again.
> >
> > Should we release this as 1.8 or 1.7.1?
> >
> > Does anyone have any last minute issues they'd like to finish and see in
> > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > TIKA-1586). Any others?
> >
> > Have a good weekend,
> > Tyler
>

Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Hong-Thai Nguyen <th...@gmail.com>.
+1 for 1.8

Hong-Thai

> On 28 Mar 2015, at 16:01, Tyler Palsulich <tp...@apache.org> wrote:
> 
> Hi Folks,
> 
> Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> release a new version of Tika. I'll volunteer to be the release manager
> again.
> 
> Should we release this as 1.8 or 1.7.1?
> 
> Does anyone have any last minute issues they'd like to finish and see in
> Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> TIKA-1586). Any others?
> 
> Have a good weekend,
> Tyler

RE: [DISCUSS] Tika 1.8 or 1.7.1

Posted by Ken Krugler <kk...@transpac.com>.
Given how recently we did a 1.7 release, my vote would be for 1.7.1

And to keep this release as simple as possible, just cherry-pick the fix for TIKA-1581 into the 1.7 code base.

-- Ken

> From: Tyler Palsulich
> Sent: March 28, 2015 8:01:03am PDT
> To: dev@tika.apache.org
> Subject: [DISCUSS] Tika 1.8 or 1.7.1
> 
> Hi Folks,
> 
> Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> release a new version of Tika. I'll volunteer to be the release manager
> again.
> 
> Should we release this as 1.8 or 1.7.1?
> 
> Does anyone have any last minute issues they'd like to finish and see in
> Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> TIKA-1586). Any others?
> 
> Have a good weekend,
> Tyler

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Re: [DISCUSS] Tika 1.8 or 1.7.1

Posted by David Meikle <lo...@gmail.com>.
+1 for 1.8

> On 28 Mar 2015, at 15:01, Tyler Palsulich <tp...@apache.org> wrote:
> 
> Should we release this as 1.8 or 1.7.1?