You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Christian Göller <go...@innosystec.de> on 2011/09/21 14:28:02 UTC

Release date of tika 1.0 or 0.10

Hello,

can anyone tell me if there is a date for the next TIKA release 1.0 or
0.10 ?

I'd like to release my application for a customer, and need a bugfix
from 1.0.


Best regards
Christian


Re: Release date of tika 1.0 or 0.10

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK committed!  Release away :)

Mike McCandless

http://blog.mikemccandless.com

On Sat, Sep 24, 2011 at 6:30 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> OK I will do that... I *think* it's just a matter of fixing XSLF and
> HSLF parsers to not visit the master slide.
>
> I'll commit that but Nick can you double check that this is correct?  Thanks.
>
> I'll put a TODO to re-enable once we address TIKA-712.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Sep 23, 2011 at 6:03 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hey Mike,
>>
>> That's fine by me. If you could turn it off and commit before this weekend I'd
>> appreciate it.
>>
>> Cheers,
>> Chris
>>
>> On Sep 23, 2011, at 12:26 PM, Michael McCandless wrote:
>>
>>> I think before we release 0.10 we should address TIKA-712?
>>>
>>> I don't think we should hold the release... I think we should just
>>> turn off the new functionality (to extract text from master slides)
>>> for the time being, until we work out how to fix it more correctly,
>>> because right now it's always extracting boilerplate text from the
>>> master slide onto each slide.  Ie put Tika back to what it did before
>>> any of the TIKA-712 commits, for the 0.10 release.
>>>
>>> I've made some progress trying to understand what we can use in the
>>> OOXML format to not extract the boiler plate while keeping what the
>>> user had actually edited, but I'm not done yet and I think what's
>>> committed is worse than the original issue...
>>>
>>> Thoughts?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Wed, Sep 21, 2011 at 11:02 PM, Mattmann, Chris A (388J)
>>> <ch...@jpl.nasa.gov> wrote:
>>>> Hey Jukka,
>>>>
>>>> If everyone is cool with me doing it over the weekend, I'll bust it out,
>>>> no worries. Thanks for getting the RC all prepped up and
>>>> thanks to everyone for the hard work.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
>>>>>> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?
>>>>>
>>>>> As discussed in the other thread, we seem to have a rough consensus to
>>>>> make a 0.10 release pretty soon while we work on perfecting things for
>>>>> the 1.0 release.
>>>>>
>>>>> I think the trunk is pretty much ready to be released already, so I'd
>>>>> suggest we cut the release already this week, for example over the
>>>>> weekend. Chris, do you want to take care of it? I should also have
>>>>> some spare cycles to cut the release if needed.
>>>>>
>>>>> BR,
>>>>>
>>>>> Jukka Zitting
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>

Re: Release date of tika 1.0 or 0.10

Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks Nick.

I'll keep digging on TIKA-712 as to how we can figure out which master
elements should and should not be extracted...

Mike McCandless

http://blog.mikemccandless.com

On Sat, Sep 24, 2011 at 7:07 AM, Nick Burch <ni...@alfresco.com> wrote:
> On Sat, 24 Sep 2011, Michael McCandless wrote:
>>
>> OK I will do that... I *think* it's just a matter of fixing XSLF and HSLF
>> parsers to not visit the master slide.
>
> Yup, your commit looks to have done the trick. Then we just have the longer
> term fix of understanding what we should and should pull over in future!
>
> Nick
>

Re: Release date of tika 1.0 or 0.10

Posted by Nick Burch <ni...@alfresco.com>.
On Sat, 24 Sep 2011, Michael McCandless wrote:
> OK I will do that... I *think* it's just a matter of fixing XSLF and 
> HSLF parsers to not visit the master slide.

Yup, your commit looks to have done the trick. Then we just have the 
longer term fix of understanding what we should and should pull over in 
future!

Nick

Re: Release date of tika 1.0 or 0.10

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I will do that... I *think* it's just a matter of fixing XSLF and
HSLF parsers to not visit the master slide.

I'll commit that but Nick can you double check that this is correct?  Thanks.

I'll put a TODO to re-enable once we address TIKA-712.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Sep 23, 2011 at 6:03 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hey Mike,
>
> That's fine by me. If you could turn it off and commit before this weekend I'd
> appreciate it.
>
> Cheers,
> Chris
>
> On Sep 23, 2011, at 12:26 PM, Michael McCandless wrote:
>
>> I think before we release 0.10 we should address TIKA-712?
>>
>> I don't think we should hold the release... I think we should just
>> turn off the new functionality (to extract text from master slides)
>> for the time being, until we work out how to fix it more correctly,
>> because right now it's always extracting boilerplate text from the
>> master slide onto each slide.  Ie put Tika back to what it did before
>> any of the TIKA-712 commits, for the 0.10 release.
>>
>> I've made some progress trying to understand what we can use in the
>> OOXML format to not extract the boiler plate while keeping what the
>> user had actually edited, but I'm not done yet and I think what's
>> committed is worse than the original issue...
>>
>> Thoughts?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Sep 21, 2011 at 11:02 PM, Mattmann, Chris A (388J)
>> <ch...@jpl.nasa.gov> wrote:
>>> Hey Jukka,
>>>
>>> If everyone is cool with me doing it over the weekend, I'll bust it out,
>>> no worries. Thanks for getting the RC all prepped up and
>>> thanks to everyone for the hard work.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote:
>>>
>>>> Hi,
>>>>
>>>> On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
>>>>> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?
>>>>
>>>> As discussed in the other thread, we seem to have a rough consensus to
>>>> make a 0.10 release pretty soon while we work on perfecting things for
>>>> the 1.0 release.
>>>>
>>>> I think the trunk is pretty much ready to be released already, so I'd
>>>> suggest we cut the release already this week, for example over the
>>>> weekend. Chris, do you want to take care of it? I should also have
>>>> some spare cycles to cut the release if needed.
>>>>
>>>> BR,
>>>>
>>>> Jukka Zitting
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Release date of tika 1.0 or 0.10

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Mike,

That's fine by me. If you could turn it off and commit before this weekend I'd 
appreciate it.

Cheers,
Chris

On Sep 23, 2011, at 12:26 PM, Michael McCandless wrote:

> I think before we release 0.10 we should address TIKA-712?
> 
> I don't think we should hold the release... I think we should just
> turn off the new functionality (to extract text from master slides)
> for the time being, until we work out how to fix it more correctly,
> because right now it's always extracting boilerplate text from the
> master slide onto each slide.  Ie put Tika back to what it did before
> any of the TIKA-712 commits, for the 0.10 release.
> 
> I've made some progress trying to understand what we can use in the
> OOXML format to not extract the boiler plate while keeping what the
> user had actually edited, but I'm not done yet and I think what's
> committed is worse than the original issue...
> 
> Thoughts?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Wed, Sep 21, 2011 at 11:02 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hey Jukka,
>> 
>> If everyone is cool with me doing it over the weekend, I'll bust it out,
>> no worries. Thanks for getting the RC all prepped up and
>> thanks to everyone for the hard work.
>> 
>> Cheers,
>> Chris
>> 
>> On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote:
>> 
>>> Hi,
>>> 
>>> On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
>>>> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?
>>> 
>>> As discussed in the other thread, we seem to have a rough consensus to
>>> make a 0.10 release pretty soon while we work on perfecting things for
>>> the 1.0 release.
>>> 
>>> I think the trunk is pretty much ready to be released already, so I'd
>>> suggest we cut the release already this week, for example over the
>>> weekend. Chris, do you want to take care of it? I should also have
>>> some spare cycles to cut the release if needed.
>>> 
>>> BR,
>>> 
>>> Jukka Zitting
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Release date of tika 1.0 or 0.10

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think before we release 0.10 we should address TIKA-712?

I don't think we should hold the release... I think we should just
turn off the new functionality (to extract text from master slides)
for the time being, until we work out how to fix it more correctly,
because right now it's always extracting boilerplate text from the
master slide onto each slide.  Ie put Tika back to what it did before
any of the TIKA-712 commits, for the 0.10 release.

I've made some progress trying to understand what we can use in the
OOXML format to not extract the boiler plate while keeping what the
user had actually edited, but I'm not done yet and I think what's
committed is worse than the original issue...

Thoughts?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 21, 2011 at 11:02 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hey Jukka,
>
> If everyone is cool with me doing it over the weekend, I'll bust it out,
> no worries. Thanks for getting the RC all prepped up and
> thanks to everyone for the hard work.
>
> Cheers,
> Chris
>
> On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote:
>
>> Hi,
>>
>> On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
>>> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?
>>
>> As discussed in the other thread, we seem to have a rough consensus to
>> make a 0.10 release pretty soon while we work on perfecting things for
>> the 1.0 release.
>>
>> I think the trunk is pretty much ready to be released already, so I'd
>> suggest we cut the release already this week, for example over the
>> weekend. Chris, do you want to take care of it? I should also have
>> some spare cycles to cut the release if needed.
>>
>> BR,
>>
>> Jukka Zitting
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Release date of tika 1.0 or 0.10

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Jukka,

If everyone is cool with me doing it over the weekend, I'll bust it out, 
no worries. Thanks for getting the RC all prepped up and 
thanks to everyone for the hard work.

Cheers,
Chris

On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote:

> Hi,
> 
> On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
>> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?
> 
> As discussed in the other thread, we seem to have a rough consensus to
> make a 0.10 release pretty soon while we work on perfecting things for
> the 1.0 release.
> 
> I think the trunk is pretty much ready to be released already, so I'd
> suggest we cut the release already this week, for example over the
> weekend. Chris, do you want to take care of it? I should also have
> some spare cycles to cut the release if needed.
> 
> BR,
> 
> Jukka Zitting


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Release date of tika 1.0 or 0.10

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Sep 21, 2011 at 2:28 PM, Christian Göller <go...@innosystec.de> wrote:
> can anyone tell me if there is a date for the next TIKA release 1.0 or 0.10 ?

As discussed in the other thread, we seem to have a rough consensus to
make a 0.10 release pretty soon while we work on perfecting things for
the 1.0 release.

I think the trunk is pretty much ready to be released already, so I'd
suggest we cut the release already this week, for example over the
weekend. Chris, do you want to take care of it? I should also have
some spare cycles to cut the release if needed.

BR,

Jukka Zitting