You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Andreas Beeker <ki...@apache.org> on 2020/02/03 21:14:49 UTC

[VOTE] Apache POI 4.1.2 release (RC2)

Hi *,

I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).

The most notable changes in this release are:

- XDDF - some work on better chart support
- Common SL / EMF - ongoing rendering fixes - see #60656
- XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency to SparseBitSet 1.2 - see #64015

https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/

The heap problem is solved and back to the old values - Thank you Dominik.
I've added various XSLF fixes which I'd like to be mass tested.

Please vote to release the artifacts.
The vote keeps open for 72hrs after the regression results are available.
Planned release announcement date is Friday, 2020-02-10.

Andi




Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Axel Howind <ax...@dua3.com>.
+1

> On 3. Feb 2020, at 22:14, Andreas Beeker <ki...@apache.org> wrote:
> 
> Hi *,
> 
> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> 
> The most notable changes in this release are:
> 
> - XDDF - some work on better chart support
> - Common SL / EMF - ongoing rendering fixes - see #60656
> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency to SparseBitSet 1.2 - see #64015
> 
> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> 
> The heap problem is solved and back to the old values - Thank you Dominik.
> I've added various XSLF fixes which I'd like to be mass tested.
> 
> Please vote to release the artifacts.
> The vote keeps open for 72hrs after the regression results are available.
> Planned release announcement date is Friday, 2020-02-10.
> 
> Andi
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
Y. Please do use the rackspace vm and let me know if there is anything that
can be improved.

I’m thinking about setting up a second w a modern version of Ubuntu and
then phasing

On Thu, Feb 6, 2020 at 7:48 AM Dominik Stadler <do...@gmx.at>
wrote:

> Hi,
>
> I'm slowly trying to make this a more documented, automated and repeatable
> process to allow others to run it as well if necessary. Some of the tooling
> is moved into POI Sources unter "src/integrationtest" already nowadays,
> there is a separate project which executes the actual testing by calling
> POI and collecting results into a SQL-Database for fetching results and
> building the result-HTML. I plan to release that somewhere, not sure if POI
> Sources are the right place or a separate project, e.g. on GitHub would be
> fine for this.
>
> Execution wise it is already as simple as setting up the version to test
> and then "./run.sh", the main obstacle for running it elsewhere is the
> >2mio corpus of documents that I collected from various sources over time.
>
> We also have a VM from RackSpace that we tried to start using for that
> before, I can take a look again if this VM is suitable for such runs.
>
> Dominik.
>
>
> On Thu, Feb 6, 2020 at 5:02 AM Dave Fisher <wa...@comcast.net> wrote:
>
> > Hi -
> >
> > What resources would be needed to replace Dominick’s equipment? Depending
> > on the resources it might be possible to have ASF infrastructure
> provision
> > a VM to be managed by POI PMC.
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> > > On Feb 5, 2020, at 3:38 PM, Andreas Beeker <ki...@apache.org>
> wrote:
> > >
> > > Hi Tim,
> > >
> > > do you also perform a crawl/regression test? ... now that Dominiks
> > equipment is unavailable.
> > >
> > > Andi
> > >
> > >> On 05.02.20 01:05, Tim Allison wrote:
> > >> +1
> > >>
> > >> built without surprises, digests check out and Tika builds.  Thank
> you,
> > >> Andi and team!
> > >>
> > >>> On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
> > wrote:
> > >>>
> > >>> +1 ... the NOTICE file was still on 2019, but I don't think this
> > matters.
> > >>> Apart of it, my sample application works.
> > >>>
> > >>> On 03.02.20 22:55, PJ Fanning wrote:
> > >>>> +1 builds working and APIs are stable
> > >>>>    On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
> > >>> kiwiwings@apache.org> wrote:
> > >>>> Hi *,
> > >>>>
> > >>>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> > >>>>
> > >>>> The most notable changes in this release are:
> > >>>>
> > >>>> - XDDF - some work on better chart support
> > >>>> - Common SL / EMF - ongoing rendering fixes - see #60656
> > >>>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new
> dependency
> > >>> to SparseBitSet 1.2 - see #64015
> > >>>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> > >>>>
> > >>>> The heap problem is solved and back to the old values - Thank you
> > >>> Dominik.
> > >>>> I've added various XSLF fixes which I'd like to be mass tested.
> > >>>>
> > >>>> Please vote to release the artifacts.
> > >>>> The vote keeps open for 72hrs after the regression results are
> > available.
> > >>>> Planned release announcement date is Friday, 2020-02-10.
> > >>>>
> > >>>> Andi
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> > For additional commands, e-mail: dev-help@poi.apache.org
> >
> >
>

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

I'm slowly trying to make this a more documented, automated and repeatable
process to allow others to run it as well if necessary. Some of the tooling
is moved into POI Sources unter "src/integrationtest" already nowadays,
there is a separate project which executes the actual testing by calling
POI and collecting results into a SQL-Database for fetching results and
building the result-HTML. I plan to release that somewhere, not sure if POI
Sources are the right place or a separate project, e.g. on GitHub would be
fine for this.

Execution wise it is already as simple as setting up the version to test
and then "./run.sh", the main obstacle for running it elsewhere is the
>2mio corpus of documents that I collected from various sources over time.

We also have a VM from RackSpace that we tried to start using for that
before, I can take a look again if this VM is suitable for such runs.

Dominik.


On Thu, Feb 6, 2020 at 5:02 AM Dave Fisher <wa...@comcast.net> wrote:

> Hi -
>
> What resources would be needed to replace Dominick’s equipment? Depending
> on the resources it might be possible to have ASF infrastructure provision
> a VM to be managed by POI PMC.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
> > On Feb 5, 2020, at 3:38 PM, Andreas Beeker <ki...@apache.org> wrote:
> >
> > Hi Tim,
> >
> > do you also perform a crawl/regression test? ... now that Dominiks
> equipment is unavailable.
> >
> > Andi
> >
> >> On 05.02.20 01:05, Tim Allison wrote:
> >> +1
> >>
> >> built without surprises, digests check out and Tika builds.  Thank you,
> >> Andi and team!
> >>
> >>> On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
> wrote:
> >>>
> >>> +1 ... the NOTICE file was still on 2019, but I don't think this
> matters.
> >>> Apart of it, my sample application works.
> >>>
> >>> On 03.02.20 22:55, PJ Fanning wrote:
> >>>> +1 builds working and APIs are stable
> >>>>    On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
> >>> kiwiwings@apache.org> wrote:
> >>>> Hi *,
> >>>>
> >>>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> >>>>
> >>>> The most notable changes in this release are:
> >>>>
> >>>> - XDDF - some work on better chart support
> >>>> - Common SL / EMF - ongoing rendering fixes - see #60656
> >>>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
> >>> to SparseBitSet 1.2 - see #64015
> >>>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> >>>>
> >>>> The heap problem is solved and back to the old values - Thank you
> >>> Dominik.
> >>>> I've added various XSLF fixes which I'd like to be mass tested.
> >>>>
> >>>> Please vote to release the artifacts.
> >>>> The vote keeps open for 72hrs after the regression results are
> available.
> >>>> Planned release announcement date is Friday, 2020-02-10.
> >>>>
> >>>> Andi
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

[DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Dave Fisher <wa...@comcast.net>.
Hi -

What resources would be needed to replace Dominick’s equipment? Depending on the resources it might be possible to have ASF infrastructure provision a VM to be managed by POI PMC.

Regards,
Dave

Sent from my iPhone

> On Feb 5, 2020, at 3:38 PM, Andreas Beeker <ki...@apache.org> wrote:
> 
> Hi Tim,
> 
> do you also perform a crawl/regression test? ... now that Dominiks equipment is unavailable.
> 
> Andi
> 
>> On 05.02.20 01:05, Tim Allison wrote:
>> +1
>> 
>> built without surprises, digests check out and Tika builds.  Thank you,
>> Andi and team!
>> 
>>> On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org> wrote:
>>> 
>>> +1 ... the NOTICE file was still on 2019, but I don't think this matters.
>>> Apart of it, my sample application works.
>>> 
>>> On 03.02.20 22:55, PJ Fanning wrote:
>>>> +1 builds working and APIs are stable
>>>>    On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
>>> kiwiwings@apache.org> wrote:
>>>> Hi *,
>>>> 
>>>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
>>>> 
>>>> The most notable changes in this release are:
>>>> 
>>>> - XDDF - some work on better chart support
>>>> - Common SL / EMF - ongoing rendering fixes - see #60656
>>>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
>>> to SparseBitSet 1.2 - see #64015
>>>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
>>>> 
>>>> The heap problem is solved and back to the old values - Thank you
>>> Dominik.
>>>> I've added various XSLF fixes which I'd like to be mass tested.
>>>> 
>>>> Please vote to release the artifacts.
>>>> The vote keeps open for 72hrs after the regression results are available.
>>>> Planned release announcement date is Friday, 2020-02-10.
>>>> 
>>>> Andi
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

I could now fix my hardware and do a re-run, see results at the following
locations, one NPE shows up, but I did not see how it could be related to
any changes, so I believe it is due to more testing compared to 4.1.1.


   - 4.1.1-RC2 to 4.1.2-RC3
   <http://people.apache.org/~centic/poi_regression/reports/index411RC2to412RC3.html>
   - 4.1.2-RC3-All
   <http://people.apache.org/~centic/poi_regression/reportsAll/index411RC2to412RC3.html>


So based on this I am +1

Dominik.


On Sat, Feb 8, 2020 at 11:30 PM Tim Allison <ta...@apache.org> wrote:

> I’m afk, but it looked like there was a regression on old excel
> files...check new exceptions file.
>
> I didn’t get a chance to confirm this wasn’t a Tika artifact w pure POI,
> but I did confirm at the Tika level on one govdocs file that there was a
> new exception.
>
> Sorry I don’t have more info and I’m afk.
>
> On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker <ki...@apache.org>
> wrote:
>
> > Hi *,
> >
> > just to be sure ... I'm waiting for Tims second +1 or should I release
> the
> > artifacts?
> > I.e. as far as I understand the reports we only have marginal
> differences.
> >
> > Andi
> >
> > On 07.02.20 13:05, Tim Allison wrote:
> > > Hi All,,
> > >   I haven't had the chance to look, but will do so later today::
> > > http://162.242.228.174/reports/poi_4.1.2_reports.tgz
> > >
> >
> >
> >
>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Andreas Beeker <ki...@apache.org>.
I'd be happier if you voted -1, that would have made the decision easier for me.
I'm -1 now ... Sorry - you (PMCs) are always welcomed to jump in as a release manager.

So we have 550 failing files or new exceptions, that could be just a small amount - but at least it's a regression.
I've decided to roll another RC - this is not much of an effort for me and saves a bit of discussion later.
I don't think we need another regression run.

Andi.


On 10.02.20 18:58, Tim Allison wrote:
> Sorry for the late reply.   See Bug 64130 for a regression in parsing old
> excel spreadsheets that have worksheets without names.  There were about
> 550 new exceptions caused by this in our regression corpus.
>
> On Sat, Feb 8, 2020 at 5:30 PM Tim Allison <ta...@apache.org> wrote:
>
>> I’m afk, but it looked like there was a regression on old excel
>> files...check new exceptions file.
>>
>> I didn’t get a chance to confirm this wasn’t a Tika artifact w pure POI,
>> but I did confirm at the Tika level on one govdocs file that there was a
>> new exception.
>>
>> Sorry I don’t have more info and I’m afk.
>>
>> On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker <ki...@apache.org>
>> wrote:
>>
>>> Hi *,
>>>
>>> just to be sure ... I'm waiting for Tims second +1 or should I release
>>> the artifacts?
>>> I.e. as far as I understand the reports we only have marginal differences.
>>>
>>> Andi
>>>
>>> On 07.02.20 13:05, Tim Allison wrote:
>>>> Hi All,,
>>>>   I haven't had the chance to look, but will do so later today::
>>>> http://162.242.228.174/reports/poi_4.1.2_reports.tgz
>>>>
>>>
>>>



Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
Sorry for the late reply.   See Bug 64130 for a regression in parsing old
excel spreadsheets that have worksheets without names.  There were about
550 new exceptions caused by this in our regression corpus.

On Sat, Feb 8, 2020 at 5:30 PM Tim Allison <ta...@apache.org> wrote:

> I’m afk, but it looked like there was a regression on old excel
> files...check new exceptions file.
>
> I didn’t get a chance to confirm this wasn’t a Tika artifact w pure POI,
> but I did confirm at the Tika level on one govdocs file that there was a
> new exception.
>
> Sorry I don’t have more info and I’m afk.
>
> On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker <ki...@apache.org>
> wrote:
>
>> Hi *,
>>
>> just to be sure ... I'm waiting for Tims second +1 or should I release
>> the artifacts?
>> I.e. as far as I understand the reports we only have marginal differences.
>>
>> Andi
>>
>> On 07.02.20 13:05, Tim Allison wrote:
>> > Hi All,,
>> >   I haven't had the chance to look, but will do so later today::
>> > http://162.242.228.174/reports/poi_4.1.2_reports.tgz
>> >
>>
>>
>>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
I’m afk, but it looked like there was a regression on old excel
files...check new exceptions file.

I didn’t get a chance to confirm this wasn’t a Tika artifact w pure POI,
but I did confirm at the Tika level on one govdocs file that there was a
new exception.

Sorry I don’t have more info and I’m afk.

On Sat, Feb 8, 2020 at 1:21 PM Andreas Beeker <ki...@apache.org> wrote:

> Hi *,
>
> just to be sure ... I'm waiting for Tims second +1 or should I release the
> artifacts?
> I.e. as far as I understand the reports we only have marginal differences.
>
> Andi
>
> On 07.02.20 13:05, Tim Allison wrote:
> > Hi All,,
> >   I haven't had the chance to look, but will do so later today::
> > http://162.242.228.174/reports/poi_4.1.2_reports.tgz
> >
>
>
>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Andreas Beeker <ki...@apache.org>.
Hi *,

just to be sure ... I'm waiting for Tims second +1 or should I release the artifacts?
I.e. as far as I understand the reports we only have marginal differences.

Andi

On 07.02.20 13:05, Tim Allison wrote:
> Hi All,,
>   I haven't had the chance to look, but will do so later today::
> http://162.242.228.174/reports/poi_4.1.2_reports.tgz
>



Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
On the metadata issue, those rows refer to the embedded bmps in that one
ppt.

TODO: I should add the embedded file name in that metadata value count
details file so that you can see which
embedded file has diff counts.

There are, in fact, fewer metadata items. This is likely caused by a change
in Tika and our dependencies.

This in 1.23:
"Chroma ColorSpaceType": "RGB",
        "Chroma NumChannels": "3",
        "Compression CompressionTypeName": "BI_RGB",
        "Compression Lossless": "true",
        "Content-Type": "image/bmp",
        "Data BitsPerSample": "8 8 8",
        "Data SampleFormat": "UnsignedIntegral",
        "Dimension HorizontalPhysicalPixelSpacing": "0.26462027",
        "Dimension HorizontalPixelSize": "0.26462027",
        "Dimension PixelAspectRatio": "1.0",
        "Dimension VerticalPhysicalPixelSpacing": "0.26462027",
        "Dimension VerticalPixelSize": "0.26462027",
        "Document FormatVersion": "BMP v. 3.x",
        "Transparency Alpha": "none",
        "X-Parsed-By": [
            "org.apache.tika.parser.CompositeParser",
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.image.ImageParser"
        ],
        "X-TIKA:digest:MD5": "d29cb08f19bfd203d5517cdac8f36dd4",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_resource_path": "/embedded-4",
        "X-TIKA:parse_time_millis": "2",
        "height": "1",
        "tiff:BitsPerSample": "8 8 8",
        "tiff:ImageLength": "1",
        "tiff:ImageWidth": "2",
        "width": "2"

This is what we get 1.24-SNAPSHOT and poi 4.1.2:
   "Compression CompressionTypeName": "BI_RGB",
        "Content-Type": "image/bmp",
        "Data BitsPerSample": "8 8 8",
        "Dimension HorizontalPhysicalPixelSpacing": "0.26462027",
        "Dimension PixelAspectRatio": "1.0",
        "Dimension VerticalPhysicalPixelSpacing": "0.26462027",
        "X-Parsed-By": [
            "org.apache.tika.parser.CompositeParser",
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.image.ImageParser"
        ],
        "X-TIKA:digest:MD5": "d29cb08f19bfd203d5517cdac8f36dd4",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_resource_path": "/embedded-4",
        "X-TIKA:parse_time_millis": "3",
        "height": "1",
        "tiff:BitsPerSample": "8 8 8",
        "tiff:ImageLength": "1",
        "tiff:ImageWidth": "2",
        "width": "2"

On Fri, Feb 7, 2020 at 1:31 PM Tim Allison <ta...@apache.org> wrote:

> a) In the SQLs, I see the *_a/*_b tables - so _a is then result of using
> POI 4.1.1 and _b of POI 4.1.2?
> a is tika 1.23 (which used 4.1.1), b is tika 1.x branch with 4.1.2 --
> *WARNING -- diffs we observe may be changes in Tika btwn 1.23 and 1.x
> branch.
>
> b) Are the stats evaluated for both each time or is *_a cached from last
> run?
> I had to rerun 1.23 because I had wiped it out.
>
> b) If a) is true, it's interesting that the attachment-missing* have such
> similar numbers. I would expect one side to outweigh the other.
> That is unexpected.  Aligning attachments is tricky if one version is
> missing a version.  It is possible that this reflects failure to align.
> I'll look into this.
>
> c) I've checked one of metadata diffs (govdocs1/338/338907.ppt) and can't
> reproduce/don't understand the values in the report
> I've put the .json output here:
> http://162.242.228.174/share/338907_ppt.tgz. I haven't looked yet, but
> will.
>
> d) looking at the parse times: there are quite a few .ppt which only take
> 100-400ms in _a whereas in _b it takes them 3-5 sec.
>
> That _may_ be caused by diffs in loads on the m|vm...other stuff going on
> in the jvm.  Parse times per file can vary wildly
> even with the same versions on different runs.  The key for me is the
> rollup by parse time suggests _overall_ for ppt,
> the time is nearly identical.
>
>
>> On 07.02.20 13:05, Tim Allison wrote:
>> > Hi All,,
>> >   I haven't had the chance to look, but will do so later today::
>> > http://162.242.228.174/reports/poi_4.1.2_reports.tgz
>>
>>
>>

Re: [DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
a) In the SQLs, I see the *_a/*_b tables - so _a is then result of using
POI 4.1.1 and _b of POI 4.1.2?
a is tika 1.23 (which used 4.1.1), b is tika 1.x branch with 4.1.2 --
*WARNING -- diffs we observe may be changes in Tika btwn 1.23 and 1.x
branch.

b) Are the stats evaluated for both each time or is *_a cached from last
run?
I had to rerun 1.23 because I had wiped it out.

b) If a) is true, it's interesting that the attachment-missing* have such
similar numbers. I would expect one side to outweigh the other.
That is unexpected.  Aligning attachments is tricky if one version is
missing a version.  It is possible that this reflects failure to align.
I'll look into this.

c) I've checked one of metadata diffs (govdocs1/338/338907.ppt) and can't
reproduce/don't understand the values in the report
I've put the .json output here: http://162.242.228.174/share/338907_ppt.tgz.
I haven't looked yet, but will.

d) looking at the parse times: there are quite a few .ppt which only take
100-400ms in _a whereas in _b it takes them 3-5 sec.

That _may_ be caused by diffs in loads on the m|vm...other stuff going on
in the jvm.  Parse times per file can vary wildly
even with the same versions on different runs.  The key for me is the
rollup by parse time suggests _overall_ for ppt,
the time is nearly identical.


> On 07.02.20 13:05, Tim Allison wrote:
> > Hi All,,
> >   I haven't had the chance to look, but will do so later today::
> > http://162.242.228.174/reports/poi_4.1.2_reports.tgz
>
>
>

[DISCUSS] Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Andreas Beeker <ki...@apache.org>.
Hi Tim,

when you are on to it, please also shed a light on how you compare.

a) In the SQLs, I see the *_a/*_b tables - so _a is then result of using POI 4.1.1 and _b of POI 4.1.2?
b) Are the stats evaluated for both each time or is *_a cached from last run?
b) If a) is true, it's interesting that the attachment-missing* have such similar numbers. I would expect one side to outweigh the other.
c) I've checked one of metadata diffs (govdocs1/338/338907.ppt) and can't reproduce/don't understand the values in the report
d) looking at the parse times: there are quite a few .ppt which only take 100-400ms in _a whereas in _b it takes them 3-5 sec.

Best wishes,
Andi

On 07.02.20 13:05, Tim Allison wrote:
> Hi All,,
>   I haven't had the chance to look, but will do so later today::
> http://162.242.228.174/reports/poi_4.1.2_reports.tgz



Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
Hi All,,
  I haven't had the chance to look, but will do so later today::
http://162.242.228.174/reports/poi_4.1.2_reports.tgz

On Wed, Feb 5, 2020 at 7:47 PM Tim Allison <ta...@apache.org> wrote:

> Might be faster than I thought...results tomorrow...perhaps.
>
> On Wed, Feb 5, 2020 at 5:51 PM Tim Allison <ta...@apache.org> wrote:
>
>> I did not.  I can kick it off now, but with travel and other stuff,
>> wouldn't have results until Monday.  Happy to do so if desired.
>>
>> On Wed, Feb 5, 2020 at 12:38 PM Andreas Beeker <ki...@apache.org>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> do you also perform a crawl/regression test? ... now that Dominiks
>>> equipment is unavailable.
>>>
>>> Andi
>>>
>>> On 05.02.20 01:05, Tim Allison wrote:
>>> > +1
>>> >
>>> > built without surprises, digests check out and Tika builds.  Thank you,
>>> > Andi and team!
>>> >
>>> > On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
>>> wrote:
>>> >
>>> >> +1 ... the NOTICE file was still on 2019, but I don't think this
>>> matters.
>>> >> Apart of it, my sample application works.
>>> >>
>>> >> On 03.02.20 22:55, PJ Fanning wrote:
>>> >>>  +1 builds working and APIs are stable
>>> >>>     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
>>> >> kiwiwings@apache.org> wrote:
>>> >>>  Hi *,
>>> >>>
>>> >>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
>>> >>>
>>> >>> The most notable changes in this release are:
>>> >>>
>>> >>> - XDDF - some work on better chart support
>>> >>> - Common SL / EMF - ongoing rendering fixes - see #60656
>>> >>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new
>>> dependency
>>> >> to SparseBitSet 1.2 - see #64015
>>> >>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
>>> >>>
>>> >>> The heap problem is solved and back to the old values - Thank you
>>> >> Dominik.
>>> >>> I've added various XSLF fixes which I'd like to be mass tested.
>>> >>>
>>> >>> Please vote to release the artifacts.
>>> >>> The vote keeps open for 72hrs after the regression results are
>>> available.
>>> >>> Planned release announcement date is Friday, 2020-02-10.
>>> >>>
>>> >>> Andi
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
Might be faster than I thought...results tomorrow...perhaps.

On Wed, Feb 5, 2020 at 5:51 PM Tim Allison <ta...@apache.org> wrote:

> I did not.  I can kick it off now, but with travel and other stuff,
> wouldn't have results until Monday.  Happy to do so if desired.
>
> On Wed, Feb 5, 2020 at 12:38 PM Andreas Beeker <ki...@apache.org>
> wrote:
>
>> Hi Tim,
>>
>> do you also perform a crawl/regression test? ... now that Dominiks
>> equipment is unavailable.
>>
>> Andi
>>
>> On 05.02.20 01:05, Tim Allison wrote:
>> > +1
>> >
>> > built without surprises, digests check out and Tika builds.  Thank you,
>> > Andi and team!
>> >
>> > On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
>> wrote:
>> >
>> >> +1 ... the NOTICE file was still on 2019, but I don't think this
>> matters.
>> >> Apart of it, my sample application works.
>> >>
>> >> On 03.02.20 22:55, PJ Fanning wrote:
>> >>>  +1 builds working and APIs are stable
>> >>>     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
>> >> kiwiwings@apache.org> wrote:
>> >>>  Hi *,
>> >>>
>> >>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
>> >>>
>> >>> The most notable changes in this release are:
>> >>>
>> >>> - XDDF - some work on better chart support
>> >>> - Common SL / EMF - ongoing rendering fixes - see #60656
>> >>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
>> >> to SparseBitSet 1.2 - see #64015
>> >>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
>> >>>
>> >>> The heap problem is solved and back to the old values - Thank you
>> >> Dominik.
>> >>> I've added various XSLF fixes which I'd like to be mass tested.
>> >>>
>> >>> Please vote to release the artifacts.
>> >>> The vote keeps open for 72hrs after the regression results are
>> available.
>> >>> Planned release announcement date is Friday, 2020-02-10.
>> >>>
>> >>> Andi
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>>
>>
>>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
I did not.  I can kick it off now, but with travel and other stuff,
wouldn't have results until Monday.  Happy to do so if desired.

On Wed, Feb 5, 2020 at 12:38 PM Andreas Beeker <ki...@apache.org> wrote:

> Hi Tim,
>
> do you also perform a crawl/regression test? ... now that Dominiks
> equipment is unavailable.
>
> Andi
>
> On 05.02.20 01:05, Tim Allison wrote:
> > +1
> >
> > built without surprises, digests check out and Tika builds.  Thank you,
> > Andi and team!
> >
> > On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
> wrote:
> >
> >> +1 ... the NOTICE file was still on 2019, but I don't think this
> matters.
> >> Apart of it, my sample application works.
> >>
> >> On 03.02.20 22:55, PJ Fanning wrote:
> >>>  +1 builds working and APIs are stable
> >>>     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
> >> kiwiwings@apache.org> wrote:
> >>>  Hi *,
> >>>
> >>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> >>>
> >>> The most notable changes in this release are:
> >>>
> >>> - XDDF - some work on better chart support
> >>> - Common SL / EMF - ongoing rendering fixes - see #60656
> >>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
> >> to SparseBitSet 1.2 - see #64015
> >>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> >>>
> >>> The heap problem is solved and back to the old values - Thank you
> >> Dominik.
> >>> I've added various XSLF fixes which I'd like to be mass tested.
> >>>
> >>> Please vote to release the artifacts.
> >>> The vote keeps open for 72hrs after the regression results are
> available.
> >>> Planned release announcement date is Friday, 2020-02-10.
> >>>
> >>> Andi
> >>>
> >>>
> >>>
> >>
> >>
> >>
>
>
>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Andreas Beeker <ki...@apache.org>.
Hi Tim,

do you also perform a crawl/regression test? ... now that Dominiks equipment is unavailable.

Andi

On 05.02.20 01:05, Tim Allison wrote:
> +1
>
> built without surprises, digests check out and Tika builds.  Thank you,
> Andi and team!
>
> On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org> wrote:
>
>> +1 ... the NOTICE file was still on 2019, but I don't think this matters.
>> Apart of it, my sample application works.
>>
>> On 03.02.20 22:55, PJ Fanning wrote:
>>>  +1 builds working and APIs are stable
>>>     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
>> kiwiwings@apache.org> wrote:
>>>  Hi *,
>>>
>>> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
>>>
>>> The most notable changes in this release are:
>>>
>>> - XDDF - some work on better chart support
>>> - Common SL / EMF - ongoing rendering fixes - see #60656
>>> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
>> to SparseBitSet 1.2 - see #64015
>>> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
>>>
>>> The heap problem is solved and back to the old values - Thank you
>> Dominik.
>>> I've added various XSLF fixes which I'd like to be mass tested.
>>>
>>> Please vote to release the artifacts.
>>> The vote keeps open for 72hrs after the regression results are available.
>>> Planned release announcement date is Friday, 2020-02-10.
>>>
>>> Andi
>>>
>>>
>>>
>>
>>
>>



Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Dominik Stadler <do...@gmx.at>.
unfortunately some hardware broke and so I am currently unable to do a
re-run, overall changes in latest build look fine, so I am +-0 unless I get
the hardware fixed.

Dominik

On Wed, Feb 5, 2020, 01:05 Tim Allison <ta...@apache.org> wrote:

> +1
>
> built without surprises, digests check out and Tika builds.  Thank you,
> Andi and team!
>
> On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org>
> wrote:
>
> > +1 ... the NOTICE file was still on 2019, but I don't think this matters.
> > Apart of it, my sample application works.
> >
> > On 03.02.20 22:55, PJ Fanning wrote:
> > >  +1 builds working and APIs are stable
> > >     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
> > kiwiwings@apache.org> wrote:
> > >
> > >  Hi *,
> > >
> > > I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> > >
> > > The most notable changes in this release are:
> > >
> > > - XDDF - some work on better chart support
> > > - Common SL / EMF - ongoing rendering fixes - see #60656
> > > - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
> > to SparseBitSet 1.2 - see #64015
> > >
> > > https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> > >
> > > The heap problem is solved and back to the old values - Thank you
> > Dominik.
> > > I've added various XSLF fixes which I'd like to be mass tested.
> > >
> > > Please vote to release the artifacts.
> > > The vote keeps open for 72hrs after the regression results are
> available.
> > > Planned release announcement date is Friday, 2020-02-10.
> > >
> > > Andi
> > >
> > >
> > >
> >
> >
> >
> >
>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Tim Allison <ta...@apache.org>.
+1

built without surprises, digests check out and Tika builds.  Thank you,
Andi and team!

On Tue, Feb 4, 2020 at 2:20 PM Andreas Beeker <ki...@apache.org> wrote:

> +1 ... the NOTICE file was still on 2019, but I don't think this matters.
> Apart of it, my sample application works.
>
> On 03.02.20 22:55, PJ Fanning wrote:
> >  +1 builds working and APIs are stable
> >     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <
> kiwiwings@apache.org> wrote:
> >
> >  Hi *,
> >
> > I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
> >
> > The most notable changes in this release are:
> >
> > - XDDF - some work on better chart support
> > - Common SL / EMF - ongoing rendering fixes - see #60656
> > - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency
> to SparseBitSet 1.2 - see #64015
> >
> > https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
> >
> > The heap problem is solved and back to the old values - Thank you
> Dominik.
> > I've added various XSLF fixes which I'd like to be mass tested.
> >
> > Please vote to release the artifacts.
> > The vote keeps open for 72hrs after the regression results are available.
> > Planned release announcement date is Friday, 2020-02-10.
> >
> > Andi
> >
> >
> >
>
>
>
>

Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by Andreas Beeker <ki...@apache.org>.
+1 ... the NOTICE file was still on 2019, but I don't think this matters.
Apart of it, my sample application works.

On 03.02.20 22:55, PJ Fanning wrote:
>  +1 builds working and APIs are stable
>     On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <ki...@apache.org> wrote:  
>  
>  Hi *,
>
> I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).
>
> The most notable changes in this release are:
>
> - XDDF - some work on better chart support
> - Common SL / EMF - ongoing rendering fixes - see #60656
> - XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency to SparseBitSet 1.2 - see #64015
>
> https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/
>
> The heap problem is solved and back to the old values - Thank you Dominik.
> I've added various XSLF fixes which I'd like to be mass tested.
>
> Please vote to release the artifacts.
> The vote keeps open for 72hrs after the regression results are available.
> Planned release announcement date is Friday, 2020-02-10.
>
> Andi
>
>
>   




Re: [VOTE] Apache POI 4.1.2 release (RC2)

Posted by PJ Fanning <fa...@yahoo.com.INVALID>.
 +1 builds working and APIs are stable
    On Monday 3 February 2020, 21:14:53 GMT, Andreas Beeker <ki...@apache.org> wrote:  
 
 Hi *,

I've prepared artifacts for the release of Apache POI 4.1.2 (RC2).

The most notable changes in this release are:

- XDDF - some work on better chart support
- Common SL / EMF - ongoing rendering fixes - see #60656
- XSLF - OOM fixes when parsing arbitrary shape ids + a new dependency to SparseBitSet 1.2 - see #64015

https://dist.apache.org/repos/dist/dev/poi/4.1.2-RC2/

The heap problem is solved and back to the old values - Thank you Dominik.
I've added various XSLF fixes which I'd like to be mass tested.

Please vote to release the artifacts.
The vote keeps open for 72hrs after the regression results are available.
Planned release announcement date is Friday, 2020-02-10.

Andi