You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Nick Burch <ni...@apache.org> on 2016/03/30 12:17:19 UTC

Beta time yet?

Hi All

I know it's not been very long at all since 3.14, but we've fixed a decent 
number of bugs already, including one for which Tika is awaiting a fix 
(TIKA-1836).

What do people think about doing a beta release in a few weeks time, to 
get those fixes out to our users? I figure we might want a few betas until 
it's worth calling it 3.15 final, but you never know especially if someone 
does something great! :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by Javen O'Neal <ja...@gmail.com>.
This is a new module contained in 2 new files plus a unit test, so it's
understandable that it would be a larger commit.

Your code style and documentation is great, despite Java being a verbose
beast. I'm spending most of my time making sure it meshes well with the
current code base and that functionality isn't duplicated or could be
consolidated.

The main issue (brought up previously) are the overlaps with the class you
considered replacing. For the sake of backwards compatibility, if I could
fit your changes into an existing class without creating a jackalope, it'll
be easiest for implementers to find your features. I'm less familiar with
cell styles in POI, so I'm having to familiarize myself with existing
capabilities before I can figure out how your patch best fits. I understand
your desire to get your code mainlined so you don't have to maintain forks
at your $DAYJOB, and also wanting to contribute to the community.

Thanks for the many months of patience!
On Mar 31, 2016 2:11 PM, "Nick Burch" <ni...@apache.org> wrote:

> On Thu, 31 Mar 2016, Murphy, Mark wrote:
>
>> If you have any questions or comments, I would love to hear your feedback.
>>
>
> I haven't looked at this patch of yours. However...
>
> Several small patches are easier to review and apply than one big, for
> un-related fixes or improvements. However, one big patch is often easier to
> review for a big change. Patches with unit tests are safer than those
> without
>
> Using a git fork can help with some of these things. It's possible to work
> on a bunch of stuff at once, committing locally, then edit (squashing +
> cherry-picking) into a logical set of changes to review. However, you can
> also shoot your own feet off easier with git, so do beware ;-)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

RE: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Thu, 31 Mar 2016, Murphy, Mark wrote:
> If you have any questions or comments, I would love to hear your 
> feedback.

I haven't looked at this patch of yours. However...

Several small patches are easier to review and apply than one big, for 
un-related fixes or improvements. However, one big patch is often easier 
to review for a big change. Patches with unit tests are safer than those 
without

Using a git fork can help with some of these things. It's possible to work 
on a bunch of stuff at once, committing locally, then edit (squashing + 
cherry-picking) into a logical set of changes to review. However, you can 
also shoot your own feet off easier with git, so do beware ;-)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.
If you have any questions or comments, I would love to hear your feedback. I am an experienced programmer (> 25 years) just not with Java. If I am doing something questionable, I would like to discuss it. That helps me learn new things. If there is something I could/should have done a different way, I would like to know what it is, and the reasoning for it.

-----Original Message-----
From: Javen O'Neal [mailto:javenoneal@gmail.com] 
Sent: Thursday, March 31, 2016 12:12 PM
To: POI Developers List
Subject: RE: Beta time yet?

I've been slowly reviewing 58787. It's a larger patch, so takes a little more time to review. I'll try my best to squeeze it in, but no promises.
On Mar 31, 2016 04:01, "Murphy, Mark" <mu...@metalexmfg.com> wrote:

> Can we get bug 58787 into the Beta?
>
> -----Original Message-----
> From: Nick Burch [mailto:nick@apache.org]
> Sent: Wednesday, March 30, 2016 5:50 PM
> To: POI Developers List
> Subject: RE: Beta time yet?
>
> On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> > I added a new report "mime_diffs_A_to_B_details.xlsx", which I'll 
> > send to you privately shortly.  I can share this if anyone wants it, 
> > and it will be included in all future report dumps.
>
> Checking a few from there, I think the detection as only ooxml is due to:
>
> java.lang.IllegalArgumentException: created  : Date not well formated 
> at
>
> org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreated
> Property(PackagePropertiesPart.java:393)
>
> When I run with a POI nightly build + Tika CLI, they're back to being 
> detected as 
> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
> commands, e-mail: dev-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
> commands, e-mail: dev-help@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by Javen O'Neal <ja...@gmail.com>.
I've been slowly reviewing 58787. It's a larger patch, so takes a little
more time to review. I'll try my best to squeeze it in, but no promises.
On Mar 31, 2016 04:01, "Murphy, Mark" <mu...@metalexmfg.com> wrote:

> Can we get bug 58787 into the Beta?
>
> -----Original Message-----
> From: Nick Burch [mailto:nick@apache.org]
> Sent: Wednesday, March 30, 2016 5:50 PM
> To: POI Developers List
> Subject: RE: Beta time yet?
>
> On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> > I added a new report "mime_diffs_A_to_B_details.xlsx", which I'll send
> > to you privately shortly.  I can share this if anyone wants it, and it
> > will be included in all future report dumps.
>
> Checking a few from there, I think the detection as only ooxml is due to:
>
> java.lang.IllegalArgumentException: created  : Date not well formated at
>
> org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:393)
>
> When I run with a POI nightly build + Tika CLI, they're back to being
> detected as
> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
> commands, e-mail: dev-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

RE: Beta time yet?

Posted by "Murphy, Mark" <mu...@metalexmfg.com>.
Can we get bug 58787 into the Beta?

-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Wednesday, March 30, 2016 5:50 PM
To: POI Developers List
Subject: RE: Beta time yet?

On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> I added a new report "mime_diffs_A_to_B_details.xlsx", which I'll send 
> to you privately shortly.  I can share this if anyone wants it, and it 
> will be included in all future report dumps.

Checking a few from there, I think the detection as only ooxml is due to:

java.lang.IllegalArgumentException: created  : Date not well formated at
org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:393)

When I run with a POI nightly build + Tika CLI, they're back to being detected as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> I added a new report "mime_diffs_A_to_B_details.xlsx", which I'll send 
> to you privately shortly.  I can share this if anyone wants it, and it 
> will be included in all future report dumps.

Checking a few from there, I think the detection as only ooxml is due to:

java.lang.IllegalArgumentException: created  : Date not well formated at 
org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:393)

When I run with a POI nightly build + Tika CLI, they're back to being 
detected as 
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
I added a new report "mime_diffs_A_to_B_details.xlsx", which I'll send to you privately shortly.  I can share this if anyone wants it, and it will be included in all future report dumps.

To the file path, prefix:

162.242.228.174/docs

-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Wednesday, March 30, 2016 10:54 AM
To: POI Developers List <de...@poi.apache.org>
Subject: RE: Beta time yet?

On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> The only differences in exceptions were caused by Tika now identifying 
> several files as xlsx vs zip or ooxml...which I can't explain.

I did some work on improving exceptions in 3.14 for POIFS given non-POIFS stuff, and since then on OPCPackage given non-OOXML stuff, so that might help a little bit

Happy to take a quick look at some detection bits - how do I get from mimes/mime_diffs_A_to_B.xlsx to a file to look into?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Wed, 30 Mar 2016, Allison, Timothy B. wrote:
> The only differences in exceptions were caused by Tika now identifying 
> several files as xlsx vs zip or ooxml...which I can't explain.

I did some work on improving exceptions in 3.14 for POIFS given non-POIFS 
stuff, and since then on OPCPackage given non-OOXML stuff, so that might 
help a little bit

Happy to take a quick look at some detection bits - how do I get from 
mimes/mime_diffs_A_to_B.xlsx to a file to look into?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1

I ran a comparison on ooxml before and after replacing the piccolo parser, and I didn't find any new problems.  I did not do any memory profiling or rigorous time tests, though.

The overall run for 3.14-final took 10.3 hours summed across threads (actual wall clock was < 1.5 hours).  The overall run for 3.15-beta1 took 11.9 hours summed across threads.  I'm not willing to draw any firm conclusions from this...could be an artifact of what was going on in other vms or something else.  This is not a rigorous test!

The only differences in exceptions were caused by Tika now identifying several files as xlsx vs zip or ooxml...which I can't explain.

Reports are here:

https://github.com/tballison/share/blob/master/poi_comparisons/poix_3_15-beta1_reports.zip?raw=true



-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at] 
Sent: Wednesday, March 30, 2016 7:05 AM
To: POI Developers List <de...@poi.apache.org>
Subject: Re: Beta time yet?

Hi,

I'm generally in favor of more releases to reduce the amount of time that people need to wait for fixes, our unit-tests and other ways of testing should catch the majority of problems quickly nowadays and if the next release is coming up soon anyway, a regression looses some of it's horror as well...

Dominik.

On Wed, Mar 30, 2016 at 12:17 PM, Nick Burch <ni...@apache.org> wrote:

> Hi All
>
> I know it's not been very long at all since 3.14, but we've fixed a 
> decent number of bugs already, including one for which Tika is 
> awaiting a fix (TIKA-1836).
>
> What do people think about doing a beta release in a few weeks time, 
> to get those fixes out to our users? I figure we might want a few 
> betas until it's worth calling it 3.15 final, but you never know 
> especially if someone does something great! :)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
> commands, e-mail: dev-help@poi.apache.org
>
>

Re: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Wed, 30 Mar 2016, Dominik Stadler wrote:
> I'm generally in favor of more releases to reduce the amount of time that
> people need to wait for fixes, our unit-tests and other ways of testing
> should catch the majority of problems quickly nowadays and if the next
> release is coming up soon anyway, a regression looses some of it's horror
> as well...

One area our unit tests + integrations tests don't catch is things out in 
the wild that people haven't contributed back, and hence where we don't 
have test coverage. Frequent releases helps there!


Speaking of which... Spotted yesterday on stackoverlow, someone had 
contributed there a XWPF fix (which we ought to try to get a proper fix 
in for) which seems to have been broken in 3.14, in case anyone wants to 
have a go at solving it properly:
http://stackoverflow.com/questions/35097640/add-image-into-a-word-docx-document-header-using-poi-xwpf

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

I'm generally in favor of more releases to reduce the amount of time that
people need to wait for fixes, our unit-tests and other ways of testing
should catch the majority of problems quickly nowadays and if the next
release is coming up soon anyway, a regression looses some of it's horror
as well...

Dominik.

On Wed, Mar 30, 2016 at 12:17 PM, Nick Burch <ni...@apache.org> wrote:

> Hi All
>
> I know it's not been very long at all since 3.14, but we've fixed a decent
> number of bugs already, including one for which Tika is awaiting a fix
> (TIKA-1836).
>
> What do people think about doing a beta release in a few weeks time, to
> get those fixes out to our users? I figure we might want a few betas until
> it's worth calling it 3.15 final, but you never know especially if someone
> does something great! :)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
ah, you are right, I did not find that one with the search terms that I was
using...

Dominik.

On Thu, Apr 28, 2016 at 2:22 PM, Allison, Timothy B. <ta...@mitre.org>
wrote:

> >However I unearthed https://bz.apache.org/bugzilla/show_bug.cgi?id=58963
> >and https://bz.apache.org/bugzilla/show_bug.cgi?id=57031 which I think
> were the bugs for the change. Maybe they >contain the files you are looking
> for?
>
> Thank you for the links, Dominik!
>
> I finally dug up what I was trying to earlier...this was the note.
>
>
> >2. An XML parsing related one:
>
> >Caused by: java.lang.ArrayIndexOutOfBoundsException: 8192
>         at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:329)
>
> >This seems to be a bug in the JDK itself in relation to surrogate Unicode
> characters, see e.g.
> >https://bugs.openjdk.java.net/browse/JDK-7156085 and originally
> >https://issues.apache.org/jira/browse/XERCESJ-1257 for more detailed
> discussion.
>
> >Seems only JDK 9 has a fix for this :(
> >However it is very rare, only 6 times in 1 mio documents, so I think it
> outweights the gain from using the JDK XML Parser.
>

RE: Beta time yet?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
>However I unearthed https://bz.apache.org/bugzilla/show_bug.cgi?id=58963
>and https://bz.apache.org/bugzilla/show_bug.cgi?id=57031 which I think were the bugs for the change. Maybe they >contain the files you are looking for?

Thank you for the links, Dominik!  

I finally dug up what I was trying to earlier...this was the note.


>2. An XML parsing related one:

>Caused by: java.lang.ArrayIndexOutOfBoundsException: 8192
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:329)

>This seems to be a bug in the JDK itself in relation to surrogate Unicode characters, see e.g.
>https://bugs.openjdk.java.net/browse/JDK-7156085 and originally
>https://issues.apache.org/jira/browse/XERCESJ-1257 for more detailed discussion.

>Seems only JDK 9 has a fix for this :(
>However it is very rare, only 6 times in 1 mio documents, so I think it outweights the gain from using the JDK XML Parser.

Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
Yep, I think it is some missing class in the reduced schema-jar. If I add a
unit-test that triggers this code, the error goes away.

I will commit this test tomorrow to avoid this issue in the future.

Dominik.

On Fri, Apr 1, 2016 at 11:57 PM, Dominik Stadler <do...@gmx.at>
wrote:

> Hi,
>
> Yes, it is quite strange, I can currently reproduce it locally in the
> regression-test-project without any OOM, but not if I do the same thing in
> POI itself, seem to be somehow related to the full/partial -schema jars, I
> am testing some more to see why it happened here...
>
> If I try to access the array-object one-by-one, I get a better error
> message:
>
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.xmlbeans.impl.values.XmlComplexContentImpl cannot be cast to
> org.openxmlformats.schemas.drawingml.x2006.main.CTGeomGuide
>
> Maybe some piece is missing in the reduced schema-jar because of a missing
> unit-test?
>
> Dominik.
>
> On Fri, Apr 1, 2016 at 11:43 PM, Andreas Beeker <ki...@apache.org>
> wrote:
>
>> Hi Dominik,
>>
>> thanks again for running the regression tests.
>> @issue 1: I've tried to limit the classes to the lite-schemas, but the
>> error doesn't show up.
>> My guess is, the toArray fails when the children aren't returned as gd
>> elements but as xmlanytype [1].
>> Another wild guess is, I think something like this happens, when the
>> xmlbeans schema type classloader
>> is affected by out-of-memory / garbage collector.
>>
>> is this error happening before any of the out-of-memory errors?
>>
>> Btw. I'll wait with releasing until #58787 is fixed or I get a green
>> light from Javen.
>>
>> Andi.
>>
>> [1]
>> http://xmlbeans.996285.n3.nabble.com/ArrayStoreException-in-get-Array-method-in-custom-Ant-task-td1378.html#a1381
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
>>
>

Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

Yes, it is quite strange, I can currently reproduce it locally in the
regression-test-project without any OOM, but not if I do the same thing in
POI itself, seem to be somehow related to the full/partial -schema jars, I
am testing some more to see why it happened here...

If I try to access the array-object one-by-one, I get a better error
message:

Exception in thread "main" java.lang.ClassCastException:
org.apache.xmlbeans.impl.values.XmlComplexContentImpl cannot be cast to
org.openxmlformats.schemas.drawingml.x2006.main.CTGeomGuide

Maybe some piece is missing in the reduced schema-jar because of a missing
unit-test?

Dominik.

On Fri, Apr 1, 2016 at 11:43 PM, Andreas Beeker <ki...@apache.org>
wrote:

> Hi Dominik,
>
> thanks again for running the regression tests.
> @issue 1: I've tried to limit the classes to the lite-schemas, but the
> error doesn't show up.
> My guess is, the toArray fails when the children aren't returned as gd
> elements but as xmlanytype [1].
> Another wild guess is, I think something like this happens, when the
> xmlbeans schema type classloader
> is affected by out-of-memory / garbage collector.
>
> is this error happening before any of the out-of-memory errors?
>
> Btw. I'll wait with releasing until #58787 is fixed or I get a green light
> from Javen.
>
> Andi.
>
> [1]
> http://xmlbeans.996285.n3.nabble.com/ArrayStoreException-in-get-Array-method-in-custom-Ant-task-td1378.html#a1381
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

Re: Beta time yet?

Posted by Andreas Beeker <ki...@apache.org>.
Hi Dominik,

thanks again for running the regression tests.
@issue 1: I've tried to limit the classes to the lite-schemas, but the error doesn't show up.
My guess is, the toArray fails when the children aren't returned as gd elements but as xmlanytype [1].
Another wild guess is, I think something like this happens, when the xmlbeans schema type classloader
is affected by out-of-memory / garbage collector.

is this error happening before any of the out-of-memory errors?

Btw. I'll wait with releasing until #58787 is fixed or I get a green light from Javen.

Andi.

[1] http://xmlbeans.996285.n3.nabble.com/ArrayStoreException-in-get-Array-method-in-custom-Ant-task-td1378.html#a1381


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Beta time yet?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you, Dominik!  For the regression tests and for fixing #1 already.

As for #2, I wonder if we should look into using the actual Apache xerces for the next beta? Not another dependency....

-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at] 
Sent: Friday, April 01, 2016 4:13 PM
To: POI Developers List <de...@poi.apache.org>
Subject: Re: Beta time yet?

Hi,

the POI specific regression test report is now updated for comparing
3.14-beta2 (nearly 3.14 final) with the current 3.15-beta1 (as of Tuesday) at http://people.apache.org/~centic/poi_regression/reports/

Runtime-wise I see that the run did take aprox 25h:15min now compared to 25h before, so I would say no actual change in runtime as I was doing other stuff on the machine for some time and thus might have affected these timings somewhat!

I see two issues:

1.
A XSLF related new error:

java.lang.ArrayStoreException
	at java.util.ArrayList.toArray(ArrayList.java:408)
	at org.openxmlformats.schemas.drawingml.x2006.main.impl.CTGeomGuideListImpl.getGdArray(Unknown
Source)
	at o.a.p.xslf.usermodel.XSLFSimpleShape.getAdjustValue(XSLFSimpleShape.java:823)

Seems like list members with different types are used here so that putting them all into an array fails.
However I could not reproduce this locally, so not sure how it was triggered in the regression testing?!?

2.
An XML parsing related one:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 8192
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:329)

This seems to be a bug in the JDK itself in relation to surrogate Unicode characters, see e.g.
https://bugs.openjdk.java.net/browse/JDK-7156085 and originally
https://issues.apache.org/jira/browse/XERCESJ-1257 for more detailed discussion.

Seems only JDK 9 has a fix for this :(
However it is very rare, only 6 times in 1 mio documents, so I think it outweights the gain from using the JDK XML Parser.

Dominik.


On Wed, Mar 30, 2016 at 4:52 PM, Dominik Stadler <do...@gmx.at>
wrote:

>
> Ok, I'll try to squeeze in a run of my version of the large regression 
> test-suite with the latest trunk as it may cover a few different areas 
> not used in Tika.
>
> Dominik.
>
> On Wed, Mar 30, 2016 at 1:36 PM, kiwiwings <ki...@apache.org> wrote:
>
>> Nick Burch-8 wrote
>> > What do people think about doing a beta release in a few weeks time ...?
>>
>> Last time it took ages between the thread start and the actual release.
>> As the general consensus is anyway having short release cycles, I'll 
>> simply start the releasing trip on the next weekend (Friday evening).
>>
>> Maybe I should spent some time to script more of the manual work ... 
>> - as we use Jenkins here at my $dayjob, I'm used to (nearly) 
>> one-click-releasing the artifacts - I guess something like this would 
>> be possible with Ant too ...
>>
>> Andi
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/Beta-time-yet-tp5722477p57224
>> 81.html Sent from the POI - Dev mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
>> commands, e-mail: dev-help@poi.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

the POI specific regression test report is now updated for comparing
3.14-beta2 (nearly 3.14 final) with the current 3.15-beta1 (as of Tuesday)
at http://people.apache.org/~centic/poi_regression/reports/

Runtime-wise I see that the run did take aprox 25h:15min now compared to
25h before, so I would say no actual change in runtime as I was doing other
stuff on the machine for some time and thus might have affected these
timings somewhat!

I see two issues:

1.
A XSLF related new error:

java.lang.ArrayStoreException
	at java.util.ArrayList.toArray(ArrayList.java:408)
	at org.openxmlformats.schemas.drawingml.x2006.main.impl.CTGeomGuideListImpl.getGdArray(Unknown
Source)
	at o.a.p.xslf.usermodel.XSLFSimpleShape.getAdjustValue(XSLFSimpleShape.java:823)

Seems like list members with different types are used here so that
putting them all into an array fails.
However I could not reproduce this locally, so not sure how it was
triggered in the regression testing?!?

2.
An XML parsing related one:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 8192
	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:329)

This seems to be a bug in the JDK itself in relation to surrogate
Unicode characters, see e.g.
https://bugs.openjdk.java.net/browse/JDK-7156085 and originally
https://issues.apache.org/jira/browse/XERCESJ-1257 for more detailed
discussion.

Seems only JDK 9 has a fix for this :(
However it is very rare, only 6 times in 1 mio documents, so I think
it outweights the gain from using the JDK XML Parser.

Dominik.


On Wed, Mar 30, 2016 at 4:52 PM, Dominik Stadler <do...@gmx.at>
wrote:

>
> Ok, I'll try to squeeze in a run of my version of the large regression
> test-suite with the latest trunk as it may cover a few different areas not
> used in Tika.
>
> Dominik.
>
> On Wed, Mar 30, 2016 at 1:36 PM, kiwiwings <ki...@apache.org> wrote:
>
>> Nick Burch-8 wrote
>> > What do people think about doing a beta release in a few weeks time ...?
>>
>> Last time it took ages between the thread start and the actual release.
>> As the general consensus is anyway having short release cycles,
>> I'll simply start the releasing trip on the next weekend (Friday evening).
>>
>> Maybe I should spent some time to script more of the manual work ... -
>> as we use Jenkins here at my $dayjob, I'm used to (nearly)
>> one-click-releasing
>> the artifacts - I guess something like this would be possible with Ant too
>> ...
>>
>> Andi
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/Beta-time-yet-tp5722477p5722481.html
>> Sent from the POI - Dev mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
>>
>

Re: Beta time yet?

Posted by Dominik Stadler <do...@gmx.at>.
Ok, I'll try to squeeze in a run of my version of the large regression
test-suite with the latest trunk as it may cover a few different areas not
used in Tika.

Dominik.

On Wed, Mar 30, 2016 at 1:36 PM, kiwiwings <ki...@apache.org> wrote:

> Nick Burch-8 wrote
> > What do people think about doing a beta release in a few weeks time ...?
>
> Last time it took ages between the thread start and the actual release.
> As the general consensus is anyway having short release cycles,
> I'll simply start the releasing trip on the next weekend (Friday evening).
>
> Maybe I should spent some time to script more of the manual work ... -
> as we use Jenkins here at my $dayjob, I'm used to (nearly)
> one-click-releasing
> the artifacts - I guess something like this would be possible with Ant too
> ...
>
> Andi
>
>
>
> --
> View this message in context:
> http://apache-poi.1045710.n5.nabble.com/Beta-time-yet-tp5722477p5722481.html
> Sent from the POI - Dev mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

Re: Beta time yet?

Posted by Andreas Beeker <ki...@apache.org>.
Hi,

I'm still fiddling with the release script - and need to try it out, which leads to already changed version-ids.
Currently pinning the documentation (i.e. svn:externals) doesn't work - please be patient with me.

Andi.

On 30.03.2016 23:30, Nick Burch wrote:
> I've had a go at adding some things for this in r1737165. If you run the "init-svn" task, it'll fetch the svn bindings
>
> However... The most recent svnant binding is 1.3.1 as documented at http://subclipse.tigris.org/svnant/svntask.html but the most recent one I can find in Maven (and not even Maven Central) is 1.3.0. So, it looks like we'd need to publish the 1.3.1 release into maven central via the third party oss process, before everything will work. Check the Tika list / find a suitable Tika volunteer if you're new to this - uploading other people's open source jars to Maven is a common Tika need!
>
> Nick
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Wed, 30 Mar 2016, kiwiwings wrote:
> This time I think about ant-scripting the svn changes, e.g. un-/pinning of
> the docs, creating the branch, uploading the artifacts.
>
> I'd like to have something like:
> - ant release-beta <version>
> - ant release-final <version>

I've had a go at adding some things for this in r1737165. If you run the 
"init-svn" task, it'll fetch the svn bindings

However... The most recent svnant binding is 1.3.1 as documented at 
http://subclipse.tigris.org/svnant/svntask.html but the most recent one I 
can find in Maven (and not even Maven Central) is 1.3.0. So, it looks like 
we'd need to publish the 1.3.1 release into maven central via the third 
party oss process, before everything will work. Check the Tika list / find 
a suitable Tika volunteer if you're new to this - uploading other people's 
open source jars to Maven is a common Tika need!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by kiwiwings <ki...@apache.org>.
Nick Burch-8 wrote
> Maybe annotate the release guide this time with anything that's tedious /
> needs lots of 
> manual work / etc, and for beta 2 hopefully someone can go through and 
> improve those bits? :)

I've already changed the naming of the maven artifacts (now without the
timestamp), the signing process and the uploading to the nexus last time, as
this bugged me, i.e. before I had to rename and move the files to the
correct dirs (maybe sign them again) and the nexus upload with a slow
internet connection is also time-consuming.

This time I think about ant-scripting the svn changes, e.g. un-/pinning of
the docs, creating the branch, uploading the artifacts.

I'd like to have something like:
- ant release-beta <version>
- ant release-final <version>

Andi



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Beta-time-yet-tp5722477p5722487.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by Nick Burch <ni...@apache.org>.
On Wed, 30 Mar 2016, kiwiwings wrote:
> Maybe I should spent some time to script more of the manual work ... - 
> as we use Jenkins here at my $dayjob, I'm used to (nearly) 
> one-click-releasing the artifacts - I guess something like this would be 
> possible with Ant too ...

My vague recollection is that it's about 10 commands or so, but it has 
been a little while since I last ran through it. Maybe annotate the 
release guide this time with anything that's tedious / needs lots of 
manual work / etc, and for beta 2 hopefully someone can go through and 
improve those bits? :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Beta time yet?

Posted by kiwiwings <ki...@apache.org>.
Nick Burch-8 wrote
> What do people think about doing a beta release in a few weeks time ...?

Last time it took ages between the thread start and the actual release.
As the general consensus is anyway having short release cycles,
I'll simply start the releasing trip on the next weekend (Friday evening).

Maybe I should spent some time to script more of the manual work ... -
as we use Jenkins here at my $dayjob, I'm used to (nearly)
one-click-releasing
the artifacts - I guess something like this would be possible with Ant too
...

Andi



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Beta-time-yet-tp5722477p5722481.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org