You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2016/09/09 18:44:29 UTC

RE: [VOTE] Apache POI 3.15-beta3

Thank you, Dominik, for catching these!  3 cheers for mass regression testing!


I'm finally back from break and catching up on emails...

-----Original Message-----
From: Dominik Stadler [mailto:dominik.stadler@gmx.at] 
Sent: Monday, August 15, 2016 6:09 AM
To: POI Developers List <de...@poi.apache.org>
Subject: Re: [VOTE] Apache POI 3.15-beta3

Hi,

Running the regression tests for POI 3.15-beta3 against the CommonCrawl corpus is now finished, initial results are as follows:

* 11966 fail because I did not add commons-collections4, I'll trigger a re-run to get document-counts correctly show  the number of regressing documents

* 456 times: ArrayIndexOutOfBoundsException in SprmOperation.getOperand()

java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
	at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:317)
	at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85)
	at o.a.p.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:60)
	at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
	at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:113)
	at o.a.p.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:62)
	at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:44)
	at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
	at o.a.p.hwpf.usermodel.Section.(Section.java:36)
	at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
	at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
	at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:299)
	... 9 more

* 4 times NullPointerException in XSLFTextParagraph.getDefaultFontSize()

java.lang.NullPointerException
	at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(XSLFTextParagraph.java:935)
	at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(DrawTextParagraph.java:567)
	at o.a.p.sl.draw.DrawTextParagraph.breakText(DrawTextParagraph.java:235)
	at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.java:158)
	at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.java:219)
	at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.java:102)
	at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
	at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
	at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
	at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
	at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.java:120)
	at o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:43)
	at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.java:43)
	at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)



The others are probably flaky things where files caused OOM/Timeout before and thus were not reported with these errors before.


See http://people.apache.org/~centic/poi_regression/reports/ and http://people.apache.org/~centic/poi_regression/reportsAll/ for detailed results.


Thanks... Dominik.


On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <on...@apache.org> wrote:

> Correction: HSLF. This is a ppt/OLE2 file.
>
> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <on...@apache.org> wrote:
> > Tim,
> >
> > I have extracted the pptx PowerPoint file containing the Prague 
> > footer. I'm want to write a unit test for POI to find the Prague 
> > string so I can figure why Prague was not included in the Tika 
> > regression test using POI 3.15 beta 3 but was found by POI 3.15 beta 
> > 1.
> >
> > Could you point me to the Tika code that generated the potential 
> > regressions zip file in TIKA-2013, or the POI class/function that is 
> > used to extract the text from a document?
> >
> > Also, is the pptx file shareable and ASL 2.0 licensed so that it can 
> > be included as part of POI's unit test suite?
> >
> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal <ja...@gmail.com>
> wrote:
> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
> >>>...the two potential content regressions may be caused by something 
> >>>at
> the
> >>> Tika level.  If anyone has time to take a look, that'd be great.
> >>
> >> I can take a look this weekend.
> >>
> >> Did you use the same Tika code with different POI versions for 
> >> these
> tests
> >> (so that we can attribute the change in behavior to a POI commit,
> regardless
> >> of whether the bug is in Tika or POI)?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
> commands, e-mail: dev-help@poi.apache.org
>
>

RE: [VOTE] Apache POI 3.15-beta3

Posted by Dominik Stadler <do...@gmx.at>.
To be honest I would rather release now, any additional fix carries a small
risk of regressions.

Dominik

On Sep 15, 2016 2:59 AM, "Javen O'Neal" <ja...@gmail.com> wrote:

> I don't mind putting off 3.15 final if it means we have a solid,
> universally usable release. We could do another beta release and wait for a
> final release when it's ready. That would allow us to go back to committing
> changes that we have put off for the last couple months.
>
> Releasing a beta now and postponing a final release would allow us to
> attempt to get an API breakage checker software in place.
>
> On Sep 14, 2016 4:48 PM, "Allison, Timothy B." <ta...@mitre.org> wrote:
>
> > Y, thank you for figuring out what is going on there.  I think this is
> low
> > enough priority to put off until 3.16-beta1.  Unless there are
> objections...
> >
> > -----Original Message-----
> > From: Javen O'Neal [mailto:onealj@apache.org]
> > Sent: Saturday, September 10, 2016 3:20 AM
> > To: POI Developers List <de...@poi.apache.org>
> > Subject: Re: [VOTE] Apache POI 3.15-beta3
> >
> > Bug 60003 is still open and is a regression if POI should be extracting
> > Prague from the test slideshow.
> >
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=60003
> >
> > On Fri, Sep 9, 2016 at 11:44 AM, Allison, Timothy B. <tallison@mitre.org
> >
> > wrote:
> > > Thank you, Dominik, for catching these!  3 cheers for mass regression
> > testing!
> > >
> > >
> > > I'm finally back from break and catching up on emails...
> > >
> > > -----Original Message-----
> > > From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
> > > Sent: Monday, August 15, 2016 6:09 AM
> > > To: POI Developers List <de...@poi.apache.org>
> > > Subject: Re: [VOTE] Apache POI 3.15-beta3
> > >
> > > Hi,
> > >
> > > Running the regression tests for POI 3.15-beta3 against the CommonCrawl
> > corpus is now finished, initial results are as follows:
> > >
> > > * 11966 fail because I did not add commons-collections4, I'll trigger
> > > a re-run to get document-counts correctly show  the number of
> > > regressing documents
> > >
> > > * 456 times: ArrayIndexOutOfBoundsException in
> > > SprmOperation.getOperand()
> > >
> > > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException:
> *
> > >         at o.a.p.hwpf.extractor.WordExtractor.getText(
> > WordExtractor.java:317)
> > >         at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(
> > AbstractFileHandler.java:85)
> > >         at o.a.p.stress.AbstractFileHandler.handleExtracting(
> > AbstractFileHandler.java:60)
> > >         at
> > > org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> > > .java:58)
> > >
> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
> > >         at o.a.p.hwpf.sprm.SprmOperation.
> getOperand(SprmOperation.java:
> > 113)
> > >         at o.a.p.hwpf.sprm.SectionSprmUncompressor.
> > unCompressSEPOperation(SectionSprmUncompressor.java:62)
> > >         at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(
> > SectionSprmUncompressor.java:44)
> > >         at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
> > >         at o.a.p.hwpf.usermodel.Section.(Section.java:36)
> > >         at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
> > >         at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(
> > AbstractWordConverter.java:721)
> > >         at o.a.p.hwpf.extractor.WordExtractor.getText(
> > WordExtractor.java:299)
> > >         ... 9 more
> > >
> > > * 4 times NullPointerException in
> > > XSLFTextParagraph.getDefaultFontSize()
> > >
> > > java.lang.NullPointerException
> > >         at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(
> > XSLFTextParagraph.java:935)
> > >         at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(
> > DrawTextParagraph.java:567)
> > >         at o.a.p.sl.draw.DrawTextParagraph.breakText(
> > DrawTextParagraph.java:235)
> > >         at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.
> > java:158)
> > >         at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.
> > java:219)
> > >         at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.
> > java:102)
> > >         at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
> > >         at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
> > >         at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
> > >         at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
> > >         at o.a.p.stress.SlideShowHandler.
> renderSlides(SlideShowHandler.
> > java:120)
> > >         at o.a.p.stress.SlideShowHandler.handleSlideShow(
> > SlideShowHandler.java:43)
> > >         at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.
> > java:43)
> > >         at
> > > org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> > > .java:58)
> > >
> > >
> > >
> > > The others are probably flaky things where files caused OOM/Timeout
> > before and thus were not reported with these errors before.
> > >
> > >
> > > See http://people.apache.org/~centic/poi_regression/reports/ and
> > http://people.apache.org/~centic/poi_regression/reportsAll/ for detailed
> > results.
> > >
> > >
> > > Thanks... Dominik.
> > >
> > >
> > > On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <on...@apache.org>
> wrote:
> > >
> > >> Correction: HSLF. This is a ppt/OLE2 file.
> > >>
> > >> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <on...@apache.org>
> > wrote:
> > >> > Tim,
> > >> >
> > >> > I have extracted the pptx PowerPoint file containing the Prague
> > >> > footer. I'm want to write a unit test for POI to find the Prague
> > >> > string so I can figure why Prague was not included in the Tika
> > >> > regression test using POI 3.15 beta 3 but was found by POI 3.15
> > >> > beta 1.
> > >> >
> > >> > Could you point me to the Tika code that generated the potential
> > >> > regressions zip file in TIKA-2013, or the POI class/function that
> > >> > is used to extract the text from a document?
> > >> >
> > >> > Also, is the pptx file shareable and ASL 2.0 licensed so that it
> > >> > can be included as part of POI's unit test suite?
> > >> >
> > >> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal
> > >> > <ja...@gmail.com>
> > >> wrote:
> > >> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <ta...@mitre.org>
> > >> wrote:
> > >> >>>...the two potential content regressions may be caused by
> > >> >>>something at
> > >> the
> > >> >>> Tika level.  If anyone has time to take a look, that'd be great.
> > >> >>
> > >> >> I can take a look this weekend.
> > >> >>
> > >> >> Did you use the same Tika code with different POI versions for
> > >> >> these
> > >> tests
> > >> >> (so that we can attribute the change in behavior to a POI commit,
> > >> regardless
> > >> >> of whether the bug is in Tika or POI)?
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
> > >> commands, e-mail: dev-help@poi.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
> > commands, e-mail: dev-help@poi.apache.org
> >
> >
>

RE: [VOTE] Apache POI 3.15-beta3

Posted by Javen O'Neal <ja...@gmail.com>.
I don't mind putting off 3.15 final if it means we have a solid,
universally usable release. We could do another beta release and wait for a
final release when it's ready. That would allow us to go back to committing
changes that we have put off for the last couple months.

Releasing a beta now and postponing a final release would allow us to
attempt to get an API breakage checker software in place.

On Sep 14, 2016 4:48 PM, "Allison, Timothy B." <ta...@mitre.org> wrote:

> Y, thank you for figuring out what is going on there.  I think this is low
> enough priority to put off until 3.16-beta1.  Unless there are objections...
>
> -----Original Message-----
> From: Javen O'Neal [mailto:onealj@apache.org]
> Sent: Saturday, September 10, 2016 3:20 AM
> To: POI Developers List <de...@poi.apache.org>
> Subject: Re: [VOTE] Apache POI 3.15-beta3
>
> Bug 60003 is still open and is a regression if POI should be extracting
> Prague from the test slideshow.
>
> https://bz.apache.org/bugzilla/show_bug.cgi?id=60003
>
> On Fri, Sep 9, 2016 at 11:44 AM, Allison, Timothy B. <ta...@mitre.org>
> wrote:
> > Thank you, Dominik, for catching these!  3 cheers for mass regression
> testing!
> >
> >
> > I'm finally back from break and catching up on emails...
> >
> > -----Original Message-----
> > From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
> > Sent: Monday, August 15, 2016 6:09 AM
> > To: POI Developers List <de...@poi.apache.org>
> > Subject: Re: [VOTE] Apache POI 3.15-beta3
> >
> > Hi,
> >
> > Running the regression tests for POI 3.15-beta3 against the CommonCrawl
> corpus is now finished, initial results are as follows:
> >
> > * 11966 fail because I did not add commons-collections4, I'll trigger
> > a re-run to get document-counts correctly show  the number of
> > regressing documents
> >
> > * 456 times: ArrayIndexOutOfBoundsException in
> > SprmOperation.getOperand()
> >
> > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
> >         at o.a.p.hwpf.extractor.WordExtractor.getText(
> WordExtractor.java:317)
> >         at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(
> AbstractFileHandler.java:85)
> >         at o.a.p.stress.AbstractFileHandler.handleExtracting(
> AbstractFileHandler.java:60)
> >         at
> > org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> > .java:58)
> >
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
> >         at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:
> 113)
> >         at o.a.p.hwpf.sprm.SectionSprmUncompressor.
> unCompressSEPOperation(SectionSprmUncompressor.java:62)
> >         at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(
> SectionSprmUncompressor.java:44)
> >         at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
> >         at o.a.p.hwpf.usermodel.Section.(Section.java:36)
> >         at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
> >         at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(
> AbstractWordConverter.java:721)
> >         at o.a.p.hwpf.extractor.WordExtractor.getText(
> WordExtractor.java:299)
> >         ... 9 more
> >
> > * 4 times NullPointerException in
> > XSLFTextParagraph.getDefaultFontSize()
> >
> > java.lang.NullPointerException
> >         at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(
> XSLFTextParagraph.java:935)
> >         at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(
> DrawTextParagraph.java:567)
> >         at o.a.p.sl.draw.DrawTextParagraph.breakText(
> DrawTextParagraph.java:235)
> >         at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.
> java:158)
> >         at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.
> java:219)
> >         at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.
> java:102)
> >         at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
> >         at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
> >         at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
> >         at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
> >         at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.
> java:120)
> >         at o.a.p.stress.SlideShowHandler.handleSlideShow(
> SlideShowHandler.java:43)
> >         at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.
> java:43)
> >         at
> > org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> > .java:58)
> >
> >
> >
> > The others are probably flaky things where files caused OOM/Timeout
> before and thus were not reported with these errors before.
> >
> >
> > See http://people.apache.org/~centic/poi_regression/reports/ and
> http://people.apache.org/~centic/poi_regression/reportsAll/ for detailed
> results.
> >
> >
> > Thanks... Dominik.
> >
> >
> > On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <on...@apache.org> wrote:
> >
> >> Correction: HSLF. This is a ppt/OLE2 file.
> >>
> >> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <on...@apache.org>
> wrote:
> >> > Tim,
> >> >
> >> > I have extracted the pptx PowerPoint file containing the Prague
> >> > footer. I'm want to write a unit test for POI to find the Prague
> >> > string so I can figure why Prague was not included in the Tika
> >> > regression test using POI 3.15 beta 3 but was found by POI 3.15
> >> > beta 1.
> >> >
> >> > Could you point me to the Tika code that generated the potential
> >> > regressions zip file in TIKA-2013, or the POI class/function that
> >> > is used to extract the text from a document?
> >> >
> >> > Also, is the pptx file shareable and ASL 2.0 licensed so that it
> >> > can be included as part of POI's unit test suite?
> >> >
> >> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal
> >> > <ja...@gmail.com>
> >> wrote:
> >> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <ta...@mitre.org>
> >> wrote:
> >> >>>...the two potential content regressions may be caused by
> >> >>>something at
> >> the
> >> >>> Tika level.  If anyone has time to take a look, that'd be great.
> >> >>
> >> >> I can take a look this weekend.
> >> >>
> >> >> Did you use the same Tika code with different POI versions for
> >> >> these
> >> tests
> >> >> (so that we can attribute the change in behavior to a POI commit,
> >> regardless
> >> >> of whether the bug is in Tika or POI)?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
> >> commands, e-mail: dev-help@poi.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
> commands, e-mail: dev-help@poi.apache.org
>
>

RE: [VOTE] Apache POI 3.15-beta3

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Y, thank you for figuring out what is going on there.  I think this is low enough priority to put off until 3.16-beta1.  Unless there are objections...

-----Original Message-----
From: Javen O'Neal [mailto:onealj@apache.org] 
Sent: Saturday, September 10, 2016 3:20 AM
To: POI Developers List <de...@poi.apache.org>
Subject: Re: [VOTE] Apache POI 3.15-beta3

Bug 60003 is still open and is a regression if POI should be extracting Prague from the test slideshow.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60003

On Fri, Sep 9, 2016 at 11:44 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
> Thank you, Dominik, for catching these!  3 cheers for mass regression testing!
>
>
> I'm finally back from break and catching up on emails...
>
> -----Original Message-----
> From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
> Sent: Monday, August 15, 2016 6:09 AM
> To: POI Developers List <de...@poi.apache.org>
> Subject: Re: [VOTE] Apache POI 3.15-beta3
>
> Hi,
>
> Running the regression tests for POI 3.15-beta3 against the CommonCrawl corpus is now finished, initial results are as follows:
>
> * 11966 fail because I did not add commons-collections4, I'll trigger 
> a re-run to get document-counts correctly show  the number of 
> regressing documents
>
> * 456 times: ArrayIndexOutOfBoundsException in 
> SprmOperation.getOperand()
>
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:317)
>         at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85)
>         at o.a.p.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:60)
>         at 
> org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> .java:58)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
>         at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:113)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:62)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:44)
>         at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
>         at o.a.p.hwpf.usermodel.Section.(Section.java:36)
>         at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
>         at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:299)
>         ... 9 more
>
> * 4 times NullPointerException in 
> XSLFTextParagraph.getDefaultFontSize()
>
> java.lang.NullPointerException
>         at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(XSLFTextParagraph.java:935)
>         at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(DrawTextParagraph.java:567)
>         at o.a.p.sl.draw.DrawTextParagraph.breakText(DrawTextParagraph.java:235)
>         at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.java:158)
>         at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.java:219)
>         at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.java:102)
>         at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
>         at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
>         at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
>         at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
>         at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.java:120)
>         at o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:43)
>         at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.java:43)
>         at 
> org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable
> .java:58)
>
>
>
> The others are probably flaky things where files caused OOM/Timeout before and thus were not reported with these errors before.
>
>
> See http://people.apache.org/~centic/poi_regression/reports/ and http://people.apache.org/~centic/poi_regression/reportsAll/ for detailed results.
>
>
> Thanks... Dominik.
>
>
> On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <on...@apache.org> wrote:
>
>> Correction: HSLF. This is a ppt/OLE2 file.
>>
>> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <on...@apache.org> wrote:
>> > Tim,
>> >
>> > I have extracted the pptx PowerPoint file containing the Prague 
>> > footer. I'm want to write a unit test for POI to find the Prague 
>> > string so I can figure why Prague was not included in the Tika 
>> > regression test using POI 3.15 beta 3 but was found by POI 3.15 
>> > beta 1.
>> >
>> > Could you point me to the Tika code that generated the potential 
>> > regressions zip file in TIKA-2013, or the POI class/function that 
>> > is used to extract the text from a document?
>> >
>> > Also, is the pptx file shareable and ASL 2.0 licensed so that it 
>> > can be included as part of POI's unit test suite?
>> >
>> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal 
>> > <ja...@gmail.com>
>> wrote:
>> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <ta...@mitre.org>
>> wrote:
>> >>>...the two potential content regressions may be caused by 
>> >>>something at
>> the
>> >>> Tika level.  If anyone has time to take a look, that'd be great.
>> >>
>> >> I can take a look this weekend.
>> >>
>> >> Did you use the same Tika code with different POI versions for 
>> >> these
>> tests
>> >> (so that we can attribute the change in behavior to a POI commit,
>> regardless
>> >> of whether the bug is in Tika or POI)?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional 
>> commands, e-mail: dev-help@poi.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org


Re: [VOTE] Apache POI 3.15-beta3

Posted by Javen O'Neal <on...@apache.org>.
Bug 60003 is still open and is a regression if POI should be
extracting Prague from the test slideshow.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60003

On Fri, Sep 9, 2016 at 11:44 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
> Thank you, Dominik, for catching these!  3 cheers for mass regression testing!
>
>
> I'm finally back from break and catching up on emails...
>
> -----Original Message-----
> From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
> Sent: Monday, August 15, 2016 6:09 AM
> To: POI Developers List <de...@poi.apache.org>
> Subject: Re: [VOTE] Apache POI 3.15-beta3
>
> Hi,
>
> Running the regression tests for POI 3.15-beta3 against the CommonCrawl corpus is now finished, initial results are as follows:
>
> * 11966 fail because I did not add commons-collections4, I'll trigger a re-run to get document-counts correctly show  the number of regressing documents
>
> * 456 times: ArrayIndexOutOfBoundsException in SprmOperation.getOperand()
>
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:317)
>         at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85)
>         at o.a.p.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:60)
>         at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
>         at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:113)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:62)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:44)
>         at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
>         at o.a.p.hwpf.usermodel.Section.(Section.java:36)
>         at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
>         at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:299)
>         ... 9 more
>
> * 4 times NullPointerException in XSLFTextParagraph.getDefaultFontSize()
>
> java.lang.NullPointerException
>         at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(XSLFTextParagraph.java:935)
>         at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(DrawTextParagraph.java:567)
>         at o.a.p.sl.draw.DrawTextParagraph.breakText(DrawTextParagraph.java:235)
>         at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.java:158)
>         at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.java:219)
>         at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.java:102)
>         at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
>         at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
>         at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
>         at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
>         at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.java:120)
>         at o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:43)
>         at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.java:43)
>         at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)
>
>
>
> The others are probably flaky things where files caused OOM/Timeout before and thus were not reported with these errors before.
>
>
> See http://people.apache.org/~centic/poi_regression/reports/ and http://people.apache.org/~centic/poi_regression/reportsAll/ for detailed results.
>
>
> Thanks... Dominik.
>
>
> On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <on...@apache.org> wrote:
>
>> Correction: HSLF. This is a ppt/OLE2 file.
>>
>> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <on...@apache.org> wrote:
>> > Tim,
>> >
>> > I have extracted the pptx PowerPoint file containing the Prague
>> > footer. I'm want to write a unit test for POI to find the Prague
>> > string so I can figure why Prague was not included in the Tika
>> > regression test using POI 3.15 beta 3 but was found by POI 3.15 beta
>> > 1.
>> >
>> > Could you point me to the Tika code that generated the potential
>> > regressions zip file in TIKA-2013, or the POI class/function that is
>> > used to extract the text from a document?
>> >
>> > Also, is the pptx file shareable and ASL 2.0 licensed so that it can
>> > be included as part of POI's unit test suite?
>> >
>> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal <ja...@gmail.com>
>> wrote:
>> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <ta...@mitre.org>
>> wrote:
>> >>>...the two potential content regressions may be caused by something
>> >>>at
>> the
>> >>> Tika level.  If anyone has time to take a look, that'd be great.
>> >>
>> >> I can take a look this weekend.
>> >>
>> >> Did you use the same Tika code with different POI versions for
>> >> these
>> tests
>> >> (so that we can attribute the change in behavior to a POI commit,
>> regardless
>> >> of whether the bug is in Tika or POI)?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
>> commands, e-mail: dev-help@poi.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org