You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by Joe Gallo <jo...@gmail.com> on 2011/10/06 20:45:57 UTC

HSLFExtractor Bug

I ran into a problem with tika extraction of ppt files today, and I
think it traced it back to some mistaken code in the HSLFExtractor

         // Repeat the Notes header, if set
         if (hf != null && hf.isHeaderVisible() && hf.getHeaderText() != null) {
            xhtml.startElement("p", "class", "slide-note-header");
            xhtml.characters( hf.getFooterText() ); <----------
shouldn't this be hf.getHeaderText()?  the getFooterText() call here
is returning null, and causing an NPE in the XHTMLContentHandler
            xhtml.endElement("p");
         }

Joe

HSLFExtractor Bug

Posted by Joe Gallo <jo...@gmail.com>.

> I think this was raised in TIKA-727 and fixed in r1177313, any chance you 
> could check with a svn checkout / recent nightly build and verify your 
> problem is fixed?

Ah, yes, confirmed -- it is fixed in 1.0-20111006.162438-162.

Re: HSLFExtractor Bug

Posted by Nick Burch <ni...@alfresco.com>.

On Thu, 6 Oct 2011, Joe Gallo wrote:
> I ran into a problem with tika extraction of ppt files today, and I
> think it traced it back to some mistaken code in the HSLFExtractor
>
>            xhtml.characters( hf.getFooterText() ); <----------

I think this was raised in TIKA-727 and fixed in r1177313, any chance you 
could check with a svn checkout / recent nightly build and verify your 
problem is fixed?

Nick