You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Luis Trigueiros <lu...@gmail.com> on 2006/09/08 16:10:58 UTC

Excel 2003 problem

Hi,
I made an utility applications that reads an excel file and outputs XML
using POI (poi-2.5.1-final-20040804.jar).
It all worked fine until my input excel was edited/changed with Excel 2003,
where funny things started to happen.
As a workaround I tryed so save the input excel file as Microsoft Excel
97(-2002) from excel and submit back to the utility.
Unfortunately I found that once the excel file is changed with Excel 2003
the POI lib becomes unable to read this file even
if I save it as Microsoft Excel 97(-2002).
Additionally I found the following:
    - If I converted the input excel file using OpenOffice to Microsoft
Excel 97(-2002) and submit to the utility it all
      works fine. But I don't have full control of the excel file and can't
ask users to use OpenOffice.
    - If I convert the input file using Excel to Microsoft Excel 97(-2002)
using Excel the generated file as 2 times size of the
      original one.
Did anyone face a similar problem and how did you solved it.
Thanks, Luis

Re: Excel 2003 problem

Posted by tryma <tr...@creuna.no>.
Ok, sorry about that, Nick.

Actually, now I see this is happening in the Nutch classes for the MS parse
plugin, possibly not not POI, so I've posted on the Nutch list instead. Just
wanted to reply to your question first although it now doesn't seem to be a
problem with POI not handling the Excel document.

So, unless you're curious, you don't need to read on below unless you're
involved with Nutch and the MS parse plugins. :)

Here's the trace I get when I print the stacktrace of any exception
occurring as I attempt to call the MSExcelParser's getParse(Content). It
seems I get an NPE in MSBaseParser.getParse().

[#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;|
java.lang.NullPointerException
	at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94)
	at
org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40)
        at
<my_package>.DocumentParser.parseDocument(DocumentParser.java:154)
        ...

Looking at the source (MSBaseParser.java) at this line, it says:

****SNIP****
      extractor.extract(new ByteArrayInputStream(raw));
      text = extractor.getText();
      properties = extractor.getProperties();
      outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(),
getConf());
      
    } catch (Exception e) {
      return new ParseStatus(ParseStatus.FAILED,
                             "Can't be handled as micrsosoft document. " +
e)
                             .getEmptyParse(this.conf);
    }
    
    // collect meta data
    Metadata metadata = new Metadata();
    title = properties.getProperty(DublinCore.TITLE);      <========== This
is line 94
    properties.remove(DublinCore.TITLE);
****SNIP****

So I can only gather that my properties object is null. As seen above in
this snippet from the MSBaseParser class, properties is initially null but
assigned a value from the ExcelExtractor / MSExtractor (properties =
extractor.getProperties();) which I assume is becoming null although I would
have expected just an empty Properties object in return to avoid the NPE at
line 94.

Hopefully someone on the Nutch list can shed some light on that.


Thanks,
Trym


Nick Burch wrote:
> 
> On Tue, 3 Oct 2006, tryma wrote:
>> Anyone know about any patches or can suggest a work-around for this?
> 
> You'll need to give us more to go on than this.... stack traces, problem 
> files, failing unit tests etc
> 
> (I personally haven't noticed any problems with processing 
> word/excel/powerpoint 2003 files with POI)
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Excel-2003-problem-tf2241908.html#a6635249
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Excel 2003 problem

Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 3 Oct 2006, tryma wrote:
> Anyone know about any patches or can suggest a work-around for this?

You'll need to give us more to go on than this.... stack traces, problem 
files, failing unit tests etc

(I personally haven't noticed any problems with processing 
word/excel/powerpoint 2003 files with POI)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Excel 2003 problem

Posted by tryma <tr...@creuna.no>.

Hi, we are in need of some pointers on this issue too. We are currently
using the following POI libs in our application where these libs are used as
part of the Nutch plug-in architecture. What we are finding is that certain
Word and Excel documents fail parsing, and this seems to be connected to
files created on Office 2003.


poi-3.0-alpha1-20050704.jar
poi-scratchpad-3.0-alpha1-20050704.jar

We've also tried getting the latest build (see libs used below) but this
doesn't help either.

poi-3.0-alpha2-20060616.jar
poi-scratchpad-3.0-alpha2-20060616.jar


Anyone know about any patches or can suggest a work-around for this?


Regards,
Trym



bryan-6 wrote:
> 
> Hi,
> I do use Excel 2003 now, and POI 3.0. It works well, I never face this
> kink
> of problem, maybe there are some other problem.
> Wish you can resolve it soon.
> 
> 
> On 9/8/06, Luis Trigueiros <lu...@gmail.com> wrote:
>>
>> Hi,
>> I made an utility applications that reads an excel file and outputs XML
>> using POI (poi-2.5.1-final-20040804.jar).
>> It all worked fine until my input excel was edited/changed with Excel
>> 2003,
>> where funny things started to happen.
>> As a workaround I tryed so save the input excel file as Microsoft Excel
>> 97(-2002) from excel and submit back to the utility.
>> Unfortunately I found that once the excel file is changed with Excel 2003
>> the POI lib becomes unable to read this file even
>> if I save it as Microsoft Excel 97(-2002).
>> Additionally I found the following:
>>    - If I converted the input excel file using OpenOffice to Microsoft
>> Excel 97(-2002) and submit to the utility it all
>>      works fine. But I don't have full control of the excel file and
>> can't
>> ask users to use OpenOffice.
>>    - If I convert the input file using Excel to Microsoft Excel 97(-2002)
>> using Excel the generated file as 2 times size of the
>>      original one.
>> Did anyone face a similar problem and how did you solved it.
>> Thanks, Luis
>>
>>
> 
> 
> -- 
> 
> Regards!
> Bryan.Liu
> 
> 

-- 
View this message in context: http://www.nabble.com/Excel-2003-problem-tf2241908.html#a6620656
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Excel 2003 problem

Posted by Bryan Liu <lj...@gmail.com>.
Hi,
I do use Excel 2003 now, and POI 3.0. It works well, I never face this kink
of problem, maybe there are some other problem.
Wish you can resolve it soon.


On 9/8/06, Luis Trigueiros <lu...@gmail.com> wrote:
>
> Hi,
> I made an utility applications that reads an excel file and outputs XML
> using POI (poi-2.5.1-final-20040804.jar).
> It all worked fine until my input excel was edited/changed with Excel
> 2003,
> where funny things started to happen.
> As a workaround I tryed so save the input excel file as Microsoft Excel
> 97(-2002) from excel and submit back to the utility.
> Unfortunately I found that once the excel file is changed with Excel 2003
> the POI lib becomes unable to read this file even
> if I save it as Microsoft Excel 97(-2002).
> Additionally I found the following:
>    - If I converted the input excel file using OpenOffice to Microsoft
> Excel 97(-2002) and submit to the utility it all
>      works fine. But I don't have full control of the excel file and can't
> ask users to use OpenOffice.
>    - If I convert the input file using Excel to Microsoft Excel 97(-2002)
> using Excel the generated file as 2 times size of the
>      original one.
> Did anyone face a similar problem and how did you solved it.
> Thanks, Luis
>
>


-- 

Regards!
Bryan.Liu