You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by et...@vmd.desjardins.com on 2005/11/09 15:49:38 UTC

got an error when running on UNIX-AIX: illegal block count!


HI all,

I have a strange problem when I deploy my word document extracting
application on AIX (Unix). I have run many time the application on windows
using WSAD and I never got this problem for the word document. All other
document are well read (PDF, Excel, Txt) only the word document seems to
jam.
I use the textmining library to do the extraction.


This is the error I get :

>>>>
2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while parsing
word document java.io.IOException: Illegal block count; minimum count is 1,
got 0 instead
java.io.IOException: Illegal block count; minimum count is 1, got 0 instead
        at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
 Code))
        at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
 Code))
        at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
 Code))
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
Compiled Code))
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
 Code))
        at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
        at
ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
        at
ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
 Code))
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
        at
com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
        at
com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
        at
com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
        at java.lang.reflect.Method.invoke(Native Method)
        at
com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
        at
com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
>>>>

And this is the code I use. Is there a trivial mistake I made??

>>>
  public Object parse(Object file) {
            File wordDoc = new File((String) file);

            WordExtractor we = new WordExtractor();
            String fullText = "";
            try {
                  FileInputStream fStream = new
                  FileInputStream(wordDoc);
                  fullText = we.extractText(fStream); <---- ERROR HERE
            } catch (FileNotFoundException e1) {
                  logger.error("FileNotFound while parsing word document "
+ e1);
                  e1.printStackTrace();
            } catch (Exception e) {
                  logger.error("Error while parsing word document " + e);
                  e.printStackTrace();
            }
            return fullText;
      }
>>>


Thanks for answering!

Please put my email address in cc!
etienne.laverdiere@vmd.desjardins.com

Etienne
Montreal

- L'intégrité des informations transmises dans ce courriel n?est pas
garantie par Valeurs mobilières Desjardins qui décline toute responsabilité
quant aux dommages causés par leur modification frauduleuse. - Ce courriel
est confidentiel et est à l?usage exclusif de son destinataire. Toute
personne qui reçoit celui-ci par erreur doit en informer immédiatement son
expéditeur et le détruire sur-le-champ. Toute autre  utilisation des
informations qu?il contient est strictement interdite. - Le présent
avertissement ne limite aucunement tout autre avertissement plus restrictif
qui vous aurait été transmis par Valeurs mobilières Desjardins.
- The integrity of the transmitted information in this E-mail is not
guaranteed by Desjardins Securities which accepts no liability for any
damage caused by its fraudulent alteration.  - This E-mail is confidential
and is intended for the sole use of the recipient or authorized
representative of the recipient. Any person who receives this E-mail by
mistake shall immediately notify the sender and destroy it. Any other use
of the information therein is strictly prohibited. - In no manner does this
notice limit other more restrictive warnings which may have been
transmitted to you by Desjardins Securities.

Re: got an error when running on UNIX-AIX: illegal block count!

Posted by ac...@apache.org.
Try

 >
 >   public Object parse(Object file) {
 >             File wordDoc = new File((String) file);
 >
 >             WordExtractor we = new WordExtractor();
 >             String fullText = "";
 >             try {
 >                   FileInputStream fStream = new Buffered InputStream(new
 >                   FileInputStream(wordDoc));
 >                   fullText = we.extractText(fStream); <---- ERROR HERE
 >             } catch (FileNotFoundException e1) {
 >                   logger.error("FileNotFound while parsing word 
document "
 > + e1);
 >                   e1.printStackTrace();
 >             } catch (Exception e) {
 >                   logger.error("Error while parsing word document " + e);
 >                   e.printStackTrace();
 >             }
 >             return fullText;
 >       }

You probably don't see it elsewhere because AIX's VM and IO support is 
really slow.  While I love AIX, because it is a UNIX variant and I love 
UNIX but it certainly is not the best UNIX and the IBM VM is frankly 
pathetic and uses a decisively retro garbage collection.  Thus your 
stream is getting behind.  Since we don't inherently do the buffering, 
POIFS just pukes unless you use buffered input stream... (which you're 
naughty for not doing for all files anyhow)

If that doesn't work pass -Dfile.encoding=ISO-8559-1 (or if that doesn't 
work try 8559-1)

It could also be that AIX is a red herring and that this DOC is pre Word 
6 and thus doesn't use OLE2CDF format or actually is blank-blank 
(meaning no document in the DOC file just the surrounding OLE wrapper)

-Andy

etienne.laverdiere@vmd.desjardins.com wrote:
> 
> HI all,
> 
> I have a strange problem when I deploy my word document extracting
> application on AIX (Unix). I have run many time the application on windows
> using WSAD and I never got this problem for the word document. All other
> document are well read (PDF, Excel, Txt) only the word document seems to
> jam.
> I use the textmining library to do the extraction.
> 
> 
> This is the error I get :
> 
> 
> 2005-11-08 16:02:21,939 ERROR [P=689750:O=0:CT] (?:?) - Error while parsing
> word document java.io.IOException: Illegal block count; minimum count is 1,
> got 0 instead
> java.io.IOException: Illegal block count; minimum count is 1, got 0 instead
>         at
> org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java(Compiled
>  Code))
>         at
> org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java(Compiled
>  Code))
>         at
> org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java(Compiled
>  Code))
>         at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.parse(WordIndexer.java(Inlined
> Compiled Code))
>         at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.getPopulatedCollection(WordIndexer.java(Compiled
>  Code))
>         at
> ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java:87)
>         at
> ca.ulaval.bibl.lius.index.MSWord.WordIndexer.createLuceneDocument(WordIndexer.java:81)
>         at
> ca.ulaval.bibl.lius.index.Indexer.createLuceneDocument(Indexer.java(Compiled
>  Code))
>         at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFile(IndexerRamBean.java(Compiled
>  Code))
>         at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java(Compiled
>  Code))
>         at
> com.vmd.intranet.research.index.bean.IndexerRamBean.indexFolder(IndexerRamBean.java:153)
>         at
> com.vmd.intranet.research.index.bean.IndexerRamBean.processIndexing(IndexerRamBean.java:137)
>         at
> com.vmd.intranet.research.index.IndexFilesLauncher.processIndexing(IndexFilesLauncher.java:123)
>         at
> com.vmd.intranet.research.index.IndexFilesLauncher.main(IndexFilesLauncher.java:60)
>         at java.lang.reflect.Method.invoke(Native Method)
>         at
> com.ibm.websphere.client.applicationclient.launchClient.createContainerAndLaunchApp(launchClient.java:448)
>         at
> com.ibm.websphere.client.applicationclient.launchClient.main(launchClient.java:304)
>         at java.lang.reflect.Method.invoke(Native Method)
>         at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:158)
> 
> 
> And this is the code I use. Is there a trivial mistake I made??
> 
> 
>   public Object parse(Object file) {
>             File wordDoc = new File((String) file);
> 
>             WordExtractor we = new WordExtractor();
>             String fullText = "";
>             try {
>                   FileInputStream fStream = new
>                   FileInputStream(wordDoc);
>                   fullText = we.extractText(fStream); <---- ERROR HERE
>             } catch (FileNotFoundException e1) {
>                   logger.error("FileNotFound while parsing word document "
> + e1);
>                   e1.printStackTrace();
>             } catch (Exception e) {
>                   logger.error("Error while parsing word document " + e);
>                   e.printStackTrace();
>             }
>             return fullText;
>       }
> 
> 
> 
> Thanks for answering!
> 
> Please put my email address in cc!
> etienne.laverdiere@vmd.desjardins.com
> 
> Etienne
> Montreal
> 
> - L'int?grit? des informations transmises dans ce courriel n?est pas
> garantie par Valeurs mobili?res Desjardins qui d?cline toute responsabilit?
> quant aux dommages caus?s par leur modification frauduleuse. - Ce courriel
> est confidentiel et est ? l?usage exclusif de son destinataire. Toute
> personne qui re?oit celui-ci par erreur doit en informer imm?diatement son
> exp?diteur et le d?truire sur-le-champ. Toute autre  utilisation des
> informations qu?il contient est strictement interdite. - Le pr?sent
> avertissement ne limite aucunement tout autre avertissement plus restrictif
> qui vous aurait ?t? transmis par Valeurs mobili?res Desjardins.
> - The integrity of the transmitted information in this E-mail is not
> guaranteed by Desjardins Securities which accepts no liability for any
> damage caused by its fraudulent alteration.  - This E-mail is confidential
> and is intended for the sole use of the recipient or authorized
> representative of the recipient. Any person who receives this E-mail by
> mistake shall immediately notify the sender and destroy it. Any other use
> of the information therein is strictly prohibited. - In no manner does this
> notice limit other more restrictive warnings which may have been
> transmitted to you by Desjardins Securities.


-- 
Andrew C. Oliver
SuperLink Software, Inc.

Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/