You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@poi.apache.org by bu...@apache.org on 2008/06/17 23:47:06 UTC

DO NOT REPLY [Bug 45223] New: NegativeArraySizeException in WordDocument constructor.

https://issues.apache.org/bugzilla/show_bug.cgi?id=45223

           Summary: NegativeArraySizeException in WordDocument constructor.
           Product: POI
           Version: 3.0
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HDF
        AssignedTo: dev@poi.apache.org
        ReportedBy: maksimov.andrei@gmail.com


DOCParser class: 

package resumecrawler.utils.parser;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.poi.hdf.extractor.WordDocument;

/**
 *
 * @author solaris
 */
public class DOCParser implements Parser {
    File file = null;

    public DOCParser(File f) {
        file = f;
    }

    public DOCParser(String file) {
        this.file = new File(file);
    }

    public void parse(String fieldName, Document doc) {
        String content = "";
        try {            
            WordDocument wd = new WordDocument(file.toString());
            StringWriter docTextWriter = new StringWriter();
            wd.writeAllText(new PrintWriter(docTextWriter));
            docTextWriter.close();
            content = docTextWriter.toString();            
        } catch (IOException ex) {
            Logger.getLogger(PDFParser.class.getName()).log(Level.SEVERE, null,
ex);
        } 
        doc.add(new Field(fieldName, content, Field.Store.YES,
Field.Index.TOKENIZED));
    }
}

Error stack trace:
Exception in thread "main" java.lang.NegativeArraySizeException
        at
org.apache.poi.hdf.extractor.data.ListTables.createLVL(ListTables.java:171)
        at
org.apache.poi.hdf.extractor.data.ListTables.initLFO(ListTables.java:149)
        at
org.apache.poi.hdf.extractor.data.ListTables.<init>(ListTables.java:43)
        at
org.apache.poi.hdf.extractor.WordDocument.createListTables(WordDocument.java:1640)
        at
org.apache.poi.hdf.extractor.WordDocument.findFormatting(WordDocument.java:365)
        at
org.apache.poi.hdf.extractor.WordDocument.processComplexFile(WordDocument.java:292)
        at
org.apache.poi.hdf.extractor.WordDocument.readFIB(WordDocument.java:244)
        at
org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:194)
        at
org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:183)
        at resumecrawler.utils.parser.DOCParser.parse(DOCParser.java:37)
.......

file.toString() returns something like this:
/home/solaris/crawler/StoreDocuments/9cbb0d2ab441c5a900b7e072915ba298.doc


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

DO NOT REPLY [Bug 45223] NegativeArraySizeException in WordDocument constructor.

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=45223


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX




--- Comment #1 from Nick Burch <ni...@torchbox.com>  2008-06-19 04:43:01 PST ---
You're likely to have much more luck with hwpf than with hdf, hdf is
unsupported

For word text extracting, try org.apache.poi.hwpf.extractor.WordExtractor -
http://poi.apache.org/apidocs/org/apache/poi/hwpf/extractor/WordExtractor.html


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org