You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by "Andrew C. Oliver" <ac...@apache.org> on 2003/02/18 03:10:29 UTC

Re: [POSIBILITY] POI/HDF Donation

Please submit the patches per the instructions here:
http://jakarta.apache.org/poi/getinvolved/index.html

As much as I'd love to say go for it (so that HDF will move forward), 
I'm not entirely sure HDF is ready for this project (if its due soon). 
In the past I have used HtDig and one of its recommended solutions for 
retreiving text and this worked like a charm.  (www.htdig.org)

We'll do our best to assist you in working with HDF, defintitely submit 
your word documents with associated junit tests.

-Andy


Leon Messerschmidt wrote:
> Hi,
> 
> I've got a client that is interested in indexing a large number of Word
> .doc files.  The organization was looking at Verity/Retrievalware, but
> they simply can't afford these products.  I suggested that they look a
> POI/Lucene solution.
> 
> They are sold on the open-source idea and would consider funding some of
> my time to extend HDF, but I have to prove that HDF is a viable base to
> start from.  Disclaimer: I'm not sure about the number of hours they are
> willing to sponsor, and I suspect you should not get exited too soon ;-)
>   I hope it can be enough the move HDF from scratchpad status.
> 
> I played with HDF over the weekend and the CVS version of the scratchpad
> didn't seem to work at all.  I used the event model for HDF.  Here are
> patches that I created to make it work:
> 
> diff -r1.1 EventBridge.java
> 290c290
> <           sb.append(_mainDocument[y]);
> ---
>  >           sb.append((char)_mainDocument[y]);
> 
> 
> diff -r1.10 HDFObjectFactory.java
> 147a148,149
>  >
>  >         _listener.mainDocument(_mainDocument);
> 
> 
> These patches worked for simple documents, but bulleted lists and tables
> still broke.  To fix the tables I added the following hack.  With this
> code in place I can get the text from the table, but it doesn't seem
> like the proper solution.  Maybe someone can help me here?
> 
> diff -r1.4 StyleSheet.java
> 1167a1168,1174
>  >                if (brcTop == null) break;
>  >                if (brcLeft == null) break;
>  >                if (brcBottom == null) break;
>  >                if (brcRight == null) break;
>  >                if (brcVertical == null) break;
>  >                if (brcHorizontal == null) break;
> 
> 
> With the bulleted lists I'm stumped.  I get the following exception, and
> I can't even hack it to work.  Anybody that can at least point me into a
> direction to find this problem?
> 
> Exception in thread "main" java.lang.NegativeArraySizeException
>          at
> org.apache.poi.hdf.model.HDFObjectFactory.createListTables(HDFObjectFactory.java:644) 
> 
>          at
> org.apache.poi.hdf.model.HDFObjectFactory.initFormattingProperties(HDFObjectFactory.java:277) 
> 
>          at
> org.apache.poi.hdf.model.HDFObjectFactory.<init>(HDFObjectFactory.java:155)
>          at 
> org.apache.poi.hdf.model.HDFDocument.<init>(HDFDocument.java:18)
>          at
> org.apache.poi.hdf.test.DataExtractor.extract(DataExtractor.java:25)
>          at
> org.apache.poi.hdf.test.DataExtractor.main(DataExtractor.java:98)
> 
> I have also created a [simple] suite of test .doc files and classes to
> test various features.  These classes can easily be incorporated into a
> JUnit test (they are stand-alone at the moment).  The test suite was
> done with MS Word 2000 on a Windows 2000 machine.  If you are interested
> in these I can donate them to the POI project.
> 
> I would *really* like to strengthen my client's open-source drive and
> this project is aimed at TOP management.  If can show that POI/HDF is
> viable at the end of this week I can probably make the cut-off point.  I
> would appreciate any help - even if it is just some pointers on how to
> start looking for the above mentioned problems.  Some code/bug-fixes
> would be cool too ;-)
> 
> ~ Leon
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
> 
>