You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Beltran, Justin" <jb...@hitachiconsulting.com> on 2009/07/01 01:24:08 UTC

Use cases for MS Word files

Hi all,

I'm doing initial research on a project and I'm trying to see what how mature the capabilities are in POI in regards to the following:


1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)

2.       Merging different word documents

3.       Creating hyperlinks (not to external URLs, but to other places in document)

4.       Creating table of contents

If POI currently doesn't have these capabilities, are there any other open source Java packages that can deliver the same functionality?  Thanks in advance!

Justin




This e-mail is intended solely for the person or entity to which it is addressed
and may contain confidential and/or privileged information. Any review, dissemination,
copying, printing or other use of this e-mail by persons or entities other than the 
addressee is prohibited. If you have received this e-mail in error, please contact
the sender immediately and delete the material from any computer.
To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas, Texas 75201



Re: Use cases for MS Word files

Posted by MSB <ma...@tiscali.co.uk>.
Sorry Justin, I have to admit to a rather large mistake in my previous post.
David has just reminded me that OpenXML4j is already part of the API and
that I was actually referring to docx4j. Still, think that the other
comments hold true though(!!)

Yours

Mark B


Beltran, Justin wrote:
> 
> Hi all,
> 
> I'm doing initial research on a project and I'm trying to see what how
> mature the capabilities are in POI in regards to the following:
> 
> 
> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
> 
> 2.       Merging different word documents
> 
> 3.       Creating hyperlinks (not to external URLs, but to other places in
> document)
> 
> 4.       Creating table of contents
> 
> If POI currently doesn't have these capabilities, are there any other open
> source Java packages that can deliver the same functionality?  Thanks in
> advance!
> 
> Justin
> 
> 
> 
> 
> This e-mail is intended solely for the person or entity to which it is
> addressed
> and may contain confidential and/or privileged information. Any review,
> dissemination,
> copying, printing or other use of this e-mail by persons or entities other
> than the 
> addressee is prohibited. If you have received this e-mail in error, please
> contact
> the sender immediately and delete the material from any computer.
> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
> Texas 75201
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24293215.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Use cases for MS Word files

Posted by MSB <ma...@tiscali.co.uk>.
Hello Justin,

This morning, I had the opportunity to work on the piece of code that I sent
to you. Sadly, it is till not copying the formatting information between the
source and merged document. I am not at all sure what the cause of the
problem is but will continue to look. If I make any progress, I will post a
message to the list. The code will work however if all you need to do is
merge text from one document into another and if the default font - Times
New Roman on my PC - is satisfactory. If that is the case, then just
uncomment the line that reads something like this;

newCharRun = range.insertAfter(text);

and instead comment out this line;

newCharRun = range.insertAfter(text, charProps);

Yours

Mark B


Beltran, Justin wrote:
> 
> Hi all,
> 
> I'm doing initial research on a project and I'm trying to see what how
> mature the capabilities are in POI in regards to the following:
> 
> 
> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
> 
> 2.       Merging different word documents
> 
> 3.       Creating hyperlinks (not to external URLs, but to other places in
> document)
> 
> 4.       Creating table of contents
> 
> If POI currently doesn't have these capabilities, are there any other open
> source Java packages that can deliver the same functionality?  Thanks in
> advance!
> 
> Justin
> 
> 
> 
> 
> This e-mail is intended solely for the person or entity to which it is
> addressed
> and may contain confidential and/or privileged information. Any review,
> dissemination,
> copying, printing or other use of this e-mail by persons or entities other
> than the 
> addressee is prohibited. If you have received this e-mail in error, please
> contact
> the sender immediately and delete the material from any computer.
> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
> Texas 75201
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24334849.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Use cases for MS Word files

Posted by MSB <ma...@tiscali.co.uk>.
Justin,

To keep you up to date with progress, I have only been able to spend about
an hour on the code today and it is still very, very far from working
properly. Just to give you some oversight, I was simply looking to merge one
paragraph from a Word document into another Word document, moreover, to be
able to identify which paragraph to merge and where to insert it. Once I had
this working, the plan was to add further methods that would have allowed me
to specifiy a list of paragraph numbers to merge from one document into
another or even a range of the same. As it stands, the code looks like this;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Section;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.ParagraphProperties;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.CharacterProperties;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

/**
 *
 * @author win user
 */
public class MergeTest {

    private HWPFDocument mergeToDocument = null;
    private Range mergeToDocRange = null;
    private String mergeToDocName = null;

    public MergeTest(String mergeToDocName) throws FileNotFoundException,
IOException {
        File mergeToFile = new File(mergeToDocName);
        FileInputStream fis = new FileInputStream(mergeToFile);
        this.mergeToDocument = new HWPFDocument(fis);
        this.mergeToDocRange = this.mergeToDocument.getRange();
        fis.close();
        fis = null;
        this.mergeToDocName = mergeToDocName;
    }

    public void mergeParaFrom(String mergeFilename, int numParaToMerge,
                              int numParaMergeAfter) throws
FileNotFoundException, IOException {
        File mergeFromFile = new File(mergeFilename);
        FileInputStream fis = new FileInputStream(mergeFromFile);
        HWPFDocument mergeFromDoc = new HWPFDocument(fis);
        Range docRange = mergeFromDoc.getRange();
        if(numParaToMerge > docRange.numParagraphs()) {
            throw new IllegalArgumentException("Value passed to
numParaToMerge " +
                    "parameter greater than the number of Paragraphs in the
document.");
        }
        if(numParaMergeAfter > this.mergeToDocRange.numParagraphs()) {
            throw new IllegalArgumentException("Value passed to
numParaMergeAfter " +
                    "parameter greater than the number of Paragraphs in the
document.");
        }
        Paragraph paraToMerge = docRange.getParagraph(numParaToMerge);
       
this.mergeParaIntoDoc(this.mergeToDocRange.getParagraph(numParaMergeAfter),
paraToMerge);
    }

    public void mergeParaIntoDoc(Paragraph mergeAfterPara, Paragraph
toMergePara) {
        CharacterRun newCharRun = null;
        CharacterRun toMergeCharRun = null;
        CharacterProperties charProps = null;
        String text = null;
        ParagraphProperties paraProps = toMergePara.cloneProperties();
        Range range = mergeAfterPara.insertAfter(paraProps, 0);
        System.out.println("Text: " + toMergePara.text());
        int numCharRuns = toMergePara.numCharacterRuns();
        for(int i = 0; i < numCharRuns; i++) {
            toMergeCharRun = toMergePara.getCharacterRun(i);
            text = toMergeCharRun.text();
            text = CharacterRun.stripFields(text);
            charProps = toMergeCharRun.cloneProperties();
            newCharRun = range.insertAfter(text, charProps);
            //newCharRun = range.insertAfter(text);
            range = newCharRun;
        }
    }

    public void saveMergedDocument() throws FileNotFoundException,
IOException {
        this.saveMergedDocument(this.mergeToDocName);
    }

    public void saveMergedDocument(String filename) throws
FileNotFoundException, IOException {
        File outputFile = null;
        FileOutputStream fos = null;
        try {
            outputFile = new File(filename);
            fos = new FileOutputStream(outputFile);
            this.mergeToDocument.write(fos);
        }
        finally {
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        MergeTest mergeTest = null;
        try {
            mergeTest = new MergeTest("C:/temp/Merge Document.doc");
            mergeTest.mergeParaFrom("C:/temp/Source Document.doc", 2, 3);
            mergeTest.saveMergedDocument("C:/temp/Merge Results.doc");
        }
        catch(Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows..............");
            ex.printStackTrace(System.out);
        }
    }
}

Whilst I think that the basic premise is sound - insert a new paragraph into
the document and add to it each character run from that paragraph that is
being merged, all of the 'style' information - font, size, etc - is lost
when the paragraph is inserted and so I think I am looking at writing
methods to deep copy the CharacterProperties and most likely the
ParagraphProperties as well. Will take a look at the source for the
cloneProperties() methods firstly though for clues. If it is possible to get
this to work, there are still going to be lots of other problems; pictures,
tables, OLE objects, what happens if the text to be merged is arranged into
columns and so on.

I will keep playing with the code when I have the time - and once it cools
down a little around here - and let you know what happens; as before though,
I cannot promise when this will be. Am also going to look into an
alternative approach where paragraphs are extracted from documents and
merged to form a new document - could be tricky but might work.

Yours

Mark B


MSB wrote:
> 
> Hello Justin,
> 
> Not to hand, no I do not. Having said that I am quite willing to try and
> put something together but cannot promise a time scale, sorry. If I have
> any time today, I will look into writing something. Can I just ask how you
> want to perform the merge? Do you want to simply copy text from one
> document into an existing document or do you want to take some text from
> two or more documents and merge that into a new document?
> 
> Thinking a little bit more overnight, the answer to merging documents
> ought to have been 'yes but with a caveat'; fonts could be an issue but I
> am not at all sure about this and it would require testing. I am thinking
> here about a document that could have been created on another machine
> entirely and then emailed to you; if it uses an obscure font then we could
> face a problem however, this is hard to prove until some testing is
> undertaken.
> 
> Yours
> 
> Mark B
> 
> 
> Beltran, Justin wrote:
>> 
>> Hi Mark,
>> 
>> Do you have an examples of how to merge different word documents?  I've
>> seen code to parse a word doc, but not how to merge different documents.
>> 
>> Justin
>> 
>> -----Original Message-----
>> From: MSB [mailto:markbrdsly@tiscali.co.uk] 
>> Sent: Tuesday, June 30, 2009 11:56 PM
>> To: user@poi.apache.org
>> Subject: Re: Use cases for MS Word files
>> 
>> 
>> Morning Justin,
>> 
>> I think that the answers to your questions are yes, yes, no and no in
>> that
>> order. Do not take this as the final answer however as I have not used
>> HWPF\XSSF for a while now and the project could have advanced since that
>> time.
>> 
>> As for other open source APIs, there is not another one that I am aware
>> of
>> which targets both the binary and OPenXML file formats. There is the 
>> OpenXML4j project at Sourceforge
>> (http://sourceforge.net/projects/openxml4j/) but this is 'limited' to
>> just
>> the XML based file format. Also, I have not used that tool so cannot
>> speak
>> to it's feature set, sorry. Of course, there are commercial tools -
>> Aspose
>> is the one that springs to mind.
>> 
>> While OLE might have been an option if you were targetting just Windows
>> platforms. OpenOffice could offer you an alternative. It is open source
>> and
>> platform independent but quite large to deploy. UNO is not an easy
>> technique/interface to learn and I do not have complete confidence in
>> OpenOffice's abilities to accurately render complex documents; at least
>> in
>> the binary (OLE2CDF) file format. Further, applications that use it can
>> be
>> quite slow because you will actually be manipulating an instance of the
>> application rather than creating a file. Finally, there are complications
>> if
>> you want to run it in a client server configuration as you will need to
>> create what is termed a 'connection aware' client at the very least.
>> 
>> If you have the time, it might be worth seeing what would be required to
>> add
>> the necessary capabilities into HWPF\XWPF. I am certain there are others
>> who
>> would like to see this sort of functionality and would be delighted if
>> you
>> could join the development team and contribute patches.
>> 
>> Yours
>> 
>> Mark B
>> 
>> 
>> Beltran, Justin wrote:
>>> 
>>> Hi all,
>>> 
>>> I'm doing initial research on a project and I'm trying to see what how
>>> mature the capabilities are in POI in regards to the following:
>>> 
>>> 
>>> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
>>> 
>>> 2.       Merging different word documents
>>> 
>>> 3.       Creating hyperlinks (not to external URLs, but to other places
>>> in
>>> document)
>>> 
>>> 4.       Creating table of contents
>>> 
>>> If POI currently doesn't have these capabilities, are there any other
>>> open
>>> source Java packages that can deliver the same functionality?  Thanks in
>>> advance!
>>> 
>>> Justin
>>> 
>>> 
>>> 
>>> 
>>> This e-mail is intended solely for the person or entity to which it is
>>> addressed
>>> and may contain confidential and/or privileged information. Any review,
>>> dissemination,
>>> copying, printing or other use of this e-mail by persons or entities
>>> other
>>> than the 
>>> addressee is prohibited. If you have received this e-mail in error,
>>> please
>>> contact
>>> the sender immediately and delete the material from any computer.
>>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
>>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
>>> Texas 75201
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> This e-mail is intended solely for the person or entity to which it is
>> addressed
>> and may contain confidential and/or privileged information. Any review,
>> dissemination,
>> copying, printing or other use of this e-mail by persons or entities
>> other than the 
>> addressee is prohibited. If you have received this e-mail in error,
>> please contact
>> the sender immediately and delete the material from any computer.
>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
>> Texas 75201
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24309490.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Use cases for MS Word files

Posted by MSB <ma...@tiscali.co.uk>.
Hello Justin,

Not to hand, no I do not. Having said that I am quite willing to try and put
something together but cannot promise a time scale, sorry. If I have any
time today, I will look into writing something. Can I just ask how you want
to perform the merge? Do you want to simply copy text from one document into
an existing document or do you want to take some text from two or more
documents and merge that into a new document?

Thinking a little bit more overnight, the answer to merging documents ought
to have been 'yes but with a caveat'; fonts could be an issue but I am not
at all sure about this and it would require testing. I am thinking here
about a document that could have been created on another machine entirely
and then emailed to you; if it uses an obscure font then we could face a
problem however, this is hard to prove until some testing is undertaken.

Yours

Mark B


Beltran, Justin wrote:
> 
> Hi Mark,
> 
> Do you have an examples of how to merge different word documents?  I've
> seen code to parse a word doc, but not how to merge different documents.
> 
> Justin
> 
> -----Original Message-----
> From: MSB [mailto:markbrdsly@tiscali.co.uk] 
> Sent: Tuesday, June 30, 2009 11:56 PM
> To: user@poi.apache.org
> Subject: Re: Use cases for MS Word files
> 
> 
> Morning Justin,
> 
> I think that the answers to your questions are yes, yes, no and no in that
> order. Do not take this as the final answer however as I have not used
> HWPF\XSSF for a while now and the project could have advanced since that
> time.
> 
> As for other open source APIs, there is not another one that I am aware of
> which targets both the binary and OPenXML file formats. There is the 
> OpenXML4j project at Sourceforge
> (http://sourceforge.net/projects/openxml4j/) but this is 'limited' to just
> the XML based file format. Also, I have not used that tool so cannot speak
> to it's feature set, sorry. Of course, there are commercial tools - Aspose
> is the one that springs to mind.
> 
> While OLE might have been an option if you were targetting just Windows
> platforms. OpenOffice could offer you an alternative. It is open source
> and
> platform independent but quite large to deploy. UNO is not an easy
> technique/interface to learn and I do not have complete confidence in
> OpenOffice's abilities to accurately render complex documents; at least in
> the binary (OLE2CDF) file format. Further, applications that use it can be
> quite slow because you will actually be manipulating an instance of the
> application rather than creating a file. Finally, there are complications
> if
> you want to run it in a client server configuration as you will need to
> create what is termed a 'connection aware' client at the very least.
> 
> If you have the time, it might be worth seeing what would be required to
> add
> the necessary capabilities into HWPF\XWPF. I am certain there are others
> who
> would like to see this sort of functionality and would be delighted if you
> could join the development team and contribute patches.
> 
> Yours
> 
> Mark B
> 
> 
> Beltran, Justin wrote:
>> 
>> Hi all,
>> 
>> I'm doing initial research on a project and I'm trying to see what how
>> mature the capabilities are in POI in regards to the following:
>> 
>> 
>> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
>> 
>> 2.       Merging different word documents
>> 
>> 3.       Creating hyperlinks (not to external URLs, but to other places
>> in
>> document)
>> 
>> 4.       Creating table of contents
>> 
>> If POI currently doesn't have these capabilities, are there any other
>> open
>> source Java packages that can deliver the same functionality?  Thanks in
>> advance!
>> 
>> Justin
>> 
>> 
>> 
>> 
>> This e-mail is intended solely for the person or entity to which it is
>> addressed
>> and may contain confidential and/or privileged information. Any review,
>> dissemination,
>> copying, printing or other use of this e-mail by persons or entities
>> other
>> than the 
>> addressee is prohibited. If you have received this e-mail in error,
>> please
>> contact
>> the sender immediately and delete the material from any computer.
>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
>> Texas 75201
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> This e-mail is intended solely for the person or entity to which it is
> addressed
> and may contain confidential and/or privileged information. Any review,
> dissemination,
> copying, printing or other use of this e-mail by persons or entities other
> than the 
> addressee is prohibited. If you have received this e-mail in error, please
> contact
> the sender immediately and delete the material from any computer.
> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
> Texas 75201
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24301974.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Use cases for MS Word files

Posted by "Beltran, Justin" <jb...@hitachiconsulting.com>.
Hi Mark,

Do you have an examples of how to merge different word documents?  I've seen code to parse a word doc, but not how to merge different documents.

Justin

-----Original Message-----
From: MSB [mailto:markbrdsly@tiscali.co.uk] 
Sent: Tuesday, June 30, 2009 11:56 PM
To: user@poi.apache.org
Subject: Re: Use cases for MS Word files


Morning Justin,

I think that the answers to your questions are yes, yes, no and no in that
order. Do not take this as the final answer however as I have not used
HWPF\XSSF for a while now and the project could have advanced since that
time.

As for other open source APIs, there is not another one that I am aware of
which targets both the binary and OPenXML file formats. There is the 
OpenXML4j project at Sourceforge
(http://sourceforge.net/projects/openxml4j/) but this is 'limited' to just
the XML based file format. Also, I have not used that tool so cannot speak
to it's feature set, sorry. Of course, there are commercial tools - Aspose
is the one that springs to mind.

While OLE might have been an option if you were targetting just Windows
platforms. OpenOffice could offer you an alternative. It is open source and
platform independent but quite large to deploy. UNO is not an easy
technique/interface to learn and I do not have complete confidence in
OpenOffice's abilities to accurately render complex documents; at least in
the binary (OLE2CDF) file format. Further, applications that use it can be
quite slow because you will actually be manipulating an instance of the
application rather than creating a file. Finally, there are complications if
you want to run it in a client server configuration as you will need to
create what is termed a 'connection aware' client at the very least.

If you have the time, it might be worth seeing what would be required to add
the necessary capabilities into HWPF\XWPF. I am certain there are others who
would like to see this sort of functionality and would be delighted if you
could join the development team and contribute patches.

Yours

Mark B


Beltran, Justin wrote:
> 
> Hi all,
> 
> I'm doing initial research on a project and I'm trying to see what how
> mature the capabilities are in POI in regards to the following:
> 
> 
> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
> 
> 2.       Merging different word documents
> 
> 3.       Creating hyperlinks (not to external URLs, but to other places in
> document)
> 
> 4.       Creating table of contents
> 
> If POI currently doesn't have these capabilities, are there any other open
> source Java packages that can deliver the same functionality?  Thanks in
> advance!
> 
> Justin
> 
> 
> 
> 
> This e-mail is intended solely for the person or entity to which it is
> addressed
> and may contain confidential and/or privileged information. Any review,
> dissemination,
> copying, printing or other use of this e-mail by persons or entities other
> than the 
> addressee is prohibited. If you have received this e-mail in error, please
> contact
> the sender immediately and delete the material from any computer.
> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
> Texas 75201
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

This e-mail is intended solely for the person or entity to which it is addressed
and may contain confidential and/or privileged information. Any review, dissemination,
copying, printing or other use of this e-mail by persons or entities other than the 
addressee is prohibited. If you have received this e-mail in error, please contact
the sender immediately and delete the material from any computer.
To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas, Texas 75201



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Use cases for MS Word files

Posted by MSB <ma...@tiscali.co.uk>.
Morning Justin,

I think that the answers to your questions are yes, yes, no and no in that
order. Do not take this as the final answer however as I have not used
HWPF\XSSF for a while now and the project could have advanced since that
time.

As for other open source APIs, there is not another one that I am aware of
which targets both the binary and OPenXML file formats. There is the 
OpenXML4j project at Sourceforge
(http://sourceforge.net/projects/openxml4j/) but this is 'limited' to just
the XML based file format. Also, I have not used that tool so cannot speak
to it's feature set, sorry. Of course, there are commercial tools - Aspose
is the one that springs to mind.

While OLE might have been an option if you were targetting just Windows
platforms. OpenOffice could offer you an alternative. It is open source and
platform independent but quite large to deploy. UNO is not an easy
technique/interface to learn and I do not have complete confidence in
OpenOffice's abilities to accurately render complex documents; at least in
the binary (OLE2CDF) file format. Further, applications that use it can be
quite slow because you will actually be manipulating an instance of the
application rather than creating a file. Finally, there are complications if
you want to run it in a client server configuration as you will need to
create what is termed a 'connection aware' client at the very least.

If you have the time, it might be worth seeing what would be required to add
the necessary capabilities into HWPF\XWPF. I am certain there are others who
would like to see this sort of functionality and would be delighted if you
could join the development team and contribute patches.

Yours

Mark B


Beltran, Justin wrote:
> 
> Hi all,
> 
> I'm doing initial research on a project and I'm trying to see what how
> mature the capabilities are in POI in regards to the following:
> 
> 
> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
> 
> 2.       Merging different word documents
> 
> 3.       Creating hyperlinks (not to external URLs, but to other places in
> document)
> 
> 4.       Creating table of contents
> 
> If POI currently doesn't have these capabilities, are there any other open
> source Java packages that can deliver the same functionality?  Thanks in
> advance!
> 
> Justin
> 
> 
> 
> 
> This e-mail is intended solely for the person or entity to which it is
> addressed
> and may contain confidential and/or privileged information. Any review,
> dissemination,
> copying, printing or other use of this e-mail by persons or entities other
> than the 
> addressee is prohibited. If you have received this e-mail in error, please
> contact
> the sender immediately and delete the material from any computer.
> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
> Texas 75201
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org