You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Baby Periasamy <ba...@gmail.com> on 2011/07/13 10:13:06 UTC

Read Word document and display it with in a textarea of jsp

Hi POI Users,

I wanted to read a word document which can have rich content, images and
tables.

I am able to get the image and text and i am able to get the text from the
table also.

but i could not display the table struture exactly in jsp, how it looks in
word document.

How can i get the table properties from my word document.

Please help me out.

Thanks in advance.

Baby Periasamy.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4581911.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Baby Periasamy <ba...@gmail.com>.
Hi,

When I tried I am getting the following error,

java.lang.VerifyError: (class:
org/apache/poi/hwpf/converter/AbstractWordConverter, method: processField
signature:
(Lorg/apache/poi/hwpf/HWPFDocument;Lorg/apache/poi/hwpf/usermodel/Range;ILorg/apache/poi/hwpf/model/Field;Lorg/w3c/dom/Element;)V)
Incompatible argument to function
Exception in thread "main".

And I was getting error on the below codes also,
WordToHtmlUtils.isNotEmpty(String)
WordToHtmlUtils.equals(String)
and for some methods from WordToHtmlUtils also.

Can you plz help me out.

Thanks & Regards,
Baby Periasamy.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586158.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Yegor Kozlov <ye...@dinom.ru>.
You may want to play with WordToHtmlConverter:

http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java

this is a brand new feature which is present only in trunk. Daily
builds can be downloaded from here:
http://encore.torchbox.com/poi-cvs-build/

Yegor

On Wed, Jul 13, 2011 at 3:28 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Wed, 13 Jul 2011, Baby Periasamy wrote:
>>
>> How can i get the table properties from my word document.
>
> The best example I know of for getting the textual formatting properties
> from word for use elsewhere is within Apache Tika
>
> For .doc files:
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
>
> For .docx files:
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java
>
> For Tika the interest is in generating html, so I think you should find
> things quite similar for your jsp case
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 15 Jul 2011, Baby Periasamy wrote:
> Tha problem here is, the font color and styles are missing in the retrieved
> html.

You'll either want to use your own code based on Tika if you want this 
level of detail, or use alternately WordToHtmlConverter which Yegor 
pointed you at

> And where the image will be stored? The image is coming as x in the jsp 
> page.

It's up to you to get the image from Tika, and do something with it. This 
was recently discussed on the Tika list:
    http://lucene.472066.n3.nabble.com/Image-Extraction-td3006668.html

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Baby Periasamy <ba...@gmail.com>.
Hi Nick,

Thank you. i have followed that test class. Now i am able to get the
contents in the html form for table, content and images.

Tha problem here is, the font color and styles are missing in the retrieved
html. And where the image will be stored? The image is coming as x in the
jsp page.

Tha contents are only plain html.

Below is the code I've used,

    Metadata metadata = new Metadata();
    
    StringWriter sw = new StringWriter();
 
    SAXTransformerFactory factory = (SAXTransformerFactory)
    SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
    sw = new StringWriter();
    handler.setResult(new StreamResult(sw));
    
    input = OOXMLParser.class.getResourceAsStream(filePath);//
    //input = new FileInputStream(fileDirectoryPath);  
        new OfficeParser().parse(TikaInputStream.get(input), handler,
metadata, new ParseContext());
        xml = sw.toString();
        //xml.
        System.out.println("xml test "+xml);




Plz help me out. 

Regards,
Baby Periasamy.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4590667.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 14 Jul 2011, Baby Periasamy wrote:
> Can you plz tell me how can get the contents after parsing it by using 
> the parse method of the WordExtractor class.

Look at testWordHTML in WordParserTest for an example:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Baby Periasamy <ba...@gmail.com>.
Hi Nick,

Can you plz tell me how can get the contents after parsing it by using the
parse method of the WordExtractor class.

How can I get those as html content?

Thank you.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586267.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Baby Periasamy <ba...@gmail.com>.
Thank you Nick. I will work with that and will post the test result.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586056.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Read Word document and display it with in a textarea of jsp

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 13 Jul 2011, Baby Periasamy wrote:
> How can i get the table properties from my word document.

The best example I know of for getting the textual formatting properties 
from word for use elsewhere is within Apache Tika

For .doc files:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java

For .docx files:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java

For Tika the interest is in generating html, so I think you should find 
things quite similar for your jsp case

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org