You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Baby Periasamy <ba...@gmail.com> on 2011/07/13 10:13:06 UTC
Read Word document and display it with in a textarea of jsp
Hi POI Users,
I wanted to read a word document which can have rich content, images and
tables.
I am able to get the image and text and i am able to get the text from the
table also.
but i could not display the table struture exactly in jsp, how it looks in
word document.
How can i get the table properties from my word document.
Please help me out.
Thanks in advance.
Baby Periasamy.
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4581911.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Baby Periasamy <ba...@gmail.com>.
Hi,
When I tried I am getting the following error,
java.lang.VerifyError: (class:
org/apache/poi/hwpf/converter/AbstractWordConverter, method: processField
signature:
(Lorg/apache/poi/hwpf/HWPFDocument;Lorg/apache/poi/hwpf/usermodel/Range;ILorg/apache/poi/hwpf/model/Field;Lorg/w3c/dom/Element;)V)
Incompatible argument to function
Exception in thread "main".
And I was getting error on the below codes also,
WordToHtmlUtils.isNotEmpty(String)
WordToHtmlUtils.equals(String)
and for some methods from WordToHtmlUtils also.
Can you plz help me out.
Thanks & Regards,
Baby Periasamy.
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586158.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Yegor Kozlov <ye...@dinom.ru>.
You may want to play with WordToHtmlConverter:
http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java
this is a brand new feature which is present only in trunk. Daily
builds can be downloaded from here:
http://encore.torchbox.com/poi-cvs-build/
Yegor
On Wed, Jul 13, 2011 at 3:28 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Wed, 13 Jul 2011, Baby Periasamy wrote:
>>
>> How can i get the table properties from my word document.
>
> The best example I know of for getting the textual formatting properties
> from word for use elsewhere is within Apache Tika
>
> For .doc files:
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
>
> For .docx files:
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java
>
> For Tika the interest is in generating html, so I think you should find
> things quite similar for your jsp case
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 15 Jul 2011, Baby Periasamy wrote:
> Tha problem here is, the font color and styles are missing in the retrieved
> html.
You'll either want to use your own code based on Tika if you want this
level of detail, or use alternately WordToHtmlConverter which Yegor
pointed you at
> And where the image will be stored? The image is coming as x in the jsp
> page.
It's up to you to get the image from Tika, and do something with it. This
was recently discussed on the Tika list:
http://lucene.472066.n3.nabble.com/Image-Extraction-td3006668.html
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Baby Periasamy <ba...@gmail.com>.
Hi Nick,
Thank you. i have followed that test class. Now i am able to get the
contents in the html form for table, content and images.
Tha problem here is, the font color and styles are missing in the retrieved
html. And where the image will be stored? The image is coming as x in the
jsp page.
Tha contents are only plain html.
Below is the code I've used,
Metadata metadata = new Metadata();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
sw = new StringWriter();
handler.setResult(new StreamResult(sw));
input = OOXMLParser.class.getResourceAsStream(filePath);//
//input = new FileInputStream(fileDirectoryPath);
new OfficeParser().parse(TikaInputStream.get(input), handler,
metadata, new ParseContext());
xml = sw.toString();
//xml.
System.out.println("xml test "+xml);
Plz help me out.
Regards,
Baby Periasamy.
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4590667.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 14 Jul 2011, Baby Periasamy wrote:
> Can you plz tell me how can get the contents after parsing it by using
> the parse method of the WordExtractor class.
Look at testWordHTML in WordParserTest for an example:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Baby Periasamy <ba...@gmail.com>.
Hi Nick,
Can you plz tell me how can get the contents after parsing it by using the
parse method of the WordExtractor class.
How can I get those as html content?
Thank you.
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586267.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Baby Periasamy <ba...@gmail.com>.
Thank you Nick. I will work with that and will post the test result.
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Read-Word-document-and-display-it-with-in-a-textarea-of-jsp-tp4581911p4586056.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Read Word document and display it with in a textarea of jsp
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 13 Jul 2011, Baby Periasamy wrote:
> How can i get the table properties from my word document.
The best example I know of for getting the textual formatting properties
from word for use elsewhere is within Apache Tika
For .doc files:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
For .docx files:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java
For Tika the interest is in generating html, so I think you should find
things quite similar for your jsp case
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org