You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "SnehaUttam.Dudhane" <Sn...@cognizant.com> on 2015/05/23 10:43:19 UTC

Downloading Data in Microsoft Word format using Apache POI.

Hi,

I am using Apache POI to read paragraph at a time from internet. I am
reading "*.rtf" file, and writing it into ".docx" file, It downloads the
contents and stores it, in the docx file, but it also displays text
formatting information in that docx file.

For ex. after downloading, my file looks like -
\p\f0\Arial\12\b\plain Diagnosis of report \tab\f0\Arial\10\plain :
\p\f0\Arial\12\plain C01253320

But, it should look like -
Diagnosis of report : C01253320

Is this the compatibility issue with Apache POI?
It should not show me the formatting of text.
Please help me resolve this issue.





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Downloading-Data-in-Microsoft-Word-format-using-Apache-POI-tp5718871.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Downloading Data in Microsoft Word format using Apache POI.

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
As you have found out, the Rich Text Format uses markup in a similar way to
html. You will have to either find a way to interpret the markup yourself
and convert that into instructions that you can encode using POI (this is a
good place to start http://www.biblioscape.com/rtf15_spec.htm) or you should
look to using something like JODConverter
(http://www.artofsolving.com/opensource/jodconverter.html) to leverage the
functionality within Open/LibreOffice and perform the file conversions this
way. 

My only concern is that I an not sure whether Open/LiberOffice fully support
writing the Word xml file format. At the back of my mind, I feel that they
will simply save the file as rtf with the .docx extension. If you do that,
Word will silently open the document for you. This would not of course be a
true file conversion and it is something that I think merits further
investigation. Sorry to say that I cannot help as I use LiberOffice
exclusively and do not have access to Word on my machine.

If you are working on a Windows platform /exclusively/, then you could look
at OLE/COM and using something like JACOB to create and control an instance
of Word itself and perform the file conversions that way. This does work
quite well but is slow and a little cumbersome.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Downloading-Data-in-Microsoft-Word-format-using-Apache-POI-tp5718871p5718882.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Downloading Data in Microsoft Word format using Apache POI.

Posted by "SnehaUttam.Dudhane" <Sn...@cognizant.com>.
You were right, I was reading byte code and writing it in a word file using
XSSF.
I tried using BufferedReader also, it just copies all the formatting in text
and writes it as a part of doc file.

Here is my code -

            BufferedReader in = new BufferedReader(new
InputStreamReader(link.openStream()));
		 String inputLine;
	        while ((inputLine = in.readLine()) != null)
	            System.out.println(inputLine);
	     in.close();   

This inputLine contains all the formatting, like, 
"
{\rtf1\ansi\ansicpg1252\paperh15840\paperw12240\margl720\margr720\margt720\margb720\psz1{\colortbl\
"

Please help.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Downloading-Data-in-Microsoft-Word-format-using-Apache-POI-tp5718871p5718880.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Downloading Data in Microsoft Word format using Apache POI.

Posted by Aram Mirzadeh <aw...@mbcli.com>.
Are you using POI-HMEF to read in the RTF?  Not that it matters, but I
don't believe HMEF and HSSF/XSSF or SS are directly linked.

You'll have to read and store all the formatting information, then create a
new XSSF document and apply the formatting you want.

>From your output I suspect you're just reading in byte code and writing
byte code out and expecting it to auto format for you and AFAIK there is no
such function within POI.

On Sat, May 23, 2015 at 4:43 AM, SnehaUttam.Dudhane <
SnehaUttam.Dudhane@cognizant.com> wrote:

> Hi,
>
> I am using Apache POI to read paragraph at a time from internet. I am
> reading "*.rtf" file, and writing it into ".docx" file, It downloads the
> contents and stores it, in the docx file, but it also displays text
> formatting information in that docx file.
>
> For ex. after downloading, my file looks like -
> \p\f0\Arial\12\b\plain Diagnosis of report \tab\f0\Arial\10\plain :
> \p\f0\Arial\12\plain C01253320
>
> But, it should look like -
> Diagnosis of report : C01253320
>
> Is this the compatibility issue with Apache POI?
> It should not show me the formatting of text.
> Please help me resolve this issue.
>
>
>
>
>
> --
> View this message in context:
> http://apache-poi.1045710.n5.nabble.com/Downloading-Data-in-Microsoft-Word-format-using-Apache-POI-tp5718871.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>