You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by online2 <ha...@yahoo.com> on 2012/07/03 11:04:59 UTC

Re: How to read the value of bookmarks? (docx)

Hi Mark,

thank you for your code. Your work is awesome. It works well in my tests
except the multiple cells for one bookmark. Eventhough, it's really awesome. 

I can really take it to work now. Sorry because I'm not good into working
with XML. But thank you because I can get almost bookmarks from docx now. 

Have a nice day,
online2

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710367.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Thanks for letting me know and it is good to hear that you have managed to
track the source of the problem down.

One concern I do have however is concerning the .dotm files. I am no expert
but understand that this specific extension indicates to Word that the file
contains macros but that these can be trusted. As a result, Word will not
display a prompt to the user asking them whether or not macro code should be
allowed to run. Changing the type as you are doing may subvert this process
- I do not now that it will however and think this conclusions should be
tested - something which might annoy the document's authors and users.

With regard to simple template files - dotx - then this is the perfect
solution to the problem I think and well done for finding it. With regard to
the dotm files, I think it might be worth discussing the repercussions of
the changes - that is if my suspicions are correct - with your users.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710759.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by Hashem <as...@gmail.com>.
Mark, 

I did what you suggested: 
1 - I created a very simple file and saved it as dotx and docx formats. 
2- I changed their formats to zip and then unzipped them and compared the
contents. 
3- There was a difference in contentType at "[Content_Types].xml" file (for
this part: PartName="/word/document.xml"). Also there were very few
differences in some other xml files which I think are not so important.

So I tried to change the contentType of the generated file. I ran your
program and passed a template having few bookmarks (.dotx template file),
then I called the following method after calling your save method. And now
word can open the file without any problem: 

private void changeContentType(String filename) {
	try{
    OPCPackage opcPackage = OPCPackage.open(new FileInputStream(filename));
	opcPackage.replaceContentType(
	
"application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
	
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
		);			
	opcPackage.save(new FileOutputStream(filename));
    } catch (InvalidFormatException e) {
		e.printStackTrace();            
    } catch (IOException e) {
		e.printStackTrace();
	}
}	

I tried to change the contentType using the document object itself but it
did not work. That's why I had to create a new method and do another process
on the generated file! So, the question is that how can we modify the code
in a way that works with OPCPackage in first place? This way working with
templates will be pretty easy!

P.S: I am going to test the same for .dotm file(which has binary data as
well) and see what will happen!  I will update this comment and let you know
if it works. 


Thank you for your help,
Regards





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710754.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Sorry, I do not. The key to working the problem out should be quite
straightforward though and I think that you need to do this.

Use Word to create a very simple document template file; it need only
contain maybe one sentence of text.
Re-save that same file using Word again but this time as a .docx.
Use a tool like pkunzip or winRAR to unzip both files so that you can look
at the xml markup.
Dig through the markup to spot what is different between the two files and
it is likely that the changes to the markup between the two otherwise
identical files are the key.

It has been a very long time since I did look, but I think the problems are
caused by a single line in the markup that identifies to Word's parser the
file type. Once the relevant piece of markup is identified, then it is
important to track down just which class within XWPF sets/writes this value.
Once this has been found, it should be quite a straightforward task to amend
amend it so that the file is written away correctly.

I should have some time today and will take a look into it myself to see
what I can find but I cannot offer any promise as to when this will be.

Yours

Mark B



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710750.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by asbaranjan <as...@gmail.com>.
Mark, 

You are right. I did so and your code worked properly. Do you know any
approach to open template files (.dotm and .dotx formats) in java program
modify them and then save them as .docx format?

Regards,



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710743.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Can I just check a single point with you please? Are you opening a template
file using POI, modifying that file and then saving it back again as a
.docx? If this is the case, then POI does not support this sort of operation
at all as far as I am aware. There are other features of the xml markup that
indicate to the Word application when it is working with a template file
rather than a true document file. POI does not modify these features when
you simply change the file extension and that has been one cause of problems
in the past. Obviously, I do not know that this is the cause of the problems
in this case but fully expect that it is. To test my hypothesis, use Word to
open one of your template files and re-save it as a document (.docx). Open
this newly created (converted) document using POI and try the modifications
on it, I suspect that the problems will have gone away.

Yours

Mark B

PS The code I submitted is not a part of the POI project but an example to
help others out and it was never anything other than beta code at best. I do
not offer on-going support for it and, if you want it to operate
differently, that is very much up to you to implement.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710729.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by asbaranjan <as...@gmail.com>.
Hi Mark, 

Thank you very much for your code. It was very helpful and answered most of
my questions about the docx files. 

I am using .dotm templates. I give a .dotm file as input to your code, I
have put some logs in your code and track what's going on in the program, it
shows that the code runs properly and sets the bookmark's text but when I
open the output .docx file in MS Word it says there is an error in file and
can not open the file.

P.S: I changed the output format from .docx to .zip and saw the contents of
document.xml in the zip archive, the changes/replacements were there, but
still I don't know what's wrong with the file that MS Word can not open it.
should I change the coed for .dotm files?

Any comment would be appreciated, 
Thanks and regards





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710728.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: How to read the value of bookmarks? (docx)

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Thanks for letting me know and I hope that it serves you well.

There is already partial support for multiple cells in there; well at least
I have defined a constructor that should allow you to handle them. The
intention was to pass an array of references to the cells to the constructor
and then build in the code to process this array. Should be a fairly
straightforward fix.

Currently, I have abandoned that code base as it had other problems; chiefly
that it would not support multi-paragraph bookmarks nor bookmarked
rows/columns in a table. Also, I did not like the fact that all you could do
was insert text, it seems more logical to insert runs, paragraphs tables etc
at the bookmark and I am thinking about altering the code to support this.
Sadly work has forced me to place the project aside at the moment - we need
to make lots more charcoal to raise funds for a local wildlife trust,
smelly, dirty and very time consuming, but a lot of fun. Still, I will work
on it in the background as and when I have the time and post any updates to
this thread in case it might be of use to someone.

Do not be afraid to dig around in the markup. All I do is use Word to create
the effect I am looking for and then unzip the archive and look at the
markup. Then I know what to aim for an simply test ideas out until I am able
to match the markup that Word produced. Yes, it can be time consuming but it
is most certainly instructive.

Yours

Mark B

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710370.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org