You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Charles Gutjahr <me...@charlesgutjahr.com> on 2011/04/07 15:06:35 UTC

Proposed RSID patch for XWPF

Sorry, I completely messed up that last email by mixing up HWPF and XWPF. I meant only XWPF — not HWPF!

Ignore the HWPF references, read them as XWPF instead.
 
 

On 07 Apr, 2011,at 11:01 PM, Charles Gutjahr <or...@charlesgutjahr.com> wrote:


Hi

I'm just starting to use POI for generating HWPF Word documents There are two features that I need which I think should be included in POI; and since I need them I figure I might as well build them and contribute patches. I'd like to get some advice and a yea or nay before I go ahead 


The first one is about RSIDs. This started with my question at http://stackoverflow.com/questions/4966087/how-to-generate-rsid-attributes-correctly-in-word-docx-files-using-apache-poi

My application is generating HWPF documents which a number of user will download, edit using Word, then upload back into the system. At the moment we simply overwrite the generated document with user's changes; but it would be more useful if the application could identify what has changed. That's exactly why Word has revision identifiers (RSIDs): they identify all the changes made in one session thus making it easy to identify changes. A document created by Word is full of RSIDs, whereas a document created by POI doesn't have any.


I'd like to add the ability for POI to automatically assign RSIDs to HWPF documents. Here's a rough plan of what I'm thinking about implementing:
 * Add properties to XWPFDocument that store the base RSID, all RSIDs in use the document, and a RSID being used for the current revision
 * Add appropriate methods to get, set and clear those XWPFDocument properties
 * The base RSID and other RSIDs in XWPFDocument will be populated from word/settings.xml when an existing document is loaded
 * The current session RSID will be randomly generated automatically when a XWPFDocument object is constructed. This means that a 'revision' will be defined as the lifetime of that instance.
 * Paragraphs, runs and other content will have appropriate methods to get, set and clear the RSID
 * Add a boolean property (and associated methods) that enables and disables automatic assignment of an RSID to new paragrams, runs, etc added by POI This will probably be disabled by default.
 * When that boolean is enabled, any method that creates new context will automatically assign the current RSID to that content (for example XWPFDocument.createParagraph(), XWPFParagraph.createRun(), XWPFRun.setText() etc)


Does anyone have any comments on or objections to that plan? And should I put this in bugzilla?


I have another need for POI, I will write that one up in my next email...

Cheers
Charlie


Re: Proposed RSID patch for XWPF

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 7 Apr 2011, Charles Gutjahr wrote:
>  * Add properties to XWPFDocument that store the base RSID, all RSIDs in use 
> the document, and a RSID being used for the current revision

XWPFDocument will have a XWPFSettings object if there is a settings part. 
This is probably the right place for the RSID generation / fetching / 
listing code. Depending on how the code looks, we can either expose these 
methods through XWPFDocument, or provide public access to the XWPFSettings 
object.

We'd probably also need a bit of code to create a XWPFSettings if the user 
tries to do RSID stuff and there isn't one. Well, if the settings part is 
option (I'm not sure on this one) - can you check the .docx specification 
and see if the settings part must be there or not?

>  * Add appropriate methods to get, set and clear 
> those XWPFDocument properties
>  * The base RSID and other RSIDs in XWPFDocument will be populated from 
> word/settings.xml when an existing document is loaded

(Covered above)

>  * The current session RSID will be randomly generated automatically when a 
> XWPFDocument object is constructed. This means that a 'revision' will be 
> defined as the lifetime of that instance.

We could maybe do this as a lazy thing. Probably depends on if the 
settings part must be there or not. If it's optional, we'll want to make 
it a lazy thing where we add the settings when first requested, then 
generate the RSID. Otherwise, maybe just have the XWPFSettings constructor 
generate a new RSID each time, but maybe only add it to the list if it 
gets used?

>  * Paragraphs, runs and other content will have appropriate methods to get, 
> set and clear the RSID

Yup. Are you able to look at the specs to see what exactly can have an 
RSID set on it? I guess paragraph and character run can have, from example 
files I've seen, but what about tables? Can the overall table have it set 
on, or only rows/cells, or only paragraphs within there?

>  * Add a boolean property (and associated methods) that enables and disables 
> automatic assignment of an RSID to new paragrams, runs, etc added by POI This 
> will probably be disabled by default.

This may need a bit of thought, as there are quite a few different ways at 
the moment for a paragraph, run or table to be added. Probably best to 
leave this one towards the end, until the rest of the API is clearer

>  * When that boolean is enabled, any method that creates new context will 
> automatically assign the current RSID to that content (for 
> example XWPFDocument.createParagraph(), XWPFParagraph.createRun(), 
> XWPFRun.setText() etc)

(See above)

> Does anyone have any comments on or objections to that plan? And should 
> I put this in bugzilla?

Please open a new bug in bugzilla to track this, and the post patches when 
you've got something ready for review. Please do try to include unit tests 
too, both so we can be sure it works, and so we can check we don't 
accidently break it later!

Thanks
Nick