You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@corinthia.apache.org by pm...@apache.org on 2015/04/12 15:46:12 UTC

incubator-corinthia git commit: Have DFGet() and DFPut() accept a custom id prefix

Repository: incubator-corinthia
Updated Branches:
  refs/heads/master 557758302 -> 4c4d03a97


Have DFGet() and DFPut() accept a custom id prefix

One of the problems that can occur when updating an existing concrete
document (e.g. a docx file) based on a modified HTML file is that the id
attributes in the latter need to correctly match the sequentially-
assigned node ids in the former. That is, when the put operation goes
through the concrete document working out what to change, it relies on
the mapping from HTML elements to concrete document elements to match
what it was when the original get operation was performed.

It is crucial that this mapping be correct; otherwise, the result can be
corrupted documents due to the wrong elements being moved into the wrong
places, violating the XML document's schema.

A situation that can arise when the conversion library is used for
editing is that a user could open a one document, copy some text to the
clipboard (stored in HTML format, including the id attributes), and then
open a different document and paste that HTML into it. This will result
in the second document containing HTML elements whose id attributes had
meaning in the first document, but not in the latter. When the second
document is later "saved" (that is, its HTML representation is used to
update the originating docx file), these incorrect id attributes can
result in the wrong mapping, causing the problems above.

A solution to this is to use a unique prefix for every conversion, e.g.
a random number. This was previously done, but that capability was taken
out to simplify the API. However, I've received several bug reports from
the latest update to UX Write regarding corrupted documents and/or text
missing on save & reopen, and I suspect the above issue is probably the
cause.

The DFGet() and DFPut() functions now take an optional string to be used
as an id prefix - this can be NULL, in which case the filter will choose
its own (the only current filter this is relevant to is the Word filter,
which chooses the "word" prefix). dfconvert also now allows the user to
optionally specify an id prefix on the command line.


Project: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/commit/4c4d03a9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/tree/4c4d03a9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/diff/4c4d03a9

Branch: refs/heads/master
Commit: 4c4d03a972508feeb2fb245c4dc460f6bee64514
Parents: 5577583
Author: Peter Kelly <pe...@uxproductivity.com>
Authored: Sun Apr 12 20:24:19 2015 +0700
Committer: Peter Kelly <pe...@uxproductivity.com>
Committed: Sun Apr 12 20:37:54 2015 +0700

----------------------------------------------------------------------
 DocFormats/api/headers/DocFormats/Operations.h  |  8 ++++----
 DocFormats/api/src/Operations.c                 | 12 +++++++++--
 DocFormats/filters/odf/src/text/ODFText.c       |  4 ++--
 DocFormats/filters/odf/src/text/ODFText.h       |  4 ++--
 DocFormats/filters/ooxml/src/word/Word.c        | 10 +++++-----
 DocFormats/filters/ooxml/src/word/Word.h        |  4 ++--
 .../filters/ooxml/src/word/WordConverter.c      | 12 +++++------
 .../filters/ooxml/src/word/WordConverter.h      |  4 ++--
 DocFormats/filters/ooxml/tests/word/WordTests.c |  4 ++--
 consumers/dfconvert/src/main.c                  | 21 ++++++++++++--------
 10 files changed, 48 insertions(+), 35 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/api/headers/DocFormats/Operations.h
----------------------------------------------------------------------
diff --git a/DocFormats/api/headers/DocFormats/Operations.h b/DocFormats/api/headers/DocFormats/Operations.h
index 1bb6ee2..c84ad84 100644
--- a/DocFormats/api/headers/DocFormats/Operations.h
+++ b/DocFormats/api/headers/DocFormats/Operations.h
@@ -39,14 +39,14 @@ void DFAbstractDocumentRelease(DFAbstractDocument *abstract);
 DFDocument *DFAbstractDocumentGetHTML(DFAbstractDocument *abstract);
 void DFAbstractDocumentSetHTML(DFAbstractDocument *abstract, DFDocument *htmlDoc);
 
-int DFGet(DFConcreteDocument *concrete, DFAbstractDocument *abstract, DFError **error);
-int DFPut(DFConcreteDocument *concrete, DFAbstractDocument *abstract, DFError **error);
+int DFGet(DFConcreteDocument *concrete, DFAbstractDocument *abstract, const char *idPrefix, DFError **error);
+int DFPut(DFConcreteDocument *concrete, DFAbstractDocument *abstract, const char *idPrefix, DFError **error);
 int DFCreate(DFConcreteDocument *concrete, DFAbstractDocument *abstract, DFError **error);
 
 // Abstraction level 1
 
-int DFGetFile(const char *concrete, const char *abstract, DFError **error);
-int DFPutFile(const char *concrete, const char *abstract, DFError **error);
+int DFGetFile(const char *concrete, const char *abstract, const char *idPrefix, DFError **error);
+int DFPutFile(const char *concrete, const char *abstract, const char *idPrefix, DFError **error);
 int DFCreateFile(const char *concrete, const char *abstract, DFError **error);
 
 #endif

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/api/src/Operations.c
----------------------------------------------------------------------
diff --git a/DocFormats/api/src/Operations.c b/DocFormats/api/src/Operations.c
index 872f4c5..d13efbc 100644
--- a/DocFormats/api/src/Operations.c
+++ b/DocFormats/api/src/Operations.c
@@ -159,6 +159,7 @@ void DFAbstractDocumentSetHTML(DFAbstractDocument *abstract,
 
 int DFGet(DFConcreteDocument *concrete,
           DFAbstractDocument *abstract,
+          const char *idPrefix,
           DFError **error)
 {
     if (DFStorageFormat(abstract->storage) != DFFileFormatHTML) {
@@ -172,11 +173,13 @@ int DFGet(DFConcreteDocument *concrete,
         case DFFileFormatDocx:
             htmlDoc = WordGet(concrete->storage,
                               abstract->storage,
+                              idPrefix,
                               error);
             break;
         case DFFileFormatOdt:
             htmlDoc = ODFTextGet(concrete->storage,
                                  abstract->storage,
+                                 idPrefix,
                                  error);
             break;
         default:
@@ -194,6 +197,7 @@ int DFGet(DFConcreteDocument *concrete,
 
 int DFPut(DFConcreteDocument *concreteDoc,
           DFAbstractDocument *abstractDoc,
+          const char *idPrefix,
           DFError **error)
 {
     if (DFStorageFormat(abstractDoc->storage) != DFFileFormatHTML) {
@@ -208,12 +212,14 @@ int DFPut(DFConcreteDocument *concreteDoc,
             ok = WordPut(concreteDoc->storage,
                          abstractDoc->storage,
                          abstractDoc->htmlDoc,
+                         idPrefix,
                          error);
             break;
         case DFFileFormatOdt:
             ok = ODFTextPut(concreteDoc->storage,
                             abstractDoc->storage,
                             abstractDoc->htmlDoc,
+                            idPrefix,
                             error);
             break;
         default:
@@ -256,6 +262,7 @@ int DFCreate(DFConcreteDocument *concreteDoc,
 
 int DFGetFile(const char *concreteFilename,
               const char *abstractFilename,
+              const char *idPrefix,
               DFError **error)
 {
     int r = 0;
@@ -275,7 +282,7 @@ int DFGetFile(const char *concreteFilename,
 
     abstractDoc = DFAbstractDocumentNew(abstractStorage);
 
-    if (!DFGet(concreteDoc, abstractDoc, error)
+    if (!DFGet(concreteDoc, abstractDoc, idPrefix, error)
         || (abstractDoc->htmlDoc == NULL)) {
         DFErrorFormat(error, "%s: %s",
                       concreteFilename,
@@ -311,6 +318,7 @@ end:
 
 int DFPutFile(const char *concreteFilename,
               const char *abstractFilename,
+              const char *idPrefix,
               DFError **error)
 {
     int ok = 0;
@@ -340,7 +348,7 @@ int DFPutFile(const char *concreteFilename,
     abstractDoc = DFAbstractDocumentNew(abstractStorage2);
     abstractDoc->htmlDoc = DFDocumentRetain(htmlDoc2);
 
-    ok = DFPut(concreteDoc, abstractDoc, error);
+    ok = DFPut(concreteDoc, abstractDoc, idPrefix, error);
 
 end:
     DFDocumentRelease(htmlDoc2);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/odf/src/text/ODFText.c
----------------------------------------------------------------------
diff --git a/DocFormats/filters/odf/src/text/ODFText.c b/DocFormats/filters/odf/src/text/ODFText.c
index 1a7c55f..0e13531 100644
--- a/DocFormats/filters/odf/src/text/ODFText.c
+++ b/DocFormats/filters/odf/src/text/ODFText.c
@@ -18,13 +18,13 @@
 #include "DFPlatform.h"
 #include "ODFText.h"
 
-DFDocument *ODFTextGet(DFStorage *concreteStorage, DFStorage *abstractStorage, DFError **error)
+DFDocument *ODFTextGet(DFStorage *concreteStorage, DFStorage *abstractStorage, const char *idPrefix, DFError **error)
 {
     DFErrorFormat(error,"ODFTextGet: Not yet implemented");
     return NULL;
 }
 
-int ODFTextPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error)
+int ODFTextPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, const char *idPrefix, DFError **error)
 {
     DFErrorFormat(error,"ODFTextPut: Not yet implemented");
     return 0;

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/odf/src/text/ODFText.h
----------------------------------------------------------------------
diff --git a/DocFormats/filters/odf/src/text/ODFText.h b/DocFormats/filters/odf/src/text/ODFText.h
index f1b41ed..70486ab 100644
--- a/DocFormats/filters/odf/src/text/ODFText.h
+++ b/DocFormats/filters/odf/src/text/ODFText.h
@@ -22,8 +22,8 @@
 #include <DocFormats/DFStorage.h>
 #include <DocFormats/DFXMLForward.h>
 
-DFDocument *ODFTextGet(DFStorage *concreteStorage, DFStorage *abstractStorage, DFError **error);
-int ODFTextPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error);
+DFDocument *ODFTextGet(DFStorage *concreteStorage, DFStorage *abstractStorage, const char *idPrefix, DFError **error);
+int ODFTextPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, const char *idPrefix, DFError **error);
 int ODFTextCreate(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error);
 
 #endif

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/ooxml/src/word/Word.c
----------------------------------------------------------------------
diff --git a/DocFormats/filters/ooxml/src/word/Word.c b/DocFormats/filters/ooxml/src/word/Word.c
index e9a5495..d5db0c7 100644
--- a/DocFormats/filters/ooxml/src/word/Word.c
+++ b/DocFormats/filters/ooxml/src/word/Word.c
@@ -26,7 +26,7 @@
 #include "DFZipFile.h"
 #include <stdlib.h>
 
-DFDocument *WordGet(DFStorage *concreteStorage, DFStorage *abstractStorage, DFError **error)
+DFDocument *WordGet(DFStorage *concreteStorage, DFStorage *abstractStorage, const char *idPrefix, DFError **error)
 {
     int ok = 0;
     WordPackage *wordPackage = NULL;
@@ -37,7 +37,7 @@ DFDocument *WordGet(DFStorage *concreteStorage, DFStorage *abstractStorage, DFEr
         goto end;
 
     htmlDoc = DFDocumentNew();
-    if (!WordConverterGet(htmlDoc,abstractStorage,wordPackage,error))
+    if (!WordConverterGet(htmlDoc,abstractStorage,wordPackage,idPrefix,error))
         goto end;
 
     ok = 1;
@@ -53,7 +53,7 @@ end:
     }
 }
 
-int WordPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error)
+int WordPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, const char *idPrefix, DFError **error)
 {
     int ok = 0;
     WordPackage *wordPackage = NULL;
@@ -62,7 +62,7 @@ int WordPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *
     if (wordPackage == NULL)
         goto end;
 
-    if (!WordConverterPut(htmlDoc,abstractStorage,wordPackage,error))
+    if (!WordConverterPut(htmlDoc,abstractStorage,wordPackage,idPrefix,error))
         goto end;
 
     if (!WordPackageSave(wordPackage,error))
@@ -90,7 +90,7 @@ int WordCreate(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocumen
     // a new word file from it.
     HTMLBreakBDTRefs(htmlDoc->docNode,"word");
 
-    if (!WordConverterPut(htmlDoc,abstractStorage,wordPackage,error))
+    if (!WordConverterPut(htmlDoc,abstractStorage,wordPackage,"word",error))
         goto end;
 
     if (!WordPackageSave(wordPackage,error))

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/ooxml/src/word/Word.h
----------------------------------------------------------------------
diff --git a/DocFormats/filters/ooxml/src/word/Word.h b/DocFormats/filters/ooxml/src/word/Word.h
index 3eb8144..98d05e4 100644
--- a/DocFormats/filters/ooxml/src/word/Word.h
+++ b/DocFormats/filters/ooxml/src/word/Word.h
@@ -30,8 +30,8 @@
 
 CSSStyle *WordSetupTableGridStyle(CSSSheet *styleSheet, int *changed);
 
-DFDocument *WordGet(DFStorage *concreteStorage, DFStorage *abstractStorage, DFError **error);
-int WordPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error);
+DFDocument *WordGet(DFStorage *concreteStorage, DFStorage *abstractStorage, const char *idPrefix, DFError **error);
+int WordPut(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, const char *idPrefix, DFError **error);
 int WordCreate(DFStorage *concreteStorage, DFStorage *abstractStorage, DFDocument *htmlDoc, DFError **error);
 int WordCollapseBookmarks(DFStorage *concreteStorage, DFError **error);
 int WordExpandBookmarks(DFStorage *concreteStorage, DFError **error);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/ooxml/src/word/WordConverter.c
----------------------------------------------------------------------
diff --git a/DocFormats/filters/ooxml/src/word/WordConverter.c b/DocFormats/filters/ooxml/src/word/WordConverter.c
index e695c96..1d754f6 100644
--- a/DocFormats/filters/ooxml/src/word/WordConverter.c
+++ b/DocFormats/filters/ooxml/src/word/WordConverter.c
@@ -560,13 +560,13 @@ static void Word_postProcessHTMLDoc(WordConverter *conv)
 //                                                                                                //
 ////////////////////////////////////////////////////////////////////////////////////////////////////
 
-static WordConverter *WordConverterNew(DFDocument *html, DFStorage *abstractStorage, WordPackage *package)
+static WordConverter *WordConverterNew(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, const char *idPrefix)
 {
     WordConverter *converter = (WordConverter *)xcalloc(1,sizeof(WordConverter));
     converter->html = DFDocumentRetain(html);
     converter->abstractStorage = DFStorageRetain(abstractStorage);
     assert(DFStorageFormat(converter->abstractStorage) == DFFileFormatHTML);
-    converter->idPrefix = xstrdup("word");
+    converter->idPrefix = (idPrefix != NULL) ? xstrdup(idPrefix) : xstrdup("word");
     converter->package = WordPackageRetain(package);
     converter->styles = WordSheetNew(converter->package->styles);
     converter->numbering = WordNumberingNew(converter->package);
@@ -676,7 +676,7 @@ DFNode *WordConverterGetConcrete(WordPutData *put, DFNode *abstract)
     return node;
 }
 
-int WordConverterGet(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, DFError **error)
+int WordConverterGet(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, const char *idPrefix, DFError **error)
 {
     if (package->document == NULL) {
         DFErrorFormat(error,"document.xml not found");
@@ -692,7 +692,7 @@ int WordConverterGet(DFDocument *html, DFStorage *abstractStorage, WordPackage *
     int haveFields = Word_simplifyFields(package);
     Word_mergeRuns(package);
 
-    WordConverter *converter = WordConverterNew(html,abstractStorage,package);
+    WordConverter *converter = WordConverterNew(html,abstractStorage,package,idPrefix);
     converter->haveFields = haveFields;
     WordAddNbsps(converter->package->document);
     WordFixLists(converter);
@@ -810,7 +810,7 @@ static void addMissingDefaultStyles(WordConverter *converter)
     }
 }
 
-int WordConverterPut(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, DFError **error)
+int WordConverterPut(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, const char *idPrefix, DFError **error)
 {
     if (package->document == NULL) {
         DFErrorFormat(error,"document.xml not found");
@@ -826,7 +826,7 @@ int WordConverterPut(DFDocument *html, DFStorage *abstractStorage, WordPackage *
     HTML_normalizeDocument(html);
     HTML_pushDownInlineProperties(html->docNode);
 
-    WordConverter *converter = WordConverterNew(html,abstractStorage,package);
+    WordConverter *converter = WordConverterNew(html,abstractStorage,package,idPrefix);
 
     // FIXME: Need a more reliable way of telling whether this is a new document or not - it could be that the
     // document already existed (with styles set up) but did not have any content

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/ooxml/src/word/WordConverter.h
----------------------------------------------------------------------
diff --git a/DocFormats/filters/ooxml/src/word/WordConverter.h b/DocFormats/filters/ooxml/src/word/WordConverter.h
index 5cda316..e255ad2 100644
--- a/DocFormats/filters/ooxml/src/word/WordConverter.h
+++ b/DocFormats/filters/ooxml/src/word/WordConverter.h
@@ -101,8 +101,8 @@ struct WordConverter {
     CSSSheet *styleSheet;
 };
 
-int WordConverterGet(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, DFError **error);
-int WordConverterPut(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, DFError **error);
+int WordConverterGet(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, const char *idPrefix, DFError **error);
+int WordConverterPut(DFDocument *html, DFStorage *abstractStorage, WordPackage *package, const char *idPrefix, DFError **error);
 void WordConverterWarning(WordConverter *converter, const char *format, ...) ATTRIBUTE_FORMAT(printf,2,3);
 
 char *WordStyleIdForStyle(CSSStyle *style);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/DocFormats/filters/ooxml/tests/word/WordTests.c
----------------------------------------------------------------------
diff --git a/DocFormats/filters/ooxml/tests/word/WordTests.c b/DocFormats/filters/ooxml/tests/word/WordTests.c
index 9c6bb7b..5e9b30f 100644
--- a/DocFormats/filters/ooxml/tests/word/WordTests.c
+++ b/DocFormats/filters/ooxml/tests/word/WordTests.c
@@ -100,7 +100,7 @@ static void test_get(void)
     abstractStorage = DFStorageNewMemory(DFFileFormatHTML);
     abstractDoc = DFAbstractDocumentNew(abstractStorage);
 
-    if (!DFGet(concreteDoc,abstractDoc,&error))
+    if (!DFGet(concreteDoc,abstractDoc,NULL,&error))
         goto end;;
 
     DFDocument *htmlDoc = DFAbstractDocumentGetHTML(abstractDoc);
@@ -217,7 +217,7 @@ static void test_put(void)
     DFAbstractDocumentSetHTML(abstractDoc,htmlDoc);
 
     // Update the docx file based on the contents of the HTML file
-    if (!DFPut(concreteDoc,abstractDoc,&error))
+    if (!DFPut(concreteDoc,abstractDoc,NULL,&error))
         goto end;
 
     // Output the updated docx file

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/4c4d03a9/consumers/dfconvert/src/main.c
----------------------------------------------------------------------
diff --git a/consumers/dfconvert/src/main.c b/consumers/dfconvert/src/main.c
index 2d47e40..2790a25 100644
--- a/consumers/dfconvert/src/main.c
+++ b/consumers/dfconvert/src/main.c
@@ -24,16 +24,19 @@ void usage(void)
 {
     printf("Usage:\n"
            "\n"
-           "dfconvert get concrete.docx abstract.html\n"
+           "dfconvert get concrete.docx abstract.html [idprefix]\n"
            "\n"
            "    Create a new HTML file from a Word document. The HTML file must not\n"
-           "    already exist.\n"
+           "    already exist. If idprefix is specified, this will be used for all\n"
+           "    id attributes in the HTML file.\n"
            "\n"
-           "dfconvert put concrete.docx abstract.html\n"
+           "dfconvert put concrete.docx abstract.html [idprefix]\n"
            "\n"
            "    Update an existing Word document based on a modified HTML file. The\n"
            "    Word document must already exist, and must be same document from\n"
-           "    which the HTML file was originally generated.\n"
+           "    which the HTML file was originally generated. If idprefix is specified,\n"
+           "    only those id attributes in the HTML file which have this same prefix\n"
+           "    will be treated as corresponding to existing elements in the docx file.\n"
            "\n"
            "    The put operation cannot be executed twice on the same Word\n"
            "    document, because after the first time, the fact that the document\n"
@@ -51,12 +54,14 @@ void usage(void)
 int main(int argc, const char **argv)
 {
     DFError *error = NULL;
-    if ((argc == 4) && !strcmp(argv[1],"get")) {
-        if (DFGetFile(argv[2],argv[3],&error))
+    if ((argc >= 4) && !strcmp(argv[1],"get")) {
+        const char *idPrefix = (argc >= 5) ? argv[4] : NULL;
+        if (DFGetFile(argv[2],argv[3],idPrefix,&error))
             return 0;
     }
-    else if ((argc == 4) && !strcmp(argv[1],"put")) {
-        if (DFPutFile(argv[2],argv[3],&error))
+    else if ((argc >= 4) && !strcmp(argv[1],"put")) {
+        const char *idPrefix = (argc >= 5) ? argv[4] : NULL;
+        if (DFPutFile(argv[2],argv[3],idPrefix,&error))
             return 0;
     }
     else if ((argc == 4) && !strcmp(argv[1],"create")) {