You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@corinthia.apache.org by Peter Kelly <pm...@apache.org> on 2015/03/11 01:25:47 UTC

Re: [Document Model] Initial questions about web-based application

> On 8 Mar 2015, at 10:20 pm, Franz de Copenhague <fr...@outlook.com> wrote:
>
> I agree that HTML5 is a good model to feed into the editing library to support the edition of paragraphs, lists, text, tables, images. But what about sections, headers, footers, fields (author, date, etc ), styles and themes? All of them are document features implemented either docx or odt and so far they are not supported by DocFormat API.

So this is where things get tricky :)

HTML5 does not directly support all the features of OOXML/ODF word processing documents, so we need to figure out whether or not we are going to support these features and, if so, how. With UX Write I’ve always taken the position that it was never intended to be a complete replacement for Word/OO and that it was subject to the inherent limitations of HTML (e.g. no page breaks, tabs, headers/footers etc). But I got a *lot* of complains about the lack of those features, which meant a difficult situations as those can only be properly be added (at least in terms of doing the layout calculations) by modifying the web layout engine itself - although some of them can be “faked” using javascript. But I think we can find a way to support most of these.

Sections: (This term is ambiguous unfortunately as it can both mean different parts of a document e.g. “See Section 3.2 for details” and part of the document that has separate page layout settings). We could support these using a <div> with a custom CSS class, e.g. “corinthia-section”, which means that a browser or any other HTML-supporting program will still be able to make sense of the document, only that we will know that class=“corinthia-section” has special semantics that we handle appropriately in both DocFormats and the editor.

There are actually a few instances already where I’ve used custom class names for this purpose - see DocFormats/core/src/common/DFClassNames.h. Currently these use the “uxwrite-“ prefix, which should be changed to “corinthia-“ - this is a fairly easy task for someone to take on if perhaps if they want to start making a contribution since it’s largely just find and replace. When the change occurs we must also update the tests.

For sections, we could alternatively use the <article> tag which is also in HTML5, and thinking about it I’d actually favour this more than a div since then we can avoid relying on a custom class name. There is a <section> element also but this is for sections in the “see section 3.2” sense (i.e. what appears in the table of contents of a report).

Headers and footers: HTML5 actually has <header> and <footer> elements - but, bizarelly, they don’t seem to be intended for the same process as the way we think of them in traditional word processing. However just checking the spec now it seems they’ve made it a little more clearer. Even if browsers won’t necessarily display them properly as such, due to the non-paginated layout model used on the web, it’s at least the closest we can get in terms of how we represent things. We may be able to have the editor use CSS tricks to display the header and footer content at the top and bottom of the screen.

Fields: There’s a few of these that are handled already, though the set is fairly limited. These are:

- Table of contents
- List of figures
- List of tables
- Cross-reference (to a section, figure or table) - can be text only, label + number, caption text, etc.

See DFClassNames.h for the list of these, and also grep through the JS files in Editor/src and the OOXML filter to see how they’re used. I think using custom CSS class names to identify them, and perhaps data- attributes where we need extra information would be appropriate.

Incidentally, once nice thing about how these are handled in the Editor is it updates them automatically, in the same way that a spreadsheet automatically recalculates formulas. Every time you add, remove, or rename a section (<h1> to <h6>), figure, or table (in the case of the latter two, changing content of the caption), the table of contents and all cross-references are updated. This is handled in Outline.js. This also reports changes to the outline structure of these items to callback functions, so the editor can display a “document map” or outline view in the UI.

Styles: Already handled, via CSS. See for example DocFormats/filters/ooxml/src/word/WordStyles.c which is where the translation is done for OOXML Word documents.

For the Editor, the JS code there’s no facilities for manipulating styles directly other than simply getting and settings the CSS text. DocFormats provides a set of classes for representing CSS stylesheets, styles, and property collections, which can be used in native (C/C++/Objective C) code. UX Write uses this API, and the Qt editor can do the same. For the web-based version of an editor, we’ll need to create a similar set of data structures for the Web UI.

Themes: I’m not sure what the best strategy for this is, but I’d say something along the lines of CSS stylesheets that can be reused among different documents would probably be the way to go. This requires a lot of thought and investigation.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)