You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@corinthia.apache.org by Franz de Copenhague <fr...@outlook.com> on 2015/03/08 16:20:07 UTC

[Document Model] RE: Initial questions about web-based application

 
> > Is there any Document model defined? I mean for Document model an
> > abstract model that's defined the structure of the document like
> > sections, paragraphs, lists, text, fields, tables, images and styles.
> 
> The document model is HTML5 - that is, it’s identical to what the browser
> uses.
> 
> This gives us the vast majority of what we need for an editor “for free”, in
> that it is provided by the web browser or embedded web view.
> 
> The editing library consists of JS code that conducts all editing operations
> using W3C DOM APIs. It’s basically the same sort of thing as CKEditor and
> various other similar web-based rich text editors commonly used on
> wikis/content management systems/blogging engines.
> 
> —
> Dr Peter M. Kelly

I agree that HTML5 is a good model to feed into the editing library to support the edition of paragraphs, lists, text, tables, images. But what about sections, headers, footers, fields (author, date, etc ), styles and themes? All of them are document features implemented either docx or odt and so far they are not supported by DocFormat API. 

For example, If the section feature is implemented in the web app and the user creates a section with 3 columns. It can be rendered like this example http://www.w3schools.com/css/tryit.asp?filename=trycss3_column-count, but somehow DocFormat needs to know that it is a section to translate to docx or odt. 

HTML5 is good to serialize from/to DocFormat api, and good to be edited by the editing library but web app will need a model/controller on top of the HTML DOM to support same features. Any thoughts?

-J




Re: Initial questions about web-based application

Posted by jan i <ja...@apache.org>.
On Sunday, March 8, 2015, Franz de Copenhague <fr...@outlook.com>
wrote:

>
>
> > From: franzdecopenhague@outlook.com <javascript:;>
> >
> > > > Is there any Document model defined? I mean for Document model an
> > > > abstract model that's defined the structure of the document like
> > > > sections, paragraphs, lists, text, fields, tables, images and styles.
> > >
> > > The document model is HTML5 - that is, it’s identical to what the
> browser
> > > uses.
> > >
> > > This gives us the vast majority of what we need for an editor “for
> free”, in
> > > that it is provided by the web browser or embedded web view.
> > >
> > > The editing library consists of JS code that conducts all editing
> operations
> > > using W3C DOM APIs. It’s basically the same sort of thing as CKEditor
> and
> > > various other similar web-based rich text editors commonly used on
> > > wikis/content management systems/blogging engines.
> > >
> > > —
> > > Dr Peter M. Kelly
> >
> > I agree that HTML5 is a good model to feed into the editing library to
> support the edition of paragraphs, lists, text, tables, images. But what
> about sections, headers, footers, fields (author, date, etc ), styles and
> themes? All of them are document features implemented either docx or odt
> and so far they are not supported by DocFormat API.
> >
> > For example, If the section feature is implemented in the web app and
> the user creates a section with 3 columns. It can be rendered like this
> example
> http://www.w3schools.com/css/tryit.asp?filename=trycss3_column-count, but
> somehow DocFormat needs to know that it is a section to translate to docx
> or odt.
> >
> > HTML5 is good to serialize from/to DocFormat api, and good to be edited
> by the editing library but web app will need a model/controller on top of
> the HTML DOM to support same features. Any thoughts?
> >
> > -JD
> >
>
> Considering that HTML5 is itself the model, the web-app can decorate the
> HTML tags with HTML data- attributes to serialize the model. (See
> http://www.w3schools.com/tags/att_global_data.asp)

I would prefer the web-app uses the html as docFormats generates it. If the
generated code is not good enough we should consider changing docformats.


> For example, a section with 3 columns can be define as:
>
> <section data-web-app-column-count="3"><p>Lorem ipsum dolor sit amet,
> consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut
> laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam,
> quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex
> ea commodo consequat.</p></section>
>
> Now, I see that this approach can be extended to the HTML generated by
> DocFormats and instead of using the HTML id attribute filled with "wordNN",
> we can use <p data-df-tag-index = "2">. This is an little enhancement that
> breaks the constraint in the id attribute and any web editor application
> will be free to use the id attribute on its own.
>
> <body><p data-df-tag-index="2">Hello World!</p></body>

The idea of the web-app is to be a editor that uses the generated html code.

rgds
jan i

>
>
> JD
>
>
>



-- 
Sent from My iPad, sorry for any misspellings.

RE: [Document Model] RE: Initial questions about web-based application

Posted by Franz de Copenhague <fr...@outlook.com>.

> From: franzdecopenhague@outlook.com
>  
> > > Is there any Document model defined? I mean for Document model an
> > > abstract model that's defined the structure of the document like
> > > sections, paragraphs, lists, text, fields, tables, images and styles.
> > 
> > The document model is HTML5 - that is, it’s identical to what the browser
> > uses.
> > 
> > This gives us the vast majority of what we need for an editor “for free”, in
> > that it is provided by the web browser or embedded web view.
> > 
> > The editing library consists of JS code that conducts all editing operations
> > using W3C DOM APIs. It’s basically the same sort of thing as CKEditor and
> > various other similar web-based rich text editors commonly used on
> > wikis/content management systems/blogging engines.
> > 
> > —
> > Dr Peter M. Kelly
> 
> I agree that HTML5 is a good model to feed into the editing library to support the edition of paragraphs, lists, text, tables, images. But what about sections, headers, footers, fields (author, date, etc ), styles and themes? All of them are document features implemented either docx or odt and so far they are not supported by DocFormat API. 
> 
> For example, If the section feature is implemented in the web app and the user creates a section with 3 columns. It can be rendered like this example http://www.w3schools.com/css/tryit.asp?filename=trycss3_column-count, but somehow DocFormat needs to know that it is a section to translate to docx or odt. 
> 
> HTML5 is good to serialize from/to DocFormat api, and good to be edited by the editing library but web app will need a model/controller on top of the HTML DOM to support same features. Any thoughts?
> 
> -JD
> 

Considering that HTML5 is itself the model, the web-app can decorate the HTML tags with HTML data- attributes to serialize the model. (See http://www.w3schools.com/tags/att_global_data.asp)

For example, a section with 3 columns can be define as:

<section data-web-app-column-count="3"><p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.</p></section>

Now, I see that this approach can be extended to the HTML generated by DocFormats and instead of using the HTML id attribute filled with "wordNN", we can use <p data-df-tag-index = "2">. This is an little enhancement that breaks the constraint in the id attribute and any web editor application will be free to use the id attribute on its own.

<body><p data-df-tag-index="2">Hello World!</p></body>


JD 


 		 	   		  

Re: [Document Model] Initial questions about web-based application

Posted by Peter Kelly <pm...@apache.org>.
> On 8 Mar 2015, at 10:20 pm, Franz de Copenhague <fr...@outlook.com> wrote:
> 
> I agree that HTML5 is a good model to feed into the editing library to support the edition of paragraphs, lists, text, tables, images. But what about sections, headers, footers, fields (author, date, etc ), styles and themes? All of them are document features implemented either docx or odt and so far they are not supported by DocFormat API. 

So this is where things get tricky :)

HTML5 does not directly support all the features of OOXML/ODF word processing documents, so we need to figure out whether or not we are going to support these features and, if so, how. With UX Write I’ve always taken the position that it was never intended to be a complete replacement for Word/OO and that it was subject to the inherent limitations of HTML (e.g. no page breaks, tabs, headers/footers etc). But I got a *lot* of complains about the lack of those features, which meant a difficult situations as those can only be properly be added (at least in terms of doing the layout calculations) by modifying the web layout engine itself - although some of them can be “faked” using javascript. But I think we can find a way to support most of these.

Sections: (This term is ambiguous unfortunately as it can both mean different parts of a document e.g. “See Section 3.2 for details” and part of the document that has separate page layout settings). We could support these using a <div> with a custom CSS class, e.g. “corinthia-section”, which means that a browser or any other HTML-supporting program will still be able to make sense of the document, only that we will know that class=“corinthia-section” has special semantics that we handle appropriately in both DocFormats and the editor.

There are actually a few instances already where I’ve used custom class names for this purpose - see DocFormats/core/src/common/DFClassNames.h. Currently these use the “uxwrite-“ prefix, which should be changed to “corinthia-“ - this is a fairly easy task for someone to take on if perhaps if they want to start making a contribution since it’s largely just find and replace. When the change occurs we must also update the tests.

For sections, we could alternatively use the <article> tag which is also in HTML5, and thinking about it I’d actually favour this more than a div since then we can avoid relying on a custom class name. There is a <section> element also but this is for sections in the “see section 3.2” sense (i.e. what appears in the table of contents of a report).

Headers and footers: HTML5 actually has <header> and <footer> elements - but, bizarelly, they don’t seem to be intended for the same process as the way we think of them in traditional word processing. However just checking the spec now it seems they’ve made it a little more clearer. Even if browsers won’t necessarily display them properly as such, due to the non-paginated layout model used on the web, it’s at least the closest we can get in terms of how we represent things. We may be able to have the editor use CSS tricks to display the header and footer content at the top and bottom of the screen.

Fields: There’s a few of these that are handled already, though the set is fairly limited. These are:

- Table of contents
- List of figures
- List of tables
- Cross-reference (to a section, figure or table) - can be text only, label + number, caption text, etc.

See DFClassNames.h for the list of these, and also grep through the JS files in Editor/src and the OOXML filter to see how they’re used. I think using custom CSS class names to identify them, and perhaps data- attributes where we need extra information would be appropriate.

Incidentally, once nice thing about how these are handled in the Editor is it updates them automatically, in the same way that a spreadsheet automatically recalculates formulas. Every time you add, remove, or rename a section (<h1> to <h6>), figure, or table (in the case of the latter two, changing content of the caption), the table of contents and all cross-references are updated. This is handled in Outline.js. This also reports changes to the outline structure of these items to callback functions, so the editor can display a “document map” or outline view in the UI.

Styles: Already handled, via CSS. See for example DocFormats/filters/ooxml/src/word/WordStyles.c which is where the translation is done for OOXML Word documents.

For the Editor, the JS code there’s no facilities for manipulating styles directly other than simply getting and settings the CSS text. DocFormats provides a set of classes for representing CSS stylesheets, styles, and property collections, which can be used in native (C/C++/Objective C) code. UX Write uses this API, and the Qt editor can do the same. For the web-based version of an editor, we’ll need to create a similar set of data structures for the Web UI.

Themes: I’m not sure what the best strategy for this is, but I’d say something along the lines of CSS stylesheets that can be reused among different documents would probably be the way to go. This requires a lot of thought and investigation.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)