You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openoffice.apache.org by Rory O'Farrell <of...@iol.ie> on 2014/08/01 09:42:37 UTC

OOXML

For information:
http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/

-- 
Rory O'Farrell <of...@iol.ie>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

In a later note, Jan asks about my statement concerning digital signatures, private content, and covert content:

  "In the other mail you write a quite interesting note about 
   digital signing of artifact the user cannot see. Do you 
   happen to know how microsoft goes around that with the web 
   based offerings ?

Digital signatures officially entered ODF with the ODF 1.2 specification, although there was an implementation of that capability in versions of OpenOffice.org that extended their ODF 1.0/1.1 support to provide digital signatures.  (The ODF 1.2 version is incompatible and that created some interesting interoperability issues until the implementations sorted it out.)

With regard to Microsoft Office.  Microsoft supports the ODF 1.2 digital signature in their support for ODF in Microsoft Office 2013.  Since Microsoft is careful about what is signed and whether the user knows what is being signed (in terms of what is visible to users), there is no problem.

On receiving digitally signed ODF 1.2 documents, Microsoft verifies those signatures as provided.  Any editing will break the signature (as is true for all Consumers) and if the result is signed, there will be no unsupported features or private/covert content left, so all is well.

I am not certain how this applies to the Office Web Applications.  It appears that the Web Applications notice that a document is signed (whether they check it or not I have not tested) but provide no way to sign a document that is edited in one of the Web Applications.  


PS: Here is what I did.

I downloaded an OpenOffice Calc (.ods) file that I already had in OneDrive, saved it under a new name, and signed it using LibreOffice.  I put that back up on OneDrive.  Now, when I open the .ods, I am warned that there may be features lost because editing is with the on-line Excel application.  The Excel Online Help reports that an existing digital signature will be lost if any attempt to edit is performed.

When I edited the document anyhow, there was no way to sign it on saving it back to OneDrive.  It appears that I have to open it either in AOO or LibO or Excel on the desktop and sign it there.  That's easy to do on Windows 8 because I have a OneDrive virtual folder on my desktop.  (By the way, the making of a copy of the Calc file before editing in the Web Application is no longer automatic.  I can edit the Calc document directly, but there is a warning about it.  The warning links to details of what can be lost when Excel edits the Calc document.  That includes loss of the digital signature.)

I just uploaded a signed Microsoft Word 2013 document.  When I opened it in the Web Application to edit it, I was warned that editing would invalidate the signature.  After editing, I could find no way using the Web Application to sign the document.  I would have to open it in the desktop application in order to do that.


-----Original Message-----
From: Dennis E. Hamilton [mailto:dennis.hamilton@acm.org] 
Sent: Saturday, August 2, 2014 13:05
To: dev@openoffice.apache.org
Subject: RE: OOXML

[ ... ]
There are some tricky cases, including

- Changes that overlap/conflict with tracked changes but tracked changes are not updated/preserved properly
- Accessibility impacts
- Digital signature applying to content not observable by the signer
- Covert content of various kinds
- breaking of RDF/RDA connections into the document (along with failure to preserve markers correctly)

The digital signature and covert-content avoidance cases work against preserving material that is not evident in a given application.  In the case of ODF, the damage to tracked changes is survivable (with some loss), because the ODF approach is resilient.  But not knowing about the tracked changes gets into the digital signature problem if the material is preserved while not being visible to the user.

[ ... ]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Change tracking & versioning (was Re: OOXML)

Posted by Peter Kelly <ke...@gmail.com>.

On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton <de...@acm.org> wrote:

> In line with the sketch that Peter Kelley provides below, I am personally very sympathetic to the idea of having an internal model that can tolerate difference in format between input and output while preserving in the output everything from the input format it can, even by leaving markers that will be useful on future input of the produced form.  (There is a well-known case of Microsoft Office doing this for HTML it exports, although the added information for recovery of the MSO rendition led to many complaints about document bloat.)
> 
> There are some conflicts between the desire to do this and the fact that some alterations have non-local consequences and may have other effects.  I still support the idea, but there are some tricky cases, including
> 
> - Changes that overlap/conflict with tracked changes but tracked changes are not updated/preserved properly

I'm probably getting a bit off-topic here, but this issue is one of the reasons I advocate an approach that keeps change tracking information separate from the content itself, rather than part of it. In my mind, Git provides the perfect model for this, although integrating it (or something else based on a similar model) into a word processor or office suite remains, shall we say, a rather significant problem to solve, both in the sense of the theoretical model and how that would be exposed in a user interface.

By itself, keeping the change information separate wouldn't solve the problem of inconsistency when the file is modified by an implementation with no knowledge of change tracking information. However, with a data model based on that of a version control system, that is able to access the previous version of the file as well as the current one, find the differences between the two, and allow the user to apply those differences, this could be addressed.

Let's say, just as a mental exercise, that we were to embed a git repository directly within an ODF file. That is, the .odt file is a zip archive containing the usual content.xml, styles.xml etc and also has a .git directory inside it, which contains the complete revision history of all these separate files. When you save the document in an implementation that does not support any change tracking/versioning, it would just overwrite the XML files in the same way as a text editor writes a file to disk. When you save the document in an implementation that *does* support this however, it overwrites the files and *then* does a git commit.

With this approach, if you were to first create a file in implementation A which supports this versioning, you'd have a zip file with a git repository and one or more commits, and the "working copy" (that is, all the files within the zip archive outside of the .git directory) would be "clean" (up to date). If you then open and save it in implementation B which does not support versioning, it would not touch the repository and leave the .git directory in the zip file untouched, but instead save over the XML files. Then you open it in implementation A again, and you can see that the working directory is not clean, and there are outstanding changes. These could then be displayed in the editor in the same way as is done currently, without the user noticing any difference. And you'd have the benefits of knowing the derivation relationships between versions, so if you get two different versions of a document back that have the same ancestor, you could do a merge.

Now I'm not suggesting that actually storing a git repository inside a .odt archive would be a good way to go - partly for efficiency reasons (duplication of document's entire history in every copy), and partly because its format is pure binary, and is so vastly different from everything else in ODF. Nonetheless, at a theoretical level, the core idea - of storing a version history separate from the content, from which changes can automatically be detected without requiring any extensions to the core part of the standard itself - would I think be worth exploring.

I know this is quite a different approach to what you've previously been considering; what are your thoughts?

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton <de...@acm.org> wrote:

> In line with the sketch that Peter Kelley provides below, I am personally very sympathetic to the idea of having an internal model that can tolerate difference in format between input and output while preserving in the output everything from the input format it can, even by leaving markers that will be useful on future input of the produced form.  (There is a well-known case of Microsoft Office doing this for HTML it exports, although the added information for recovery of the MSO rendition led to many complaints about document bloat.)

On a semi-related note, there's once quite fascinating implementation of ODF I've seen called WebODF (see http://webodf.org; the code is open source). This is an in-browser editor, and actually works by loading the content.xml file from the ODF package into the DOM tree of the browser, thus having it contained within the HTML content of the page. Through clever use of CSS namespaces, it's able to achieve a pretty faithful rendering of the document using the browser's built-in layout engine, even though the content itself is not in HTML.

From what I understand about their approach, the reason they did this I believe is as a way to ensure that the XML structure of the ODF file is preserved exactly, which is much more difficult to achieve if the content is converted into HTML first (as in my implementation). Web browsers are actually very good at handling content in this way, since you can just use the CSS property setting "display: none" to hide any elements that shouldn't be rendered on screen, and this CSS can be kept entirely separate (or even dynamically generated by javascript) and not part of the XML content itself. So WebODF takes advantage of the fact that a web browser will just preserve information by default, and it works quite well.

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

In line with the sketch that Peter Kelley provides below, I am personally very sympathetic to the idea of having an internal model that can tolerate difference in format between input and output while preserving in the output everything from the input format it can, even by leaving markers that will be useful on future input of the produced form.  (There is a well-known case of Microsoft Office doing this for HTML it exports, although the added information for recovery of the MSO rendition led to many complaints about document bloat.)

There are some conflicts between the desire to do this and the fact that some alterations have non-local consequences and may have other effects.  I still support the idea, but there are some tricky cases, including

- Changes that overlap/conflict with tracked changes but tracked changes are not updated/preserved properly
- Accessibility impacts
- Digital signature applying to content not observable by the signer
- Covert content of various kinds
- breaking of RDF/RDA connections into the document (along with failure to preserve markers correctly)

The digital signature and covert-content avoidance cases work against preserving material that is not evident in a given application.  In the case of ODF, the damage to tracked changes is survivable (with some loss), because the ODF approach is resilient.  But not knowing about the tracked changes gets into the digital signature problem if the material is preserved while not being visible to the user.

There is also a case around confusion between two consumers having to do with how image renditions in ODF are negotiated, with the consumer presenting the best that it recognizes that is not necessarily the preferable best that the producer listed in the choices it offered in the document.  This raises Digital signature considerations as well.

I don’t think this should stop the kind of exploration Peter Kelly is embarked upon.  At some point, these considerations will surface and it will be interesting to see what a creative accommodation might be.

It's not clear to me that the openoffice.org descendants can do much about format ecumenicalism very quickly, if at all, so I have probably gotten pretty off-topic at this point.


 -- Dennis E. Hamilton
    dennis.hamilton@acm.org    +1-206-779-9430
    https://keybase.io/orcmid  PGP F96E 89FF D456 628A
    X.509 certs used and requested for signed e-mail



From: Peter Kelly [mailto:kellypmk@gmail.com] 
Sent: Saturday, August 2, 2014 09:43
To: dev@openoffice.apache.org
Subject: Re: OOXML

On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <jz...@oooes.org> wrote:


The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML documents and 2) Do so while preserving all elements, including unsupported features and Microsoft-only data as being the #1 limitation to OpenOffice today. The fact is, OOXML is in practice extremely widely used (vastly more so than ODF) and I argue that if OpenOffice is to have any relevance going forward it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just about importing files but enabling a period of a number of years during which an organisation can effectively work with a mixture of OOXML and ODF documents. This allows the transition to be done incrementally - a company with 30,000 employees will only migrate if there's a way they can do so bit-by-bit, with some departments sticking with OOXML for longer than others. Because there will be people in different departments that need to work together, those who insist on remaining with OOXML for the time being must be able to collaborate in both directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary formats to OOXML - Office 2007 (and all later versions) still support the older formats, for both read and write, and I expect they will continue for some time. If Office 2007 had completely dropped support for saving .doc, .xls, and .ppt, it would have been dead-on-arrival, as it took several years before most people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting these formats. There is already an import filter which sort-of works (though I had to direct a customer to LibreOffice the other day as they were having trouble opening a perfectly-valid .docx using OpenOffice). This could be left in place, with fixes where necessary, and a new export filter written for saving. The problem with this however is that import/export is inherently a lossy process; if there is any information within a document that is not supported by OO or the filters, then it will be lost after an open/save. This information could also include proprietary extension data that is supported by Office which there is no way to interpret since its format is not published (macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], which ensures updates happen in a non-destructive manner. When you open a .docx file in UX Write, it converts it into HTML, and keeps track of information that it allows it to map each HTML element back to the original XML element in the .docx file from which it was generated. When you save the file, instead of overwriting it with a new version, it *updates* the existing version by figuring out what changes have occurred in the HTML document, and applying those changes to the original .docx file. This way, only the parts that the user has actually modified are touched; anything UX Write doesn't know about (e.g. embedded spreadsheets) is left untouched. I'm planning to use the same design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, specifically the WordProcessingML part of it) in an incremental fashion. First there was only support for editing text; then came basic formatting, then lists, tables, styles etc. Even today, my implementation doesn't have support for the complete feature set, but it is nonetheless able to "walk lightly" in editing the document, by not touching anything that isn't supported. Coming back to the migration path I mentioned above, whereby there is a need to be able to interoperate with people using OOXML for some period of time, assuming they're eventually lead towards using only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how best to tackle it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great introduction to what bidirectional transformation is and how it works. There's been a ton of research been done on this in the past, and I think it's ideal for dealing with different document formats, particularly when a given app has treats a particular format as "native" (HTML in the case of UX Write, ODF in the case of OpenOffice). With this approach, we could bypass an entire class of compatibility problems where people complain of losing formatting or other information from their documents (and blame it on OpenOffice, telling their collaborators to use Microsoft Office instead).

[1] http://www.cis.upenn.edu/~bcpierce/papers/icmt-2009-slides.pdf

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <jz...@oooes.org> wrote:

> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
> 
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
> 
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML documents and 2) Do so while preserving all elements, including unsupported features and Microsoft-only data as being the #1 limitation to OpenOffice today. The fact is, OOXML is in practice extremely widely used (vastly more so than ODF) and I argue that if OpenOffice is to have any relevance going forward it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just about importing files but enabling a period of a number of years during which an organisation can effectively work with a mixture of OOXML and ODF documents. This allows the transition to be done incrementally - a company with 30,000 employees will only migrate if there's a way they can do so bit-by-bit, with some departments sticking with OOXML for longer than others. Because there will be people in different departments that need to work together, those who insist on remaining with OOXML for the time being must be able to collaborate in both directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary formats to OOXML - Office 2007 (and all later versions) still support the older formats, for both read and write, and I expect they will continue for some time. If Office 2007 had completely dropped support for saving .doc, .xls, and .ppt, it would have been dead-on-arrival, as it took several years before most people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting these formats. There is already an import filter which sort-of works (though I had to direct a customer to LibreOffice the other day as they were having trouble opening a perfectly-valid .docx using OpenOffice). This could be left in place, with fixes where necessary, and a new export filter written for saving. The problem with this however is that import/export is inherently a lossy process; if there is any information within a document that is not supported by OO or the filters, then it will be lost after an open/save. This information could also include proprietary extension data that is supported by Office which there is no way to interpret since its format is not published (macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], which ensures updates happen in a non-destructive manner. When you open a .docx file in UX Write, it converts it into HTML, and keeps track of information that it allows it to map each HTML element back to the original XML element in the .docx file from which it was generated. When you save the file, instead of overwriting it with a new version, it *updates* the existing version by figuring out what changes have occurred in the HTML document, and applying those changes to the original .docx file. This way, only the parts that the user has actually modified are touched; anything UX Write doesn't know about (e.g. embedded spreadsheets) is left untouched. I'm planning to use the same design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, specifically the WordProcessingML part of it) in an incremental fashion. First there was only support for editing text; then came basic formatting, then lists, tables, styles etc. Even today, my implementation doesn't have support for the complete feature set, but it is nonetheless able to "walk lightly" in editing the document, by not touching anything that isn't supported. Coming back to the migration path I mentioned above, whereby there is a need to be able to interoperate with people using OOXML for some period of time, assuming they're eventually lead towards using only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how best to tackle it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great introduction to what bidirectional transformation is and how it works. There's been a ton of research been done on this in the past, and I think it's ideal for dealing with different document formats, particularly when a given app has treats a particular format as "native" (HTML in the case of UX Write, ODF in the case of OpenOffice). With this approach, we could bypass an entire class of compatibility problems where people complain of losing formatting or other information from their documents (and blame it on OpenOffice, telling their collaborators to use Microsoft Office instead).

[1] http://www.cis.upenn.edu/~bcpierce/papers/icmt-2009-slides.pdf

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: OOXML

Posted by Guy Waterval <wa...@gmail.com>.

+1
-- 
gw


2014-08-03 1:47 GMT+02:00 Andrew Douglas Pitonyak <an...@pitonyak.org>:

>
> I am often required to read and write DOCX files and I know others for
> which this is a need. If I cannot accurately read / write DOCX files (or if
> I suspect that it may not work correctly) then I use Word; I don't like it
> when I have to use Word.
>
>
> On 08/02/2014 10:24 AM, Alexandro Colorado wrote:
>
>> The Support that is done is to receieve OOXML not to produce them, the
>> discussion issue would be to support legacy formats like .doc or .xls.
>>
>> I still dont see a point to generate OOXML and most people dont care
>> as long as they can send in office native formats.
>>
>> I never heard someone saying, please send it on docx, your doc is a
>> closed binary format.
>>
>> On 8/2/14, Peter Kelly <ke...@gmail.com> wrote:
>>
>>> On 1 Aug 2014, at 2:42 pm, Rory O'Farrell <of...@iol.ie> wrote:
>>>
>>>  For information:
>>>> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/
>>>>
>>> An interesting article. This brings to mind a few issues I've been
>>> thinking
>>> about for a while:
>>>
>>> - I think the rather extreme anti-OOXML stance that some take can be
>>> counterproductive. I certainly hold the view that ODF is a superior
>>> standard
>>> in many respects (though not all), however there are circumstances where
>>> it
>>> makes sense for a given piece of software to support both. For example
>>> they
>>> cite the lack of support for ODF in Google Docs and iWork; if one wants
>>> to
>>> develop software that will interoperate with these would require OOXML
>>> support.
>>>
>>> My take on the issue is that it's important to support both, because as
>>> much
>>> as we might dislike the fact, OOXML is out there and used very widely.
>>> With
>>> the work I'm currently doing on UX Write, I'm adding to the existing
>>> OOXML
>>> (specifically .docx) support with support for for ODF (.odt) and doing
>>> this
>>> in a common framework such that the app itself doesn't care which format
>>> the
>>> file is natively stored in, it will work equally well with both.
>>> Additionally, once I have the ODF support in, it will be possible to
>>> leverage this support for conversion between the two formats in both
>>> directions. I'll be giving a talk on this at ApacheCon EU later this
>>> year,
>>> and yes this framework will soon be open source - if anyone is
>>> interested in
>>> collaborating on it, please let me know.
>>>
>>> - One of the criticisms raised is that there are several different
>>> versions
>>> of OOXML, not all of which are entirely compatible. However this is also
>>> true of ODF (or at least of MS's implementation in Office 2007 and 2010;
>>> I'm
>>> not sure where the fault lies). One of the big questions I've been asking
>>> myself in the work I'm currently on ODF is whether I should have my
>>> implementation it save ODF 1.1 by default, or version 1.2 by default. If
>>> I
>>> choose the former, it will work with Office 2007 and onwards. The latter,
>>> only Office 2013 (I think). For someone such as myself writing a new
>>> implementation of the (prat of) ODF spec, and desiring compatibility with
>>> Office 2007 and 2010, which is the best choice?
>>>
>>> - I consider the use of proprietary fonts to be a separate issue from the
>>> standard itself. The specification is silent on the matter, so this is
>>> really a criticism of MS Office rather than OOXML itself. Nonetheless,
>>> it's
>>> an important one, and one I believe we should address by promoting the
>>> use
>>> of open source fonts (e.g. https://www.google.com/fonts) independently
>>> and
>>> in addition to the use of ODF. Perhaps these could be made available as
>>> an
>>> easily-distributed separate package, so that those who want to stick
>>> with MS
>>> Office for whatever reason could be encouraged to install & use them, for
>>> improved interoperability with other office suites?
>>>
>>> In an organisation where there are some users on MS and others on OO/LO,
>>> these fonts could be deployed by the IT department as part of the
>>> standard
>>> desktop image, and all templates created by the organisations could use
>>> these fonts by default, which would lead to wider usage.
>>>
>>> - Towards the end of the article, there's a discussion about the lack of
>>> support for ODF by some vendors, particularly Google and Apple. The
>>> question
>>> then is how do we fix that? My view is that there needs to be a migration
>>> path - and by that I mean not just a tool to convert documents from
>>> OOMXL to
>>> ODF, but the ability to go both ways, and work with either format for as
>>> long as necessary for the migration to complete. Most (all?) successful
>>> transitions I've seen have used a similar approach - Microsoft going from
>>> DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic
>>> ->
>>> OS X, and so forth.
>>>
>>> In the case of document formats, for a country whose government currently
>>> uses MS Office and OOXML that wants to make the switch to ODF and
>>> OpenOffice/LibreOffice/other tools, it's not going to be an overnight
>>> change. It could very well take several years, and during that period
>>> everyone in the organisation will need to have the capability to work
>>> with
>>> both formats. New or modified documents would in general be saved in ODF,
>>> but older documents as well as documents that need to be exchanged with
>>> people running MS Office 2007 or 2010 (which I think don't support ODF
>>> 1.2)
>>> would need to be in OOXML, until such time as everyone has upgraded to a
>>> fully-conformant version of MS Office, or switched to OpenOffice et al.
>>>
>>> --
>>> Dr. Peter M. Kelly
>>> Founder, UX Productivity
>>> peter@uxproductivity.com
>>> http://www.uxproductivity.com/
>>> http://www.kellypmk.net/
>>>
>>> PGP key: http://www.kellypmk.net/pgp-key
>>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>>>
>>>
>>>
>>
> --
> Andrew Pitonyak
> My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
> Info:  http://www.pitonyak.org/oo.php
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

Re: OOXML

Posted by Andrew Douglas Pitonyak <an...@pitonyak.org>.

I am often required to read and write DOCX files and I know others for 
which this is a need. If I cannot accurately read / write DOCX files (or 
if I suspect that it may not work correctly) then I use Word; I don't 
like it when I have to use Word.

On 08/02/2014 10:24 AM, Alexandro Colorado wrote:
> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
>
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
>
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.
>
> On 8/2/14, Peter Kelly <ke...@gmail.com> wrote:
>> On 1 Aug 2014, at 2:42 pm, Rory O'Farrell <of...@iol.ie> wrote:
>>
>>> For information:
>>> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/
>> An interesting article. This brings to mind a few issues I've been thinking
>> about for a while:
>>
>> - I think the rather extreme anti-OOXML stance that some take can be
>> counterproductive. I certainly hold the view that ODF is a superior standard
>> in many respects (though not all), however there are circumstances where it
>> makes sense for a given piece of software to support both. For example they
>> cite the lack of support for ODF in Google Docs and iWork; if one wants to
>> develop software that will interoperate with these would require OOXML
>> support.
>>
>> My take on the issue is that it's important to support both, because as much
>> as we might dislike the fact, OOXML is out there and used very widely. With
>> the work I'm currently doing on UX Write, I'm adding to the existing OOXML
>> (specifically .docx) support with support for for ODF (.odt) and doing this
>> in a common framework such that the app itself doesn't care which format the
>> file is natively stored in, it will work equally well with both.
>> Additionally, once I have the ODF support in, it will be possible to
>> leverage this support for conversion between the two formats in both
>> directions. I'll be giving a talk on this at ApacheCon EU later this year,
>> and yes this framework will soon be open source - if anyone is interested in
>> collaborating on it, please let me know.
>>
>> - One of the criticisms raised is that there are several different versions
>> of OOXML, not all of which are entirely compatible. However this is also
>> true of ODF (or at least of MS's implementation in Office 2007 and 2010; I'm
>> not sure where the fault lies). One of the big questions I've been asking
>> myself in the work I'm currently on ODF is whether I should have my
>> implementation it save ODF 1.1 by default, or version 1.2 by default. If I
>> choose the former, it will work with Office 2007 and onwards. The latter,
>> only Office 2013 (I think). For someone such as myself writing a new
>> implementation of the (prat of) ODF spec, and desiring compatibility with
>> Office 2007 and 2010, which is the best choice?
>>
>> - I consider the use of proprietary fonts to be a separate issue from the
>> standard itself. The specification is silent on the matter, so this is
>> really a criticism of MS Office rather than OOXML itself. Nonetheless, it's
>> an important one, and one I believe we should address by promoting the use
>> of open source fonts (e.g. https://www.google.com/fonts) independently and
>> in addition to the use of ODF. Perhaps these could be made available as an
>> easily-distributed separate package, so that those who want to stick with MS
>> Office for whatever reason could be encouraged to install & use them, for
>> improved interoperability with other office suites?
>>
>> In an organisation where there are some users on MS and others on OO/LO,
>> these fonts could be deployed by the IT department as part of the standard
>> desktop image, and all templates created by the organisations could use
>> these fonts by default, which would lead to wider usage.
>>
>> - Towards the end of the article, there's a discussion about the lack of
>> support for ODF by some vendors, particularly Google and Apple. The question
>> then is how do we fix that? My view is that there needs to be a migration
>> path - and by that I mean not just a tool to convert documents from OOMXL to
>> ODF, but the ability to go both ways, and work with either format for as
>> long as necessary for the migration to complete. Most (all?) successful
>> transitions I've seen have used a similar approach - Microsoft going from
>> DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic ->
>> OS X, and so forth.
>>
>> In the case of document formats, for a country whose government currently
>> uses MS Office and OOXML that wants to make the switch to ODF and
>> OpenOffice/LibreOffice/other tools, it's not going to be an overnight
>> change. It could very well take several years, and during that period
>> everyone in the organisation will need to have the capability to work with
>> both formats. New or modified documents would in general be saved in ODF,
>> but older documents as well as documents that need to be exchanged with
>> people running MS Office 2007 or 2010 (which I think don't support ODF 1.2)
>> would need to be in OOXML, until such time as everyone has upgraded to a
>> fully-conformant version of MS Office, or switched to OpenOffice et al.
>>
>> --
>> Dr. Peter M. Kelly
>> Founder, UX Productivity
>> peter@uxproductivity.com
>> http://www.uxproductivity.com/
>> http://www.kellypmk.net/
>>
>> PGP key: http://www.kellypmk.net/pgp-key
>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>>
>>
>

-- 
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
Info:  http://www.pitonyak.org/oo.php


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

It is important to understand that an XML DOM does not capture all of the constraints and referential requirements within an ODF document.  In particular, content.xml does not have everything and there are references using XLink (relative hrefs) and also special identifiers (not IDREFs) to other files, whether for binary attachments or into other defined parts (styles.xml and meta.xml for two).

There is also considerable internal structuring that is off-hierachy.  Some of the connections are via fragment IDs (xml:id) and IDREFs, others are by identifiers (not IDs and IDREFs) that are introduced in the ODF specification but which are not modelled in the Relax NG Schema (beyond saying they have string values, for example).

This sort of thing also happens rather heavily in OOXML, where communication among parts uses a unique cross-part relationship model.  There are also many cross references to named components by other than XML IDs and IDREFs, whether or not the components and the references occur in the same part of the OPC package.

One could continue the kind of hack that plants that information as benign markers into an internal form of the XML parts (even as a single XML document, although that is tricky when ODF documents are nested as subdocuments of another), so long as they are replaced when the XML document is committed to a saved ODF document file format.

In terms of having a DOM that maps to the external file form and a different internal model, the only time that the internal model needs to update the externally-oriented DOM is as part of a Save operation.  There might be more coupling, but performance and storage issues will doubtless impact the engineering outcome, especially for handling large documents with alacrity.  Copy and paste and undo management will also be factors, along with maintaining pagination, word counts, and such.

On the other hand, it is convenient (practically necessary) to specify the semantics of ODF, or some profile of ODF, as if operations are on the format itself, since it is only the format that is more-or-less well-specified.  It would be interesting to know how much this could be taken literally in an application.  I think there might be forensic tools on ODF documents that might be able to operate that way.  I'm not at all certain about production WYSIWYG consumers and producers, especially ones implemented to harmonize between OOXML, ODF and other interesting formats (EPUB coming to mind).

I will watch Peter Kelly's efforts with great interest to see how much the boundaries can be moved in this area.


 -- Dennis E. Hamilton
    dennis.hamilton@acm.org    +1-206-779-9430
    https://keybase.io/orcmid  PGP F96E 89FF D456 628A
    X.509 certs used and requested for signed e-mail


 ----- Original Message ---
From: Peter Kelly [mailto:kellypmk@gmail.com] 
Sent: Monday, August 4, 2014 01:27
To: dev@openoffice.apache.org
Subject: Re: OOXML

On 4 Aug 2014, at 12:16 am, jan i <ja...@apache.org> wrote:


[ ... ]

It's possible in theory, though I'm not familiar enough with the OO codebase to say whether it would work in practice.

The key idea is to maintain two separate data structures - one which is the ODF XML trees, and another which is the internal representation. Any time a change gets made to the former, the implementation must update the latter to reflect the change. Modification operations on the latter would need to go in the other direction.

[ ... ]

In the case of UX Write, there's a few instances where I've used custom extensions to handle certain things. The main ones are:

1. Table of contents/list of tables/list of figures.

When you insert one of these into your document, it inserts a <nav> element with a CSS class name of "tableofcontents", "listoffigures", or "listoftables", which were chosen as these are the same keywords that LaTeX uses for these features. UX Write treats these as having special meaning, in the sense that when opening a document (and when the document is modified), it updates the content of these <nav> elements based on the set of all heading, figure, or table elements in the document (including numbering/captions).

2. OOXML-specific features.

When converting from .docx to .html during the process of opening a document, it assigns certain pre-defined CSS class names to particular types of HTML elements to indicate their purpose. For example, a cross-reference whose display format is supposed to include both the label and caption of a figure will be translated as:

[ ... ]



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 4 Aug 2014, at 12:16 am, jan i <ja...@apache.org> wrote:

> By painfull experience, I found out that our internal (memory) structure is
> a superset of mixed ODF and pre-odf items. I dont think you can have a pure
> odf/OOXML memory structure, you need internal pointers as well (like
> start/finish of copy buffer)...but of course those 2 parts should have been
> well separated.

It's possible in theory, though I'm not familiar enough with the OO codebase to say whether it would work in practice.

The key idea is to maintain two separate data structures - one which is the ODF XML trees, and another which is the internal representation. Any time a change gets made to the former, the implementation must update the latter to reflect the change. Modification operations on the latter would need to go in the other direction.

This is how WebKit works (well, at least how it worked last time I touched the code, which was more than 10 years ago...). There is the DOM tree and the rendering tree. The DOM tree stores the HTML structure exactly as it was parsed from the original file; this is accessible to javascript code and can be modified in arbitrary ways. Whenever the DOM tree changes, WebKit updates its rendering tree, based both on the DOM tree and applicable rules from the CSS stylesheet. The rendering tree is the internal model which is used for displaying the content on screen.

Importantly, the DOM tree is also allowed to contain arbitrary XML elements in any namespace. This is how WebODF works; it includes the content.xml from the package directly, and that's the "authoritative" data structure that is manipulated during editing. The CSS rules WebODF uses control rendering of the content.

> I wonder, you wrote earlier that UXwrite uses html internally, that seems
> for me as the lowest common nominator...I would have thought a real
> superset would have been the better choise ?

Well a convenient thing about HTML is that you can include your extensions without affecting the rendered output, or risking loss of the data. This includes custom elements, custom attributes, and CSS style names that you may choose to assign special meaning to.

The reasons for this are largely due to the way in which HTML has historically evolved... browsers deliberately allow the presence of "invalid" elements they don't know about, to cater for future versions of the spec which add new elements. The idea is "graceful degradation", such that if you try to view a site that uses some new HTML features your browser doesn't support, it should at least in theory still let you see most of the content, just that you won't be able to use the new features. Depending on the HTML/CSS design, this works better in practice on some sites than on others. Then of course there's JavaScript APIs which can cause compatibility issues, though that's a separate topic, and the browser will usually at least display the content even if it can't do dynamic stuff because the JS code threw an exception.

In the case of UX Write, there's a few instances where I've used custom extensions to handle certain things. The main ones are:

1. Table of contents/list of tables/list of figures.

When you insert one of these into your document, it inserts a <nav> element with a CSS class name of "tableofcontents", "listoffigures", or "listoftables", which were chosen as these are the same keywords that LaTeX uses for these features. UX Write treats these as having special meaning, in the sense that when opening a document (and when the document is modified), it updates the content of these <nav> elements based on the set of all heading, figure, or table elements in the document (including numbering/captions).

2. OOXML-specific features.

When converting from .docx to .html during the process of opening a document, it assigns certain pre-defined CSS class names to particular types of HTML elements to indicate their purpose. For example, a cross-reference whose display format is supposed to include both the label and caption of a figure will be translated as:

<a href="#idN" class="uxwrite-ref-label-num">...</a>

where N is the id of the target. The editing code knows about these class names and uses them to update the text inside the <a> element if the figure number or caption changes. Similarly, where there is an unsupported object, like an embedded spreadsheet, it will translate this as:

<span class="uxwrite-placeholder">[Unsupported object]</span>.

During editing, WebKit preserves these, since they're just CSS class names and don't in any way cause problems with the HTML or rendering. All of the core editing operations are implemented in javascript, and these take the class names into account where appropriate.

3. Element mappings for bidirectional transformation.

For every HTML element that is generated from an OOXML element, it sets the id attribute to a string of the form bdt(N)-(M), where N is a randomly-generated number for each editing session, and M is the sequence number of the element in the OOXML tree. The purpose of the randomly-generated N value is to ensure that there aren't mixups for BDT updates if that HTML content gets copied & pasted into another document within UX Write itself. The number used for the M value is the position of the element in a pre-order traversal of the XML tree of document.xml. In cases where the element corresponds to an XML file in the package that is *not* the main content (currently only for the case of footnotes and endnotes), it is prefixed with a string identifying the file, so it can be properly identified.

When a document is saved, and the BDT update process takes place, it uses these to re-establish the relationship between elements in the HTML file and elements in the OOXML content tree, and figure out where changes have taken place. Given this mapping, it is able to update the OOXML file based on content from the HTML file.

This is all fully conformant with the HTML spec, as it allows you to choose whatever values you want for id attributes. And the editor neither knows nor cares whether the file it's working with was stored as .html or .docx; what happens on save is entirely separate from what happens during editing. In the case of HTML, the file is just saved directly, and in the case of .docx, the BDT process described above occurs. I'll be using exactly this same approach for supporting .odt files.

4. Extra elements to indicate selection

The iOS version of WebKit has a broken selection API (or at least did at the time I began writing the app, which was in the days of iOS 5), so I had to "fake" selections by creating my own <div> and <span> elements with the light-blue background colour. These are just regular HTML elements with CSS styling - nothing special about them. The editor keeps track of which elements in the document are used for faking selections, and these are removed before save; it's a runtime thing only.

In addition to all of the above, there are additional data structures maintained by the javascript code for information that isn't possible to represent (or doesn't make sense to represent) in the HTML structure itself. This includes a list of undo/redo operations, event listeners for changes to elements that would affect the table of contents/cross-references, an abstract tree representing the document outline, and so forth. These are all javascript objects; but they are separate from the DOM tree, and as far as opening & saving a file is concerned, have no effect on that. The HTML DOM remains the core data structure used, and WebKit preserves all the information needed.

> Some parts of AOO uses the structure directly others go through the API,
> that is not very clean, and makes it extremly difficult to test chaanges in
> the internal memory layout. An application like this (and many other
> similar types), should see the memory as a capsule, with a fixed API around
> it.

Agreed; I think it's important to maintain a separation between the internal data structures used by the editor and other code (file format loading/saving, automated tests, and plugins), so that the internal structures can be changed without affecting any of these.

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: OOXML

Posted by jan i <ja...@apache.org>.

On 3 August 2014 18:50, Peter Kelly <ke...@gmail.com> wrote:

> On 3 Aug 2014, at 6:52 pm, Regina Henschel <rb...@t-online.de>
> wrote:
>
> Peter Kelly schrieb:
>
> There's two ways to view a format: (1) as a way of encoding information
> for storage or transmission, and (2) as an in-memory data structure used
> by the editor at runtime. In some programs these are two different
> things, and in others they are the same. The latter is true of web
> browsers - HTML is both the file format and the runtime data model; the
> W3C DOM APIs can be used to manipulate the HTML structure directly. I
> believe this was also true to a large extent with the binary formats
> used by older versions of MS Office, for purposes of efficiency [1].
>
> I'm not familiar with the internals of OpenOffice - one thing I'd be
> very interested to know is does it use ODF for it's in-memory
> representation of the document? Or are the runtime data structures used
> different to the XML trees that one finds in an ODF package?
>
>
> No, OpenOffice has a very different in-memory representation than the ODF
> format. And the API is a third version of looking at the document.
>
>
> Interesting.
>
> Given this is the case, what would you suggest would be the best strategy
> for supporting OOXML?
>
> 1) Two-way conversion between OOXML and ODF, with OpenOffice then dealing
> solely with the file as ODF (not even being aware it came from OOXML
> originally)
> 2) Two-way conversion between OOXML and OpenOffice's internal
> representation, bypassing ODF altogether
>
> The second option has the advantage that it would be easier to cater for
> features that are supported in OOXML but not ODF, e.g. table styles.
> However the first option has the advantage that it would keep the core
> entirely separate from the OOXML filter, and could potentially be
> constructed as in a general-purpose manner and made usable as a library by
> other software.
>

By painfull experience, I found out that our internal (memory) structure is
a superset of mixed ODF and pre-odf items. I dont think you can have a pure
odf/OOXML memory structure, you need internal pointers as well (like
start/finish of copy buffer)...but of course those 2 parts should have been
well separated.

I wonder, you wrote earlier that UXwrite uses html internally, that seems
for me as the lowest common nominator...I would have thought a real
superset would have been the better choise ?

Some parts of AOO uses the structure directly others go through the API,
that is not very clean, and makes it extremly difficult to test chaanges in
the internal memory layout. An application like this (and many other
similar types), should see the memory as a capsule, with a fixed API around
it.

rgds
jan I

>
> --
> Dr. Peter M. Kelly
> Founder, UX Productivity
> peter@uxproductivity.com
> http://www.uxproductivity.com/
> http://www.kellypmk.net/
>
> PGP key: http://www.kellypmk.net/pgp-key
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>

Re: OOXML

Posted by Andrea Pescetti <pe...@apache.org>.

Andrew Douglas Pitonyak wrote:
> On 08/03/2014 12:50 PM, Peter Kelly wrote:
>> features that are supported in OOXML but not ODF, e.g. table
>> styles.
>>
> If AOO does not support Table Styles and a particular file format does
> not, the biggest problem is that you lose table styles when you load,
> edit, then save. If Aoo does not support Table Styles, then obviously
> that feature will not properly "round trip" from file to memory to file.

Just a minor detail in this discussion, but last time I checked (a few 
years ago) ODF did have support for Table Styles; OpenOffice didn't 
expose a UI for that, but as Andrew wrote this is an editor problem, not 
a format problem.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Andrew Douglas Pitonyak <an...@pitonyak.org>.

On 08/03/2014 12:50 PM, Peter Kelly wrote:
> On 3 Aug 2014, at 6:52 pm, Regina Henschel <rb.henschel@t-online.de 
> <ma...@t-online.de>> wrote:
>
> The second option has the advantage that it would be easier to cater 
> for features that are supported in OOXML but not ODF, e.g. table 
> styles. However the first option has the advantage that it would keep 
> the core entirely separate from the OOXML filter, and could 
> potentially be constructed as in a general-purpose manner and made 
> usable as a library by other software.
>
If AOO does not support Table Styles and a particular file format does 
not, the biggest problem is that you lose table styles when you load, 
edit, then save. If Aoo does not support Table Styles, then obviously 
that feature will not properly "round trip" from file to memory to file.

-- 
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
Info:  http://www.pitonyak.org/oo.php

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 3 Aug 2014, at 6:52 pm, Regina Henschel <rb...@t-online.de> wrote:

> Peter Kelly schrieb:
>> There's two ways to view a format: (1) as a way of encoding information
>> for storage or transmission, and (2) as an in-memory data structure used
>> by the editor at runtime. In some programs these are two different
>> things, and in others they are the same. The latter is true of web
>> browsers - HTML is both the file format and the runtime data model; the
>> W3C DOM APIs can be used to manipulate the HTML structure directly. I
>> believe this was also true to a large extent with the binary formats
>> used by older versions of MS Office, for purposes of efficiency [1].
>> 
>> I'm not familiar with the internals of OpenOffice - one thing I'd be
>> very interested to know is does it use ODF for it's in-memory
>> representation of the document? Or are the runtime data structures used
>> different to the XML trees that one finds in an ODF package?
> 
> No, OpenOffice has a very different in-memory representation than the ODF format. And the API is a third version of looking at the document.

Interesting.

Given this is the case, what would you suggest would be the best strategy for supporting OOXML?

1) Two-way conversion between OOXML and ODF, with OpenOffice then dealing solely with the file as ODF (not even being aware it came from OOXML originally)
2) Two-way conversion between OOXML and OpenOffice's internal representation, bypassing ODF altogether

The second option has the advantage that it would be easier to cater for features that are supported in OOXML but not ODF, e.g. table styles. However the first option has the advantage that it would keep the core entirely separate from the OOXML filter, and could potentially be constructed as in a general-purpose manner and made usable as a library by other software.

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: OOXML

Posted by Regina Henschel <rb...@t-online.de>.

Hi Peter,

Peter Kelly schrieb:
> On 3 Aug 2014, at 1:57 am, jan i <jani@apache.org
> <ma...@apache.org>> wrote:
>
>> I too am on peter fast rolling waggon :-) but I am also confused.
>>
>> @peter maybe you could explain a couple of things, for non-document
>> specialists:
>>
>> 1) Following your thought, with biderectional editors. Why would a editor
>> have a home format ?
>
> There's two ways to view a format: (1) as a way of encoding information
> for storage or transmission, and (2) as an in-memory data structure used
> by the editor at runtime. In some programs these are two different
> things, and in others they are the same. The latter is true of web
> browsers - HTML is both the file format and the runtime data model; the
> W3C DOM APIs can be used to manipulate the HTML structure directly. I
> believe this was also true to a large extent with the binary formats
> used by older versions of MS Office, for purposes of efficiency [1].
>
> I'm not familiar with the internals of OpenOffice - one thing I'd be
> very interested to know is does it use ODF for it's in-memory
> representation of the document? Or are the runtime data structures used
> different to the XML trees that one finds in an ODF package?

No, OpenOffice has a very different in-memory representation than the 
ODF format. And the API is a third version of looking at the document.

Kind regards
Regina

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 3 Aug 2014, at 1:57 am, jan i <ja...@apache.org> wrote:

> I too am on peter fast rolling waggon :-) but I am also confused.
> 
> @peter maybe you could explain a couple of things, for non-document
> specialists:
> 
> 1) Following your thought, with biderectional editors. Why would a editor
> have a home format ?

There's two ways to view a format: (1) as a way of encoding information for storage or transmission, and (2) as an in-memory data structure used by the editor at runtime. In some programs these are two different things, and in others they are the same. The latter is true of web browsers - HTML is both the file format and the runtime data model; the W3C DOM APIs can be used to manipulate the HTML structure directly. I believe this was also true to a large extent with the binary formats used by older versions of MS Office, for purposes of efficiency [1].

I'm not familiar with the internals of OpenOffice - one thing I'd be very interested to know is does it use ODF for it's in-memory representation of the document? Or are the runtime data structures used different to the XML trees that one finds in an ODF package?

> Following your thought to the end, the editor would always save/read in the
> format, and things not supported in the format with be saved as private.

The issue of how to handle features not supported by the format is a tricky one. My initial view is that those features are best disabled if the user chooses to save in that format (or alternatively a warning message shown on save), since even if there were private extensions saved in the foreign format, they won't be supported in other apps, and are not guaranteed to be preserved (see further below).

> 2) When editing in format foo, one can expect that not all features are
> supported (like e.g. microsoft macros), these are handled as private
> containers.
> 
> But looking at LO there seems to be huge challenges when doing especially
> copy/paste operations ?

Yes, this is a very tricky problem. Even with a simple bidirectional transformation model, where you have a 1:1 mapping between elements in the concrete document and elements in the abstract document (concrete = original format, abstract = format used by the editor), it's not possible to know what should be done for elements that have been copied & pasted.

One approach would be to make the mapping 1:n, where if an element in the abstract (editable) document is copied & pasted one or more times, then its corresponding element in the concrete document is also duplicated at save time when the file is updated. However, this can potentially violate uniqueness constraints, e.g. if the element being copied is supposed to have a unique identifier, you can't just go making a direct copy of it, as you'd end up with two elements with the same identifier. However, if the implementation was aware of such uniqueness constraints for specific elements it could ensure these are still respected, even if it doesn't support any other aspects of the element (e.g. editing or rendering).

Cut & paste is much easier to handle though as it's equivalent to a move operation, which doesn't have any implications for uniqueness constraints.

> 3) If we save private info in .docx, how can be be sure that a microsoft
> editor does not destroy it ?
> 
> Does the standard contain some rules about keeping private information ?

Well, we can never be *completely* sure that a microsoft editor won't destroy something ;)

Having said that though, there are a couple of provisions for this. One is simply the ability to include extra files in the package, labeled with a particular namespace. Each OOXML package contains a "relationship graph", which is a separate data structure from the zip file's directory hierarchy, and is what OOXML uses to identify "parts" (files) within the package. In principle, there should be no problem with simply adding an extra part with whatever namespace you like, and that being preserved. However, this isn't guaranteed if an implementation does an import/export, since usually any extra information gets lost on import and is no longer there by the time export occurs.

I've just done a test on this in fact, to see how different implementations handle it. I added an extra XML file to a package, and referenced it from the relationships graph. Under Word 2011 and Word 2013, this file was preserved after modification. Under LibreOffice Writer however, the file disappeared from the package after a save. I suspect this is due to the file being imported into either ODF of LibreOffice's own internal data model, and thus the extra information being missing on save (if any of the LO developers are reading this... perhaps you can comment here).

Ironically the warning message LO displayed when I tried to save the file was 'This document may contain formatting or content that cannot be saved in the currently selected file format "Microsoft Word 2007/2010 XML". Use the default ODF file format to be sure that the document is saved correctly". In fact, in this instance, the exact opposite is the case - the information *could* be saved in OOXML (if it were not previously lost on import), but could *not* be saved in ODF. I think this is a good example of why bidirectional transformation is so important for achieving true compatibility - since it means you *don't* lose information on save. The fact that it works in MS Office is possibly more luck than anything else, since it wouldn't need to do an import.

The second way in which OOXML caters for foreign extensions is a set of XML elements which can be used to indicate how a consumer should treat content it doesn't know about. This is described in part 3 of the spec, "Markup Compatibility and Extensibility (MCE)". Essentially this provides a way of saying to a consumer "hey, I've got this extra info in a custom format, and you should use that if you support the particular namespace; otherwise, here's some fallback content you can use instead". It also lets you say to the consumer "just ignore elements in this namespace if you don't support it".

Unfortunately however, I don't believe there's any guarantee that these are preserved either. In the case of UX Write, where there is a piece of content stored in multiple formats, it just throws away the ones it doesn't support (one of the few cases in which UX Write's .docx support is not fully bidirectional). This is something I should arguably fix, as potentially there may be useful information lost. The only instance I've seen it used in practice though is where there's a new, proprietary feature introduced in a later version of Office; e.g. in Word 2010 or later if you draw a circle in your document, it will (and I'm not making this up) store two versions of the circle - one a special Word 2010 namespace which is not defined in the OOXML spec, and another representation of the circle in the older VML format (which for some reason mainly consists of a "o:gfxdata" attribute containing binary data encoded in base 64 - but hey, at least it's in XML, right? ;)

To summarise, I think that storing private/extension information in a foreign file format should be considered unreliable, since implementations tend to differ a lot on their support for this. Therefore, one should do so if there's no major consequence to losing that information. It also kind of goes against the idea of having a standard in the first place.

[1] http://www.joelonsoftware.com/items/2008/02/19.html

--
Dr. Peter M. Kelly
kellypmk@gmail.com
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: OOXML

Posted by jan i <ja...@apache.org>.

On 3 August 2014 19:56, Dennis E. Hamilton <de...@acm.org> wrote:

> Below, Jan asks
>
>   “Does a consumer normally have some sort of conformance sheet
>     (like we have for communication protocols) or is it solely the user
>     that painfully finds the lack of support ?”
>
> I think this is easy to answer.  Where have you found an ODF conformance
> sheet for Apache OpenOffice?  LibreOffice?
>

I have of course not, and always been wondering. My background is
communication protocols and in broad terms ODF can be seen as such, so to
me a statement of conformance is natural. But given your explanation that
many parts are left implementation-dependent (unlike "real" communication
protocols) I understand why.

Im simple words, its a wonder it work, we dont know why, but its a lot
better than nothing :-)

thanks
jan.


> Many choices of what to implement and also deviations of the way features
> are implemented are left implementation-dependent.  In ODF 1.2 there are
> more cases where *implementation-defined* is a requirement.  I am not aware
> how any of those have come up for AOO and LibO and how the
> implementation-based choices are defined, if any.
>
> Here is a serious conformance statement I have found:
> <http://technet.microsoft.com/en-us/library/ff852100(v=office.14).aspx>
>
> Here are some about ODF (scroll down to [MS-OODF], [MS-OODF2], and
> [MS-OODF3],
> <http://msdn.microsoft.com/en-us/library/gg548604.aspx>.
>
> Here’s the on-line version of the one for ODF 1.2 support:
> <http://msdn.microsoft.com/en-us/library/hh695327.aspx>.
>
> It is instructive to expand the sidebar section 2 Standards Support
> Statements and 2.1 Normative Variations.  (I never know what it means to
> say something is not supported.  I believe it is clear that such features
> are not produced, but I have no idea what happens when a not-supported
> provision is encountered in an input document.  All in all, I think this
> is, compared to other implementations, a “glass-half-full” condition.)
>
> In the past there was an on-line database that you could use to review
> compliance with ODF feature by feature, line chapter and verse.  It
> provided for user comments and questions at that level.  It was
> ill-maintained and I can no longer find it.  It looks like the [MS-OODFn]
> documents have taken on that task.  The statements in those documents are
> very much what was to be found on the database.
>
> Cynics will point out that the EUC required Microsoft to describe all
> deviations in its support of ODF.  It is unfortunate that the EUC did not
> consider that such statements would be important from other sources of ODF
> Consumers as well.
>
>
>  -- Dennis E. Hamilton
>     dennis.hamilton@acm.org    +1-206-779-9430
>     https://keybase.io/orcmid  PGP F96E 89FF D456 628A
>     X.509 certs used and requested for signed e-mail
>
>
>
>
> -----Original Message-----
> From: jan i [mailto:jani@apache.org]
> Sent: Sunday, August 3, 2014 00:57
> To: dev; Dennis Hamilton
> Subject: Re: OOXML
>
> On 2 August 2014 22:31, Dennis E. Hamilton <de...@acm.org>
> wrote:
> > [ ... ] There is no strict minimum Conforming OpenDocument
> > Consumer.  A consumer must not object to anything in the document file
> that
> > conforms to the ODF specification, but it is not required to "interpret"
> > all or even any minimum set of features.  There is no producer that I am
> > aware of that produces all features provided for in the ODF
> specification,
> > and most implementations only interpret those features that they are
> > designed to produce (sometimes incorrectly) themselves.  This doesn't
> > matter too much if you use implementations with a common genealogy, but
> > across independent implementations not having any common code base there
> > tend to be unexpected surprises.  There are also many places where a
> > provision of ODF is not rigorously defined and implementation-dependent
> > variation is the result, whether explicitly called out (e.g., for macros
> > and scripts) or not (e.g., for supported image formats).
> >
>
> Does a consumer normally have some sort of conformance sheet (like we have
> for communication protocols) or is it solely the user that painfully finds
> the lack of support ?
>
>
> [ ... ]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

Below, Jan asks

  “Does a consumer normally have some sort of conformance sheet 
    (like we have for communication protocols) or is it solely the user 
    that painfully finds the lack of support ?”

I think this is easy to answer.  Where have you found an ODF conformance sheet for Apache OpenOffice?  LibreOffice?

Many choices of what to implement and also deviations of the way features are implemented are left implementation-dependent.  In ODF 1.2 there are more cases where *implementation-defined* is a requirement.  I am not aware how any of those have come up for AOO and LibO and how the implementation-based choices are defined, if any.

Here is a serious conformance statement I have found: 
<http://technet.microsoft.com/en-us/library/ff852100(v=office.14).aspx>

Here are some about ODF (scroll down to [MS-OODF], [MS-OODF2], and [MS-OODF3], 
<http://msdn.microsoft.com/en-us/library/gg548604.aspx>.   

Here’s the on-line version of the one for ODF 1.2 support: 
<http://msdn.microsoft.com/en-us/library/hh695327.aspx>.  

It is instructive to expand the sidebar section 2 Standards Support Statements and 2.1 Normative Variations.  (I never know what it means to say something is not supported.  I believe it is clear that such features are not produced, but I have no idea what happens when a not-supported provision is encountered in an input document.  All in all, I think this is, compared to other implementations, a “glass-half-full” condition.)

In the past there was an on-line database that you could use to review compliance with ODF feature by feature, line chapter and verse.  It provided for user comments and questions at that level.  It was ill-maintained and I can no longer find it.  It looks like the [MS-OODFn] documents have taken on that task.  The statements in those documents are very much what was to be found on the database.

Cynics will point out that the EUC required Microsoft to describe all deviations in its support of ODF.  It is unfortunate that the EUC did not consider that such statements would be important from other sources of ODF Consumers as well.

 -- Dennis E. Hamilton
    dennis.hamilton@acm.org    +1-206-779-9430
    https://keybase.io/orcmid  PGP F96E 89FF D456 628A
    X.509 certs used and requested for signed e-mail

-----Original Message-----
From: jan i [mailto:jani@apache.org] 
Sent: Sunday, August 3, 2014 00:57
To: dev; Dennis Hamilton
Subject: Re: OOXML

On 2 August 2014 22:31, Dennis E. Hamilton <de...@acm.org> wrote:
> [ ... ] There is no strict minimum Conforming OpenDocument
> Consumer.  A consumer must not object to anything in the document file that
> conforms to the ODF specification, but it is not required to "interpret"
> all or even any minimum set of features.  There is no producer that I am
> aware of that produces all features provided for in the ODF specification,
> and most implementations only interpret those features that they are
> designed to produce (sometimes incorrectly) themselves.  This doesn't
> matter too much if you use implementations with a common genealogy, but
> across independent implementations not having any common code base there
> tend to be unexpected surprises.  There are also many places where a
> provision of ODF is not rigorously defined and implementation-dependent
> variation is the result, whether explicitly called out (e.g., for macros
> and scripts) or not (e.g., for supported image formats).
>

Does a consumer normally have some sort of conformance sheet (like we have
for communication protocols) or is it solely the user that painfully finds
the lack of support ?

[ ... ]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by jan i <ja...@apache.org>.

On 2 August 2014 22:31, Dennis E. Hamilton <de...@acm.org> wrote:

> Below, Jan asks
>
>   "Does the standard contain some rules about keeping private information
> ?"
>
> There are two cases for ODF 1.2.
>
> First there is the case for foreign elements/attributes/attribute values.
>  This would be the case for some sort of extended material incorporated in
> the ODF document.  This makes a Conforming OpenDocument Document into an
> Extended OpenDocument Document.  A Conforming OpenDocument Consumer is
> permitted to ignore all of that, based on some rules about whether or not
> it occurs in (technically-defined) paragraph content or elsewhere in the
> format.  There can also be foreign content in the XML package of the
> document, where there is no recognized relationship of that content to
> anything in the document as seen by an ODF Consumer.
>
> There are places where the preservation of such foreign material is
> recommended but not required.  Most implementations lose all content that
> they are not implemented to interpret.  Microsoft Office very definitely
> does that in its acceptance of OpenDocument Document files.  This happens
> mainly because the typical internal model doesn't preserve the original XML
> parts and it doesn't work by manipulation of the XML parts.  I suspect that
> Microsoft concerns about document security are also a factor, in addition
> to unwillingness to support features that are not part of the ODF
> specification.  (The position, as I understand it, is that they will
> support the standard, not OpenOffice's particular implementation around it,
> and I don't know how much flexibility there is in that respect.  That
> OpenOffice *is* the standard is a popular view that happens to be
> inconsistent with the principles of ISO or any standards-development
> organization that are committed to the ideal of independently-implemented
> interoperable implementations.)
>
> The second case has to do with features of ODF that a particular
> implementation does not support.  In general, these do not survive in
> current implementations, since import into the internal model loses that
> material and there is consequently no provision for exporting it.  Here,
> there is the fact that there is no strict minimum Conforming OpenDocument
> Consumer.  A consumer must not object to anything in the document file that
> conforms to the ODF specification, but it is not required to "interpret"
> all or even any minimum set of features.  There is no producer that I am
> aware of that produces all features provided for in the ODF specification,
> and most implementations only interpret those features that they are
> designed to produce (sometimes incorrectly) themselves.  This doesn't
> matter too much if you use implementations with a common genealogy, but
> across independent implementations not having any common code base there
> tend to be unexpected surprises.  There are also many places where a
> provision of ODF is not rigorously defined and implementation-dependent
> variation is the result, whether explicitly called out (e.g., for macros
> and scripts) or not (e.g., for supported image formats).
>

Does a consumer normally have some sort of conformance sheet (like we have
for communication protocols) or is it solely the user that painfully finds
the lack of support ?


In the other mail you write a quite interesting note about digital signing
of artifact the user cannot see. Do you happen to know how microsoft goes
around that with the web based offerings ?

Thanks for some very interesting input.
rgds
jan I.

>
>
>  -- Dennis E. Hamilton
>     dennis.hamilton@acm.org    +1-206-779-9430
>     https://keybase.io/orcmid  PGP F96E 89FF D456 628A
>     X.509 certs used and requested for signed e-mail
>
>
>
> -----Original Message-----
> From: jan i [mailto:jani@apache.org]
> Sent: Saturday, August 2, 2014 11:58
> To: dev; Dennis Hamilton
> Subject: Re: OOXML
>
> On 2 August 2014 20:27, Dennis E. Hamilton <de...@acm.org>
> wrote:
>
> > <orcnote>s below.
> >
> >
> > -----Original Message-----
> > From: jan i [mailto:jani@apache.org]
> > Sent: Saturday, August 2, 2014 08:57
> > To: dev
> > Subject: Re: OOXML
> >
> > On 2 August 2014 17:06, Louis Suárez-Potts <lu...@gmail.com> wrote:
> >
> > >
> > > > On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> > > >
> > > > The Support that is done is to receieve OOXML not to produce them,
> the
> > > > discussion issue would be to support legacy formats like .doc or
> .xls.
> > > >
> > > > I still dont see a point to generate OOXML and most people dont care
> > > > as long as they can send in office native formats.
> > > >
> > > > I never heard someone saying, please send it on docx, your doc is a
> > > > closed binary format.
> > >
> > > Actually, I have. But it also matters on mobile, as well as, I'd guess,
> > > for some developing processes for batch conversion of documents.
> Finally,
> > > it's not evident to me that refusing to develop to what is likely to
> > become
> > > the major desktop document format globally—alas—is a good strategy that
> > > would lead to the adoption of OO. Rather, it seems it would only help
> > those
> > > applications that do (express) both ODF *and* .docx well.
> > >
> >
> > Please dont forget, the computer business have always had 2 types of
> > standard the official one and the de facto one.
> >
> > For those to young to remember, tcp/ip is not an official standard (OSI
> > was) but something a number of companies decided to promote, I see docx
> in
> > the same light.
> >
> > <orcnote>
> >    I think this has it backwards.  For ages, .doc was the defacto
> standard
> >    And de jure ISO/W3C standards like SGML, ODA, and even XML did not do
> >    Anything to dent that.  That is now .doc and .docx, however defacto
> >    you consider them to be (although they are both now all open formats).
> >
> >    I am squarely in the same camp as Peter Kelley and Luis Suarez-
> >    Potts with regard to the pragmatic situation that exists.  One-way
> >    movement to ODF is simply going to be unacceptable, possibly forever,
> >    if you are determined to have "there must be only one" in a niche of
> >    like-minded followers.
> >
> >    This is unfortunate for one particular reason -- ODF is the only well-
> >    established multi-platform document format, thanks to the wider
> platform
> >    support of LibreOffice and Apache OpenOffice.  (Those also introduce
> >    de facto and monoculture factors that are omitted in the marketing
> >    speak.)
> >
> >    But without a dramatic increase in Linux penetration, this may not
> dent
> >    The state of affairs much.  The bigger penetration opportunity is iOS
> >    and Android, not Linux.  And you may have noticed that Microsoft has
> >    figured that out and is moving dramatically to provide OOXML inter-
> >    operability via the cloud (especially Sky-/One-Drive and Office Web
> >    Apps) and via phone/phablet/tablet presence on Windows 8,
> WindowsPhone8,
> >    Android (including the Amazon flavor), and iOS.  There are even
> >    provisions for concurrent collaboration already strong in the flag-
> >    carrying application, Microsoft OneNote, an openly-documented but
> >    not-standardized format.
> >
> >    The last time I checked, the OneDrive free in-browser Office Web Apps
> >    also support ODF 1.2 documents, although it will convert them to a
> >    MSO-compatible cloud subset form if you want to edit them there, even
> >    Though retrievable in ODF 1.2.  Viewing works out of the box.  My
> >    impression of the editing pre-conversion is that is a safety measure
> >    in case any ODF feature loss is unacceptable and so you still have an
> >    intact original there.
> > </orcmid>
> >
>
> I too am on peter fast rolling waggon :-) but I am also confused.
>
> @peter maybe you could explain a couple of things, for non-document
> specialists:
>
> 1) Following your thought, with biderectional editors. Why would a editor
> have a home format ?
>
> Following your thought to the end, the editor would always save/read in the
> format, and things not supported in the format with be saved as private.
>
> 2) When editing in format foo, one can expect that not all features are
> supported (like e.g. microsoft macros), these are handled as private
> containers.
>
> But looking at LO there seems to be huge challenges when doing especially
> copy/paste operations ?
>
> 3) If we save private info in .docx, how can be be sure that a microsoft
> editor does not destroy it ?
>
> Does the standard contain some rules about keeping private information ?
>
> thanks in advance
> jan I.
>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> > For additional commands, e-mail: dev-help@openoffice.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

Below, Jan asks

  "Does the standard contain some rules about keeping private information ?"

There are two cases for ODF 1.2.

First there is the case for foreign elements/attributes/attribute values.  This would be the case for some sort of extended material incorporated in the ODF document.  This makes a Conforming OpenDocument Document into an Extended OpenDocument Document.  A Conforming OpenDocument Consumer is permitted to ignore all of that, based on some rules about whether or not it occurs in (technically-defined) paragraph content or elsewhere in the format.  There can also be foreign content in the XML package of the document, where there is no recognized relationship of that content to anything in the document as seen by an ODF Consumer.  

There are places where the preservation of such foreign material is recommended but not required.  Most implementations lose all content that they are not implemented to interpret.  Microsoft Office very definitely does that in its acceptance of OpenDocument Document files.  This happens mainly because the typical internal model doesn't preserve the original XML parts and it doesn't work by manipulation of the XML parts.  I suspect that Microsoft concerns about document security are also a factor, in addition to unwillingness to support features that are not part of the ODF specification.  (The position, as I understand it, is that they will support the standard, not OpenOffice's particular implementation around it, and I don't know how much flexibility there is in that respect.  That OpenOffice *is* the standard is a popular view that happens to be inconsistent with the principles of ISO or any standards-development organization that are committed to the ideal of independently-implemented interoperable implementations.)

The second case has to do with features of ODF that a particular implementation does not support.  In general, these do not survive in current implementations, since import into the internal model loses that material and there is consequently no provision for exporting it.  Here, there is the fact that there is no strict minimum Conforming OpenDocument Consumer.  A consumer must not object to anything in the document file that conforms to the ODF specification, but it is not required to "interpret" all or even any minimum set of features.  There is no producer that I am aware of that produces all features provided for in the ODF specification, and most implementations only interpret those features that they are designed to produce (sometimes incorrectly) themselves.  This doesn't matter too much if you use implementations with a common genealogy, but across independent implementations not having any common code base there tend to be unexpected surprises.  There are also many places where a provision of ODF is not rigorously defined and implementation-dependent variation is the result, whether explicitly called out (e.g., for macros and scripts) or not (e.g., for supported image formats).

 -- Dennis E. Hamilton
    dennis.hamilton@acm.org    +1-206-779-9430
    https://keybase.io/orcmid  PGP F96E 89FF D456 628A
    X.509 certs used and requested for signed e-mail

-----Original Message-----
From: jan i [mailto:jani@apache.org] 
Sent: Saturday, August 2, 2014 11:58
To: dev; Dennis Hamilton
Subject: Re: OOXML

On 2 August 2014 20:27, Dennis E. Hamilton <de...@acm.org> wrote:

> <orcnote>s below.
>
>
> -----Original Message-----
> From: jan i [mailto:jani@apache.org]
> Sent: Saturday, August 2, 2014 08:57
> To: dev
> Subject: Re: OOXML
>
> On 2 August 2014 17:06, Louis Suárez-Potts <lu...@gmail.com> wrote:
>
> >
> > > On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> > >
> > > The Support that is done is to receieve OOXML not to produce them, the
> > > discussion issue would be to support legacy formats like .doc or .xls.
> > >
> > > I still dont see a point to generate OOXML and most people dont care
> > > as long as they can send in office native formats.
> > >
> > > I never heard someone saying, please send it on docx, your doc is a
> > > closed binary format.
> >
> > Actually, I have. But it also matters on mobile, as well as, I'd guess,
> > for some developing processes for batch conversion of documents. Finally,
> > it's not evident to me that refusing to develop to what is likely to
> become
> > the major desktop document format globally—alas—is a good strategy that
> > would lead to the adoption of OO. Rather, it seems it would only help
> those
> > applications that do (express) both ODF *and* .docx well.
> >
>
> Please dont forget, the computer business have always had 2 types of
> standard the official one and the de facto one.
>
> For those to young to remember, tcp/ip is not an official standard (OSI
> was) but something a number of companies decided to promote, I see docx in
> the same light.
>
> <orcnote>
>    I think this has it backwards.  For ages, .doc was the defacto standard
>    And de jure ISO/W3C standards like SGML, ODA, and even XML did not do
>    Anything to dent that.  That is now .doc and .docx, however defacto
>    you consider them to be (although they are both now all open formats).
>
>    I am squarely in the same camp as Peter Kelley and Luis Suarez-
>    Potts with regard to the pragmatic situation that exists.  One-way
>    movement to ODF is simply going to be unacceptable, possibly forever,
>    if you are determined to have "there must be only one" in a niche of
>    like-minded followers.
>
>    This is unfortunate for one particular reason -- ODF is the only well-
>    established multi-platform document format, thanks to the wider platform
>    support of LibreOffice and Apache OpenOffice.  (Those also introduce
>    de facto and monoculture factors that are omitted in the marketing
>    speak.)
>
>    But without a dramatic increase in Linux penetration, this may not dent
>    The state of affairs much.  The bigger penetration opportunity is iOS
>    and Android, not Linux.  And you may have noticed that Microsoft has
>    figured that out and is moving dramatically to provide OOXML inter-
>    operability via the cloud (especially Sky-/One-Drive and Office Web
>    Apps) and via phone/phablet/tablet presence on Windows 8, WindowsPhone8,
>    Android (including the Amazon flavor), and iOS.  There are even
>    provisions for concurrent collaboration already strong in the flag-
>    carrying application, Microsoft OneNote, an openly-documented but
>    not-standardized format.
>
>    The last time I checked, the OneDrive free in-browser Office Web Apps
>    also support ODF 1.2 documents, although it will convert them to a
>    MSO-compatible cloud subset form if you want to edit them there, even
>    Though retrievable in ODF 1.2.  Viewing works out of the box.  My
>    impression of the editing pre-conversion is that is a safety measure
>    in case any ODF feature loss is unacceptable and so you still have an
>    intact original there.
> </orcmid>
>

I too am on peter fast rolling waggon :-) but I am also confused.

@peter maybe you could explain a couple of things, for non-document
specialists:

1) Following your thought, with biderectional editors. Why would a editor
have a home format ?

Following your thought to the end, the editor would always save/read in the
format, and things not supported in the format with be saved as private.

2) When editing in format foo, one can expect that not all features are
supported (like e.g. microsoft macros), these are handled as private
containers.

But looking at LO there seems to be huge challenges when doing especially
copy/paste operations ?

3) If we save private info in .docx, how can be be sure that a microsoft
editor does not destroy it ?

Does the standard contain some rules about keeping private information ?

thanks in advance
jan I.

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by jan i <ja...@apache.org>.

On 2 August 2014 20:27, Dennis E. Hamilton <de...@acm.org> wrote:

> <orcnote>s below.
>
>
> -----Original Message-----
> From: jan i [mailto:jani@apache.org]
> Sent: Saturday, August 2, 2014 08:57
> To: dev
> Subject: Re: OOXML
>
> On 2 August 2014 17:06, Louis Suárez-Potts <lu...@gmail.com> wrote:
>
> >
> > > On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> > >
> > > The Support that is done is to receieve OOXML not to produce them, the
> > > discussion issue would be to support legacy formats like .doc or .xls.
> > >
> > > I still dont see a point to generate OOXML and most people dont care
> > > as long as they can send in office native formats.
> > >
> > > I never heard someone saying, please send it on docx, your doc is a
> > > closed binary format.
> >
> > Actually, I have. But it also matters on mobile, as well as, I'd guess,
> > for some developing processes for batch conversion of documents. Finally,
> > it's not evident to me that refusing to develop to what is likely to
> become
> > the major desktop document format globally—alas—is a good strategy that
> > would lead to the adoption of OO. Rather, it seems it would only help
> those
> > applications that do (express) both ODF *and* .docx well.
> >
>
> Please dont forget, the computer business have always had 2 types of
> standard the official one and the de facto one.
>
> For those to young to remember, tcp/ip is not an official standard (OSI
> was) but something a number of companies decided to promote, I see docx in
> the same light.
>
> <orcnote>
>    I think this has it backwards.  For ages, .doc was the defacto standard
>    And de jure ISO/W3C standards like SGML, ODA, and even XML did not do
>    Anything to dent that.  That is now .doc and .docx, however defacto
>    you consider them to be (although they are both now all open formats).
>
>    I am squarely in the same camp as Peter Kelley and Luis Suarez-
>    Potts with regard to the pragmatic situation that exists.  One-way
>    movement to ODF is simply going to be unacceptable, possibly forever,
>    if you are determined to have "there must be only one" in a niche of
>    like-minded followers.
>
>    This is unfortunate for one particular reason -- ODF is the only well-
>    established multi-platform document format, thanks to the wider platform
>    support of LibreOffice and Apache OpenOffice.  (Those also introduce
>    de facto and monoculture factors that are omitted in the marketing
>    speak.)
>
>    But without a dramatic increase in Linux penetration, this may not dent
>    The state of affairs much.  The bigger penetration opportunity is iOS
>    and Android, not Linux.  And you may have noticed that Microsoft has
>    figured that out and is moving dramatically to provide OOXML inter-
>    operability via the cloud (especially Sky-/One-Drive and Office Web
>    Apps) and via phone/phablet/tablet presence on Windows 8, WindowsPhone8,
>    Android (including the Amazon flavor), and iOS.  There are even
>    provisions for concurrent collaboration already strong in the flag-
>    carrying application, Microsoft OneNote, an openly-documented but
>    not-standardized format.
>
>    The last time I checked, the OneDrive free in-browser Office Web Apps
>    also support ODF 1.2 documents, although it will convert them to a
>    MSO-compatible cloud subset form if you want to edit them there, even
>    Though retrievable in ODF 1.2.  Viewing works out of the box.  My
>    impression of the editing pre-conversion is that is a safety measure
>    in case any ODF feature loss is unacceptable and so you still have an
>    intact original there.
> </orcmid>
>

I too am on peter fast rolling waggon :-) but I am also confused.

@peter maybe you could explain a couple of things, for non-document
specialists:

1) Following your thought, with biderectional editors. Why would a editor
have a home format ?

Following your thought to the end, the editor would always save/read in the
format, and things not supported in the format with be saved as private.

2) When editing in format foo, one can expect that not all features are
supported (like e.g. microsoft macros), these are handled as private
containers.

But looking at LO there seems to be huge challenges when doing especially
copy/paste operations ?

3) If we save private info in .docx, how can be be sure that a microsoft
editor does not destroy it ?

Does the standard contain some rules about keeping private information ?

thanks in advance
jan I.

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

RE: OOXML

Posted by "Dennis E. Hamilton" <de...@acm.org>.

<orcnote>s below.

-----Original Message-----
From: jan i [mailto:jani@apache.org] 
Sent: Saturday, August 2, 2014 08:57
To: dev
Subject: Re: OOXML

On 2 August 2014 17:06, Louis Suárez-Potts <lu...@gmail.com> wrote:

>
> > On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> >
> > The Support that is done is to receieve OOXML not to produce them, the
> > discussion issue would be to support legacy formats like .doc or .xls.
> >
> > I still dont see a point to generate OOXML and most people dont care
> > as long as they can send in office native formats.
> >
> > I never heard someone saying, please send it on docx, your doc is a
> > closed binary format.
>
> Actually, I have. But it also matters on mobile, as well as, I'd guess,
> for some developing processes for batch conversion of documents. Finally,
> it's not evident to me that refusing to develop to what is likely to become
> the major desktop document format globally—alas—is a good strategy that
> would lead to the adoption of OO. Rather, it seems it would only help those
> applications that do (express) both ODF *and* .docx well.
>

Please dont forget, the computer business have always had 2 types of
standard the official one and the de facto one.

For those to young to remember, tcp/ip is not an official standard (OSI
was) but something a number of companies decided to promote, I see docx in
the same light.

<orcnote>
   I think this has it backwards.  For ages, .doc was the defacto standard 
   And de jure ISO/W3C standards like SGML, ODA, and even XML did not do 
   Anything to dent that.  That is now .doc and .docx, however defacto 
   you consider them to be (although they are both now all open formats).

   I am squarely in the same camp as Peter Kelley and Luis Suarez-
   Potts with regard to the pragmatic situation that exists.  One-way 
   movement to ODF is simply going to be unacceptable, possibly forever,
   if you are determined to have "there must be only one" in a niche of
   like-minded followers.

   This is unfortunate for one particular reason -- ODF is the only well-
   established multi-platform document format, thanks to the wider platform
   support of LibreOffice and Apache OpenOffice.  (Those also introduce
   de facto and monoculture factors that are omitted in the marketing 
   speak.)

   But without a dramatic increase in Linux penetration, this may not dent
   The state of affairs much.  The bigger penetration opportunity is iOS 
   and Android, not Linux.  And you may have noticed that Microsoft has 
   figured that out and is moving dramatically to provide OOXML inter-
   operability via the cloud (especially Sky-/One-Drive and Office Web 
   Apps) and via phone/phablet/tablet presence on Windows 8, WindowsPhone8, 
   Android (including the Amazon flavor), and iOS.  There are even 
   provisions for concurrent collaboration already strong in the flag-
   carrying application, Microsoft OneNote, an openly-documented but 
   not-standardized format.  

   The last time I checked, the OneDrive free in-browser Office Web Apps 
   also support ODF 1.2 documents, although it will convert them to a 
   MSO-compatible cloud subset form if you want to edit them there, even
   Though retrievable in ODF 1.2.  Viewing works out of the box.  My 
   impression of the editing pre-conversion is that is a safety measure 
   in case any ODF feature loss is unacceptable and so you still have an 
   intact original there.
</orcmid>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by jan i <ja...@apache.org>.

On 2 August 2014 17:06, Louis Suárez-Potts <lu...@gmail.com> wrote:

>
> > On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> >
> > The Support that is done is to receieve OOXML not to produce them, the
> > discussion issue would be to support legacy formats like .doc or .xls.
> >
> > I still dont see a point to generate OOXML and most people dont care
> > as long as they can send in office native formats.
> >
> > I never heard someone saying, please send it on docx, your doc is a
> > closed binary format.
>
> Actually, I have. But it also matters on mobile, as well as, I'd guess,
> for some developing processes for batch conversion of documents. Finally,
> it's not evident to me that refusing to develop to what is likely to become
> the major desktop document format globally—alas—is a good strategy that
> would lead to the adoption of OO. Rather, it seems it would only help those
> applications that do (express) both ODF *and* .docx well.
>

Please dont forget, the computer business have always had 2 types of
standard the official one and the de facto one.

For those to young to remember, tcp/ip is not an official standard (OSI
was) but something a number of companies decided to promote, I see docx in
the same light.

rgds
jan I

>
> louis
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

Re: OOXML

Posted by Louis Suárez-Potts <lu...@gmail.com>.

> On 2014-08-02, at 10:24, Alexandro Colorado <jz...@oooes.org> wrote:
> 
> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
> 
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
> 
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.

Actually, I have. But it also matters on mobile, as well as, I'd guess, for some developing processes for batch conversion of documents. Finally, it's not evident to me that refusing to develop to what is likely to become the major desktop document format globally—alas—is a good strategy that would lead to the adoption of OO. Rather, it seems it would only help those applications that do (express) both ODF *and* .docx well.

louis
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Alexandro Colorado <jz...@oooes.org>.

The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

On 8/2/14, Peter Kelly <ke...@gmail.com> wrote:
> On 1 Aug 2014, at 2:42 pm, Rory O'Farrell <of...@iol.ie> wrote:
>
>> For information:
>> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/
>
> An interesting article. This brings to mind a few issues I've been thinking
> about for a while:
>
> - I think the rather extreme anti-OOXML stance that some take can be
> counterproductive. I certainly hold the view that ODF is a superior standard
> in many respects (though not all), however there are circumstances where it
> makes sense for a given piece of software to support both. For example they
> cite the lack of support for ODF in Google Docs and iWork; if one wants to
> develop software that will interoperate with these would require OOXML
> support.
>
> My take on the issue is that it's important to support both, because as much
> as we might dislike the fact, OOXML is out there and used very widely. With
> the work I'm currently doing on UX Write, I'm adding to the existing OOXML
> (specifically .docx) support with support for for ODF (.odt) and doing this
> in a common framework such that the app itself doesn't care which format the
> file is natively stored in, it will work equally well with both.
> Additionally, once I have the ODF support in, it will be possible to
> leverage this support for conversion between the two formats in both
> directions. I'll be giving a talk on this at ApacheCon EU later this year,
> and yes this framework will soon be open source - if anyone is interested in
> collaborating on it, please let me know.
>
> - One of the criticisms raised is that there are several different versions
> of OOXML, not all of which are entirely compatible. However this is also
> true of ODF (or at least of MS's implementation in Office 2007 and 2010; I'm
> not sure where the fault lies). One of the big questions I've been asking
> myself in the work I'm currently on ODF is whether I should have my
> implementation it save ODF 1.1 by default, or version 1.2 by default. If I
> choose the former, it will work with Office 2007 and onwards. The latter,
> only Office 2013 (I think). For someone such as myself writing a new
> implementation of the (prat of) ODF spec, and desiring compatibility with
> Office 2007 and 2010, which is the best choice?
>
> - I consider the use of proprietary fonts to be a separate issue from the
> standard itself. The specification is silent on the matter, so this is
> really a criticism of MS Office rather than OOXML itself. Nonetheless, it's
> an important one, and one I believe we should address by promoting the use
> of open source fonts (e.g. https://www.google.com/fonts) independently and
> in addition to the use of ODF. Perhaps these could be made available as an
> easily-distributed separate package, so that those who want to stick with MS
> Office for whatever reason could be encouraged to install & use them, for
> improved interoperability with other office suites?
>
> In an organisation where there are some users on MS and others on OO/LO,
> these fonts could be deployed by the IT department as part of the standard
> desktop image, and all templates created by the organisations could use
> these fonts by default, which would lead to wider usage.
>
> - Towards the end of the article, there's a discussion about the lack of
> support for ODF by some vendors, particularly Google and Apple. The question
> then is how do we fix that? My view is that there needs to be a migration
> path - and by that I mean not just a tool to convert documents from OOMXL to
> ODF, but the ability to go both ways, and work with either format for as
> long as necessary for the migration to complete. Most (all?) successful
> transitions I've seen have used a similar approach - Microsoft going from
> DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic ->
> OS X, and so forth.
>
> In the case of document formats, for a country whose government currently
> uses MS Office and OOXML that wants to make the switch to ODF and
> OpenOffice/LibreOffice/other tools, it's not going to be an overnight
> change. It could very well take several years, and during that period
> everyone in the organisation will need to have the capability to work with
> both formats. New or modified documents would in general be saved in ODF,
> but older documents as well as documents that need to be exchanged with
> people running MS Office 2007 or 2010 (which I think don't support ODF 1.2)
> would need to be in OOXML, until such time as everyone has upgraded to a
> fully-conformant version of MS Office, or switched to OpenOffice et al.
>
> --
> Dr. Peter M. Kelly
> Founder, UX Productivity
> peter@uxproductivity.com
> http://www.uxproductivity.com/
> http://www.kellypmk.net/
>
> PGP key: http://www.kellypmk.net/pgp-key
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>


-- 
Alexandro Colorado
Apache OpenOffice Contributor
882C 4389 3C27 E8DF 41B9  5C4C 1DB7 9D1C 7F4C 2614

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org

Re: OOXML

Posted by Peter Kelly <ke...@gmail.com>.

On 1 Aug 2014, at 2:42 pm, Rory O'Farrell <of...@iol.ie> wrote:

> For information:
> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/

An interesting article. This brings to mind a few issues I've been thinking about for a while:

- I think the rather extreme anti-OOXML stance that some take can be counterproductive. I certainly hold the view that ODF is a superior standard in many respects (though not all), however there are circumstances where it makes sense for a given piece of software to support both. For example they cite the lack of support for ODF in Google Docs and iWork; if one wants to develop software that will interoperate with these would require OOXML support.

My take on the issue is that it's important to support both, because as much as we might dislike the fact, OOXML is out there and used very widely. With the work I'm currently doing on UX Write, I'm adding to the existing OOXML (specifically .docx) support with support for for ODF (.odt) and doing this in a common framework such that the app itself doesn't care which format the file is natively stored in, it will work equally well with both. Additionally, once I have the ODF support in, it will be possible to leverage this support for conversion between the two formats in both directions. I'll be giving a talk on this at ApacheCon EU later this year, and yes this framework will soon be open source - if anyone is interested in collaborating on it, please let me know.

- One of the criticisms raised is that there are several different versions of OOXML, not all of which are entirely compatible. However this is also true of ODF (or at least of MS's implementation in Office 2007 and 2010; I'm not sure where the fault lies). One of the big questions I've been asking myself in the work I'm currently on ODF is whether I should have my implementation it save ODF 1.1 by default, or version 1.2 by default. If I choose the former, it will work with Office 2007 and onwards. The latter, only Office 2013 (I think). For someone such as myself writing a new implementation of the (prat of) ODF spec, and desiring compatibility with Office 2007 and 2010, which is the best choice?

- I consider the use of proprietary fonts to be a separate issue from the standard itself. The specification is silent on the matter, so this is really a criticism of MS Office rather than OOXML itself. Nonetheless, it's an important one, and one I believe we should address by promoting the use of open source fonts (e.g. https://www.google.com/fonts) independently and in addition to the use of ODF. Perhaps these could be made available as an easily-distributed separate package, so that those who want to stick with MS Office for whatever reason could be encouraged to install & use them, for improved interoperability with other office suites?

In an organisation where there are some users on MS and others on OO/LO, these fonts could be deployed by the IT department as part of the standard desktop image, and all templates created by the organisations could use these fonts by default, which would lead to wider usage.

- Towards the end of the article, there's a discussion about the lack of support for ODF by some vendors, particularly Google and Apple. The question then is how do we fix that? My view is that there needs to be a migration path - and by that I mean not just a tool to convert documents from OOMXL to ODF, but the ability to go both ways, and work with either format for as long as necessary for the migration to complete. Most (all?) successful transitions I've seen have used a similar approach - Microsoft going from DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic -> OS X, and so forth.

In the case of document formats, for a country whose government currently uses MS Office and OOXML that wants to make the switch to ODF and OpenOffice/LibreOffice/other tools, it's not going to be an overnight change. It could very well take several years, and during that period everyone in the organisation will need to have the capability to work with both formats. New or modified documents would in general be saved in ODF, but older documents as well as documents that need to be exchanged with people running MS Office 2007 or 2010 (which I think don't support ODF 1.2) would need to be in OOXML, until such time as everyone has upgraded to a fully-conformant version of MS Office, or switched to OpenOffice et al.

--
Dr. Peter M. Kelly
Founder, UX Productivity
peter@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)