You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by "Dennis E. Hamilton" <de...@acm.org> on 2011/06/22 04:20:13 UTC

Consequences of Working in Office Documents Here

BACK STORY

On a different list, not just here on ooo-dev, there has been some surprise to see us putting binaries (ODF documents) into some SVN locations used by the PPMC. 

My impression is that the experienced hands here in ASF are expecting to see DIFFs in commit messages on SVN, but binaries don't get DIFFed since it is usually unintelligible and almost always uninteresting.  For some, it is new news that ODF packages are not XML files.

Someone suggested that one could unpack the Zip of these documents and then do diffs of the respective XML parts and that could serve as a DIFF on what the changes are.  They also noticed they'd never seen that done.

THE INSIGHT

On seeing that suggestion (clearly the kinds of things developers think of, it being what we do), it struck me that we have a geeks are from Mars, users are from Venus situation here.

I think the clash of expectations has to do with the differences in tools that are applicable at the level we work at, and how we see what it is we are at work on.

We need to understand that we really have different experience sets, and they all are important in the context of the OpenOffice.org project.

A GEEKY LOOK

Here is a geeky explanation of why it does no good to figure out a better way to show DIFFs of the XML inside an ODF package if you want to know what an author contributor/committer changed.  (You might want that as a forensics tool, but not for knowing what someone changed in the course of their work on a document.)

My (updated) explanation:

The problem is that diff-ing the XML is not what's wanted.  That's like decompiling two programs and posting a diff of the assembly language.  (There are also binary blobs -- I said blogs by mistake in another post -- in the Zipped ODF package.)

The level of abstraction that one cares about for accounting for changes in a document in one of these formats is at the presentation or print-preview level.  There are document compare utilities that provide such functions.  It's like the comparison you get between two wiki pages.  It isn't shown as a comparison of the WikiText, but of the resulting presentation anywhere I've looked.  (I know that on Apache we have a production process where we use SVN as a publishing location and see diffs of Markdown a kind of plaintext markup.  I know that fits beautifully into the source-code revision developer toolcraft model, but you wouldn't want to know about changes in an ODF document that way, BECAUSE IT IS NOT WHAT IS AUTHORED.)

There are also change-tracking (historically called red-lining in my experience) provisions in the ODF Format and the software products handle it to varying degrees of reliability.  This is like showing a kind of merge with the removed text and the inserted text all shown in the document and distinguished by highlighting and strikethroughs of various forms.  A reviewer can agree to accept a change or can reject a change, make more changes, etc.

So there are (at least) two different levels of envisioning, of toolcraft and of work practices among us.  At one level, there is the world of SVN, compiler and build processes, and source code in simply-formatted text.  For ODF (and OOXML and more of these), the XML in the Zip is object code, not the source code.  The source code counterpart is at quite another level.

Worlds are colliding here on Apache OpenOffice.org.  It is going to be very interesting what we learn from each other and how we manage to function in some kind of shared culture within the Apache Way.

Some of us navigate both levels with some fluency.  That is not the case for most of us and, I am learning, not natural for me either: OpenOffice is not my tool of choice apart from using it as an ODF forensic tool, and my development toolcraft is not SVN, LAMP, etc.

It is very important to grasp this, because if we don't recognize it, the authors of documentation and people working at the user-issues level are going to be left with no way to fit in and not much that feels like it is appropriate for their specialized activities.

 - Dennis



Re: Consequences of Working in Office Documents Here

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Jun 22, 2011 at 06:21, Frank Peters <fp...@googlemail.com> wrote:
>...
>>>> don't know where the answer is, other than "learn text". I hope that
>>>> with examining what our true outputs are, we can focus on those, and
>>>> find a path that works for the community.
>>>
>>> What does that mean? What is a "true output" and how do "we"
>>> differ from "the community"?
>>
>> Euh... just what I said: what are we producing? I tried to list some
>
> Well I guess I had difficulties understanding why you would talk
> about "true" output (vs what? perceived outputs?)

Yes, vs "perceived". I'm primarily a developer, and look at things
through a certain lens. There are a lot of things that OOo is expected
to deliver. What are they, and how do they fit into the typical
developer workflow that we understand here at the ASF? What outputs
are needed, yet don't fit the normal process? And when we really step
back and look at what we're trying to deliver... what is that? Are we
focusing on the right things?

Cheers,
-g

Re: Consequences of Working in Office Documents Here

Posted by Frank Peters <fp...@googlemail.com>.
>>> don't know where the answer is, other than "learn text". I hope that
>>> with examining what our true outputs are, we can focus on those, and
>>> find a path that works for the community.
>>
>>
>> What does that mean? What is a "true output" and how do "we"
>> differ from "the community"?
>
> Euh... just what I said: what are we producing? I tried to list some

Well I guess I had difficulties understanding why you would talk
about "true" output (vs what? perceived outputs?)

> concrete outputs. And "we" certainly means community. Not sure why you would
> think I define "we" any other way??

That was probably another misunderstanding from your wording.
My bad.

Thanks for the clarification
Frank

Re: Consequences of Working in Office Documents Here

Posted by Greg Stein <gs...@gmail.com>.
On Jun 22, 2011 5:28 AM, "Frank Peters" <fp...@googlemail.com> wrote:
>
>
>> don't know where the answer is, other than "learn text". I hope that
>> with examining what our true outputs are, we can focus on those, and
>> find a path that works for the community.
>
>
> What does that mean? What is a "true output" and how do "we"
> differ from "the community"?

Euh... just what I said: what are we producing? I tried to list some
concrete outputs. And "we" certainly means community. Not sure why you would
think I define "we" any other way??

Cheers,
-g

Re: Consequences of Working in Office Documents Here

Posted by Frank Peters <fp...@googlemail.com>.
> don't know where the answer is, other than "learn text". I hope that
> with examining what our true outputs are, we can focus on those, and
> find a path that works for the community.

What does that mean? What is a "true output" and how do "we"
differ from "the community"?

Frank

Re: Consequences of Working in Office Documents Here

Posted by Jean Hollis Weber <je...@gmail.com>.
On Wed, 2011-06-22 at 04:34 -0400, Greg Stein wrote:
> Well... what are different ways for people to contribute, other than
> code? Let me throw out some:
> 
> * work with users (forums, email, etc)
> * write documentation
> * issue tracker triage and management
> * outreach: marketing, meetups, etc
> 
> I'm sure that some aspects are missing, but most of that work is
> interaction. Where documents need to be produced, they are most likely
> to be wiki pages or HTML, for easiest and broadest consumption.
> 
> This is just a stab at the issue. I recognize the underlying point:
> how to bring people who are comfortable working with binary-based
> documents (hey! OO.o!!) into an ecosystem that has text in its DNA? I
> don't know where the answer is, other than "learn text". I hope that
> with examining what our true outputs are, we can focus on those, and
> find a path that works for the community.

At OOo, the source documents for the *user guides* (in English) are
produced and maintained in ODT, with both ODT and PDF versions provided
to users through the wiki. In the past, the user guides were also
provided in wiki format (generated from the ODT source), but that hasn't
been updated since OOo3.2. Some other user docs (faqs, some howtos, etc)
are in wiki format only. And three books (Developers Guide, Programmers
Guide, Admin Guide) have their source on the wiki, with ODT and PDF
versions generated occasionally from the source.

--Jean


Re: Consequences of Working in Office Documents Here

Posted by Greg Stein <gs...@gmail.com>.
Well... what are different ways for people to contribute, other than
code? Let me throw out some:

* work with users (forums, email, etc)
* write documentation
* issue tracker triage and management
* outreach: marketing, meetups, etc

I'm sure that some aspects are missing, but most of that work is
interaction. Where documents need to be produced, they are most likely
to be wiki pages or HTML, for easiest and broadest consumption.

This is just a stab at the issue. I recognize the underlying point:
how to bring people who are comfortable working with binary-based
documents (hey! OO.o!!) into an ecosystem that has text in its DNA? I
don't know where the answer is, other than "learn text". I hope that
with examining what our true outputs are, we can focus on those, and
find a path that works for the community.

Cheers,
-g


On Wed, Jun 22, 2011 at 02:06, Dennis E. Hamilton
<de...@acm.org> wrote:
> It's true that we could bring things around and shoe-horn them into the SVN DIFF model, like using a CSV, or not minding that HTML diffs aren't that illuminating but we get them for what they are worth, etc.
>
> Of course, using a CSV loses a lot of information and anything that was done in the design of the spreadsheet to facilitate its use, in my chosen example.
>
> Now, in this case, the spreadsheet was from a committer. And other committers knew how to retrieve it, update it, and resubmit to SVN with an informative enough commit message.  This is not a complex case, it was just illustrative of the different level.
>
> The question I did not answer, because I do not know the answer:  What is a straightforward way for someone who was not raised as a Martian to contribute without being compelled to commit unnatural (for a Venusian) acts.  What is a way to contribute that does not require an unnatural change in already-successful ways of working?  And what is the cutover where the contribution is substantial enough that an iCLA is required anyhow?
>
> It seems to me there is an impedance mismatch for non-developer contributions of content that becomes part of an Apache deliverable.  I don't question policies that are involved.  I am wondering about the logistics and the friction of shoe-horning contributors into a practice that is designed around submission of patches and requires arcane Martian technology.
>
> Perhaps this is too hypothetical.
>
> I would like to hear from non-developer members of ooo-Dev who want to contribute, and what the nature of the envisioned contribution is.  Maybe some concrete use cases can clear this up for all of us.
>
>  - Dennis
>
> -----Original Message-----
> From: Dave Fisher [mailto:dave2wave@comcast.net]
> Sent: Tuesday, June 21, 2011 22:30
> To: ooo-dev@incubator.apache.org
> Cc: Dennis E. Hamilton
> Subject: Re: Consequences of Working in Office Documents Here
>
> On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote:
>
>> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
>>> BACK STORY
>>>
>>> On a different list, not just here on ooo-dev, there has been some
>>> surprise to see us putting binaries (ODF documents) into some SVN
>>> locations used by the PPMC.
>>>
>>> My impression is that the experienced hands here in ASF are expecting
>>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
>>> since it is usually unintelligible and almost always uninteresting.
>>> For some, it is new news that ODF packages are not XML files.
>>>
>>> Someone suggested that one could unpack the Zip of these documents and
>>> then do diffs of the respective XML parts and that could serve as
>>> a DIFF on what the changes are.  They also noticed they'd never seen
>>> that done.
>>>
>>> THE INSIGHT
>>>
>>> On seeing that suggestion (clearly the kinds of things developers
>>> think of, it being what we do), it struck me that we have a geeks are
>>> from Mars, users are from Venus situation here.
>>>
>>> I think the clash of expectations has to do with the differences in
>>> tools that are applicable at the level we work at, and how we see what
>>> it is we are at work on.
>>>
>>> We need to understand that we really have different experience sets,
>>> and they all are important in the context of the OpenOffice.org
>>> project.
>>>
>>> A GEEKY LOOK
>>>
>>> Here is a geeky explanation of why it does no good to figure out
>>> a better way to show DIFFs of the XML inside an ODF package if you
>>> want to know what an author contributor/committer changed.  (You might
>>> want that as a forensics tool, but not for knowing what someone
>>> changed in the course of their work on a document.)
>>>
>>> My (updated) explanation:
>>>
>>
>> Long email.  In the end, the expectation is for commit mails to contain
>> reviewable diffs, I don't think you've addressed how that might be done?
>
> As far as I know binary files are acceptable elsewhere in SVN.
>
>>
>> (as opposed to how it shouldn't be done)
>
> Generally ODF files will be documentation and testcases, and generally consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the MS Office formats in SVN. We haven't seemed to care about that in the Apache POI project, I can't answer for PDFBox.
>
> I unzipped an ODF zip then each part is a huge set of verbose xml on two lines. Header and data. For example, content.xml.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: ....
>
> Diff won't work easily. Maybe SVN needs to provide "zip" storage and then "xml" diff within. Could the Subversion project whip that out now. We'll wait until they do before we proceed. I'm being sarcastic here. But if it available now that would be pretty cool.
>
> The real issue is that a binary document was used to update a table where everyone made changes. Changes that were important to those viewing the commit messages. I know we all love office documents around here, but ...
>
> Maybe we should be exchanging that particular file as a CSV.
>
> (BTW - I notice that Calc's save options don't include XLSX, etc.)
>
> Best Regards,
> Dave
>
>

Re: Consequences of Working in Office Documents Here

Posted by Dick Groskamp <th...@quicknet.nl>.
Op 22-6-2011 8:06, Dennis E. Hamilton schreef:
> Perhaps this is too hypothetical.
>
> I would like to hear from non-developer members of ooo-Dev who want to contribute, and what the nature of the envisioned contribution is.  Maybe some concrete use cases can clear this up for all of us.
>
>   - Dennis
OK, I'll give it a shot. Hoping it might help shining some light on things.
       I'm already a bit lost in this world ( I'm probably on Mars now, 
but want to be on Venus)

I was involved in translating the English on line Help from OOo into Dutch.
We started in about 2002 and made a 100% level of translation throughout 
2011.

Even for OOo 3.4 we had an 100% translation.
Worked through POOTLE-server where the translated files were processed 
by developers into the build.

OOo was build in about 8 lanuagues that were supproted by SUNB/Oracle. 
Dutch was one of them.

Besides that I and others translated the work from OOoAuthors (now 
ODFAuthors) from English to Dutch and
placed it on the wiki.services.openoffice.org

I also translated online Help subjects (for instance the functions in 
Calc) and placed them on the Dutch wiki of OOo.
Summary of Calc-functions into Dutch 
<http://wiki.services.openoffice.org/wiki/NL/Documentation/How_Tos/Calc:_Functies_gesorteerd_per_categorie>

Also other translations made it to the wiki.

We worked in the Documentation branch of the OOo-wiki, where 
Documentation was separated into branches for each language
http://wiki.services.openoffice.org/wiki/Documentation
http://wiki.services.openoffice.org/wiki/NL/Documentation
http://wiki.services.openoffice.org/wiki/NL/Documentation/How_Tos

So theise ( online Help and wiki) will be my main controbution to OOo, 
But right now I am totally lost.
I have no idea how to proceed. I am NOT and will NEVER be a developer 
for code to make OOo run , probably not even in the smallest way.
I do do not have the skills for that.

The only thing that will make it to OOo are the translations from the 
online Help, IF Dutch is build in, as it was before.

-- 
DiGro

Windows 7 and OpenOffice.org 3.3
Scanned with Ziggo uitgebreide Internetbeveiliging (F-Secure)


RE: Consequences of Working in Office Documents Here

Posted by "Dennis E. Hamilton" <de...@acm.org>.
It's true that we could bring things around and shoe-horn them into the SVN DIFF model, like using a CSV, or not minding that HTML diffs aren't that illuminating but we get them for what they are worth, etc.

Of course, using a CSV loses a lot of information and anything that was done in the design of the spreadsheet to facilitate its use, in my chosen example.

Now, in this case, the spreadsheet was from a committer. And other committers knew how to retrieve it, update it, and resubmit to SVN with an informative enough commit message.  This is not a complex case, it was just illustrative of the different level.

The question I did not answer, because I do not know the answer:  What is a straightforward way for someone who was not raised as a Martian to contribute without being compelled to commit unnatural (for a Venusian) acts.  What is a way to contribute that does not require an unnatural change in already-successful ways of working?  And what is the cutover where the contribution is substantial enough that an iCLA is required anyhow?

It seems to me there is an impedance mismatch for non-developer contributions of content that becomes part of an Apache deliverable.  I don't question policies that are involved.  I am wondering about the logistics and the friction of shoe-horning contributors into a practice that is designed around submission of patches and requires arcane Martian technology.

Perhaps this is too hypothetical.

I would like to hear from non-developer members of ooo-Dev who want to contribute, and what the nature of the envisioned contribution is.  Maybe some concrete use cases can clear this up for all of us.

 - Dennis

-----Original Message-----
From: Dave Fisher [mailto:dave2wave@comcast.net] 
Sent: Tuesday, June 21, 2011 22:30
To: ooo-dev@incubator.apache.org
Cc: Dennis E. Hamilton
Subject: Re: Consequences of Working in Office Documents Here

On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote:

> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
>> BACK STORY
>> 
>> On a different list, not just here on ooo-dev, there has been some
>> surprise to see us putting binaries (ODF documents) into some SVN
>> locations used by the PPMC. 
>> 
>> My impression is that the experienced hands here in ASF are expecting
>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
>> since it is usually unintelligible and almost always uninteresting.
>> For some, it is new news that ODF packages are not XML files.
>> 
>> Someone suggested that one could unpack the Zip of these documents and
>> then do diffs of the respective XML parts and that could serve as
>> a DIFF on what the changes are.  They also noticed they'd never seen
>> that done.
>> 
>> THE INSIGHT
>> 
>> On seeing that suggestion (clearly the kinds of things developers
>> think of, it being what we do), it struck me that we have a geeks are
>> from Mars, users are from Venus situation here.
>> 
>> I think the clash of expectations has to do with the differences in
>> tools that are applicable at the level we work at, and how we see what
>> it is we are at work on.
>> 
>> We need to understand that we really have different experience sets,
>> and they all are important in the context of the OpenOffice.org
>> project.
>> 
>> A GEEKY LOOK
>> 
>> Here is a geeky explanation of why it does no good to figure out
>> a better way to show DIFFs of the XML inside an ODF package if you
>> want to know what an author contributor/committer changed.  (You might
>> want that as a forensics tool, but not for knowing what someone
>> changed in the course of their work on a document.)
>> 
>> My (updated) explanation:
>> 
> 
> Long email.  In the end, the expectation is for commit mails to contain
> reviewable diffs, I don't think you've addressed how that might be done?

As far as I know binary files are acceptable elsewhere in SVN.

> 
> (as opposed to how it shouldn't be done)

Generally ODF files will be documentation and testcases, and generally consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the MS Office formats in SVN. We haven't seemed to care about that in the Apache POI project, I can't answer for PDFBox.

I unzipped an ODF zip then each part is a huge set of verbose xml on two lines. Header and data. For example, content.xml.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: ....

Diff won't work easily. Maybe SVN needs to provide "zip" storage and then "xml" diff within. Could the Subversion project whip that out now. We'll wait until they do before we proceed. I'm being sarcastic here. But if it available now that would be pretty cool.

The real issue is that a binary document was used to update a table where everyone made changes. Changes that were important to those viewing the commit messages. I know we all love office documents around here, but ...

Maybe we should be exchanging that particular file as a CSV.

(BTW - I notice that Calc's save options don't include XLSX, etc.)

Best Regards,
Dave


Re: Consequences of Working in Office Documents Here

Posted by Greg Stein <gs...@gmail.com>.
I think that the important part here is that others can review the
work being done. When that work is encapsulated behind binary formats,
then it makes it *very* difficult to perform that review.

Sure, some artifacts in the repository *need* to be binary. Nobody
will dispute that.

But when the primary work of this PMC can be done in a reviewable
format, then it helps all of us to make that happen.

Cheers,
-g

On Wed, Jun 22, 2011 at 01:29, Dave Fisher <da...@comcast.net> wrote:
> On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote:
>
>> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
>>> BACK STORY
>>>
>>> On a different list, not just here on ooo-dev, there has been some
>>> surprise to see us putting binaries (ODF documents) into some SVN
>>> locations used by the PPMC.
>>>
>>> My impression is that the experienced hands here in ASF are expecting
>>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
>>> since it is usually unintelligible and almost always uninteresting.
>>> For some, it is new news that ODF packages are not XML files.
>>>
>>> Someone suggested that one could unpack the Zip of these documents and
>>> then do diffs of the respective XML parts and that could serve as
>>> a DIFF on what the changes are.  They also noticed they'd never seen
>>> that done.
>>>
>>> THE INSIGHT
>>>
>>> On seeing that suggestion (clearly the kinds of things developers
>>> think of, it being what we do), it struck me that we have a geeks are
>>> from Mars, users are from Venus situation here.
>>>
>>> I think the clash of expectations has to do with the differences in
>>> tools that are applicable at the level we work at, and how we see what
>>> it is we are at work on.
>>>
>>> We need to understand that we really have different experience sets,
>>> and they all are important in the context of the OpenOffice.org
>>> project.
>>>
>>> A GEEKY LOOK
>>>
>>> Here is a geeky explanation of why it does no good to figure out
>>> a better way to show DIFFs of the XML inside an ODF package if you
>>> want to know what an author contributor/committer changed.  (You might
>>> want that as a forensics tool, but not for knowing what someone
>>> changed in the course of their work on a document.)
>>>
>>> My (updated) explanation:
>>>
>>
>> Long email.  In the end, the expectation is for commit mails to contain
>> reviewable diffs, I don't think you've addressed how that might be done?
>
> As far as I know binary files are acceptable elsewhere in SVN.
>
>>
>> (as opposed to how it shouldn't be done)
>
> Generally ODF files will be documentation and testcases, and generally consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the MS Office formats in SVN. We haven't seemed to care about that in the Apache POI project, I can't answer for PDFBox.
>
> I unzipped an ODF zip then each part is a huge set of verbose xml on two lines. Header and data. For example, content.xml.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: ....
>
> Diff won't work easily. Maybe SVN needs to provide "zip" storage and then "xml" diff within. Could the Subversion project whip that out now. We'll wait until they do before we proceed. I'm being sarcastic here. But if it available now that would be pretty cool.
>
> The real issue is that a binary document was used to update a table where everyone made changes. Changes that were important to those viewing the commit messages. I know we all love office documents around here, but ...
>
> Maybe we should be exchanging that particular file as a CSV.
>
> (BTW - I notice that Calc's save options don't include XLSX, etc.)
>
> Best Regards,
> Dave
>
>

Re: Consequences of Working in Office Documents Here

Posted by Dave Fisher <da...@comcast.net>.
On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote:

> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
>> BACK STORY
>> 
>> On a different list, not just here on ooo-dev, there has been some
>> surprise to see us putting binaries (ODF documents) into some SVN
>> locations used by the PPMC. 
>> 
>> My impression is that the experienced hands here in ASF are expecting
>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
>> since it is usually unintelligible and almost always uninteresting.
>> For some, it is new news that ODF packages are not XML files.
>> 
>> Someone suggested that one could unpack the Zip of these documents and
>> then do diffs of the respective XML parts and that could serve as
>> a DIFF on what the changes are.  They also noticed they'd never seen
>> that done.
>> 
>> THE INSIGHT
>> 
>> On seeing that suggestion (clearly the kinds of things developers
>> think of, it being what we do), it struck me that we have a geeks are
>> from Mars, users are from Venus situation here.
>> 
>> I think the clash of expectations has to do with the differences in
>> tools that are applicable at the level we work at, and how we see what
>> it is we are at work on.
>> 
>> We need to understand that we really have different experience sets,
>> and they all are important in the context of the OpenOffice.org
>> project.
>> 
>> A GEEKY LOOK
>> 
>> Here is a geeky explanation of why it does no good to figure out
>> a better way to show DIFFs of the XML inside an ODF package if you
>> want to know what an author contributor/committer changed.  (You might
>> want that as a forensics tool, but not for knowing what someone
>> changed in the course of their work on a document.)
>> 
>> My (updated) explanation:
>> 
> 
> Long email.  In the end, the expectation is for commit mails to contain
> reviewable diffs, I don't think you've addressed how that might be done?

As far as I know binary files are acceptable elsewhere in SVN.

> 
> (as opposed to how it shouldn't be done)

Generally ODF files will be documentation and testcases, and generally consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the MS Office formats in SVN. We haven't seemed to care about that in the Apache POI project, I can't answer for PDFBox.

I unzipped an ODF zip then each part is a huge set of verbose xml on two lines. Header and data. For example, content.xml.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: ....

Diff won't work easily. Maybe SVN needs to provide "zip" storage and then "xml" diff within. Could the Subversion project whip that out now. We'll wait until they do before we proceed. I'm being sarcastic here. But if it available now that would be pretty cool.

The real issue is that a binary document was used to update a table where everyone made changes. Changes that were important to those viewing the commit messages. I know we all love office documents around here, but ...

Maybe we should be exchanging that particular file as a CSV.

(BTW - I notice that Calc's save options don't include XLSX, etc.)

Best Regards,
Dave


Re: Consequences of Working in Office Documents Here

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
> BACK STORY
> 
> On a different list, not just here on ooo-dev, there has been some
> surprise to see us putting binaries (ODF documents) into some SVN
> locations used by the PPMC. 
> 
> My impression is that the experienced hands here in ASF are expecting
> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
> since it is usually unintelligible and almost always uninteresting.
> For some, it is new news that ODF packages are not XML files.
> 
> Someone suggested that one could unpack the Zip of these documents and
> then do diffs of the respective XML parts and that could serve as
> a DIFF on what the changes are.  They also noticed they'd never seen
> that done.
> 
> THE INSIGHT
> 
> On seeing that suggestion (clearly the kinds of things developers
> think of, it being what we do), it struck me that we have a geeks are
> from Mars, users are from Venus situation here.
> 
> I think the clash of expectations has to do with the differences in
> tools that are applicable at the level we work at, and how we see what
> it is we are at work on.
> 
> We need to understand that we really have different experience sets,
> and they all are important in the context of the OpenOffice.org
> project.
> 
> A GEEKY LOOK
> 
> Here is a geeky explanation of why it does no good to figure out
> a better way to show DIFFs of the XML inside an ODF package if you
> want to know what an author contributor/committer changed.  (You might
> want that as a forensics tool, but not for knowing what someone
> changed in the course of their work on a document.)
> 
> My (updated) explanation:
> 

Long email.  In the end, the expectation is for commit mails to contain
reviewable diffs, I don't think you've addressed how that might be done?

(as opposed to how it shouldn't be done)