You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Wolf Halton <wo...@gmail.com> on 2012/01/18 14:16:13 UTC

Has docs converter been created?

The department I work for has just standardized on MS Office 2010 and I am
constantly getting broken-xml docx files that I cannot look at in either
OOo or LibreOffice 3.4.  I was reading a thread on the forum from 2007 when
docx first came out and we were having to suggest people use MS Office for
docx.  Are we any closer to being able to read and convert docx?

-- 
This Apt Has Super Cow Powers - http://sourcefreedom.com
Advancing Libraries Together - http://LYRASIS.org

RE: Has docs converter been created?

Posted by "Dennis E. Hamilton" <de...@acm.org>.
But what about it is broken?  I get that some products don't open it or claim there is something defective about it, but what makes it broken as a .docx ?  Are you saying that the document is defective or that it is its Zip (OPC) package that is broken?

Can you make a version that has no confidential/proprietary content so that it is possible to do some forensic work on it to see what disturbs non-Office consumers about it?

(PS: Do you routinely save in compatibility mode or are you doing pure Word 2010 docx when you create it?  Try one of each to see if that narrows the situation.  Also, does it open in WordPad on Windows 7?  If you save that as a docx, can non-Office applications open it then?)

It takes a variety of back-and-forth tests and some inspection of the package itself to determine where the breakage is isolated.  It can be in different ways and places on different interchange paths, too.

 - Dennis

PS: I still don't understand "broken-xml" (not my term), as if there is an incorrect use of XML in the docx.  Who says?

-----Original Message-----
From: Wolf Halton [mailto:wolf.halton@gmail.com] 
Sent: Wednesday, January 18, 2012 14:27
To: ooo-dev@incubator.apache.org; dennis.hamilton@acm.org
Subject: Re: Has docs converter been created?

On Wed, Jan 18, 2012 at 3:28 PM, Dennis E. Hamilton <dennis.hamilton@acm.org
> wrote:

> It's not clear to me that the DOCX files are broken.  It is that import
> into OOo and LO is unsuccessful.  (People will say that is because DOCX is
> not OOXML but I've never seen actual evidence for that, simply the claim as
> a weak justification for why DOC should be used instead.  There's no
> question that DOC import/export tends to succeed more often.)
>
> Perhaps Wolf can clarify the complete use case and what he means by
> "broken-xml docx."
>
> -----Original Message-----
> From: Dave Fisher [mailto:dave2wave@comcast.net]
> Sent: Wednesday, January 18, 2012 11:49
> To: ooo-dev@incubator.apache.org
> Subject: Re: Has docs converter been created?
>
> Hi Wolf,
>
> On Jan 18, 2012, at 5:16 AM, Wolf Halton wrote:
>
> > The department I work for has just standardized on MS Office 2010 and I
> am
> > constantly getting broken-xml docx files that I cannot look at in either
> > OOo or LibreOffice 3.4.  I was reading a thread on the forum from 2007
> when
> > docx first came out and we were having to suggest people use MS Office
> for
> > docx.  Are we any closer to being able to read and convert docx?
>
> I don't know about here at AOO, but if you can use Java then you will find
> support for docx, xlsx, and pptx in Apache POI.
>
> Nick Burch and Yegor Kozlov are mentors for Apache ODF Toolkit
> (incubating) and on the POI PMC. I am also on the POI PMC.
>
> You might want to take your questions about broken docx's to the poi
> developers list.
>
> http://poi.apache.org/mailinglists.html
>
> Regards,
> Dave
>
>
> >
> > --
> > This Apt Has Super Cow Powers - http://sourcefreedom.com
> > Advancing Libraries Together - http://LYRASIS.org
>
>
Perhaps I can bring a more complete use case.  I have a document written
with MS Office 2010 that I started at work.  It is an internal document so
I cannot attach it. When I get home, I try to work on the document and when
I doubleclick it, unity archive manager is called to open it but says the
document is corrupt.  Attempting to open it through LibreOffice tells me it
is corrupt and cannot be opened.  Start a Windows 7 VM and open the
documentm and it opens perfectly.  Save it as a rich text document in
wordpad.  Open the rich text document in LibreOffice and it displays
perfectly. This is what I mean when U say broken xml.  Works on the
microsoft stack even without Office on the second microsoft machine.

-- 
This Apt Has Super Cow Powers - http://sourcefreedom.com
Advancing Libraries Together - http://LYRASIS.org


Re: Has docs converter been created?

Posted by Wolf Halton <wo...@gmail.com>.
On Wed, Jan 18, 2012 at 3:28 PM, Dennis E. Hamilton <dennis.hamilton@acm.org
> wrote:

> It's not clear to me that the DOCX files are broken.  It is that import
> into OOo and LO is unsuccessful.  (People will say that is because DOCX is
> not OOXML but I've never seen actual evidence for that, simply the claim as
> a weak justification for why DOC should be used instead.  There's no
> question that DOC import/export tends to succeed more often.)
>
> Perhaps Wolf can clarify the complete use case and what he means by
> "broken-xml docx."
>
> -----Original Message-----
> From: Dave Fisher [mailto:dave2wave@comcast.net]
> Sent: Wednesday, January 18, 2012 11:49
> To: ooo-dev@incubator.apache.org
> Subject: Re: Has docs converter been created?
>
> Hi Wolf,
>
> On Jan 18, 2012, at 5:16 AM, Wolf Halton wrote:
>
> > The department I work for has just standardized on MS Office 2010 and I
> am
> > constantly getting broken-xml docx files that I cannot look at in either
> > OOo or LibreOffice 3.4.  I was reading a thread on the forum from 2007
> when
> > docx first came out and we were having to suggest people use MS Office
> for
> > docx.  Are we any closer to being able to read and convert docx?
>
> I don't know about here at AOO, but if you can use Java then you will find
> support for docx, xlsx, and pptx in Apache POI.
>
> Nick Burch and Yegor Kozlov are mentors for Apache ODF Toolkit
> (incubating) and on the POI PMC. I am also on the POI PMC.
>
> You might want to take your questions about broken docx's to the poi
> developers list.
>
> http://poi.apache.org/mailinglists.html
>
> Regards,
> Dave
>
>
> >
> > --
> > This Apt Has Super Cow Powers - http://sourcefreedom.com
> > Advancing Libraries Together - http://LYRASIS.org
>
>
Perhaps I can bring a more complete use case.  I have a document written
with MS Office 2010 that I started at work.  It is an internal document so
I cannot attach it. When I get home, I try to work on the document and when
I doubleclick it, unity archive manager is called to open it but says the
document is corrupt.  Attempting to open it through LibreOffice tells me it
is corrupt and cannot be opened.  Start a Windows 7 VM and open the
documentm and it opens perfectly.  Save it as a rich text document in
wordpad.  Open the rich text document in LibreOffice and it displays
perfectly. This is what I mean when U say broken xml.  Works on the
microsoft stack even without Office on the second microsoft machine.

-- 
This Apt Has Super Cow Powers - http://sourcefreedom.com
Advancing Libraries Together - http://LYRASIS.org

RE: Has docs converter been created?

Posted by "Dennis E. Hamilton" <de...@acm.org>.
It's not clear to me that the DOCX files are broken.  It is that import into OOo and LO is unsuccessful.  (People will say that is because DOCX is not OOXML but I've never seen actual evidence for that, simply the claim as a weak justification for why DOC should be used instead.  There's no question that DOC import/export tends to succeed more often.)

Perhaps Wolf can clarify the complete use case and what he means by "broken-xml docx."

-----Original Message-----
From: Dave Fisher [mailto:dave2wave@comcast.net] 
Sent: Wednesday, January 18, 2012 11:49
To: ooo-dev@incubator.apache.org
Subject: Re: Has docs converter been created?

Hi Wolf,

On Jan 18, 2012, at 5:16 AM, Wolf Halton wrote:

> The department I work for has just standardized on MS Office 2010 and I am
> constantly getting broken-xml docx files that I cannot look at in either
> OOo or LibreOffice 3.4.  I was reading a thread on the forum from 2007 when
> docx first came out and we were having to suggest people use MS Office for
> docx.  Are we any closer to being able to read and convert docx?

I don't know about here at AOO, but if you can use Java then you will find support for docx, xlsx, and pptx in Apache POI.

Nick Burch and Yegor Kozlov are mentors for Apache ODF Toolkit (incubating) and on the POI PMC. I am also on the POI PMC.

You might want to take your questions about broken docx's to the poi developers list.

http://poi.apache.org/mailinglists.html

Regards,
Dave


> 
> -- 
> This Apt Has Super Cow Powers - http://sourcefreedom.com
> Advancing Libraries Together - http://LYRASIS.org


Re: Has docs converter been created?

Posted by Dave Fisher <da...@comcast.net>.
Hi Wolf,

On Jan 18, 2012, at 5:16 AM, Wolf Halton wrote:

> The department I work for has just standardized on MS Office 2010 and I am
> constantly getting broken-xml docx files that I cannot look at in either
> OOo or LibreOffice 3.4.  I was reading a thread on the forum from 2007 when
> docx first came out and we were having to suggest people use MS Office for
> docx.  Are we any closer to being able to read and convert docx?

I don't know about here at AOO, but if you can use Java then you will find support for docx, xlsx, and pptx in Apache POI.

Nick Burch and Yegor Kozlov are mentors for Apache ODF Toolkit (incubating) and on the POI PMC. I am also on the POI PMC.

You might want to take your questions about broken docx's to the poi developers list.

http://poi.apache.org/mailinglists.html

Regards,
Dave


> 
> -- 
> This Apt Has Super Cow Powers - http://sourcefreedom.com
> Advancing Libraries Together - http://LYRASIS.org