You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2018/04/01 18:02:38 UTC

[Issue 127745] New: Read Error: Format error discovered ... at n,nnnn (row,col)

https://bz.apache.org/ooo/show_bug.cgi?id=127745

          Issue ID: 127745
        Issue Type: DEFECT
           Summary: Read Error:  Format error discovered ... at n,nnnn
                    (row,col)
           Product: Writer
           Version: 4.1.5
          Hardware: PC
                OS: Windows 7
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: ui
          Assignee: issues@openoffice.apache.org
          Reporter: john.ha24@yahoo.co.uk
  Target Milestone: ---

Created attachment 86380
  --> https://bz.apache.org/ooo/attachment.cgi?id=86380&action=edit
File giving Read Error

See uploaded files Sammy Russel 1draft.odt and ARTTRANNIE WITH NOTES Ruth 29
03.odt.

Both files fail to open giving "Read Error:  Format error discovered ... at
n,nnnn (row,col)"

Analysis of content.xml shows that the first style definition in each file has
been corrupted with multiple redundant office:name definitions.  In one file
the first style definition was for P1; in the other the first was for Table1.

The fix to repair the files is to delete this redundant data.

Notes:

1  We often get posts of this problem in the forum
2  They always seem to be files where comments have been added to a range of
characters.
3  We suspect but have not confirmed that the problem is caused by MS Word
being used to edit the file.  Record Changes may be switched on, and a comment
is attached to a range of characters.
4  See [Solved] Read-Error at
https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93024#p442216 for
first file.
5  See Format error discovered at
https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93036 for second
file.

Question:  Is it possible that AOO does the corruption?

Why is this important?  

Students exchange .odt files written with AOO with supervisors who use MS Word,
where the supervisor adds comments and records changes.  If MS Word is
corrupting the file we need to get Microsoft to fix it.

AOO 4.1.5 Windows 7

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #7 from John <jo...@yahoo.co.uk> ---
See also Issue 128356 - Track Changes and Annotations on text range can cause
corruption. Applies to 4.x (all versions?) which appears to be very similar.

https://bz.apache.org/ooo/show_bug.cgi?id=128356

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #3 from John <jo...@yahoo.co.uk> ---
Unfortunately we get sent the files to repair it and it is very difficult to
get a full history.  I am attempting to get access to MS Word so I can do some
tests.

Note that the "annotation error" number seems to have multiple " 1 " digits
added to it - the values in the sammy russell file are

Annotation__414_24419901911
Annotation__401_244199019111
Annotation__158_2441990191111
Annotation__248_244199019111111
Annotation__153_24419901911111111

We have noticed this problem of "added 1" in other files which are edited by MS
Word so we are wondering:  Is the problem caused by MS Word or is it caused by
AOO?

Is it possible for you to answer that question by saying "It is not possible
for AOO to add the " 1 " digits as shown above".

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

Arrigo Marchiori <ar...@yahoo.it> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |CONFIRMED
     Ever confirmed|0                           |1
                 CC|                            |ardovm@yahoo.it

--- Comment #8 from Arrigo Marchiori <ar...@yahoo.it> ---
Following up from bug #128356.

On bug #128356 we seem to have fixed the corruption of ODT documents containing
a certain type of comments or annotations.

Apparently, this also fixes the data _corruption_ you report here: the repeated
XML attribute office:name is not added any more, and so it is never repeated.
Editing "Sammy Russel 1draft - CORRECTED.odt" and saving it gives a properly
constructed ODT document that can be reopened with no problems.

You are also reporting something else here: the "office:name" of the annotation
entries are being _changed_ every time the document is re-saved.

This could be considered a bug... or not, depending on its effect.
In any case, I confirm it as I am also seeing it in test cases for bug #128356.

As a '1' seems to be always appended, we could argue that after many times the
document is edited and saved, the name will eventually become "too long". How
long is "too long" is hard to tell.

I am not considering this issue urgent -- but this is only my humble opinion. I
am open to discussion on this topic.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

oooforum (fr) <oo...@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |oooforum@free.fr

--- Comment #2 from oooforum (fr) <oo...@free.fr> ---
(In reply to John from comment #0)
> Question:  Is it possible that AOO does the corruption?
It would be best to prevent this error from happening.

So, to investigate in this direction, we need a step-by-step to reproduce.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

Arrigo Marchiori <ar...@yahoo.it> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|CONFIRMED                   |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #9 from Arrigo Marchiori <ar...@yahoo.it> ---
I am flagging this bug as a duplicate of bug #128356 because the proposed
solutionto the data corruption problem is there.

John, if you believe that the '1' being added to the office:name attribute is a
bug, please open a new report.

*** This issue has been marked as a duplicate of issue 128356 ***

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #6 from John <jo...@yahoo.co.uk> ---
Created attachment 86389
  --> https://bz.apache.org/ooo/attachment.cgi?id=86389&action=edit
Location of problem.GIF - image of content.xml

If Recorded change 72 is accepted the problem disappears and Writer does not
corrupt content.XML. 

See Location of problem.GIF - Recorded change 72 is the deletion of the
commented text "(report 2)".

Steps to reproduce removal of problem:

1  Open Sammy Russel 1draft.odt.
2  Un-tick Edit > Changes.
3  Edit > Accept/Reject changes.  Scroll to change 72 and accept it.  See image
- change 72 is  the second 15:12 change, and is the deletion of the Commented
text "(report 2)"
4  Add a space before Case summary at top
5  File > Save.

Expected result:  See first post where making a trivial change by adding a
space caused the Read Error problem when the file was then opened.

Actual result:  The file now opens successfully without a Read Error message.

Conclusion.  Accepting Recorded change 72 has "removed" the problem causing the
Read Error.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

John <jo...@yahoo.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |john.ha24@yahoo.co.uk

--- Comment #1 from John <jo...@yahoo.co.uk> ---
Created attachment 86381
  --> https://bz.apache.org/ooo/attachment.cgi?id=86381&action=edit
File with Read Error.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #4 from oooforum (fr) <oo...@free.fr> ---
You talk about MS-Word to read ODT.
But remember that Microsoft use ODF in 1.1 and OpenOffice in 1.2 format.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #5 from John <jo...@yahoo.co.uk> ---
Created attachment 86388
  --> https://bz.apache.org/ooo/attachment.cgi?id=86388&action=edit
Sammy Russel 1draft - CORRECTED.odt file

I can confirm that Writer is adding this corruption to the .odt file.  It is
repeatable - see the attached Sammy Russel 1draft - CORRECTED.odt file.

Steps to cause Writer to corrupt the .odt file.

1  Download Sammy Russel 1draft.odt from [Solved] Read-Error at
https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93024#p442216 or it
is the first file attached to this report (36kB).

2  Extract content.XML.  Note that the P1 Style definition has been corrupted
and redundant and incorrect office:name="__Annotation__153_24419901911111111"
office:name="__Annotation__158_2441990191111"
office:name="__Annotation__248_244199019111111"
office:name="__Annotation__401_244199019111"
office:name="__Annotation__414_24419901911" has been inserted into the P1 Style
definition.  

3  Delete these redundant items and re-insert content.XML to get the attached
file Sammy Russel 1draft - CORRECTED.odt.  At this stage it is thought that the
.odt file is OK.

4 Open Sammy Russel 1draft - CORRECTED.odt.  The file opens without problem.

5  Make a trivial edit (add a space in front of Case Summary) and save the
file.

Expected result:  File should not be corrupted when saved.

Actual result:  Writer corrupts the P1 Style definition by inserting one or
more office:name definitions into the P1 Style definition.

Notes:

1.  It appears that the file was created by author SN using AOO Writer.  The
file was sent to reviewer SD who used MS Word and recorded changes on 20 Mar
2018.  Some changes were "Comments attached to a range of characters" and it is
these Comments which use the office:name definitions.

2.  Author SN then recorded more changes to the file using AOO on 22 Mar. 
Record Changes is still ON. 

3.  At some stage, the file became corrupted.  This probably happened when
author SN edited and saved the file after it had been edited with MS Word (and
as described in Step 5 above).

4.  Analysis of the time stamps of the edits shows that each change is timed at
nn:nn:00.0n seconds.  It seems strange to me that the time is always set to
00.0n seconds.  The times are shown below where 20 = date 20th. 

The first five office:name ... appear in the file, and also corrupt the P1
Style definition.  The sixth, seventh and eight appear in the file but do NOT
corrupt the P1 Style definition.  The sixth was the first, recorded at
09:51:00.02.  

The other twenty times are recorded changes which were not Comments added to a
range of characters.  Note that the same 12:18:00.06 time is recorded for two
different changes.

Note the multiple adding of digits "111...".  

Note how the decimal component of the seconds increments throughout - I would
expect it to be more random.

The times below are in the order in which they appear, from start to end, in
content.xml.

office:name="__Annotation__153_24419901911111111" line  200  20  9:56:00.04  SD
office:name="__Annotation__158_2441990191111"     line  220  20  9:57:00.04  SD
office:name="__Annotation__248_244199019111111"   line  351  20 10:39:00.04  SD 
office:name="__Annotation__401_244199019111"      line  859  20 12:18:00.06  SD 
office:name="__Annotation__414_24419901911"       line  958  20 12:20:00.06  SD

office:name="__Annotation__3_244199019"           line 1260  20  9:51:00.02  SD
office:name="__Annotation__396_244199019"         line 1522  20 12:18:00.06  SD 
office:name="__Annotation__551_244199019"         line 1636  20 12:50:00.08  SD 

09:54:00:04
11:50:00.04 
10:43:00.05 
10:41:00.05
12:21:00.05 
11:40:00.05 
11:52:00.05
11:56:00.06
12:43:00.06 
12:18:00.06  line 816 
12:27:00.06 
12:29:00.07 
12:28:00.07  
12:39:00.07 
12:40:00.07 
12:42:00.08
12:42:00.08 
12:44:00.08 
12:46:00.08
12:50:00.08

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

Keith N. McKenna <kn...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bz.apache.org/ooo/s
                   |                            |how_bug.cgi?id=128356

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

Marcus <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 127745] Read Error: Format error discovered ... at n,nnnn (row,col)

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=127745

--- Comment #10 from John <jo...@yahoo.co.uk> ---
(In reply to Arrigo Marchiori from comment #9)

> John, if you believe that the '1' being added to the office:name attribute
> is a bug, please open a new report.
> 
Arrigo

Thanks.  

I have no evidence that AOO adds the multiple "1111" sequences to the
office:name attribute as in "<office:annotation
office:name="__Annotation__401_2441990191111">".

In fact, I have an unproven hunch that MS Word adds these "1111" when MS Word
edits a .odt file.

My evidence was solely that AOO corrupted the file when a trivial edit was made
by bringing existing office:name attribute definitions  into the P1 Style
definition and corrupting the file, something now fixed in bug #128356.

I concur that this bug report should be closed.

-- 
You are receiving this mail because:
You are the assignee for the issue.