You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/04/15 22:14:00 UTC

DO NOT REPLY [Bug 44827] New: corrupt xls file after modifying one column

https://issues.apache.org/bugzilla/show_bug.cgi?id=44827

           Summary: corrupt xls file after modifying one column
           Product: POI
           Version: 3.0
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: john.rodriguez@gmail.com


I am using HSSF to read 14000+ line xls file and save it into a copy, with one
modification: stripping RTF control chars from one column.  When I tried to
open this output xls, I get "Excel found unreadable content in [filename]."  If
I allow Excel (v 2003 SP3) to repair the file, it changes my date and currency
columns to general cell format.

Code:

Iterator rit = sheet.rowIterator();
rit.next();
for(int i = 1; rit.hasNext(); i++) {
  row = (HSSFRow)rit.next();
  HSSFCell commentsCell = row.getCell(commentsIdx);
  if(commentsCell != null) {
    rtfComments = commentsCell.getRichStringCellValue().getString();

    RTFEditorKit kit = new RTFEditorKit();
    Document doc = kit.createDefaultDocument();
    kit.read(new StringReader(rtfComments), doc, 0);
    txtComments = doc.getText(0, doc.getLength());

    commentsCell.setCellValue(new HSSFRichTextString(txtComments));
  }
}

// write converted workbook to file
FileOutputStream fileOut = new FileOutputStream(outfile);
wb.write(fileOut);
fileOut.close();


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 44827] corrupt xls file after modifying one column

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=44827


David Fisher <df...@jmlafferty.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




--- Comment #4 from David Fisher <df...@jmlafferty.com>  2008-12-29 16:51:24 PST ---
Hard to know how important this is until the OP replies to Josh.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 44827] corrupt xls file after modifying one column

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=44827

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #5 from Dominik Stadler <do...@gmx.at> ---
no response in a long time => resolving for now, please reopen with more
information ifnthis is still an issue

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 44827] corrupt xls file after modifying one column

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=44827





--- Comment #3 from Josh Micich <jo...@gildedtree.com>  2008-04-18 10:39:06 PST ---
I took a look at the example.  Same behaviour as you pointed out.  When excel
saves the original file it goes from 24M down to 12M.  POI can manipulate the
second file OK.

So there is still the problem with the original file.  POI is taking a file
that Excel can read, and transforming it into one that excel can't read.  In
the example code, if you stop iteration at row 611 everything is OK. Row 612 is
the first row where the cell F text does not begin with "{\rtf". The code you
provided probably has a small bug because it transforms this non-rtf string
into empty string.

So, an easy fix to your specific problem might be to handle that situation
properly.

I re-wrote the test code to just replace one cell (F3) text with empty string. 
The output file has the same problem.  This is the simplest way I have found to
reproduce the bug.  If I replace the cell text with "a", everything works fine.

I'm not sure what's so special about empty string.  I took a look at the
various files (24M before + after, 12M before + after) using BiffView, and
could not see anything weird that POI is doing.  My suspicion is that there is
something wrong with the rest of the file (that POI doesn't touch) that only
causes Excel to fail when the SST record gets an extra entry of empty string. 
Perhaps it has something to do with (not) purging unused values from the shared
string table.


For the record, which utility are you using to create this original Excel file?

Please post back if you still need resolution to this bug.  If you are OK with
not replacing the cell text with empty string, this bug may be of much lower
priority.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 44827] corrupt xls file after modifying one column

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=44827


John Rodriguez <jo...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #2 from John Rodriguez <jo...@gmail.com>  2008-04-16 11:44:56 PST ---
I debugged the problem a little further.  When I run the program and break the
loop after a given number of iterations, I noticed that the output file was
valid for all rows until the row in which txtComments is "", i.e.,  

commentsCell.setCellValue(new HSSFRichTextString(""));

I tried to trim the file to a few thousand rows where this case occurs, but I
couldn't duplicate the problem.  Perhaps the large file size contributes to the
problem.

Actually, I tried another solution.  I saved the original file under a
different name in Excel, same content, and the file size decreased
significantly.  I think the export tool we use bloats the Excel file and HSSF
has trouble resaving it?

The original excel file is here:
http://www.columbia.edu/~jr534/dbo_contractproducts.original.zip


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 44827] corrupt xls file after modifying one column

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=44827


Josh Micich <jo...@gildedtree.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




--- Comment #1 from Josh Micich <jo...@gildedtree.com>  2008-04-15 14:57:11 PST ---
You'll probably need to upload an example XLS file to help diagnose this bug. 
Please cut down the XLS file as much as possible, while still reproducing the
error.

BTW does the code you have below still cause a corrupt file if you exit the
loop after modifying only one cell?


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org