You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2014/05/26 09:54:00 UTC
[Bug 56563] New: Multithreading bug when reading 2 similar files
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
Bug ID: 56563
Summary: Multithreading bug when reading 2 similar files
Product: POI
Version: 3.10
Hardware: All
OS: All
Status: NEW
Severity: major
Priority: P2
Component: HSSF
Assignee: dev@poi.apache.org
Reporter: urishe@gmail.com
Created attachment 31660
--> https://issues.apache.org/bugzilla/attachment.cgi?id=31660&action=edit
A file for which the problem reproduces
When reading two copies of the same file (doesn't necessarily need to be the
exact same file, but do need to contain the same styles or something along
those lines),
from two threads simultaneously (each thread processes its own file), the
format string of cell styles get mixed up.
The following code demonstrates this, printing to System.out every time a non
date cell is mistakenly recognized as date. Note that starting only one of the
threads yields no System.out messages.
public class CellFormatBugExample implements Runnable {
public static void main(String[] args) {
new Thread(new CellFormatBugExample("C:/temp/file_1.xls")).start();
new Thread(new CellFormatBugExample("C:/temp/file_2.xls")).start();
}
String filePath;
public CellFormatBugExample(String filePath) {
this.filePath = filePath;
}
@Override
public void run() {
File inputFile = new File(filePath);
try (FileInputStream stream = new FileInputStream(inputFile)) {
Workbook wb = WorkbookFactory.create(stream);
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Integer idxCell = 0; idxCell < row.getLastCellNum();
idxCell++) {
Cell cell = row.getCell(idxCell);
cell.getCellStyle().getDataFormatString();
if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
boolean isDate =
HSSFDateUtil.isCellDateFormatted(cell);
if (idxCell > 0 && isDate) {
System.out.println("cell " + idxCell + " is not a
date!");
}
}
}
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Make another copy of the attached file - "file_2.xls" and run the code to
reproduce.
Digging around a bit, seems the cause for this is a caching bug in the
HSSFCellStyle.getDataFormatString() method.
As far as I could understand a simple resolution would be to synchronize access
to that method.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 56563] Multithreading bug when reading 2 similar files
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
--- Comment #5 from Dominik Stadler <do...@gmx.at> ---
Thanks for the note, I have moved this into a separate Bug 56595 as this one is
resolved and the note concerns a different code-location.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 56563] Multithreading bug when reading 2 similar files
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |seebass@gmx.ch
--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
*** Bug 56453 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 56563] Multithreading bug when reading 2 similar files
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
--- Comment #1 from Dominik Stadler <do...@gmx.at> ---
This was introduced in Bug 55612, unfortunately only a lock on a static object
will suffice unless we replace it with something a bit more sophisticated, e.g.
a thread-local.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 56563] Multithreading bug when reading 2 similar files
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
Fixed via r1597637, we now use ThreadLocals to keep the cache and thus avoid
multi-threading issues.
There should not be a huge overhead per Thread as we keep a string, a short and
a list of (int+boolean+string), which is replaced for every cache-entry, so it
cannot grow unbounded.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 56563] Multithreading bug when reading 2 similar files
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563
--- Comment #4 from Uri Sherman <ur...@gmail.com> ---
A short suggestion - I saw you are trying to avoid synchronization on a static
variable here.
Note that every call to DateUtil.isCellDateFormatted(a-numeric-cell) ends up in
a call to DateUtil.isADateFormat() which itself also maintains a static cache
of the last result and synchronizes on DateUtil.class.
In my case for example, the issue with the formatString arises by a call to
DateUtil.isCellDateFormatted(cell), so I still get monitor locks upon each call
even though you use thread locals for the style formatString cache.
Seems like it would be a good idea to use the same strategy and switch the
DateUtil.isADateFormat() cache to be ThreadLocal based rather than
synchronization based.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org