You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2014/05/26 09:54:00 UTC

[Bug 56563] New: Multithreading bug when reading 2 similar files

https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

            Bug ID: 56563
           Summary: Multithreading bug when reading 2 similar files
           Product: POI
           Version: 3.10
          Hardware: All
                OS: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSSF
          Assignee: dev@poi.apache.org
          Reporter: urishe@gmail.com

Created attachment 31660
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=31660&action=edit
A file for which the problem reproduces

When reading two copies of the same file (doesn't necessarily need to be the
exact same file, but do need to contain the same styles or something along
those lines),
from two threads simultaneously (each thread processes its own file), the
format string of cell styles get mixed up.

The following code demonstrates this, printing to System.out every time a non
date cell is mistakenly recognized as date. Note that starting only one of the
threads yields no System.out messages.

public class CellFormatBugExample implements Runnable {

    public static void main(String[] args) {
        new Thread(new CellFormatBugExample("C:/temp/file_1.xls")).start();
        new Thread(new CellFormatBugExample("C:/temp/file_2.xls")).start();
    }

    String filePath;

    public CellFormatBugExample(String filePath) {
        this.filePath = filePath;
    }

    @Override
    public void run() {

        File inputFile = new File(filePath);
        try (FileInputStream stream = new FileInputStream(inputFile)) {

            Workbook wb = WorkbookFactory.create(stream);
            Sheet sheet = wb.getSheetAt(0);

            for (Row row : sheet) {
                for (Integer idxCell = 0; idxCell < row.getLastCellNum();
idxCell++) {

                    Cell cell = row.getCell(idxCell);
                    cell.getCellStyle().getDataFormatString();
                    if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
                        boolean isDate =
HSSFDateUtil.isCellDateFormatted(cell);
                        if (idxCell > 0 && isDate) {
                            System.out.println("cell " + idxCell + " is not a
date!");
                        }
                    }
                }
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Make another copy of the attached file - "file_2.xls" and run the code to
reproduce.
Digging around a bit, seems the cause for this is a caching bug in the
HSSFCellStyle.getDataFormatString() method.
As far as I could understand a simple resolution would be to synchronize access
to that method.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 56563] Multithreading bug when reading 2 similar files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

--- Comment #5 from Dominik Stadler <do...@gmx.at> ---
Thanks for the note, I have moved this into a separate Bug 56595 as this one is
resolved and the note concerns a different code-location.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 56563] Multithreading bug when reading 2 similar files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |seebass@gmx.ch

--- Comment #3 from Dominik Stadler <do...@gmx.at> ---
*** Bug 56453 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 56563] Multithreading bug when reading 2 similar files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

--- Comment #1 from Dominik Stadler <do...@gmx.at> ---
This was introduced in Bug 55612, unfortunately only a lock on a static object
will suffice unless we replace it with something a bit more sophisticated, e.g.
a thread-local.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 56563] Multithreading bug when reading 2 similar files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Dominik Stadler <do...@gmx.at> ---
Fixed via r1597637, we now use ThreadLocals to keep the cache and thus avoid
multi-threading issues.

There should not be a huge overhead per Thread as we keep a string, a short and
a list of (int+boolean+string), which is replaced for every cache-entry, so it
cannot grow unbounded.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 56563] Multithreading bug when reading 2 similar files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=56563

--- Comment #4 from Uri Sherman <ur...@gmail.com> ---
A short suggestion - I saw you are trying to avoid synchronization on a static
variable here.

Note that every call to DateUtil.isCellDateFormatted(a-numeric-cell) ends up in
a call to DateUtil.isADateFormat() which itself also maintains a static cache
of the last result and synchronizes on DateUtil.class.

In my case for example, the issue with the formatString arises by a call to
DateUtil.isCellDateFormatted(cell), so I still get monitor locks upon each call
even though you use thread locals for the style formatString cache. 

Seems like it would be a good idea to use the same strategy and switch the
DateUtil.isADateFormat() cache to be ThreadLocal based rather than
synchronization based.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org