You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/06/03 20:04:05 UTC

[Bug 59663] New: Please make READING a workbook thread safe

https://bz.apache.org/bugzilla/show_bug.cgi?id=59663

            Bug ID: 59663
           Summary: Please make READING a workbook thread safe
           Product: POI
           Version: 3.14-FINAL
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: m.kurz@irregular.at

Created attachment 33910
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33910&action=edit
example

Preface:
I did read the FAQ - where it says "Accessing the same document in multiple
threads will not work."
Also I read the linked discussion
(https://mail-archives.apache.org/mod_mbox/poi-user/201109.mbox/%3C1314859350817-4757295.post@n5.nabble.com%3E).
---

In most of the discussions I read about thread safety in POI people talk about
creating/writing the same document via different threads. I completely
understand that making WRITING thread safe isn't trivial and probably has many
many pitfalls (apart from performance implications) so that's why it isn't
implemented in POI (right now).

However, what I am wondering is if it would be possible to make POI thread safe
when reading the same worksheet via multiple threads in parallel at the same
time.
We have quite large Excel files which we have to read-only (including
evaluating a lot of cell-formulars, etc.).
Being able to read a workbook with multiple concurrent threads at the same time
would speed thing up a lot for us - and probably for other people as well.

For me - as someone who doesn't know the codebase and it's architecture - my
first thought was that all that needs to be done is to make some caches thread
safe (by using ConcurrentHashMaps instead of normal HashMaps) and maybe some
other minor tweaks...

E.g. today, I implemented a small app which tries read a worksheet via multiple
threads concurrently - and of course it failed with this exception:
----
Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be cast
to java.util.HashMap$TreeNode
    at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1819)
    at java.util.HashMap$TreeNode.treeify(HashMap.java:1936)
    at java.util.HashMap.treeifyBin(HashMap.java:771)
    at java.util.HashMap.putVal(HashMap.java:643)
    at java.util.HashMap.put(HashMap.java:611)
    at org.apache.poi.ss.formula.PlainCellCache.put(PlainCellCache.java:84)
    at
org.apache.poi.ss.formula.EvaluationCache.getPlainValueEntry(EvaluationCache.java:136)
    at
org.apache.poi.ss.formula.EvaluationTracker.acceptPlainValueDependency(EvaluationTracker.java:145)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:242)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
    at
org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
    at
org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
    at
org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:51)
    at
org.apache.poi.ss.formula.eval.AreaEvalBase.getValue(AreaEvalBase.java:131)
    at
org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.collectValues(MultiOperandNumericFunction.java:151)
    at
org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.getNumberArray(MultiOperandNumericFunction.java:128)
    at
org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.evaluate(MultiOperandNumericFunction.java:90)
    at
org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:132)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
    at
org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
    at
org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
    at
org.apache.poi.ss.formula.LazyRefEval.getInnerValueEval(LazyRefEval.java:43)
    at
org.apache.poi.ss.formula.eval.OperandResolver.chooseSingleElementFromRef(OperandResolver.java:179)
    at
org.apache.poi.ss.formula.eval.OperandResolver.getSingleValue(OperandResolver.java:62)
    at
org.apache.poi.ss.formula.eval.TwoOperandNumericOperation.singleOperandEvaluate(TwoOperandNumericOperation.java:29)
    at
org.apache.poi.ss.formula.eval.TwoOperandNumericOperation.evaluate(TwoOperandNumericOperation.java:36)
    at
org.apache.poi.ss.formula.functions.Fixed2ArgFunction.evaluate(Fixed2ArgFunction.java:33)
    at
org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:119)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
    at
org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
    at
org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
    at
org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:51)
    at
org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:45)
    at
org.apache.poi.ss.formula.eval.AreaEvalBase.getValue(AreaEvalBase.java:128)
    at
org.apache.poi.ss.formula.functions.LookupUtils$ColumnVector.getItem(LookupUtils.java:100)
    at org.apache.poi.ss.formula.functions.Vlookup.evaluate(Vlookup.java:59)
    at
org.apache.poi.ss.formula.functions.Var3or4ArgFunction.evaluate(Var3or4ArgFunction.java:36)
    at
org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:132)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
    at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:205)
    at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluateFormulaCellValue(HSSFFormulaEvaluator.java:374)
    at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluate(HSSFFormulaEvaluator.java:202)
----
This exception could be fixed be making _plainValueEntriesByLoc a
ConcurrentHashMap in
https://github.com/apache/poi/blob/REL_3_14_FINAL/src/java/org/apache/poi/ss/formula/PlainCellCache.java#L81
I had a quick look in the codebase and it looks like there are some more caches
which probably could just be changed to a ConcurrentHashMap...


What do you think?
Is there a chance to make this work? Or am I completly wrong?

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59663] Please make READING a workbook thread safe

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59663

m.kurz@irregular.at changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |m.kurz@irregular.at

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 59663] Please make READING a workbook thread safe

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=59663

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|NEW                         |RESOLVED

--- Comment #1 from Dominik Stadler <do...@gmx.at> ---
I don't think there are any plans to do this. 

The sample exception that you list shows the types of problems that we would
run into. Any lazily-initialized data-structure would be a potential case of
errors. 

Sometimes the error message will be much harder to interpret or there even
might be very strange incorrect behavior without any indication that
multi-threading access is the culprit.

Therefore I am closing this as WONTFIX for now. Even if someone would come up
with some initial patches, I would vote against changing the official stance on
multi-threading as such a guarantee would come with an additional maintenance
burden that very likely no-one is willing to provide.

Lastly, any such change likely has some performance impact on current
single-threaded usages of the code, thus creating other problems for cases
where large documents are processed already now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org