You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Nick Burch <ni...@alfresco.com> on 2010/07/08 16:41:12 UTC

Commons Math dependency?

Hi All

We now have two proposed patches that need commons math. Firstly the
POISSON function (bug #49538), and another (number I forget) which was for 
fractions when formatting numbers.

The latest version of commons math is a 800kb jar file. This is, however, 
larger than all the existing non-ooxml dependencies put together, and half 
the size of the core POI jar. That said, it's only 20% of the size of the 
minimal ooxml schemas jar file, and a third of the size of xmlbeans.

What do people think? Would it be ok to include this as a dependency? 
Should we require it for these features, but let everything else work 
without it (which could mean you run poi fine for ages, then suddenly one 
day it blows up saying "hey, I need commons math now!")? Should we decline 
the patches that need commons math, and do without those features?

Thoughts?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


RE: Commons Math dependency?

Posted by Kyle Ray <kr...@DIRTT.net>.
Hello,

Depends on how much of the Commons Math you use? If it's only a small percentage/portion of the commons math library I would just implement your own equivalent methods, terrible I know, but if you expect to use more than a handful methods out of Commons Math I would include the whole library as a dependency. If you allow some people to use it and others to run without it you have to make sure everyone who uses the library handles exceptions properly when the library isn't available (this is likely to be a major pain in your ass) hence why I suggest including it by default.  How hard is it to make these features work without the Commons Math Library?

For what 'we' use POI for including commons math as a dependency is no problem, we just use POI for processing data and we don't include POI as a dependency of our application.

Thank you,
Kyle Ray
d 403-450-3620
c 403-607-3346  

-----Original Message-----
From: Nick Burch [mailto:nick.burch@alfresco.com] 
Sent: Thursday, July 08, 2010 8:41 AM
To: dev@poi.apache.org
Subject: Commons Math dependency?

Hi All

We now have two proposed patches that need commons math. Firstly the
POISSON function (bug #49538), and another (number I forget) which was for 
fractions when formatting numbers.

The latest version of commons math is a 800kb jar file. This is, however, 
larger than all the existing non-ooxml dependencies put together, and half 
the size of the core POI jar. That said, it's only 20% of the size of the 
minimal ooxml schemas jar file, and a third of the size of xmlbeans.

What do people think? Would it be ok to include this as a dependency? 
Should we require it for these features, but let everything else work 
without it (which could mean you run poi fine for ages, then suddenly one 
day it blows up saying "hey, I need commons math now!")? Should we decline 
the patches that need commons math, and do without those features?

Thoughts?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Yegor Kozlov <ye...@dinom.ru>.
> Speaking of which, anyone mind if I roll another 3.7 beta release 
> later this week / early next week?
>
OK to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 26 Jul 2010, Yegor Kozlov wrote:
> It agrees with Excel, all unit tests in bug #49538 pass and I don't see 
> why should use commons-math's implementation instead of writing a few 
> lines of code.

Great news

> Nevertheless, I'm for including commons-math in POI dependencies if we 
> need more stat distributions.

I'm tempted to say we'll add it as a dependency for 3.8, and that'll let 
us do the extra stat functions + better number formatting (eg fractions). 
Well, add it as a dependency as soon after 3.7 final as someone wants to 
add the code that needs it anyway :)

Speaking of which, anyone mind if I roll another 3.7 beta release later 
this week / early next week?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Yegor Kozlov <ye...@dinom.ru>.
I didn't realize how simple is the implementation of POISSON until I 
chatted with Josh :) It is really a few lines of code:

         private double probability(int k, double lambda) {
             return Math.pow(lambda, k) * Math.exp(-lambda) / factorial(k);
         }

         private double cumulativeProbability(int x, double lambda) {
             double result = 0;
             for(int k = 0; k <= x; k++){
                 result += probability(k, lambda);
             }
             return result;
         }

It  agrees with Excel, all unit tests in bug #49538 pass and I don't see 
why should use commons-math's implementation instead of writing a few 
lines of code.

Nevertheless, I'm for including commons-math in POI dependencies if we 
need more stat distributions.

Yegor

On Fri, 16 Jul 2010, Yegor Kozlov wrote:
>> Perhaps for now, the best is to copy the implementation of POISSON 
>> from Commons-Math to POI, but if we want to take more stuff I would 
>> seriously consider a jar dependency.
>
> There's the fraction stuff too that I'm interested in. From the look 
> of your list of other functions, I think maybe a jar is the way to go
>
>> I gave a quick look at what can be useful for POI. The left column is 
>> Excel function, the right column is Math's implementation.
>>
> * snip 13 functions *
>
> Looks like we will probably fairly soon want the fraction stuff, and 
> 14 stats functions. Based on everyone's investigations, that does look 
> like probably too large a group to maintain our own hacked, minimal 
> copies of...
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 16 Jul 2010, Yegor Kozlov wrote:
> Perhaps for now, the best is to copy the implementation of POISSON from 
> Commons-Math to POI, but if we want to take more stuff I would seriously 
> consider a jar dependency.

There's the fraction stuff too that I'm interested in. From the look of 
your list of other functions, I think maybe a jar is the way to go

> I gave a quick look at what can be useful for POI. The left column is Excel 
> function, the right column is Math's implementation.
>
* snip 13 functions *

Looks like we will probably fairly soon want the fraction stuff, and 14 
stats functions. Based on everyone's investigations, that does look like 
probably too large a group to maintain our own hacked, minimal copies 
of...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Yegor Kozlov <ye...@dinom.ru>.
Perhaps for now, the best is to copy the implementation of POISSON from 
Commons-Math to POI,
but if we want to take more stuff I would seriously consider a jar 
dependency.

I gave a quick look at what can be useful for POI. The left column is 
Excel function, the right column is Math's implementation.

BETADIST         org.apache.commons.math.distribution.BetaDistributionImpl
BINOMDIST        
org.apache.commons.math.distribution.BinomialDistributionImpl
CHIDIST          
org.apache.commons.math.distribution.ChiSquaredDistributionImpl
EXPONDIST        
org.apache.commons.math.distribution.ExponentialDistributionImpl
FDIST            org.apache.commons.math.distribution.FDistributionImpl
GAMMADIST        org.apache.commons.math.distribution.GammaDistributionImpl
GEOMEAN          org.apache.commons.math.stat.StatUtils#geometricMean
KURT             org.apache.commons.math.stat.descriptive.moment.Kurtosis
NORMDIST         org.apache.commons.math.distribution.NormalDistributionImpl
PEARSON          
org.apache.commons.math.stat.correlation.PearsonsCorrelation
PERCENTILE       org.apache.commons.math.stat.descriptive.rank.Percentile
TDIST            org.apache.commons.math.distribution.TDistributionImpl
VAR              org.apache.commons.math.stat.StatUtils#variance

There can be more.


Yegor



The code for POISSON should be stable over time. The poisson
> distribution has a very simple definition that should be easy to test
> (and we should).  I guess the the complexity in the common-math
> version is there for optimisation reasons.  Unless we have better data
> on how the Excel version of POISSON is generally used, we should
> probably assume that the commons-math optimisations are appropriate.
>
> In case it's not clear, I'm still in favour of copying over this small
> amount of code from commons-math and maintaining it independently in
> poi.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Josh Micich <jo...@gmail.com>.
The code for POISSON should be stable over time.  The poisson
distribution has a very simple definition that should be easy to test
(and we should).  I guess the the complexity in the common-math
version is there for optimisation reasons.  Unless we have better data
on how the Excel version of POISSON is generally used, we should
probably assume that the commons-math optimisations are appropriate.

In case it's not clear, I'm still in favour of copying over this small
amount of code from commons-math and maintaining it independently in
poi.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 9 Jul 2010, Josh Micich wrote:
> Next, I tried pruning the classes as I added them (with some educated 
> guesses as to what functionality POI doesn't need).  For example, for 
> our use case, the class NormalDistributionImpl is not needed.  With this 
> approach, I ended up with around 1000 lines of code to copy across. 
> This represents about 1% of commons-math.

How hard is it to keep up with bug fixes in Commons Math if we do this 
though?

(If keeping up to date isn't too hard, and someone's willing to do the 
work, I don't see why we shouldn't do this. Yegor's confirmed that the 
code needed for fractions isn't too much either)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Josh Micich <jo...@gmail.com>.
My initial preference was to avoid creating a dependency on another
project (in other words, copying / rewriting instead).

I took a look at copying over PoissonDistributionImpl and everything
it depends on.  Unfortunately, the code is heavily coupled (for
example, many convenience factory methods on the exception classes).
I abandoned this approach when I'd copied a few dozen classes with no
end in sight.

Next, I tried pruning the classes as I added them (with some educated
guesses as to what functionality POI doesn't need).  For example, for
our use case, the class NormalDistributionImpl is not needed.  With
this approach, I ended up with around 1000 lines of code to copy
across.  This represents about 1% of commons-math.

Overall, I think this might still be the best solution.  There are
many other examples of code which have been written anew in POI in
order to avoid creating another dependency.  If we need more code from
commons-math in future, we can re-evaluate at that time.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Commons Math dependency?

Posted by Yegor Kozlov <ye...@dinom.ru>.
I was about to ask this question :)

I'm OK to include Commons Math as a POI dependency. There are lots of 
stuff we can use in POI - bug #49538 uses Math's implementation of 
Poisson and we may want to include other statistical distributions like 
beta, gamma and binom.

Previously I copied into POI a snippet of code that calculates 
fractional numbers. It was easy because that part is isolated. However, 
this trick will not work for statistical distributions - there are lots 
of dependencies and we will have to copy whole packages.

>
> What do people think? Would it be ok to include this as a dependency? 
> Should we require it for these features, but let everything else work 
> without it (which could mean you run poi fine for ages, then suddenly 
> one day it blows up saying "hey, I need commons math now!")? Should we 
> decline the patches that need commons math, and do without those 
> features?

We should certainly handle the case when commons math is not present, at 
least, we should wrap the default ClassNotFoundException.

Yegor


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org