You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2022/11/14 11:23:00 UTC

[jira] [Commented] (STATISTICS-57) Add a trapezoidal distribution

    [ https://issues.apache.org/jira/browse/STATISTICS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633707#comment-17633707 ] 

Alex Herbert commented on STATISTICS-57:
----------------------------------------

I have created an implementation for this. Note that the uniform distribution and the triangular distribution are simplifications of the trapezoidal distribution:
{noformat}
trapezoidal(a, b, c, d) has density:

   b----------c
  /            \
 /              \
a----------------d

trapezoidal(a, b, b, d) == triangular(a, b, d)     (b == mode)
trapezoidal(a, a, d, d) == uniform(a, d)
{noformat}
In order to pass all the edge cases for these related distributions it is simplest to create an instance of the TrapezoidalDistribution that delegates all computations to the TriangularDistribution or UniformDistribution. This avoids inconsistency between the behaviour of the two related distributions.

The edge cases:
{noformat}
trapezoidal(a, a, c, d)    (no up-slope)
trapezoidal(a, b, d, d)    (no down-slope){noformat}
are handled implicitly by the computations for the regular trapezoidal distribution and do not require specialisation.

Computation of the non-central moment k when a=b or c=d fails for the formula provided on wikipedia:
{noformat}
                 2               1       ( d^(k+2) - c^(k+2)   b^(k+2) - a^(k+2) )
 E[X^k] = --------------- -------------- ( ----------------- - ----------------- )
          (d + c - b - a) (k + 1)(k + 2) (       d - c               b - a       )
{noformat}
If the distribution is shifted to a normalized location=0 and scale=1 by translating by a and scaling by (d - a) then the computation is simplified by (a=0, d=1) to:
{noformat}
               2             1       ( 1 - c^(k+2)           )
 E[X^k] = ----------- -------------- ( ----------- - b^(k+1) )
          (1 + c - b) (k + 1)(k + 2) (    1 - c              )
{noformat}
This allows computation of the mean and variance for the edge cases. Note: This is the internal representation used by SciPy stats (which has a location and scale for all distributions), and must be used to create test data with that library. This representation also simplifies the density and probability computations. However I have used this representation only for the moments.
h2. Implementation note

The distribution is named in the literature as the trapezoidal distribution. Note that SciPy stats and the R library use the class/function name trapezoid. I have named the class TrapezoidalDistribution (not TrapezoidDistribution) which follows from the naming of the triangular distribution as TriangularDistibution.

The literature does not have a consistent name for the shape parameters b and c. These points mark the start and end of the constant region of the density function. R uses mode1 and mode2, SciPy simply labels these as shape parameters. The literature labels these as b and c so I have used these terms for the constructor and the parameter getters (getB and getC).

See [PR 39|https://github.com/apache/commons-statistics/pull/39]

> Add a trapezoidal distribution
> ------------------------------
>
>                 Key: STATISTICS-57
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-57
>             Project: Commons Statistics
>          Issue Type: New Feature
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a [trapezoidal|https://en.wikipedia.org/wiki/Trapezoidal_distribution] distribution.
> This distribution is in other libraries such as SciPy stats and R.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)