You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by da...@apache.org on 2014/07/31 23:09:52 UTC
[01/30] bear commit: updated refs/heads/import-master to 5f99806
Repository: couchdb-bear
Updated Branches:
refs/heads/import-master [created] 5f998064d
initial commit
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/eb22734b
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/eb22734b
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/eb22734b
Branch: refs/heads/import-master
Commit: eb22734b85f857900bffd3a856626629c0f15019
Parents:
Author: joewilliams <wi...@gmail.com>
Authored: Fri Mar 30 15:22:30 2012 -0700
Committer: joewilliams <wi...@gmail.com>
Committed: Fri Mar 30 15:22:30 2012 -0700
----------------------------------------------------------------------
LICENSE | 201 +++++++++++++++++++++++++
README.md | 1 +
rebar | Bin 0 -> 101515 bytes
rebar.config | 3 +
src/bear.erl | 372 +++++++++++++++++++++++++++++++++++++++++++++++
src/bear_scutil.erl | 75 ++++++++++
6 files changed, 652 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/LICENSE
----------------------------------------------------------------------
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..11069ed
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,201 @@
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..5b941da
--- /dev/null
+++ b/README.md
@@ -0,0 +1 @@
+### bear : a set of statistics functions for erlang
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/rebar
----------------------------------------------------------------------
diff --git a/rebar b/rebar
new file mode 100755
index 0000000..77abae6
Binary files /dev/null and b/rebar differ
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/rebar.config
----------------------------------------------------------------------
diff --git a/rebar.config b/rebar.config
new file mode 100644
index 0000000..3ee1ec7
--- /dev/null
+++ b/rebar.config
@@ -0,0 +1,3 @@
+{deps, []}.
+{erl_opts, [debug_info]}.
+{cover_enabled, true}.
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
new file mode 100644
index 0000000..97bac0f
--- /dev/null
+++ b/src/bear.erl
@@ -0,0 +1,372 @@
+%%%
+%%% Copyright 2011, Boundary
+%%%
+%%% Licensed under the Apache License, Version 2.0 (the "License");
+%%% you may not use this file except in compliance with the License.
+%%% You may obtain a copy of the License at
+%%%
+%%% http://www.apache.org/licenses/LICENSE-2.0
+%%%
+%%% Unless required by applicable law or agreed to in writing, software
+%%% distributed under the License is distributed on an "AS IS" BASIS,
+%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%%% See the License for the specific language governing permissions and
+%%% limitations under the License.
+%%%
+
+
+%%%-------------------------------------------------------------------
+%%% File: bear.erl
+%%% @author joe williams <j...@boundary.com>
+%%% @doc
+%%% statistics functions for calucating based on id and a list of values
+%%% @end
+%%%------------------------------------------------------------------
+
+-module(bear).
+
+-compile([export_all]).
+
+-export([
+ get_statistics/1,
+ get_statistics/2
+ ]).
+
+-define(HIST_BINS, 10).
+
+-define(STATS_MIN, 5).
+
+-record(scan_result, {n=0, sumX=0, sumXX=0, sumInv=0, sumLog, max, min}).
+-record(scan_result2, {x2=0, x3=0, x4=0}).
+
+-compile([native]).
+
+get_statistics(Values) when length(Values) < ?STATS_MIN ->
+ [
+ {min, 0.0},
+ {max, 0.0},
+ {arithmetic_mean, 0.0},
+ {geometric_mean, 0.0},
+ {harmonic_mean, 0.0},
+ {median, 0.0},
+ {variance, 0.0},
+ {standard_deviation, 0.0},
+ {skewness, 0.0},
+ {kurtosis, 0.0},
+ {percentile,
+ [
+ {75, 0.0},
+ {95, 0.0},
+ {99, 0.0},
+ {999, 0.0}
+ ]
+ },
+ {histogram, [{0, 0}]}
+ ];
+get_statistics(Values) ->
+ Scan_res = scan_values(Values),
+ Scan_res2 = scan_values2(Values, Scan_res),
+ Variance = variance(Scan_res, Scan_res2),
+ SortedValues = lists:sort(Values),
+ [
+ {min, Scan_res#scan_result.min},
+ {max, Scan_res#scan_result.max},
+ {arithmetic_mean, arithmetic_mean(Scan_res)},
+ {geometric_mean, geometric_mean(Scan_res)},
+ {harmonic_mean, harmonic_mean(Scan_res)},
+ {median, percentile(SortedValues, Scan_res, 0.5)},
+ {variance, Variance},
+ {standard_deviation, std_deviation(Scan_res, Scan_res2)},
+ {skewness, skewness(Scan_res, Scan_res2)},
+ {kurtosis, kurtosis(Scan_res, Scan_res2)},
+ {percentile,
+ [
+ {75, percentile(SortedValues, Scan_res, 0.75)},
+ {95, percentile(SortedValues, Scan_res, 0.95)},
+ {99, percentile(SortedValues, Scan_res, 0.99)},
+ {999, percentile(SortedValues, Scan_res, 0.999)}
+ ]
+ },
+ {histogram, get_histogram(Values, Scan_res, Scan_res2)}
+ ].
+
+get_statistics(Values, _) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_statistics(_, Values) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_statistics(Values1, Values2) when length(Values1) /= length(Values2) ->
+ 0.0;
+get_statistics(Values1, Values2) ->
+ [
+ {covariance, get_covariance(Values1, Values2)},
+ {tau, get_kendall_correlation(Values1, Values2)},
+ {rho, get_pearson_correlation(Values1, Values2)},
+ {r, get_spearman_correlation(Values1, Values2)}
+ ].
+
+%%%===================================================================
+%%% Internal functions
+%%%===================================================================
+
+scan_values([X|Values]) ->
+ scan_values(Values, #scan_result{n=1, sumX=X, sumXX=X*X,
+ sumLog=math_log(X),
+ max=X, min=X, sumInv=inverse(X)}).
+
+scan_values([X|Values],
+ #scan_result{n=N, sumX=SumX, sumXX=SumXX, sumLog=SumLog,
+ max=Max, min=Min, sumInv=SumInv}=Acc) ->
+ scan_values(Values,
+ Acc#scan_result{n=N+1, sumX=SumX+X, sumXX=SumXX+X*X,
+ sumLog=SumLog+math_log(X),
+ max=max(X,Max), min=min(X,Min),
+ sumInv=SumInv+inverse(X)});
+scan_values([], Acc) ->
+ Acc.
+
+scan_values2(Values, #scan_result{n=N, sumX=SumX}) ->
+ scan_values2(Values, SumX/N, #scan_result2{}).
+
+scan_values2([X|Values], Mean, #scan_result2{x2=X2, x3=X3, x4=X4}=Acc) ->
+ Diff = X-Mean,
+ Diff2 = Diff*Diff,
+ Diff3 = Diff2*Diff,
+ Diff4 = Diff2*Diff2,
+ scan_values2(Values, Mean, Acc#scan_result2{x2=X2+Diff2, x3=X3+Diff3,
+ x4=X4+Diff4});
+scan_values2([], _, Acc) ->
+ Acc.
+
+
+arithmetic_mean(#scan_result{n=N, sumX=Sum}) ->
+ Sum/N.
+
+geometric_mean(#scan_result{n=N, sumLog=SumLog}) ->
+ math:exp(SumLog/N).
+
+harmonic_mean(#scan_result{n=N, sumInv=Sum}) ->
+ N/Sum.
+
+percentile(SortedValues, #scan_result{n=N}, Percentile)
+ when is_list(SortedValues) ->
+ Element = round(Percentile * N),
+ lists:nth(Element, SortedValues).
+
+%% Two pass variance
+%% Results match those given by the 'var' function in R
+variance(#scan_result{n=N}, #scan_result2{x2=X2}) ->
+ X2/(N-1).
+
+std_deviation(Scan_res, Scan_res2) ->
+ math:sqrt(variance(Scan_res, Scan_res2)).
+
+%% http://en.wikipedia.org/wiki/Skewness
+%%
+%% skewness results should match this R function:
+%% skewness <- function(x) {
+%% m3 <- mean((x - mean(x))^3)
+%% skew <- m3 / (sd(x)^3)
+%% skew
+%% }
+skewness(#scan_result{n=N}=Scan_res, #scan_result2{x3=X3}=Scan_res2) ->
+ case math:pow(std_deviation(Scan_res,Scan_res2), 3) of
+ 0.0 ->
+ 0.0; %% Is this really the correct thing to do here?
+ Else ->
+ (X3/N)/Else
+ end.
+
+%% http://en.wikipedia.org/wiki/Kurtosis
+%%
+%% results should match this R function:
+%% kurtosis <- function(x) {
+%% m4 <- mean((x - mean(x))^4)
+%% kurt <- m4 / (sd(x)^4) - 3
+%% kurt
+%% }
+kurtosis(#scan_result{n=N}=Scan_res, #scan_result2{x4=X4}=Scan_res2) ->
+ case math:pow(std_deviation(Scan_res,Scan_res2), 4) of
+ 0.0 ->
+ 0.0; %% Is this really the correct thing to do here?
+ Else ->
+ ((X4/N)/Else) - 3
+ end.
+
+get_histogram(Values, Scan_res, Scan_res2) ->
+ Bins = get_hist_bins(Scan_res#scan_result.min,
+ Scan_res#scan_result.max,
+ std_deviation(Scan_res, Scan_res2),
+ length(Values)
+ ),
+
+ Dict = lists:foldl(fun (Value, Dict) ->
+ update_bin(Value, Bins, Dict)
+ end,
+ dict:from_list([{Bin, 0} || Bin <- Bins]),
+ Values),
+
+ lists:sort(dict:to_list(Dict)).
+
+update_bin(Value, [Bin|_Bins], Dict) when Value =< Bin ->
+ dict:update_counter(Bin, 1, Dict);
+update_bin(Values, [_Bin|Bins], Dict) ->
+ update_bin(Values, Bins, Dict).
+
+%% two pass covariance
+%% (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance)
+%% matches results given by excel's 'covar' function
+get_covariance(Values, _) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_covariance(_, Values) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_covariance(Values1, Values2) when length(Values1) /= length(Values2) ->
+ 0.0;
+get_covariance(Values1, Values2) ->
+ {SumX, SumY, N} = foldl2(fun (X, Y, {SumX, SumY, N}) ->
+ {SumX+X, SumY+Y, N+1}
+ end, {0,0,0}, Values1, Values2),
+ MeanX = SumX/N,
+ MeanY = SumY/N,
+ Sum = foldl2(fun (X, Y, Sum) ->
+ Sum + ((X - MeanX) * (Y - MeanY))
+ end,
+ 0, Values1, Values2),
+ Sum/N.
+
+get_kendall_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_kendall_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_kendall_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+ 0.0;
+get_kendall_correlation(Values1, Values2) ->
+ bear_scutil:kendall_correlation(Values1, Values2).
+
+get_spearman_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_spearman_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_spearman_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+ 0.0;
+get_spearman_correlation(Values1, Values2) ->
+ TR1 = ranks_of(Values1),
+ TR2 = ranks_of(Values2),
+ Numerator = 6 * foldl2(fun (X, Y, Acc) ->
+ Diff = X-Y,
+ Acc + Diff*Diff
+ end, 0, TR1,TR2),
+ N = length(Values1),
+ Denominator = math:pow(N,3)-N,
+ 1-(Numerator/Denominator).
+
+ranks_of(Values) when is_list(Values) ->
+ [Fst|Rest] = revsort(Values),
+ TRs = ranks_of(Rest, [], 2, Fst, 1),
+ Dict = gb_trees:from_orddict(TRs),
+ L = lists:foldl(fun (Val, Acc) ->
+ Rank = gb_trees:get(Val, Dict),
+ [Rank|Acc]
+ end, [], Values),
+ lists:reverse(L).
+
+ranks_of([E|Es],Acc, N, E, S) ->
+ ranks_of(Es, Acc, N+1, E, S);
+ranks_of([E|Es], Acc, N, P, S) ->
+ ranks_of(Es,[{P,(S+N-1)/2}|Acc], N+1, E, N);
+ranks_of([], Acc, N, P, S) ->
+ [{P,(S+N-1)/2}|Acc].
+
+
+get_pearson_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_pearson_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+ 0.0;
+get_pearson_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+ 0.0;
+get_pearson_correlation(Values1, Values2) ->
+ {SumX, SumY, SumXX, SumYY, SumXY, N} =
+ foldl2(fun (X,Y,{SX, SY, SXX, SYY, SXY, N}) ->
+ {SX+X, SY+Y, SXX+X*X, SYY+Y*Y, SXY+X*Y, N+1}
+ end, {0,0,0,0,0,0}, Values1, Values2),
+ Numer = (N*SumXY) - (SumX * SumY),
+ case math:sqrt(((N*SumXX)-(SumX*SumX)) * ((N*SumYY)-(SumY*SumY))) of
+ 0.0 ->
+ 0.0; %% Is this really the correct thing to do here?
+ Denom ->
+ Numer/Denom
+ end.
+
+revsort(L) ->
+ lists:reverse(lists:sort(L)).
+
+%% Foldl over two lists
+foldl2(F, Acc, [I1|L1], [I2|L2]) when is_function(F,3) ->
+ foldl2(F, F(I1, I2, Acc), L1, L2);
+foldl2(_F, Acc, [], []) ->
+ Acc.
+
+%% wrapper for math:log/1 to avoid dividing by zero
+math_log(0) ->
+ 1;
+math_log(X) ->
+ math:log(X).
+
+%% wrapper for calculating inverse to avoid dividing by zero
+inverse(0) ->
+ 0;
+inverse(X) ->
+ 1/X.
+
+get_hist_bins(Min, Max, StdDev, Count) ->
+ BinWidth = get_bin_width(StdDev, Count),
+ BinCount = get_bin_count(Min, Max, BinWidth),
+ case get_bin_list(BinWidth, BinCount, []) of
+ List when length(List) =< 1 ->
+ [Max];
+ Bins ->
+ %% add Min to Bins
+ [Bin + Min || Bin <- Bins]
+ end.
+
+get_bin_list(Width, Bins, Acc) when Bins > length(Acc) ->
+ Bin = ((length(Acc) + 1) * Width ),
+ get_bin_list(Width, Bins, [round_bin(Bin)| Acc]);
+get_bin_list(_, _, Acc) ->
+ lists:usort(Acc).
+
+round_bin(Bin) ->
+ Base = case erlang:trunc(math:pow(10, round(math:log10(Bin) - 1))) of
+ 0 ->
+ 1;
+ Else ->
+ Else
+ end,
+ %io:format("bin ~p, base ~p~n", [Bin, Base]),
+ round_bin(Bin, Base).
+
+round_bin(Bin, Base) when Bin rem Base == 0 ->
+ Bin;
+round_bin(Bin, Base) ->
+ Bin + Base - (Bin rem Base).
+
+% the following is up for debate as far as what the best method
+% of choosing bin counts and widths. these seem to work *good enough*
+% in my testing
+
+% bin width based on Sturges
+% http://www.jstor.org/pss/2965501
+get_bin_width(StdDev, Count) ->
+ %io:format("stddev: ~p, count: ~p~n", [StdDev, Count]),
+ case round((3.5 * StdDev) / math:pow(Count, 0.3333333)) of
+ 0 ->
+ 1;
+ Else ->
+ Else
+ end.
+
+% based on the simple ceilng function at
+% http://en.wikipedia.org/wiki/Histograms#Number_of_bins_and_width
+% with a modification to attempt to get on bin beyond the max value
+get_bin_count(Min, Max, Width) ->
+ %io:format("min: ~p, max: ~p, width ~p~n", [Min, Max, Width]),
+ round((Max - Min) / Width) + 1.
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/eb22734b/src/bear_scutil.erl
----------------------------------------------------------------------
diff --git a/src/bear_scutil.erl b/src/bear_scutil.erl
new file mode 100644
index 0000000..e684fb7
--- /dev/null
+++ b/src/bear_scutil.erl
@@ -0,0 +1,75 @@
+%% taken from http://crunchyd.com/scutil/
+%% All code here is MIT Licensed
+%% http://scutil.com/license.html
+
+-module(bear_scutil).
+
+-export([
+ kendall_correlation/2
+ ]).
+-compile([export_all]).
+-compile([native]).
+
+% seems to match the value returned by the 'cor' (method="kendal") R function
+% http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
+kendall_correlation(List1, List2) when is_list(List1), is_list(List2) ->
+ {RA,_} = lists:unzip(tied_ordered_ranking(List1)),
+ {RB,_} = lists:unzip(tied_ordered_ranking(List2)),
+
+ Ordering = lists:keysort(1, lists:zip(RA,RB)),
+ {_,OrdB} = lists:unzip(Ordering),
+
+ N = length(List1),
+ P = lists:sum(kendall_right_of(OrdB, [])),
+
+ -(( (4*P) / (N * (N - 1))) - 1).
+
+%%%===================================================================
+%%% Internal functions
+%%%==================================================================
+
+simple_ranking(List) when is_list(List) ->
+ lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).
+
+tied_ranking(List) ->
+ tied_rank_worker(simple_ranking(List), [], no_prev_value).
+
+tied_ordered_ranking(List) when is_list(List) ->
+ tied_ordered_ranking(List, tied_ranking(List), []).
+
+tied_ordered_ranking([], [], Work) ->
+ lists:reverse(Work);
+
+tied_ordered_ranking([Front|Rem], Ranks, Work) ->
+ {value,Item} = lists:keysearch(Front,2,Ranks),
+ {IRank,Front} = Item,
+ tied_ordered_ranking(Rem, Ranks--[Item], [{IRank,Front}]++Work).
+
+kendall_right_of([], Work) ->
+ lists:reverse(Work);
+kendall_right_of([F|R], Work) ->
+ kendall_right_of(R, [kendall_right_of_item(F,R)]++Work).
+
+kendall_right_of_item(B, Rem) ->
+ length([R || R <- Rem, R < B]).
+
+tied_add_prev(Work, {FoundAt, NewValue}) ->
+ lists:duplicate( length(FoundAt), {lists:sum(FoundAt)/length(FoundAt), NewValue} ) ++ Work.
+
+tied_rank_worker([], Work, PrevValue) ->
+ lists:reverse(tied_add_prev(Work, PrevValue));
+
+tied_rank_worker([Item|Remainder], Work, PrevValue) ->
+ case PrevValue of
+ no_prev_value ->
+ {BaseRank,BaseVal} = Item,
+ tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
+ {FoundAt,OldVal} ->
+ case Item of
+ {Id,OldVal} ->
+ tied_rank_worker(Remainder, Work, {[Id]++FoundAt,OldVal});
+ {Id,NewVal} ->
+ tied_rank_worker(Remainder, tied_add_prev(Work, PrevValue), {[Id],NewVal})
+
+ end
+ end.
[26/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #15 from rodo/master
Add unit test on uncovered function, move test data from src/ to test/
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/7d1ee8e0
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/7d1ee8e0
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/7d1ee8e0
Branch: refs/heads/import-master
Commit: 7d1ee8e0072341c5d61034560d889450a5bf468f
Parents: 5ed737e 3994adf
Author: Joe Williams <wi...@gmail.com>
Authored: Wed Nov 6 07:41:46 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Wed Nov 6 07:41:46 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 12 ----
test/bear_test.erl | 151 +++++++++++++++++++++++++++++++++++++-----------
2 files changed, 118 insertions(+), 45 deletions(-)
----------------------------------------------------------------------
[19/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Added unit tests for stats_subset
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/d278aae0
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/d278aae0
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/d278aae0
Branch: refs/heads/import-master
Commit: d278aae0be7f1288bb94e7bc8edae92af1ec2071
Parents: 9a61504
Author: Ulf Wiger <ul...@feuerlabs.com>
Authored: Mon Nov 4 20:37:42 2013 +0100
Committer: Ulf Wiger <ul...@feuerlabs.com>
Committed: Mon Nov 4 20:56:26 2013 +0100
----------------------------------------------------------------------
src/bear.erl | 1 +
test/bear_test.erl | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/d278aae0/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 4f5baba..b211a54 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -532,3 +532,4 @@ test_values() ->
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9].
+
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/d278aae0/test/bear_test.erl
----------------------------------------------------------------------
diff --git a/test/bear_test.erl b/test/bear_test.erl
index 2cca076..fc37c6b 100644
--- a/test/bear_test.erl
+++ b/test/bear_test.erl
@@ -235,3 +235,28 @@ tied_rank_worker_test() ->
?assertEqual([{2.0,5},{2.0,5},{2.0,5},{2.0,5}], bear:tied_rank_worker([], [{2.0,5}], {[1,2,3], 5})),
?assertEqual([{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5}],
bear:tied_rank_worker([{2.0,5},{2.0,5}], [{2.0,5}], {[1,2,3], 5})).
+
+subset_test() ->
+ Stats = bear:get_statistics(bear:test_values()),
+ match_values(Stats).
+
+full_subset_test() ->
+ Stats = bear:get_statistics(bear:test_values()),
+ match_values2(Stats).
+
+match_values([H|T]) ->
+ Res = bear:get_statistics_subset(bear:test_values(), [mk_item(H)]),
+ Res = [H],
+ match_values(T);
+match_values([]) ->
+ ok.
+
+mk_item({percentile, Ps}) ->
+ {percentile, [P || {P,_} <- Ps]};
+mk_item({K, _}) ->
+ K.
+
+match_values2(Stats) ->
+ Items = [mk_item(I) || I <- Stats],
+ Stats = bear:get_statistics_subset(bear:test_values(), Items),
+ ok.
[05/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add "n" value to stats proplist to determine number of observations
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/7cb6a632
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/7cb6a632
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/7cb6a632
Branch: refs/heads/import-master
Commit: 7cb6a632a2d1f01fd05c55f692dcd338bf8c89d2
Parents: 7ef9a7b
Author: Dave Smith <di...@dizzyd.com>
Authored: Fri Aug 24 09:23:40 2012 -0600
Committer: Dave Smith <di...@dizzyd.com>
Committed: Fri Aug 24 09:23:40 2012 -0600
----------------------------------------------------------------------
src/bear.erl | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/7cb6a632/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 0afbe61..33138ee 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -61,7 +61,8 @@ get_statistics(Values) when length(Values) < ?STATS_MIN ->
{999, 0.0}
]
},
- {histogram, [{0, 0}]}
+ {histogram, [{0, 0}]},
+ {n, 0}
];
get_statistics(Values) ->
Scan_res = scan_values(Values),
@@ -87,7 +88,8 @@ get_statistics(Values) ->
{999, percentile(SortedValues, Scan_res, 0.999)}
]
},
- {histogram, get_histogram(Values, Scan_res, Scan_res2)}
+ {histogram, get_histogram(Values, Scan_res, Scan_res2)},
+ {n, Scan_res#scan_result.n}
].
get_statistics(Values, _) when length(Values) < ?STATS_MIN ->
[06/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #3 from dizzyd/dss-observations-counter
Add "n" value to stats proplist to determine number of observations
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/b1882d7e
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/b1882d7e
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/b1882d7e
Branch: refs/heads/import-master
Commit: b1882d7ee88e775d961a2678e4fafecef7f77705
Parents: 7ef9a7b 7cb6a63
Author: Joe Williams <wi...@gmail.com>
Authored: Fri Aug 24 09:35:51 2012 -0700
Committer: Joe Williams <wi...@gmail.com>
Committed: Fri Aug 24 09:35:51 2012 -0700
----------------------------------------------------------------------
src/bear.erl | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
[18/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
bear:statistics_subset/2
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/9a615049
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/9a615049
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/9a615049
Branch: refs/heads/import-master
Commit: 9a6150495941128d026757486d21602c45b125e0
Parents: b9feed8
Author: Ulf Wiger <ul...@feuerlabs.com>
Authored: Thu Sep 12 15:45:50 2013 +0200
Committer: Ulf Wiger <ul...@feuerlabs.com>
Committed: Mon Nov 4 20:49:47 2013 +0100
----------------------------------------------------------------------
src/bear.erl | 140 +++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 113 insertions(+), 27 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/9a615049/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 04593e7..4f5baba 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -41,32 +41,7 @@
-compile([native]).
-get_statistics(Values) when length(Values) < ?STATS_MIN ->
- [
- {min, 0.0},
- {max, 0.0},
- {arithmetic_mean, 0.0},
- {geometric_mean, 0.0},
- {harmonic_mean, 0.0},
- {median, 0.0},
- {variance, 0.0},
- {standard_deviation, 0.0},
- {skewness, 0.0},
- {kurtosis, 0.0},
- {percentile,
- [
- {50, 0.0},
- {75, 0.0},
- {90, 0.0},
- {95, 0.0},
- {99, 0.0},
- {999, 0.0}
- ]
- },
- {histogram, [{0, 0}]},
- {n, 0}
- ];
-get_statistics(Values) ->
+get_statistics([_,_,_,_,_|_] = Values) ->
Scan_res = scan_values(Values),
Scan_res2 = scan_values2(Values, Scan_res),
Variance = variance(Scan_res, Scan_res2),
@@ -94,7 +69,86 @@ get_statistics(Values) ->
},
{histogram, get_histogram(Values, Scan_res, Scan_res2)},
{n, Scan_res#scan_result.n}
- ].
+ ];
+get_statistics(Values) when is_list(Values) ->
+ [
+ {min, 0.0},
+ {max, 0.0},
+ {arithmetic_mean, 0.0},
+ {geometric_mean, 0.0},
+ {harmonic_mean, 0.0},
+ {median, 0.0},
+ {variance, 0.0},
+ {standard_deviation, 0.0},
+ {skewness, 0.0},
+ {kurtosis, 0.0},
+ {percentile,
+ [
+ {50, 0.0},
+ {75, 0.0},
+ {90, 0.0},
+ {95, 0.0},
+ {99, 0.0},
+ {999, 0.0}
+ ]
+ },
+ {histogram, [{0, 0}]},
+ {n, 0}
+ ].
+
+get_statistics_subset(Values, Items) ->
+ Length = length(Values),
+ if Length < ?STATS_MIN ->
+ [I || {K,_} = I <- get_statistics([]),
+ lists:member(K, Items) orelse K==percentiles];
+ true ->
+ SortedValues = lists:sort(Values),
+ Steps = calc_steps(Items),
+ Scan_res = if Steps > 1 -> scan_values(Values);
+ true -> []
+ end,
+ Scan_res2 = if Steps > 2 -> scan_values2(Values, Scan_res);
+ true -> []
+ end,
+ report_subset(Items, Length,
+ SortedValues, Scan_res, Scan_res2)
+ end.
+
+calc_steps(Items) ->
+ lists:foldl(fun({I,_},Acc) ->
+ erlang:max(level(I), Acc);
+ (I,Acc) ->
+ erlang:max(level(I), Acc)
+ end, 1, Items).
+
+level(standard_deviation) -> 3;
+level(variance ) -> 3;
+level(skewness ) -> 3;
+level(kurtosis ) -> 3;
+level(histogram ) -> 3;
+level(arithmetic_mean ) -> 2;
+level(geometric_mean ) -> 2;
+level(harmonic_mean ) -> 2;
+level(_) -> 1.
+
+report_subset(Items, N, SortedValues, Scan_res, Scan_res2) ->
+ lists:map(
+ fun(min) -> {min, hd(SortedValues)};
+ (max) -> {max, lists:last(SortedValues)};
+ (arithmetic_mean) -> {arithmetic_mean, arithmetic_mean(Scan_res)};
+ (harmonic_mean) -> {harmonic_mean, harmonic_mean(Scan_res)};
+ (geometric_mean) -> {geometric_mean, geometric_mean(Scan_res)};
+ (median) -> {median, percentile(SortedValues,
+ #scan_result{n = N}, 0.5)};
+ (variance) -> {variance, variance(Scan_res, Scan_res2)};
+ (standard_deviation=I) -> {I, std_deviation(Scan_res, Scan_res2)};
+ (skewness) -> {skewness, skewness(Scan_res, Scan_res2)};
+ (kurtosis) -> {kurtosis, kurtosis(Scan_res, Scan_res2)};
+ ({percentile,Ps}) -> {percentile, percentiles(Ps, N, SortedValues)};
+ (histogram) ->
+ {histogram, get_histogram(SortedValues, Scan_res, Scan_res2)};
+ (n) -> {n, N}
+ end, Items).
get_statistics(Values, _) when length(Values) < ?STATS_MIN ->
0.0;
@@ -446,3 +500,35 @@ tied_rank_worker([Item|Remainder], Work, PrevValue) ->
end
end.
+
+percentiles(Ps, N, Values) ->
+ Items = [{P, perc(P, N)} || P <- Ps],
+ pick_items(Values, 1, Items).
+
+pick_items([H|_] = L, P, [{Tag,P}|Ps]) ->
+ [{Tag,H} | pick_items(L, P, Ps)];
+pick_items([_|T], P, Ps) ->
+ pick_items(T, P+1, Ps);
+pick_items([], _, Ps) ->
+ [{Tag,undefined} || {Tag,_} <- Ps].
+
+perc(P, Len) when is_integer(P), 0 =< P, P =< 100 ->
+ V = round(P * Len / 100),
+ erlang:max(1, V);
+perc(P, Len) when is_integer(P), 100 =< P, P =< 1000 ->
+ V = round(P * Len / 1000),
+ erlang:max(1, V);
+perc(P, Len) when is_float(P), 0 =< P, P =< 1 ->
+ erlang:max(1, round(P * Len)).
+
+
+test_values() ->
+ [1,1,1,1,1,1,1,
+ 2,2,2,2,2,2,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,
+ 4,4,4,4,4,4,4,4,4,4,4,4,4,4,
+ 5,5,5,5,5,5,5,5,5,5,5,5,5,5,
+ 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
+ 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
+ 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+ 9,9,9,9,9,9,9].
[08/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #6 from jamesc/master
Fix #5 - harmonic mean throws exception when given a set of zero values
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/0da736b0
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/0da736b0
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/0da736b0
Branch: refs/heads/import-master
Commit: 0da736b0e9bef2c7150cd6e6c4a9fa1854deedf9
Parents: b1882d7 79782d2
Author: Joe Williams <wi...@gmail.com>
Authored: Fri Nov 30 13:41:17 2012 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Fri Nov 30 13:41:17 2012 -0800
----------------------------------------------------------------------
src/bear.erl | 3 +++
1 file changed, 3 insertions(+)
----------------------------------------------------------------------
[04/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
app file
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/7ef9a7b9
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/7ef9a7b9
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/7ef9a7b9
Branch: refs/heads/import-master
Commit: 7ef9a7b90ffb6866cadb0167635910670923626b
Parents: 5b87b46
Author: joewilliams <wi...@gmail.com>
Authored: Fri Mar 30 15:34:07 2012 -0700
Committer: joewilliams <wi...@gmail.com>
Committed: Fri Mar 30 15:34:07 2012 -0700
----------------------------------------------------------------------
src/bear.app.src | 8 ++++++++
1 file changed, 8 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/7ef9a7b9/src/bear.app.src
----------------------------------------------------------------------
diff --git a/src/bear.app.src b/src/bear.app.src
new file mode 100644
index 0000000..3aac9fd
--- /dev/null
+++ b/src/bear.app.src
@@ -0,0 +1,8 @@
+{application, bear,
+ [
+ {description, ""},
+ {vsn, git},
+ {registered, []},
+ {applications, []},
+ {env, []}
+ ]}.
[11/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add math_log/1 and inverse/1 patterns to catch 0.0
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/49cec9a2
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/49cec9a2
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/49cec9a2
Branch: refs/heads/import-master
Commit: 49cec9a27a4e425a1fbe28c925589acdda200769
Parents: 3fd09d1
Author: James Kelly <ji...@adroll.com>
Authored: Mon Jul 1 09:04:00 2013 -0700
Committer: James Kelly <ji...@adroll.com>
Committed: Mon Jul 1 09:04:00 2013 -0700
----------------------------------------------------------------------
src/bear.erl | 4 ++++
1 file changed, 4 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/49cec9a2/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 7039910..ffc9025 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -317,12 +317,16 @@ foldl2(_F, Acc, [], []) ->
%% wrapper for math:log/1 to avoid dividing by zero
math_log(0) ->
1;
+math_log(0.0) ->
+ 1.0;
math_log(X) ->
math:log(X).
%% wrapper for calculating inverse to avoid dividing by zero
inverse(0) ->
0;
+inverse(0.0) ->
+ 0.0;
inverse(X) ->
1/X.
[03/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
readme update
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/5b87b46e
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/5b87b46e
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/5b87b46e
Branch: refs/heads/import-master
Commit: 5b87b46e3efc5fc91c5c30b18e05994583db5a87
Parents: 7c9fc5f
Author: joewilliams <wi...@gmail.com>
Authored: Fri Mar 30 15:30:04 2012 -0700
Committer: joewilliams <wi...@gmail.com>
Committed: Fri Mar 30 15:30:04 2012 -0700
----------------------------------------------------------------------
README.md | 6 ++++++
1 file changed, 6 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/5b87b46e/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 5b941da..10a2f82 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,7 @@
### bear : a set of statistics functions for erlang
+
+Currently bear is focused on use inside the Folsom Erlang metrics library but all of these functions are generic and useful in other situations.
+
+Pull requests accepted!
+
+#### Available under the Apache 2.0 License
[22/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
empty list case of get_statistics_subset
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/bb739b29
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/bb739b29
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/bb739b29
Branch: refs/heads/import-master
Commit: bb739b294c0892933978b7ab5d7e316d6855cdcb
Parents: 926a486
Author: Joe Williams <jo...@github.com>
Authored: Tue Nov 5 12:51:06 2013 -0800
Committer: Joe Williams <jo...@github.com>
Committed: Tue Nov 5 12:51:06 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/bb739b29/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 39111f5..d46aa8a 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -96,7 +96,7 @@ get_statistics(Values) when is_list(Values) ->
{n, 0}
].
-get_statistics_subset(Values, Items) ->
+get_statistics_subset([_,_,_,_,_|_] = Values, Items) ->
Length = length(Values),
if Length < ?STATS_MIN ->
[I || {K,_} = I <- get_statistics([]),
@@ -112,7 +112,9 @@ get_statistics_subset(Values, Items) ->
end,
report_subset(Items, Length,
SortedValues, Scan_res, Scan_res2)
- end.
+ end;
+get_statistics_subset(Values, Items) when is_list(Values) ->
+ [{Item, 0.0} || Item <- Items].
calc_steps(Items) ->
lists:foldl(fun({I,_},Acc) ->
[02/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
merge bear and bear_scutil
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/7c9fc5f1
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/7c9fc5f1
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/7c9fc5f1
Branch: refs/heads/import-master
Commit: 7c9fc5f16315b6e23543817c10ba641f85c10e64
Parents: eb22734
Author: joewilliams <wi...@gmail.com>
Authored: Fri Mar 30 15:26:34 2012 -0700
Committer: joewilliams <wi...@gmail.com>
Committed: Fri Mar 30 15:26:34 2012 -0700
----------------------------------------------------------------------
src/bear.erl | 67 +++++++++++++++++++++++++++++++++++++++++-
src/bear_scutil.erl | 75 ------------------------------------------------
2 files changed, 66 insertions(+), 76 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/7c9fc5f1/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 97bac0f..0afbe61 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -240,7 +240,7 @@ get_kendall_correlation(_, Values) when length(Values) < ?STATS_MIN ->
get_kendall_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
0.0;
get_kendall_correlation(Values1, Values2) ->
- bear_scutil:kendall_correlation(Values1, Values2).
+ bear:kendall_correlation(Values1, Values2).
get_spearman_correlation(Values, _) when length(Values) < ?STATS_MIN ->
0.0;
@@ -370,3 +370,68 @@ get_bin_width(StdDev, Count) ->
get_bin_count(Min, Max, Width) ->
%io:format("min: ~p, max: ~p, width ~p~n", [Min, Max, Width]),
round((Max - Min) / Width) + 1.
+
+%% taken from http://crunchyd.com/scutil/
+%% All code here is MIT Licensed
+%% http://scutil.com/license.html
+
+% seems to match the value returned by the 'cor' (method="kendal") R function
+% http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
+kendall_correlation(List1, List2) when is_list(List1), is_list(List2) ->
+ {RA,_} = lists:unzip(tied_ordered_ranking(List1)),
+ {RB,_} = lists:unzip(tied_ordered_ranking(List2)),
+
+ Ordering = lists:keysort(1, lists:zip(RA,RB)),
+ {_,OrdB} = lists:unzip(Ordering),
+
+ N = length(List1),
+ P = lists:sum(kendall_right_of(OrdB, [])),
+
+ -(( (4*P) / (N * (N - 1))) - 1).
+
+simple_ranking(List) when is_list(List) ->
+ lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).
+
+tied_ranking(List) ->
+ tied_rank_worker(simple_ranking(List), [], no_prev_value).
+
+tied_ordered_ranking(List) when is_list(List) ->
+ tied_ordered_ranking(List, tied_ranking(List), []).
+
+tied_ordered_ranking([], [], Work) ->
+ lists:reverse(Work);
+
+tied_ordered_ranking([Front|Rem], Ranks, Work) ->
+ {value,Item} = lists:keysearch(Front,2,Ranks),
+ {IRank,Front} = Item,
+ tied_ordered_ranking(Rem, Ranks--[Item], [{IRank,Front}]++Work).
+
+kendall_right_of([], Work) ->
+ lists:reverse(Work);
+kendall_right_of([F|R], Work) ->
+ kendall_right_of(R, [kendall_right_of_item(F,R)]++Work).
+
+kendall_right_of_item(B, Rem) ->
+ length([R || R <- Rem, R < B]).
+
+tied_add_prev(Work, {FoundAt, NewValue}) ->
+ lists:duplicate( length(FoundAt), {lists:sum(FoundAt)/length(FoundAt), NewValue} ) ++ Work.
+
+tied_rank_worker([], Work, PrevValue) ->
+ lists:reverse(tied_add_prev(Work, PrevValue));
+
+tied_rank_worker([Item|Remainder], Work, PrevValue) ->
+ case PrevValue of
+ no_prev_value ->
+ {BaseRank,BaseVal} = Item,
+ tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
+ {FoundAt,OldVal} ->
+ case Item of
+ {Id,OldVal} ->
+ tied_rank_worker(Remainder, Work, {[Id]++FoundAt,OldVal});
+ {Id,NewVal} ->
+ tied_rank_worker(Remainder, tied_add_prev(Work, PrevValue), {[Id],NewVal})
+
+ end
+ end.
+
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/7c9fc5f1/src/bear_scutil.erl
----------------------------------------------------------------------
diff --git a/src/bear_scutil.erl b/src/bear_scutil.erl
deleted file mode 100644
index e684fb7..0000000
--- a/src/bear_scutil.erl
+++ /dev/null
@@ -1,75 +0,0 @@
-%% taken from http://crunchyd.com/scutil/
-%% All code here is MIT Licensed
-%% http://scutil.com/license.html
-
--module(bear_scutil).
-
--export([
- kendall_correlation/2
- ]).
--compile([export_all]).
--compile([native]).
-
-% seems to match the value returned by the 'cor' (method="kendal") R function
-% http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
-kendall_correlation(List1, List2) when is_list(List1), is_list(List2) ->
- {RA,_} = lists:unzip(tied_ordered_ranking(List1)),
- {RB,_} = lists:unzip(tied_ordered_ranking(List2)),
-
- Ordering = lists:keysort(1, lists:zip(RA,RB)),
- {_,OrdB} = lists:unzip(Ordering),
-
- N = length(List1),
- P = lists:sum(kendall_right_of(OrdB, [])),
-
- -(( (4*P) / (N * (N - 1))) - 1).
-
-%%%===================================================================
-%%% Internal functions
-%%%==================================================================
-
-simple_ranking(List) when is_list(List) ->
- lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).
-
-tied_ranking(List) ->
- tied_rank_worker(simple_ranking(List), [], no_prev_value).
-
-tied_ordered_ranking(List) when is_list(List) ->
- tied_ordered_ranking(List, tied_ranking(List), []).
-
-tied_ordered_ranking([], [], Work) ->
- lists:reverse(Work);
-
-tied_ordered_ranking([Front|Rem], Ranks, Work) ->
- {value,Item} = lists:keysearch(Front,2,Ranks),
- {IRank,Front} = Item,
- tied_ordered_ranking(Rem, Ranks--[Item], [{IRank,Front}]++Work).
-
-kendall_right_of([], Work) ->
- lists:reverse(Work);
-kendall_right_of([F|R], Work) ->
- kendall_right_of(R, [kendall_right_of_item(F,R)]++Work).
-
-kendall_right_of_item(B, Rem) ->
- length([R || R <- Rem, R < B]).
-
-tied_add_prev(Work, {FoundAt, NewValue}) ->
- lists:duplicate( length(FoundAt), {lists:sum(FoundAt)/length(FoundAt), NewValue} ) ++ Work.
-
-tied_rank_worker([], Work, PrevValue) ->
- lists:reverse(tied_add_prev(Work, PrevValue));
-
-tied_rank_worker([Item|Remainder], Work, PrevValue) ->
- case PrevValue of
- no_prev_value ->
- {BaseRank,BaseVal} = Item,
- tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
- {FoundAt,OldVal} ->
- case Item of
- {Id,OldVal} ->
- tied_rank_worker(Remainder, Work, {[Id]++FoundAt,OldVal});
- {Id,NewVal} ->
- tied_rank_worker(Remainder, tied_add_prev(Work, PrevValue), {[Id],NewVal})
-
- end
- end.
[07/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Correct harmonic_mean behaviour all values are 0
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/79782d28
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/79782d28
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/79782d28
Branch: refs/heads/import-master
Commit: 79782d28084a7c3463fcd8d5d267b350d822565d
Parents: b1882d7
Author: jamesc <ja...@opscode.com>
Authored: Fri Nov 30 13:35:27 2012 -0800
Committer: jamesc <ja...@opscode.com>
Committed: Fri Nov 30 13:35:27 2012 -0800
----------------------------------------------------------------------
src/bear.erl | 3 +++
1 file changed, 3 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/79782d28/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 33138ee..65b2831 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -146,6 +146,9 @@ arithmetic_mean(#scan_result{n=N, sumX=Sum}) ->
geometric_mean(#scan_result{n=N, sumLog=SumLog}) ->
math:exp(SumLog/N).
+harmonic_mean(#scan_result{sumInv=0}) ->
+ %% Protect against divide by 0 if we have all 0 values
+ 0;
harmonic_mean(#scan_result{n=N, sumInv=Sum}) ->
N/Sum.
[28/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
get_statistics_subset should return well-formatted null results
The existing code for handling lists of values that don't meet minimum
length causes problems when the `percentiles` key is used; e.g., rather
than the expected
[{percentile, [{0.5, 0.0}, ... ]}]
the user is presented with
[{{percentile, [0.5, ... ]}, 0.0}]
which doesn't match the formatting for other subset key. This patch
special-cases the `percentile` key to return the expected result.
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/1a902e8c
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/1a902e8c
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/1a902e8c
Branch: refs/heads/import-master
Commit: 1a902e8c37f5f894a6b3203ee8a0279ff543eb82
Parents: 7d1ee8e
Author: Benjamin Anderson <b...@banjiewen.net>
Authored: Sat Nov 16 18:19:37 2013 -0800
Committer: Benjamin Anderson <b...@banjiewen.net>
Committed: Sat Nov 16 18:24:35 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/1a902e8c/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 7d9eed9..67f4139 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -114,7 +114,14 @@ get_statistics_subset([_,_,_,_,_|_] = Values, Items) ->
SortedValues, Scan_res, Scan_res2)
end;
get_statistics_subset(Values, Items) when is_list(Values) ->
- [{Item, 0.0} || Item <- Items].
+ get_null_statistics_subset(Items, []).
+
+get_null_statistics_subset([{percentile, Ps}|Items], Acc) ->
+ get_null_statistics_subset(Items, [{percentile, [{P, 0.0} || P <- Ps]}|Acc]);
+get_null_statistics_subset([I|Items], Acc) ->
+ get_null_statistics_subset(Items, [{I, 0.0}|Acc]);
+get_null_statistics_subset([], Acc) ->
+ lists:reverse(Acc).
calc_steps(Items) ->
lists:foldl(fun({I,_},Acc) ->
[13/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add unit tests
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/c65276d0
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/c65276d0
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/c65276d0
Branch: refs/heads/import-master
Commit: c65276d0c511687b3e02749089bea5633af4afc5
Parents: 96cbfae
Author: Rodolphe Quiédeville <ro...@quiedeville.org>
Authored: Wed Oct 23 23:04:29 2013 +0200
Committer: Rodolphe Quiédeville <ro...@quiedeville.org>
Committed: Thu Oct 31 19:33:37 2013 +0100
----------------------------------------------------------------------
test/bear_test.erl | 232 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 232 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/c65276d0/test/bear_test.erl
----------------------------------------------------------------------
diff --git a/test/bear_test.erl b/test/bear_test.erl
new file mode 100644
index 0000000..10d447c
--- /dev/null
+++ b/test/bear_test.erl
@@ -0,0 +1,232 @@
+%%%
+%%% Copyright 2013, Rodolphe Quiedeville <ro...@quiedeville.org>
+%%%
+%%% Licensed under the Apache License, Version 2.0 (the "License");
+%%% you may not use this file except in compliance with the License.
+%%% You may obtain a copy of the License at
+%%%
+%%% http://www.apache.org/licenses/LICENSE-2.0
+%%%
+%%% Unless required by applicable law or agreed to in writing, software
+%%% distributed under the License is distributed on an "AS IS" BASIS,
+%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%%% See the License for the specific language governing permissions and
+%%% limitations under the License.
+%%%
+
+%%% ====================================================================
+%%% file : bear_test.erl
+%%% @author : Rodolphe Quiedeville <ro...@quiedeville.org>
+%%% @doc
+%%% Unit test for functions defined in bear.erl
+%%% @end
+%%% ====================================================================
+-module(bear_test).
+
+-compile(export_all).
+
+-record(scan_result, {n=0, sumX=0, sumXX=0, sumInv=0, sumLog, max, min}).
+-record(scan_result2, {x2=0, x3=0, x4=0}).
+
+-include_lib("eunit/include/eunit.hrl").
+
+-define(PRECISION, 1.0e15).
+
+get_statistics_1_empty_test() ->
+ %% get_statistics/1
+ %% Empty set of values
+ Percentile = [{50, 0.0},{75, 0.0},{90, 0.0},{95, 0.0},{99, 0.0},{999, 0.0}],
+ Stats = bear:get_statistics([]),
+ ?assertEqual({min, 0.0}, lists:keyfind(min, 1, Stats)),
+ ?assertEqual({max, 0.0}, lists:keyfind(max, 1, Stats)),
+ ?assertEqual({arithmetic_mean, 0.0}, lists:keyfind(arithmetic_mean, 1, Stats)),
+ ?assertEqual({geometric_mean, 0.0}, lists:keyfind(geometric_mean, 1, Stats)),
+ ?assertEqual({harmonic_mean, 0.0}, lists:keyfind(harmonic_mean, 1, Stats)),
+ ?assertEqual({median, 0.0}, lists:keyfind(median, 1, Stats)),
+ ?assertEqual({variance, 0.0}, lists:keyfind(variance, 1, Stats)),
+ ?assertEqual({standard_deviation, 0.0}, lists:keyfind(standard_deviation, 1, Stats)),
+ ?assertEqual({skewness, 0.0}, lists:keyfind(skewness, 1, Stats)),
+ ?assertEqual({kurtosis, 0.0}, lists:keyfind(kurtosis, 1, Stats)),
+ ?assertEqual({percentile, Percentile}, lists:keyfind(percentile, 1, Stats)),
+ ?assertEqual({histogram, [{0,0}]}, lists:keyfind(histogram, 1, Stats)),
+ ?assertEqual({n, 0}, lists:keyfind(n, 1, Stats)).
+
+get_statistics_1_regular_test() ->
+ %% get_statistics/1
+ %% Non empty set of values
+ Percentile = [{50, 5},{75, 8},{90, 9},{95, 10},{99, 10},{999, 10}],
+ Stats = bear:get_statistics(lists:seq(1,10)),
+
+ {geometric_mean, Geometric} = lists:keyfind(geometric_mean, 1, Stats),
+ {harmonic_mean, Harmonic} = lists:keyfind(harmonic_mean, 1, Stats),
+ {variance, Variance} = lists:keyfind(variance, 1, Stats),
+ {standard_deviation, StandardDeviation} = lists:keyfind(standard_deviation, 1, Stats),
+ {kurtosis, Kurtosis} = lists:keyfind(kurtosis, 1, Stats),
+
+ ?assertEqual({min, 1}, lists:keyfind(min, 1, Stats)),
+ ?assertEqual({max, 10}, lists:keyfind(max, 1, Stats)),
+ ?assertEqual({arithmetic_mean, 5.5}, lists:keyfind(arithmetic_mean, 1, Stats)),
+ ?assertEqual(4528728688116766, erlang:trunc(?PRECISION * Geometric)),
+ ?assertEqual(3414171521474055, erlang:trunc(?PRECISION * Harmonic)),
+ ?assertEqual({median, 5}, lists:keyfind(median, 1, Stats)),
+ ?assertEqual(9166666666666666, erlang:trunc(?PRECISION * Variance)),
+ ?assertEqual(3027650354097491, erlang:trunc(?PRECISION * StandardDeviation)),
+ ?assertEqual({skewness, 0.0}, lists:keyfind(skewness, 1, Stats)),
+ ?assertEqual(-1561636363636363, erlang:trunc(?PRECISION * Kurtosis)),
+ ?assertEqual({percentile, Percentile}, lists:keyfind(percentile, 1, Stats)),
+ ?assertEqual({histogram, [{6,6},{11,4},{16,0}]}, lists:keyfind(histogram, 1, Stats)),
+ ?assertEqual({n, 10}, lists:keyfind(n, 1, Stats)).
+
+get_statistics_2_1_test() ->
+ %% get_statistics/2
+ %% First set of values is empty
+ Stats = bear:get_statistics(lists:seq(1,10), []),
+ ?assertEqual(0.0, Stats).
+
+get_statistics_3_test() ->
+ %% get_statistics/2
+ %% Second set of values is empty
+ Stats = bear:get_statistics([], lists:seq(1,10)),
+ ?assertEqual(0.0, Stats).
+
+get_statistics_4_test() ->
+ %% get_statistics/2
+ %% Two set of values with different sizes
+ Stats = bear:get_statistics(lists:seq(1,10),lists:seq(1,20)),
+ ?assertEqual(0.0, Stats).
+
+get_statistics_5_test() ->
+ %% get_statistics/2
+ %% Two set of values are valid
+ Stats = bear:get_statistics(lists:seq(0,10),lists:seq(4,24,2)),
+ ?assertEqual({covariance, 20.0}, lists:keyfind(covariance, 1, Stats)),
+ ?assertEqual({tau, 1.0}, lists:keyfind(tau, 1, Stats)),
+ ?assertEqual({rho, 1.0}, lists:keyfind(rho, 1, Stats)),
+ ?assertEqual({r, 1.0}, lists:keyfind(r, 1, Stats)).
+
+scan_values_test() ->
+ ?assertEqual(#scan_result{n=8}, bear:scan_values([], #scan_result{n=8})),
+ ?assertEqual(#scan_result{n=1,sumX=1,sumXX=1,sumInv=1.0,sumLog=0.0,max=1,min=1}, bear:scan_values([1])),
+ ?assertEqual(#scan_result{n=4,sumX=10,sumXX=30,sumInv=2.083333333333333,sumLog=3.1780538303479453,max=4,min=1},
+ bear:scan_values([1,3,2,4])).
+
+scan_values2_test() ->
+ ?assertEqual(#scan_result{n=8}, bear:scan_values2([], 3, #scan_result{n=8})),
+ ?assertEqual(#scan_result2{x2=6.6875,x3=-13.359375,x4=28.07421875}, bear:scan_values2([4,3,5], #scan_result{n=8,sumX=42})).
+
+revsort_test() ->
+ ?assertEqual([], bear:revsort([])),
+ ?assertEqual([4,3,2], bear:revsort([3,2,4])).
+
+arithmetic_mean_test() ->
+ ?assertEqual(10.0, bear:arithmetic_mean(#scan_result{n=4, sumX=40})).
+
+geometric_mean_test() ->
+ ?assertEqual(25.790339917193062, bear:geometric_mean(#scan_result{n=4, sumLog=13})).
+
+harmonic_mean_test() ->
+ ?assertEqual(0, bear:harmonic_mean(#scan_result{n=100, sumInv=0})),
+ ?assertEqual(10.0, bear:harmonic_mean(#scan_result{n=100, sumInv=10})).
+
+percentile_test() ->
+ ?assertEqual(3, bear:percentile([1,2,3,4,5], #scan_result{n=5},0.5)),
+ ?assertEqual(5, bear:percentile([1,2,3,4,5], #scan_result{n=5},0.95)).
+
+variance_test() ->
+ ?assertEqual(7.0, bear:variance(#scan_result{n=7},#scan_result2{x2=42})).
+
+std_deviation_test() ->
+ ?assertEqual(3.0, bear:std_deviation(#scan_result{n=10},#scan_result2{x2=81})).
+
+skewness_test() ->
+ ?assertEqual(0.0, bear:skewness(#scan_result{n=10},#scan_result2{x2=0,x3=81})),
+ ?assertEqual(3.0, bear:skewness(#scan_result{n=10},#scan_result2{x2=81,x3=810})).
+
+kurtosis_test() ->
+ ?assertEqual(0.0, bear:kurtosis(#scan_result{n=10},#scan_result2{x2=0,x4=81})),
+ ?assertEqual(-2.0, bear:kurtosis(#scan_result{n=10},#scan_result2{x2=81,x4=810})).
+
+update_bin_1_test() ->
+ %% with empty dict
+ Dict = dict:new(),
+ C = bear:update_bin(4, [4], Dict),
+ ?assertEqual(1, dict:fetch(4, C)).
+
+get_covariance_test() ->
+ %% Array 1 is too short
+ ?assertEqual(0.0, bear:get_covariance([], [2,1,2,3,4,5,6])),
+ %% Array 2 is too short
+ ?assertEqual(0.0, bear:get_covariance([1,2,3,4,5,6], [])),
+ %% diffenrent arry length
+ ?assertEqual(0.0, bear:get_covariance([1,2,3,4,5,6], [1,2,3,4,5,6,7])),
+ %% Usual case
+ ?assertEqual(-30944444444444444, erlang:trunc(?PRECISION * bear:get_covariance([11,2,3,41,5,9], [34,2,23,4,5,6]))).
+
+ranks_of_test() ->
+ ?assertEqual([4.0,3.0,1.0,2.0], bear:ranks_of([3,4,15,6])).
+
+get_pearson_correlation_test() ->
+ ?assertEqual(0.0, bear:get_pearson_correlation([], 42)),
+ ?assertEqual(0.0, bear:get_pearson_correlation(42, [])),
+ ?assertEqual(0.0, bear:get_pearson_correlation(lists:seq(1,10), lists:seq(1,11))),
+ ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(1,10), lists:seq(1,10))),
+ ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(0,10), lists:seq(5,15))),
+ ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(40,60,2), lists:seq(10,20))).
+
+
+round_bin_test() ->
+ ?assertEqual(10, bear:round_bin(10)),
+ ?assertEqual(10, bear:round_bin(10, 5)),
+ ?assertEqual(42, bear:round_bin(15, 42)),
+ ?assertEqual(45, bear:round_bin(42, 15)).
+
+get_bin_width_test() ->
+ ?assertEqual(1, bear:get_bin_width(0, 10)),
+ ?assertEqual(22, bear:get_bin_width(10.0, 4.0)).
+
+get_bin_count_test() ->
+ ?assertEqual(3, bear:get_bin_count(9, 15, 3)),
+ ?assertEqual(4, bear:get_bin_count(10.2, 20.2, 4)).
+
+get_kendall_correlation_test()->
+ ?assertEqual(0.0, bear:get_kendall_correlation([], [])),
+ ?assertEqual(0.0, bear:get_kendall_correlation([], [1,2,3,4,5,6,7])),
+ ?assertEqual(0.0, bear:get_kendall_correlation([1,2,3,4,5,6,7],[])),
+ ?assertEqual(0.0, bear:get_kendall_correlation(lists:seq(1,10),lists:seq(1,11))),
+ ?assertEqual(1.0, bear:get_kendall_correlation([1,2,3,4,5,6,7], [2,3,4,5,6,7,9])).
+
+get_spearman_correlation_test()->
+ ?assertEqual(0.0, bear:get_spearman_correlation([], [])),
+ ?assertEqual(0.0, bear:get_spearman_correlation([], [1,2,3,4,5,6,7])),
+ ?assertEqual(0.0, bear:get_spearman_correlation([1,2,3,4,5,6,7],[])),
+ ?assertEqual(0.0, bear:get_spearman_correlation(lists:seq(1,10),lists:seq(1,11))),
+ ?assertEqual(1.0, bear:get_spearman_correlation([1,2,3,4,5,6,7], [2,3,4,5,6,7,9])).
+
+
+math_log_test() ->
+ ?assertEqual(1, bear:math_log(0)),
+ ?assertEqual(1.0, bear:math_log(0.0)),
+ ?assertEqual(3737669618283368, erlang:trunc(?PRECISION * bear:math_log(42))).
+
+inverse_test() ->
+ ?assertEqual(0, bear:inverse(0)),
+ ?assertEqual(0.0, bear:inverse(0.0)),
+ ?assertEqual(0.5, bear:inverse(2)).
+
+get_hist_bins_test() ->
+ ?assertEqual([4], bear:get_hist_bins(1, 4, 5, 10)).
+
+tied_ordered_ranking_test() ->
+ ?assertEqual([3,2,1], bear:tied_ordered_ranking([], [], [1,2,3])).
+
+kendall_right_off_test() ->
+ %% empty array
+ ?assertEqual("654321", bear:kendall_right_of([],"123456")).
+
+tied_add_prev_test() ->
+ ?assertEqual([{2.5,5},{2.5,5},{2.5,5},{2.5,5},{2,3}], bear:tied_add_prev([{2, 3}], {[1,2,3,4], 5})).
+
+tied_rank_worker_test() ->
+ ?assertEqual([{2.0,5},{2.0,5},{2.0,5},{2.0,5}], bear:tied_rank_worker([], [{2.0,5}], {[1,2,3], 5})),
+ ?assertEqual([{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5}],
+ bear:tied_rank_worker([{2.0,5},{2.0,5}], [{2.0,5}], {[1,2,3], 5})).
[14/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #10 from rodo/master
Add unit tests for some functions
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/3cb96e4c
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/3cb96e4c
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/3cb96e4c
Branch: refs/heads/import-master
Commit: 3cb96e4c148680e1eba0d5fe93e82db534fd00ec
Parents: 96cbfae c65276d
Author: Joe Williams <wi...@gmail.com>
Authored: Thu Oct 31 11:38:52 2013 -0700
Committer: Joe Williams <wi...@gmail.com>
Committed: Thu Oct 31 11:38:52 2013 -0700
----------------------------------------------------------------------
test/bear_test.erl | 232 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 232 insertions(+)
----------------------------------------------------------------------
[09/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add more calculations to percentiles proplist.
The 90th percentile is obvious; the 50th has been included to simplify
access for end-users.
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/6263a557
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/6263a557
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/6263a557
Branch: refs/heads/import-master
Commit: 6263a5579eab998f034d9221d662df1e8da6a6ba
Parents: 0da736b
Author: Benjamin Anderson <b...@banjiewen.net>
Authored: Fri Feb 22 11:20:00 2013 -0800
Committer: Benjamin Anderson <b...@banjiewen.net>
Committed: Fri Feb 22 11:50:50 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 4 ++++
1 file changed, 4 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/6263a557/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 65b2831..7039910 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -55,7 +55,9 @@ get_statistics(Values) when length(Values) < ?STATS_MIN ->
{kurtosis, 0.0},
{percentile,
[
+ {50, 0.0},
{75, 0.0},
+ {90, 0.0},
{95, 0.0},
{99, 0.0},
{999, 0.0}
@@ -82,7 +84,9 @@ get_statistics(Values) ->
{kurtosis, kurtosis(Scan_res, Scan_res2)},
{percentile,
[
+ {50, percentile(SortedValues, Scan_res, 0.50)},
{75, percentile(SortedValues, Scan_res, 0.75)},
+ {90, percentile(SortedValues, Scan_res, 0.90)},
{95, percentile(SortedValues, Scan_res, 0.95)},
{99, percentile(SortedValues, Scan_res, 0.99)},
{999, percentile(SortedValues, Scan_res, 0.999)}
[12/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #9 from SemanticSugar/scan-values-float-fix
Add math_log/1 and inverse/1 patterns to catch 0.0
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/96cbfae6
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/96cbfae6
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/96cbfae6
Branch: refs/heads/import-master
Commit: 96cbfae62406e311bcfeb819ea46610eb38c5aff
Parents: 3fd09d1 49cec9a
Author: Joe Williams <wi...@gmail.com>
Authored: Mon Jul 1 10:25:29 2013 -0700
Committer: Joe Williams <wi...@gmail.com>
Committed: Mon Jul 1 10:25:29 2013 -0700
----------------------------------------------------------------------
src/bear.erl | 4 ++++
1 file changed, 4 insertions(+)
----------------------------------------------------------------------
[23/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
dont take the log of a negative number dummy
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/6c19d6a2
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/6c19d6a2
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/6c19d6a2
Branch: refs/heads/import-master
Commit: 6c19d6a2ee031512fca9916dd7cca2fc7ea2a38e
Parents: bb739b2
Author: Joe Williams <jo...@github.com>
Authored: Tue Nov 5 13:03:57 2013 -0800
Committer: Joe Williams <jo...@github.com>
Committed: Tue Nov 5 13:03:57 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 2 ++
test/bear_test.erl | 10 ++++++++++
2 files changed, 12 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/6c19d6a2/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index d46aa8a..3feb0bc 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -373,6 +373,8 @@ math_log(0) ->
1;
math_log(0.0) ->
1.0;
+math_log(X) when X < 0 ->
+ 0; % it's not possible to take a log of a negative number, return 0
math_log(X) ->
math:log(X).
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/6c19d6a2/test/bear_test.erl
----------------------------------------------------------------------
diff --git a/test/bear_test.erl b/test/bear_test.erl
index fc37c6b..5b2b4e1 100644
--- a/test/bear_test.erl
+++ b/test/bear_test.erl
@@ -244,6 +244,16 @@ full_subset_test() ->
Stats = bear:get_statistics(bear:test_values()),
match_values2(Stats).
+negative_test() ->
+ %% make sure things don't blow up with a negative value
+ Values = [1,-1,-2,3,3,4,5,6,7],
+ [{min, -2}] = bear:get_statistics_subset(Values, [min]).
+
+negative2_test() ->
+ %% make sure things don't blow up with a negative value
+ Values = [-1,-1,-2,-2,-3,-5,-6,-10],
+ [{min, -10}] = bear:get_statistics_subset(Values, [min]).
+
match_values([H|T]) ->
Res = bear:get_statistics_subset(bear:test_values(), [mk_item(H)]),
Res = [H],
[10/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #8 from banjiewen/additional-percentiles
Additional percentiles
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/3fd09d1b
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/3fd09d1b
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/3fd09d1b
Branch: refs/heads/import-master
Commit: 3fd09d1b7bbd9de5b2d29f46df04a93fca9ce85e
Parents: 0da736b 6263a55
Author: Joe Williams <wi...@gmail.com>
Authored: Fri Feb 22 11:57:09 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Fri Feb 22 11:57:09 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 4 ++++
1 file changed, 4 insertions(+)
----------------------------------------------------------------------
[21/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
some formating
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/926a4861
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/926a4861
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/926a4861
Branch: refs/heads/import-master
Commit: 926a48615d21594e8c8dd618e2e7d9f3785a7980
Parents: 9ff5fd0
Author: Joe Williams <jo...@github.com>
Authored: Tue Nov 5 10:53:07 2013 -0800
Committer: Joe Williams <jo...@github.com>
Committed: Tue Nov 5 10:53:07 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/926a4861/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index b211a54..39111f5 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -134,20 +134,20 @@ level(_) -> 1.
report_subset(Items, N, SortedValues, Scan_res, Scan_res2) ->
lists:map(
fun(min) -> {min, hd(SortedValues)};
- (max) -> {max, lists:last(SortedValues)};
- (arithmetic_mean) -> {arithmetic_mean, arithmetic_mean(Scan_res)};
- (harmonic_mean) -> {harmonic_mean, harmonic_mean(Scan_res)};
- (geometric_mean) -> {geometric_mean, geometric_mean(Scan_res)};
- (median) -> {median, percentile(SortedValues,
- #scan_result{n = N}, 0.5)};
- (variance) -> {variance, variance(Scan_res, Scan_res2)};
- (standard_deviation=I) -> {I, std_deviation(Scan_res, Scan_res2)};
- (skewness) -> {skewness, skewness(Scan_res, Scan_res2)};
- (kurtosis) -> {kurtosis, kurtosis(Scan_res, Scan_res2)};
- ({percentile,Ps}) -> {percentile, percentiles(Ps, N, SortedValues)};
- (histogram) ->
- {histogram, get_histogram(SortedValues, Scan_res, Scan_res2)};
- (n) -> {n, N}
+ (max) -> {max, lists:last(SortedValues)};
+ (arithmetic_mean) -> {arithmetic_mean, arithmetic_mean(Scan_res)};
+ (harmonic_mean) -> {harmonic_mean, harmonic_mean(Scan_res)};
+ (geometric_mean) -> {geometric_mean, geometric_mean(Scan_res)};
+ (median) -> {median, percentile(SortedValues,
+ #scan_result{n = N}, 0.5)};
+ (variance) -> {variance, variance(Scan_res, Scan_res2)};
+ (standard_deviation=I) -> {I, std_deviation(Scan_res, Scan_res2)};
+ (skewness) -> {skewness, skewness(Scan_res, Scan_res2)};
+ (kurtosis) -> {kurtosis, kurtosis(Scan_res, Scan_res2)};
+ ({percentile,Ps}) -> {percentile, percentiles(Ps, N, SortedValues)};
+ (histogram) ->
+ {histogram, get_histogram(SortedValues, Scan_res, Scan_res2)};
+ (n) -> {n, N}
end, Items).
get_statistics(Values, _) when length(Values) < ?STATS_MIN ->
@@ -532,4 +532,3 @@ test_values() ->
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9].
-
[25/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add unit test on uncovered function, move test data from src/ to test/
Use new approx method for floating numbers
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/3994adfe
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/3994adfe
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/3994adfe
Branch: refs/heads/import-master
Commit: 3994adfe334fe29f95d516c483c7323a70b89719
Parents: 5ed737e
Author: Rodolphe Quiédeville <ro...@quiedeville.org>
Authored: Tue Nov 5 09:48:17 2013 +0100
Committer: Rodolphe Quiédeville <ro...@quiedeville.org>
Committed: Wed Nov 6 11:11:56 2013 +0100
----------------------------------------------------------------------
src/bear.erl | 12 ----
test/bear_test.erl | 151 +++++++++++++++++++++++++++++++++++++-----------
2 files changed, 118 insertions(+), 45 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/3994adfe/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 3feb0bc..7d9eed9 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -524,15 +524,3 @@ perc(P, Len) when is_integer(P), 100 =< P, P =< 1000 ->
erlang:max(1, V);
perc(P, Len) when is_float(P), 0 =< P, P =< 1 ->
erlang:max(1, round(P * Len)).
-
-
-test_values() ->
- [1,1,1,1,1,1,1,
- 2,2,2,2,2,2,2,
- 3,3,3,3,3,3,3,3,3,3,3,3,3,3,
- 4,4,4,4,4,4,4,4,4,4,4,4,4,4,
- 5,5,5,5,5,5,5,5,5,5,5,5,5,5,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
- 9,9,9,9,9,9,9].
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/3994adfe/test/bear_test.erl
----------------------------------------------------------------------
diff --git a/test/bear_test.erl b/test/bear_test.erl
index 5b2b4e1..ce35ea3 100644
--- a/test/bear_test.erl
+++ b/test/bear_test.erl
@@ -30,7 +30,7 @@
-include_lib("eunit/include/eunit.hrl").
--define(PRECISION, 1.0e15).
+-define(PRECISION_DIGIT, 6).
get_statistics_1_empty_test() ->
%% get_statistics/1
@@ -54,28 +54,29 @@ get_statistics_1_empty_test() ->
get_statistics_1_regular_test() ->
%% get_statistics/1
%% Non empty set of values
- Percentile = [{50, 5},{75, 8},{90, 9},{95, 10},{99, 10},{999, 10}],
- Stats = bear:get_statistics(lists:seq(1,10)),
+ Percentile = [{50, -10},{75, 23},{90, 43},{95, 46},{99, 50},{999, 50}],
+ Stats = bear:get_statistics(sample1()),
{geometric_mean, Geometric} = lists:keyfind(geometric_mean, 1, Stats),
{harmonic_mean, Harmonic} = lists:keyfind(harmonic_mean, 1, Stats),
{variance, Variance} = lists:keyfind(variance, 1, Stats),
{standard_deviation, StandardDeviation} = lists:keyfind(standard_deviation, 1, Stats),
{kurtosis, Kurtosis} = lists:keyfind(kurtosis, 1, Stats),
-
- ?assertEqual({min, 1}, lists:keyfind(min, 1, Stats)),
- ?assertEqual({max, 10}, lists:keyfind(max, 1, Stats)),
- ?assertEqual({arithmetic_mean, 5.5}, lists:keyfind(arithmetic_mean, 1, Stats)),
- ?assertEqual(4528728688116766, erlang:trunc(?PRECISION * Geometric)),
- ?assertEqual(3414171521474055, erlang:trunc(?PRECISION * Harmonic)),
- ?assertEqual({median, 5}, lists:keyfind(median, 1, Stats)),
- ?assertEqual(9166666666666666, erlang:trunc(?PRECISION * Variance)),
- ?assertEqual(3027650354097491, erlang:trunc(?PRECISION * StandardDeviation)),
- ?assertEqual({skewness, 0.0}, lists:keyfind(skewness, 1, Stats)),
- ?assertEqual(-1561636363636363, erlang:trunc(?PRECISION * Kurtosis)),
+ {skewness, Skewness} = lists:keyfind(skewness, 1, Stats),
+
+ ?assertEqual({min, -49}, lists:keyfind(min, 1, Stats)),
+ ?assertEqual({max, 50}, lists:keyfind(max, 1, Stats)),
+ ?assertEqual({arithmetic_mean, -1.66}, lists:keyfind(arithmetic_mean, 1, Stats)),
+ ?assertEqual(true, approx(4.08326, Geometric)),
+ ?assertEqual(true, approx(54.255629738, Harmonic)),
+ ?assertEqual({median, -10}, lists:keyfind(median, 1, Stats)),
+ ?assertEqual(true, approx(921.0453061, Variance)),
+ ?assertEqual(true, approx(30.348728, StandardDeviation)),
+ ?assertEqual(true, approx(0.148722, Skewness)),
+ ?assertEqual(true, approx(-1.2651687, Kurtosis)),
?assertEqual({percentile, Percentile}, lists:keyfind(percentile, 1, Stats)),
- ?assertEqual({histogram, [{6,6},{11,4},{16,0}]}, lists:keyfind(histogram, 1, Stats)),
- ?assertEqual({n, 10}, lists:keyfind(n, 1, Stats)).
+ ?assertEqual({histogram, [{-20,16},{11,16},{41,12},{71,6}]}, lists:keyfind(histogram, 1, Stats)),
+ ?assertEqual({n, 50}, lists:keyfind(n, 1, Stats)).
get_statistics_2_1_test() ->
%% get_statistics/2
@@ -152,26 +153,33 @@ update_bin_1_test() ->
C = bear:update_bin(4, [4], Dict),
?assertEqual(1, dict:fetch(4, C)).
-get_covariance_test() ->
+get_covariance_exceptions_test() ->
%% Array 1 is too short
?assertEqual(0.0, bear:get_covariance([], [2,1,2,3,4,5,6])),
%% Array 2 is too short
?assertEqual(0.0, bear:get_covariance([1,2,3,4,5,6], [])),
%% diffenrent arry length
- ?assertEqual(0.0, bear:get_covariance([1,2,3,4,5,6], [1,2,3,4,5,6,7])),
+ ?assertEqual(0.0, bear:get_covariance([1,2,3,4,5,6], [1,2,3,4,5,6,7])).
+
+get_covariance_regular_test() ->
%% Usual case
- ?assertEqual(-30944444444444444, erlang:trunc(?PRECISION * bear:get_covariance([11,2,3,41,5,9], [34,2,23,4,5,6]))).
+ %% Result is not the same as R compute, R use an unbiased estimate
+ %% http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance
+ ?assertEqual(true, approx(170.813599, bear:get_covariance(sample1(),sample2()))).
ranks_of_test() ->
?assertEqual([4.0,3.0,1.0,2.0], bear:ranks_of([3,4,15,6])).
-get_pearson_correlation_test() ->
+get_pearson_correlation_exceptions_test() ->
?assertEqual(0.0, bear:get_pearson_correlation([], 42)),
?assertEqual(0.0, bear:get_pearson_correlation(42, [])),
?assertEqual(0.0, bear:get_pearson_correlation(lists:seq(1,10), lists:seq(1,11))),
?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(1,10), lists:seq(1,10))),
- ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(0,10), lists:seq(5,15))),
- ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(40,60,2), lists:seq(10,20))).
+ ?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(0,10), lists:seq(5,15))).
+
+get_pearson_correlation_regular_test() ->
+ %% Target is calculate by R
+ ?assertEqual(true, approx(0.2068785, bear:get_pearson_correlation(sample1(), sample2()))).
get_pearson_correlation_nullresult_test() ->
%% The two series do not correlate
@@ -193,25 +201,33 @@ get_bin_count_test() ->
?assertEqual(3, bear:get_bin_count(9, 15, 3)),
?assertEqual(4, bear:get_bin_count(10.2, 20.2, 4)).
-get_kendall_correlation_test()->
+get_kendall_correlation_exceptions_test()->
?assertEqual(0.0, bear:get_kendall_correlation([], [])),
?assertEqual(0.0, bear:get_kendall_correlation([], [1,2,3,4,5,6,7])),
?assertEqual(0.0, bear:get_kendall_correlation([1,2,3,4,5,6,7],[])),
- ?assertEqual(0.0, bear:get_kendall_correlation(lists:seq(1,10),lists:seq(1,11))),
- ?assertEqual(1.0, bear:get_kendall_correlation([1,2,3,4,5,6,7], [2,3,4,5,6,7,9])).
+ ?assertEqual(0.0, bear:get_kendall_correlation(lists:seq(1,10),lists:seq(1,11))).
+
+get_kendall_correlation_regular_test()->
+ Kendall = bear:get_kendall_correlation(sample1(order), sample2(order)),
+ ?assertEqual(true, approx(0.9787755, Kendall)).
-get_spearman_correlation_test()->
+kendall_correlation_test()->
+ Kendall = bear:kendall_correlation(sample1(order), sample2(order)),
+ ?assertEqual(true, approx(0.9787755, Kendall)).
+
+get_spearman_correlation_exceptions_test()->
?assertEqual(0.0, bear:get_spearman_correlation([], [])),
?assertEqual(0.0, bear:get_spearman_correlation([], [1,2,3,4,5,6,7])),
?assertEqual(0.0, bear:get_spearman_correlation([1,2,3,4,5,6,7],[])),
- ?assertEqual(0.0, bear:get_spearman_correlation(lists:seq(1,10),lists:seq(1,11))),
- ?assertEqual(1.0, bear:get_spearman_correlation([1,2,3,4,5,6,7], [2,3,4,5,6,7,9])).
+ ?assertEqual(0.0, bear:get_spearman_correlation(lists:seq(1,10),lists:seq(1,11))).
+get_spearman_correlation_regular_test()->
+ ?assertEqual(true, approx(0.997888, bear:get_spearman_correlation(sample1(order), sample2(order)))).
math_log_test() ->
?assertEqual(1, bear:math_log(0)),
?assertEqual(1.0, bear:math_log(0.0)),
- ?assertEqual(3737669618283368, erlang:trunc(?PRECISION * bear:math_log(42))).
+ ?assertEqual(true, approx(3.737669618283368, bear:math_log(42))).
inverse_test() ->
?assertEqual(0, bear:inverse(0)),
@@ -236,12 +252,25 @@ tied_rank_worker_test() ->
?assertEqual([{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5},{2.0,5}],
bear:tied_rank_worker([{2.0,5},{2.0,5}], [{2.0,5}], {[1,2,3], 5})).
+perc_test() ->
+ ?assertEqual(14, bear:perc(36, 40)),
+ ?assertEqual(5, bear:perc(900, 5)),
+ ?assertEqual(5, bear:perc(0.9, 5)).
+
+get_statistics_subset_nev_test() ->
+ %% Not enough values case
+ ?assertEqual([], bear:get_statistics_subset([1,2], [])).
+
+get_statistics_subset_regular_test() ->
+ %% Regular case
+ ?assertEqual([{max, 50},{min, -49}], bear:get_statistics_subset(sample1(), [max,min])).
+
subset_test() ->
- Stats = bear:get_statistics(bear:test_values()),
+ Stats = bear:get_statistics(test_values()),
match_values(Stats).
full_subset_test() ->
- Stats = bear:get_statistics(bear:test_values()),
+ Stats = bear:get_statistics(test_values()),
match_values2(Stats).
negative_test() ->
@@ -255,7 +284,7 @@ negative2_test() ->
[{min, -10}] = bear:get_statistics_subset(Values, [min]).
match_values([H|T]) ->
- Res = bear:get_statistics_subset(bear:test_values(), [mk_item(H)]),
+ Res = bear:get_statistics_subset(test_values(), [mk_item(H)]),
Res = [H],
match_values(T);
match_values([]) ->
@@ -268,5 +297,61 @@ mk_item({K, _}) ->
match_values2(Stats) ->
Items = [mk_item(I) || I <- Stats],
- Stats = bear:get_statistics_subset(bear:test_values(), Items),
+ Stats = bear:get_statistics_subset(test_values(), Items),
ok.
+
+test_values() ->
+ [1,1,1,1,1,1,1,
+ 2,2,2,2,2,2,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,
+ 4,4,4,4,4,4,4,4,4,4,4,4,4,4,
+ 5,5,5,5,5,5,5,5,5,5,5,5,5,5,
+ 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
+ 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
+ 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+ 9,9,9,9,9,9,9].
+
+negative_values() ->
+ %% All values are negative
+ [-1,-1,-1,-1,-1,-1,-1,
+ -2,-2,-2,-2,-2,-2,-2,
+ -3,-3,-3,-3,-3,-3,-3,-3,-3,-3,-3,-3,-3,-3,
+ -4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,
+ -5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,
+ -6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,-6,
+ -7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,-7,
+ -8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,
+ -9,-9,-9,-9,-9,-9,-9].
+
+between(Value, Low, High) ->
+ (Value >= Low) and (Value =< High).
+
+approx(Target, Value) ->
+ High = Target + math:pow(10, - ?PRECISION_DIGIT),
+ Low = Target - math:pow(10, - ?PRECISION_DIGIT),
+ case (Value > Low) and (Value < High) of
+ true -> true;
+ _ -> Value
+ end.
+
+check_sample_test() ->
+ ?assertEqual(50, length(sample1())),
+ ?assertEqual(50, length(sample1(order))),
+ ?assertEqual(50, length(sample2())),
+ ?assertEqual(50, length(sample2(order))).
+
+sample1(X) when X == order ->
+ lists:sort(sample1()).
+
+sample2(X) when X == order ->
+ lists:sort(sample2()).
+
+sample1() ->
+ %% datas from file bear/samples/data.csv
+ %% first column X
+ [-16,-18,-47,22,-18,36,25,49,-24,15,36,-10,-21,43,-35,1,-24,10,33,-21,-18,-36,-36,-43,-37,-10,23,50,31,-49,43,46,22,-43,12,-47,15,-14,6,-31,46,-8,0,-46,-16,-22,6,10,38,-11].
+
+sample2() ->
+ %% datas from file bear/samples/data.csv
+ %% second column Y
+ [33,20,-35,16,-19,8,25,3,4,10,36,-20,-41,43,28,39,-30,3,-47,-23,17,-6,-50,16,-26,-49,8,-31,24,16,32,27,-19,-32,-17,1,-37,25,-50,-32,-42,-22,25,18,-34,-37,7,-13,16,10].
[24/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #14 from boundary/drop_negatives
negatives, dont math:log/1 'em
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/5ed737e9
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/5ed737e9
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/5ed737e9
Branch: refs/heads/import-master
Commit: 5ed737e9e805771d47fabc6bc733f02f8421a635
Parents: bb739b2 6c19d6a
Author: Joe Williams <wi...@gmail.com>
Authored: Tue Nov 5 13:04:37 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Tue Nov 5 13:04:37 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 2 ++
test/bear_test.erl | 10 ++++++++++
2 files changed, 12 insertions(+)
----------------------------------------------------------------------
[30/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #18 from banjiewen/null-subsets
get_statistics_subset should return well-formatted null results
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/5f998064
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/5f998064
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/5f998064
Branch: refs/heads/import-master
Commit: 5f998064d178b1b8d01ed90c228d50d8097b12d3
Parents: 7d1ee8e 0a1d531
Author: Joe Williams <wi...@gmail.com>
Authored: Thu Dec 12 14:18:21 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Thu Dec 12 14:18:21 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 44 +++++++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 21 deletions(-)
----------------------------------------------------------------------
[27/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Remove un-necessary if statement
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/f5e777d7
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/f5e777d7
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/f5e777d7
Branch: refs/heads/import-master
Commit: f5e777d711008068ca15a48f95c20b44946995ef
Parents: 1a902e8
Author: Benjamin Anderson <b...@banjiewen.net>
Authored: Sat Nov 16 18:03:58 2013 -0800
Committer: Benjamin Anderson <b...@banjiewen.net>
Committed: Sat Nov 16 18:24:35 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/f5e777d7/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index 67f4139..fe79fae 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -98,21 +98,15 @@ get_statistics(Values) when is_list(Values) ->
get_statistics_subset([_,_,_,_,_|_] = Values, Items) ->
Length = length(Values),
- if Length < ?STATS_MIN ->
- [I || {K,_} = I <- get_statistics([]),
- lists:member(K, Items) orelse K==percentiles];
- true ->
- SortedValues = lists:sort(Values),
- Steps = calc_steps(Items),
- Scan_res = if Steps > 1 -> scan_values(Values);
- true -> []
- end,
- Scan_res2 = if Steps > 2 -> scan_values2(Values, Scan_res);
- true -> []
- end,
- report_subset(Items, Length,
- SortedValues, Scan_res, Scan_res2)
- end;
+ SortedValues = lists:sort(Values),
+ Steps = calc_steps(Items),
+ Scan_res = if Steps > 1 -> scan_values(Values);
+ true -> []
+ end,
+ Scan_res2 = if Steps > 2 -> scan_values2(Values, Scan_res);
+ true -> []
+ end,
+ report_subset(Items, Length, SortedValues, Scan_res, Scan_res2);
get_statistics_subset(Values, Items) when is_list(Values) ->
get_null_statistics_subset(Items, []).
[29/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Clean up whitespace
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/0a1d5318
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/0a1d5318
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/0a1d5318
Branch: refs/heads/import-master
Commit: 0a1d531802cc589138301347701c40502de03edd
Parents: f5e777d
Author: Benjamin Anderson <b...@banjiewen.net>
Authored: Sat Nov 16 18:24:02 2013 -0800
Committer: Benjamin Anderson <b...@banjiewen.net>
Committed: Sat Nov 16 18:24:35 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/0a1d5318/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index fe79fae..3a7898f 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -118,11 +118,12 @@ get_null_statistics_subset([], Acc) ->
lists:reverse(Acc).
calc_steps(Items) ->
- lists:foldl(fun({I,_},Acc) ->
- erlang:max(level(I), Acc);
- (I,Acc) ->
- erlang:max(level(I), Acc)
- end, 1, Items).
+ lists:foldl(
+ fun({I,_},Acc) ->
+ erlang:max(level(I), Acc);
+ (I,Acc) ->
+ erlang:max(level(I), Acc)
+ end, 1, Items).
level(standard_deviation) -> 3;
level(variance ) -> 3;
[20/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #13 from Feuerlabs/uw-boundary-stats-subset
adjust get_statistics to allow for requesting specific stats to calculate
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/9ff5fd09
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/9ff5fd09
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/9ff5fd09
Branch: refs/heads/import-master
Commit: 9ff5fd09f2ba1a5fbb20bb2238e786c4404c8387
Parents: b9feed8 d278aae
Author: Joe Williams <wi...@gmail.com>
Authored: Mon Nov 4 12:32:41 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Mon Nov 4 12:32:41 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 141 ++++++++++++++++++++++++++++++++++++++----------
test/bear_test.erl | 25 +++++++++
2 files changed, 139 insertions(+), 27 deletions(-)
----------------------------------------------------------------------
[16/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Add a new test on Pearson correlation
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/69a9cf00
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/69a9cf00
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/69a9cf00
Branch: refs/heads/import-master
Commit: 69a9cf0097b802caae34db16681773482a8d61d8
Parents: bd20bd5
Author: Rodolphe Quiédeville <ro...@quiedeville.org>
Authored: Fri Nov 1 12:23:26 2013 +0100
Committer: Rodolphe Quiédeville <ro...@quiedeville.org>
Committed: Fri Nov 1 12:23:26 2013 +0100
----------------------------------------------------------------------
test/bear_test.erl | 5 +++++
1 file changed, 5 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/69a9cf00/test/bear_test.erl
----------------------------------------------------------------------
diff --git a/test/bear_test.erl b/test/bear_test.erl
index 10d447c..2cca076 100644
--- a/test/bear_test.erl
+++ b/test/bear_test.erl
@@ -173,6 +173,11 @@ get_pearson_correlation_test() ->
?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(0,10), lists:seq(5,15))),
?assertEqual(1.0, bear:get_pearson_correlation(lists:seq(40,60,2), lists:seq(10,20))).
+get_pearson_correlation_nullresult_test() ->
+ %% The two series do not correlate
+ A = [-1,-0.5,0,0.5,1],
+ B = [1,0.25,0,0.25,1],
+ ?assertEqual(0.0, bear:get_pearson_correlation(A, B)).
round_bin_test() ->
?assertEqual(10, bear:round_bin(10)),
[15/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Remove non needed clause in ranks_of/5
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/bd20bd5c
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/bd20bd5c
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/bd20bd5c
Branch: refs/heads/import-master
Commit: bd20bd5cec51cd14c9becb7b3f360c6b09adf549
Parents: c65276d
Author: Rodolphe Quiédeville <ro...@quiedeville.org>
Authored: Thu Oct 31 21:50:54 2013 +0100
Committer: Rodolphe Quiédeville <ro...@quiedeville.org>
Committed: Thu Oct 31 21:50:54 2013 +0100
----------------------------------------------------------------------
src/bear.erl | 2 --
1 file changed, 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/couchdb-bear/blob/bd20bd5c/src/bear.erl
----------------------------------------------------------------------
diff --git a/src/bear.erl b/src/bear.erl
index ffc9025..04593e7 100644
--- a/src/bear.erl
+++ b/src/bear.erl
@@ -278,8 +278,6 @@ ranks_of(Values) when is_list(Values) ->
end, [], Values),
lists:reverse(L).
-ranks_of([E|Es],Acc, N, E, S) ->
- ranks_of(Es, Acc, N+1, E, S);
ranks_of([E|Es], Acc, N, P, S) ->
ranks_of(Es,[{P,(S+N-1)/2}|Acc], N+1, E, N);
ranks_of([], Acc, N, P, S) ->
[17/30] bear commit: updated refs/heads/import-master to 5f99806
Posted by da...@apache.org.
Merge pull request #12 from rodo/master
Remove non needed clause in ranks_of/5
Project: http://git-wip-us.apache.org/repos/asf/couchdb-bear/repo
Commit: http://git-wip-us.apache.org/repos/asf/couchdb-bear/commit/b9feed84
Tree: http://git-wip-us.apache.org/repos/asf/couchdb-bear/tree/b9feed84
Diff: http://git-wip-us.apache.org/repos/asf/couchdb-bear/diff/b9feed84
Branch: refs/heads/import-master
Commit: b9feed8400db6ff923a67f77b476e22fd3a193fd
Parents: 3cb96e4 69a9cf0
Author: Joe Williams <wi...@gmail.com>
Authored: Mon Nov 4 10:11:34 2013 -0800
Committer: Joe Williams <wi...@gmail.com>
Committed: Mon Nov 4 10:11:34 2013 -0800
----------------------------------------------------------------------
src/bear.erl | 2 --
test/bear_test.erl | 5 +++++
2 files changed, 5 insertions(+), 2 deletions(-)
----------------------------------------------------------------------