You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Shubham Jindal <sh...@appdynamics.com> on 2017/11/08 15:50:19 UTC

MATH cKMeans Implementation

Hello,

I have written a full fledged efficient implementation of cKMeans in Java
https://cran.r-project.org/web/packages/Ckmeans.1d.dp/index.html and
https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf

The algorithm described here is *O(kn^2)* where *k*: number of clusters and
*n*: number of 1D points. But, there exists an efficient implementation in
later versions of cKMeans which is *O(knlog(n))*

cKMeans is faster than kMeans and also deterministic in nature. Can I
submit a patch request for cKMeans implementation in Apache Commons Math3
ML Clustering
<https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/ml/clustering/package-summary.html>
package
as a contribution?

Thanks
Shubham Jindal

Re: MATH cKMeans Implementation

Posted by Shubham Jindal <sh...@appdynamics.com>.
Hello,
Thanks Gilles for the set of steps to be followed. I have created a feature
ticket on JIRA https://issues.apache.org/jira/browse/MATH-1435. Please
assign the ticket to me. I will now work on creating a feature branch and
work on it.

Thanks
Shubham Jindal

On Thu, Nov 9, 2017 at 6:32 PM, Gilles <gi...@harfang.homelinux.org> wrote:

> Hello.
>
> On Thu, 9 Nov 2017 10:15:11 +0530, Shubham Jindal wrote:
>
>> Hello,
>> Thanks for the update Gilles. How should I proceed with the patch request?
>> Shall I go ahead and create a feature request on JIRA and then have a
>> patch
>> request tied to that feature?
>>
>
> Yes, that is fine.
> However, to minimize rounds of updates, be sure to
>  * check out the git "master" branch,
>  * create a "feature branch" (see file in "doc/development"),
>  * run "mvn site" in the project directory,
>  * ensure that all the reports (that will be located in the
>    "target/site" directory) are clean, and
>  * cover the provided code with unit tests.
>
> Thanks for your interest in Commons Math.
>
> Best regards,
> Gilles
>
>
> Please let me know the procedure to go ahead
>> with this
>>
>> Thanks
>> Shubham Jindal
>>
>> On Thu, Nov 9, 2017 at 5:44 AM, Gilles <gi...@harfang.homelinux.org>
>> wrote:
>>
>> Hi.
>>>
>>> On Wed, 8 Nov 2017 21:20:19 +0530, Shubham Jindal wrote:
>>>
>>> Hello,
>>>>
>>>> I have written a full fledged efficient implementation of cKMeans in
>>>> Java
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-
>>>> 2Dproject.org_web_packages_Ckmeans.1d.dp_index.html&d=DwICaQ
>>>> &c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX
>>>> 0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
>>>>
>>>> XotWOzZvFa7CPvzlInI&s=mjhfzhAKtcDskCCsKnVkHogxv7r31FKEaF8MpK9dnQo&e=
>>>> and
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__journal
>>>> .r-2Dproject.org_archive_2011-2D2_RJournal-5F2011-2D2-
>>>> 5FWang-2BSong.pdf&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLo
>>>> Qp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&
>>>> m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=Qlw5usWlEii6
>>>> 4hiLiFEF9nBN94x7K0td6PgILBpum08&e=
>>>>
>>>> The algorithm described here is *O(kn^2)* where *k*: number of clusters
>>>> and
>>>> *n*: number of 1D points. But, there exists an efficient implementation
>>>> in
>>>> later versions of cKMeans which is *O(knlog(n))*
>>>>
>>>> cKMeans is faster than kMeans and also deterministic in nature. Can I
>>>> submit a patch request for cKMeans implementation in Apache Commons
>>>> Math3
>>>> ML Clustering
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__
>>>> commons.apache.org_proper_commons-2Dmath_javadocs_api-2D3.6_
>>>> org_apache_commons_math3_ml_clustering_package-2Dsummary.
>>>> html&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&
>>>> r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlE
>>>> GIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=i31GoF_7zNdr1h979W-oFpIu8K
>>>> zRvrpEt5L1bPfUrGU&e=>
>>>> package
>>>> as a contribution?
>>>>
>>>>
>>> Thanks for your proposal and interest in contributing.
>>>
>>> The current development branch (git "master") is towards
>>> version 4.0 of the library:
>>>   https://urldefense.proofpoint.com/v2/url?u=http-3A__commons.
>>> apache.org_proper_commons-2Dmath_source-2Drepository.html&d=
>>> DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg
>>> 7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDH
>>>
>>> qyHxRXotWOzZvFa7CPvzlInI&s=_0Vu9eCJBe8s_cNDKBBolT6vX09GA_V3_
>>> Qjd1uisM1o&e=
>>>
>>> Also, you should be aware that Commons Math has had very few
>>> contributors for the past year.  A lot of work was done since
>>> the last official release, but a lot is still needed in order
>>> to be able to release the next one, due to the many open issues
>>> and lack of human resources:
>>>   https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
>>> apache.org_jira_browse_MATH&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjf
>>> TRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi
>>> _9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=S1G
>>> 4GacI0e5qXM4w5yYLeqJFuL0Lg9JaDC-JLYv9hhg&e=
>>>
>>> It has been proposed to break the library into more manageable
>>> standalone components (the contents of the "o.a.c.math4.ml"
>>> package was among the likely candidates) but there was no
>>> agreement within the Commons project management committee on
>>> this attempt to revive development.[1]
>>>
>>> Best regards,
>>> Gilles
>>>
>>> [1] Full story is in the "dev" ML archives:
>>>        https://urldefense.proofpoint.com/v2/url?u=http-
>>> 3A__markmail.org_list_org.apache.commons.dev_&d=DwICaQ&
>>> c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0
>>> AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
>>> XotWOzZvFa7CPvzlInI&s=iJmDCZVfCOlxHCXIv8cIQjTw_TxMKgVuWYsguMk_xuk&e=
>>>
>>>
>>> Thanks
>>>
>>>> Shubham Jindal
>>>>
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: MATH cKMeans Implementation

Posted by Gilles <gi...@harfang.homelinux.org>.
Hello.

On Thu, 9 Nov 2017 10:15:11 +0530, Shubham Jindal wrote:
> Hello,
> Thanks for the update Gilles. How should I proceed with the patch 
> request?
> Shall I go ahead and create a feature request on JIRA and then have a 
> patch
> request tied to that feature?

Yes, that is fine.
However, to minimize rounds of updates, be sure to
  * check out the git "master" branch,
  * create a "feature branch" (see file in "doc/development"),
  * run "mvn site" in the project directory,
  * ensure that all the reports (that will be located in the
    "target/site" directory) are clean, and
  * cover the provided code with unit tests.

Thanks for your interest in Commons Math.

Best regards,
Gilles

> Please let me know the procedure to go ahead
> with this
>
> Thanks
> Shubham Jindal
>
> On Thu, Nov 9, 2017 at 5:44 AM, Gilles <gi...@harfang.homelinux.org> 
> wrote:
>
>> Hi.
>>
>> On Wed, 8 Nov 2017 21:20:19 +0530, Shubham Jindal wrote:
>>
>>> Hello,
>>>
>>> I have written a full fledged efficient implementation of cKMeans 
>>> in Java
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-
>>> 2Dproject.org_web_packages_Ckmeans.1d.dp_index.html&d=DwICaQ
>>> &c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX
>>> 0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
>>> 
>>> XotWOzZvFa7CPvzlInI&s=mjhfzhAKtcDskCCsKnVkHogxv7r31FKEaF8MpK9dnQo&e= 
>>> and
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__journal
>>> .r-2Dproject.org_archive_2011-2D2_RJournal-5F2011-2D2-
>>> 5FWang-2BSong.pdf&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLo
>>> Qp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&
>>> m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=Qlw5usWlEii6
>>> 4hiLiFEF9nBN94x7K0td6PgILBpum08&e=
>>>
>>> The algorithm described here is *O(kn^2)* where *k*: number of 
>>> clusters
>>> and
>>> *n*: number of 1D points. But, there exists an efficient 
>>> implementation in
>>> later versions of cKMeans which is *O(knlog(n))*
>>>
>>> cKMeans is faster than kMeans and also deterministic in nature. Can 
>>> I
>>> submit a patch request for cKMeans implementation in Apache Commons 
>>> Math3
>>> ML Clustering
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__
>>> commons.apache.org_proper_commons-2Dmath_javadocs_api-2D3.6_
>>> org_apache_commons_math3_ml_clustering_package-2Dsummary.
>>> html&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&
>>> r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlE
>>> GIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=i31GoF_7zNdr1h979W-oFpIu8K
>>> zRvrpEt5L1bPfUrGU&e=>
>>> package
>>> as a contribution?
>>>
>>
>> Thanks for your proposal and interest in contributing.
>>
>> The current development branch (git "master") is towards
>> version 4.0 of the library:
>>   https://urldefense.proofpoint.com/v2/url?u=http-3A__commons.
>> apache.org_proper_commons-2Dmath_source-2Drepository.html&d=
>> DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg
>> 7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDH
>> 
>> qyHxRXotWOzZvFa7CPvzlInI&s=_0Vu9eCJBe8s_cNDKBBolT6vX09GA_V3_Qjd1uisM1o&e=
>>
>> Also, you should be aware that Commons Math has had very few
>> contributors for the past year.  A lot of work was done since
>> the last official release, but a lot is still needed in order
>> to be able to release the next one, due to the many open issues
>> and lack of human resources:
>>   https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
>> apache.org_jira_browse_MATH&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjf
>> TRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi
>> _9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=S1G
>> 4GacI0e5qXM4w5yYLeqJFuL0Lg9JaDC-JLYv9hhg&e=
>>
>> It has been proposed to break the library into more manageable
>> standalone components (the contents of the "o.a.c.math4.ml"
>> package was among the likely candidates) but there was no
>> agreement within the Commons project management committee on
>> this attempt to revive development.[1]
>>
>> Best regards,
>> Gilles
>>
>> [1] Full story is in the "dev" ML archives:
>>        https://urldefense.proofpoint.com/v2/url?u=http-
>> 3A__markmail.org_list_org.apache.commons.dev_&d=DwICaQ&
>> c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0
>> AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
>> XotWOzZvFa7CPvzlInI&s=iJmDCZVfCOlxHCXIv8cIQjTw_TxMKgVuWYsguMk_xuk&e=
>>
>>
>> Thanks
>>> Shubham Jindal
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: MATH cKMeans Implementation

Posted by Shubham Jindal <sh...@appdynamics.com>.
Hello,
Thanks for the update Gilles. How should I proceed with the patch request?
Shall I go ahead and create a feature request on JIRA and then have a patch
request tied to that feature? Please let me know the procedure to go ahead
with this

Thanks
Shubham Jindal

On Thu, Nov 9, 2017 at 5:44 AM, Gilles <gi...@harfang.homelinux.org> wrote:

> Hi.
>
> On Wed, 8 Nov 2017 21:20:19 +0530, Shubham Jindal wrote:
>
>> Hello,
>>
>> I have written a full fledged efficient implementation of cKMeans in Java
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-
>> 2Dproject.org_web_packages_Ckmeans.1d.dp_index.html&d=DwICaQ
>> &c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX
>> 0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
>> XotWOzZvFa7CPvzlInI&s=mjhfzhAKtcDskCCsKnVkHogxv7r31FKEaF8MpK9dnQo&e= and
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__journal
>> .r-2Dproject.org_archive_2011-2D2_RJournal-5F2011-2D2-
>> 5FWang-2BSong.pdf&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLo
>> Qp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&
>> m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=Qlw5usWlEii6
>> 4hiLiFEF9nBN94x7K0td6PgILBpum08&e=
>>
>> The algorithm described here is *O(kn^2)* where *k*: number of clusters
>> and
>> *n*: number of 1D points. But, there exists an efficient implementation in
>> later versions of cKMeans which is *O(knlog(n))*
>>
>> cKMeans is faster than kMeans and also deterministic in nature. Can I
>> submit a patch request for cKMeans implementation in Apache Commons Math3
>> ML Clustering
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__
>> commons.apache.org_proper_commons-2Dmath_javadocs_api-2D3.6_
>> org_apache_commons_math3_ml_clustering_package-2Dsummary.
>> html&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&
>> r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlE
>> GIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=i31GoF_7zNdr1h979W-oFpIu8K
>> zRvrpEt5L1bPfUrGU&e=>
>> package
>> as a contribution?
>>
>
> Thanks for your proposal and interest in contributing.
>
> The current development branch (git "master") is towards
> version 4.0 of the library:
>   https://urldefense.proofpoint.com/v2/url?u=http-3A__commons.
> apache.org_proper_commons-2Dmath_source-2Drepository.html&d=
> DwICaQ&c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg
> 7Ci5zX0AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDH
> qyHxRXotWOzZvFa7CPvzlInI&s=_0Vu9eCJBe8s_cNDKBBolT6vX09GA_V3_Qjd1uisM1o&e=
>
> Also, you should be aware that Commons Math has had very few
> contributors for the past year.  A lot of work was done since
> the last official release, but a lot is still needed in order
> to be able to release the next one, due to the many open issues
> and lack of human resources:
>   https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_MATH&d=DwICaQ&c=3_5eq9zYXWRS8ywqccmjf
> TRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0AvRwc1e-g-mbng-04QjfLbi
> _9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxRXotWOzZvFa7CPvzlInI&s=S1G
> 4GacI0e5qXM4w5yYLeqJFuL0Lg9JaDC-JLYv9hhg&e=
>
> It has been proposed to break the library into more manageable
> standalone components (the contents of the "o.a.c.math4.ml"
> package was among the likely candidates) but there was no
> agreement within the Commons project management committee on
> this attempt to revive development.[1]
>
> Best regards,
> Gilles
>
> [1] Full story is in the "dev" ML archives:
>        https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__markmail.org_list_org.apache.commons.dev_&d=DwICaQ&
> c=3_5eq9zYXWRS8ywqccmjfTRKM8mRLoQp6HBg1Tdb_Pc&r=O38Sg7Ci5zX0
> AvRwc1e-g-mbng-04QjfLbi_9z_-CTc&m=xukM8HSh3tlEGIT_QDHqyHxR
> XotWOzZvFa7CPvzlInI&s=iJmDCZVfCOlxHCXIv8cIQjTw_TxMKgVuWYsguMk_xuk&e=
>
>
> Thanks
>> Shubham Jindal
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: MATH cKMeans Implementation

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi.

On Wed, 8 Nov 2017 21:20:19 +0530, Shubham Jindal wrote:
> Hello,
>
> I have written a full fledged efficient implementation of cKMeans in 
> Java
> https://cran.r-project.org/web/packages/Ckmeans.1d.dp/index.html and
> 
> https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf
>
> The algorithm described here is *O(kn^2)* where *k*: number of 
> clusters and
> *n*: number of 1D points. But, there exists an efficient 
> implementation in
> later versions of cKMeans which is *O(knlog(n))*
>
> cKMeans is faster than kMeans and also deterministic in nature. Can I
> submit a patch request for cKMeans implementation in Apache Commons 
> Math3
> ML Clustering
> 
> <https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/ml/clustering/package-summary.html>
> package
> as a contribution?

Thanks for your proposal and interest in contributing.

The current development branch (git "master") is towards
version 4.0 of the library:
   http://commons.apache.org/proper/commons-math/source-repository.html

Also, you should be aware that Commons Math has had very few
contributors for the past year.  A lot of work was done since
the last official release, but a lot is still needed in order
to be able to release the next one, due to the many open issues
and lack of human resources:
   https://issues.apache.org/jira/browse/MATH

It has been proposed to break the library into more manageable
standalone components (the contents of the "o.a.c.math4.ml"
package was among the likely candidates) but there was no
agreement within the Commons project management committee on
this attempt to revive development.[1]

Best regards,
Gilles

[1] Full story is in the "dev" ML archives:
        http://markmail.org/list/org.apache.commons.dev/


> Thanks
> Shubham Jindal


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org