You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Dženan Softić <dz...@gmail.com> on 2015/12/14 16:56:58 UTC

BIRCH clustering algorithm

Hi,

As a part of the project, we are trying to create parallel implementation
of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it
from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short
paper, just section 4. is relevant).

We would like to implement BIRCH on Spark. Would this be an interesting
contribution for MLlib? Is there anyone already who tried to implement
BIRCH on Spark?

Any suggestions for implementation itself would be very much appreciated!


[1] http://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf
[2] http://boyuan.global-optimization.com/Mypaper/IDEAL2013-88.pdf


Best,
Dzeno

Re: BIRCH clustering algorithm

Posted by Dzeno <dz...@gmail.com>.

Hi Joseph,

Thank you for your tips. 

Thanks,
Dzeno



> On Dec 15, 2015, at 10:58 PM, Joseph Bradley <jo...@databricks.com> wrote:
> 
> Hi Dzeno,
> 
> I'm not familiar with the algorithm myself, but if you have an important use case for it, you could open a JIRA to discuss it.  However, if it is a less common algorithm, I'd recommend first submitting it as a Spark package (but publicizing the package on the user list).  If it gains traction, then it could become a higher priority item for MLlib.
> 
> Thanks,
> Joseph
> 
>> On Mon, Dec 14, 2015 at 7:56 AM, Dženan Softić <dz...@gmail.com> wrote:
>> Hi, 
>> 
>> As a part of the project, we are trying to create parallel implementation of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short paper, just section 4. is relevant). 
>> 
>> We would like to implement BIRCH on Spark. Would this be an interesting contribution for MLlib? Is there anyone already who tried to implement BIRCH on Spark? 
>> 
>> Any suggestions for implementation itself would be very much appreciated! 
>> 
>> 
>> [1] http://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf 
>> [2] http://boyuan.global-optimization.com/Mypaper/IDEAL2013-88.pdf
>> 
>> 
>> Best,
>> Dzeno
>

Re: BIRCH clustering algorithm

Posted by Joseph Bradley <jo...@databricks.com>.

Hi Dzeno,

I'm not familiar with the algorithm myself, but if you have an important
use case for it, you could open a JIRA to discuss it.  However, if it is a
less common algorithm, I'd recommend first submitting it as a Spark package
(but publicizing the package on the user list).  If it gains traction, then
it could become a higher priority item for MLlib.

Thanks,
Joseph

On Mon, Dec 14, 2015 at 7:56 AM, Dženan Softić <dz...@gmail.com> wrote:

> Hi,
>
> As a part of the project, we are trying to create parallel implementation
> of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it
> from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short
> paper, just section 4. is relevant).
>
> We would like to implement BIRCH on Spark. Would this be an interesting
> contribution for MLlib? Is there anyone already who tried to implement
> BIRCH on Spark?
>
> Any suggestions for implementation itself would be very much appreciated!
>
>
> [1] http://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf
> [2] http://boyuan.global-optimization.com/Mypaper/IDEAL2013-88.pdf
>
>
> Best,
> Dzeno
>
>