You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "DEVAN M.S." <ms...@gmail.com> on 2015/01/21 06:55:24 UTC

KNN for large data set

Hi all,

Please help me to find out best way for K-nearest neighbor using spark for
large data sets.

Re: KNN for large data set

Posted by Sudipta Banerjee <as...@gmail.com>.

Hi Devan and Xiangrui,

Can you please explain the cost and optimization function of the KNN
alogorithim that is being  used?

Thank and Regards,
Sudipta

On Thu, Jan 22, 2015 at 6:59 PM, DEVAN M.S. <ms...@gmail.com> wrote:

> Thanks Xiangrui Meng will try this.
>
> And, found this https://github.com/kaushikranjan/knnJoin also.
> Will this work with double data ? Can we find out z value of
> *Vector(10.3,4.5,3,5)* ?
>
>
>
>
>
>
> On Thu, Jan 22, 2015 at 12:25 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> For large datasets, you need hashing in order to compute k-nearest
>> neighbors locally. You can start with LSH + k-nearest in Google
>> scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui
>>
>> On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. <ms...@gmail.com> wrote:
>> > Hi all,
>> >
>> > Please help me to find out best way for K-nearest neighbor using spark
>> for
>> > large data sets.
>> >
>>
>
>


-- 
Sudipta Banerjee
Consultant, Business Analytics and Cloud Based Architecture
Call me +919019578099

Re: KNN for large data set

Posted by "DEVAN M.S." <ms...@gmail.com>.

Thanks Xiangrui Meng will try this.

And, found this https://github.com/kaushikranjan/knnJoin also.
Will this work with double data ? Can we find out z value of
*Vector(10.3,4.5,3,5)* ?

On Thu, Jan 22, 2015 at 12:25 AM, Xiangrui Meng <me...@gmail.com> wrote:

> For large datasets, you need hashing in order to compute k-nearest
> neighbors locally. You can start with LSH + k-nearest in Google
> scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui
>
> On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. <ms...@gmail.com> wrote:
> > Hi all,
> >
> > Please help me to find out best way for K-nearest neighbor using spark
> for
> > large data sets.
> >
>

Re: KNN for large data set

Posted by "DEVAN M.S." <ms...@gmail.com>.

Thanks Xiangrui Meng will try this.

And, found this https://github.com/kaushikranjan/knnJoin also.
Will this work with double data ? Can we find out z value of
*Vector(10.3,4.5,3,5)* ?

On Thu, Jan 22, 2015 at 12:25 AM, Xiangrui Meng <me...@gmail.com> wrote:

> For large datasets, you need hashing in order to compute k-nearest
> neighbors locally. You can start with LSH + k-nearest in Google
> scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui
>
> On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. <ms...@gmail.com> wrote:
> > Hi all,
> >
> > Please help me to find out best way for K-nearest neighbor using spark
> for
> > large data sets.
> >
>

Re: KNN for large data set

Posted by Xiangrui Meng <me...@gmail.com>.

For large datasets, you need hashing in order to compute k-nearest
neighbors locally. You can start with LSH + k-nearest in Google
scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui

On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. <ms...@gmail.com> wrote:
> Hi all,
>
> Please help me to find out best way for K-nearest neighbor using spark for
> large data sets.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: KNN for large data set

Posted by Xiangrui Meng <me...@gmail.com>.

For large datasets, you need hashing in order to compute k-nearest
neighbors locally. You can start with LSH + k-nearest in Google
scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui

On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. <ms...@gmail.com> wrote:
> Hi all,
>
> Please help me to find out best way for K-nearest neighbor using spark for
> large data sets.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org