You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Scott Reynolds <sr...@twilio.com.INVALID> on 2017/12/13 21:33:32 UTC

Re: Apache Spark documentation on mllib's Kmeans doesn't jibe.

The train method is on the Companion Object
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.clustering.KMeans$

here is a decent resource on Companion Object usage:
https://docs.scala-lang.org/tour/singleton-objects.html

On Wed, Dec 13, 2017 at 9:16 AM Michael Segel <ms...@hotmail.com>
wrote:

> Hi,
>
> Just came across this while looking at the docs on how to use Spark’s
> Kmeans clustering.
>
> Note: This appears to be true in both 2.1 and 2.2 documentation.
>
> The overview page:
> https://spark.apache.org/docs/2.1.0/mllib-clustering.html#k-means
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_2.1.0_mllib-2Dclustering.html-23k-2Dmeans&d=DwMGaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=aqceDwZltCTqlsZ5_SVCDe_DGw08lU2Duf0yymdZZ7k&s=i-__RwjSLQ18f4-0jfvArBoWU8FzygMCKzJXp_FPv1U&e=>
>
> Here’ the example contains the following line:
>
> val clusters = KMeans.train(parsedData, numClusters, numIterations)
>
> I was trying to get more information on the train() method.
> So I checked out the KMeans Scala API:
>
>
> https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.clustering.KMeans
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_2.1.0_api_scala_index.html-23org.apache.spark.mllib.clustering.KMeans&d=DwMGaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=aqceDwZltCTqlsZ5_SVCDe_DGw08lU2Duf0yymdZZ7k&s=F8KhbHkJ4gQWQb4d1I-4a3gcn6uX4Z-lPmrQTmnaCp4&e=>
>
> The issue is that I couldn’t find the train method…
>
> So I thought I was slowly losing my mind.
>
> I checked out the entire API page… could not find any API docs which
> describe the method train().
>
> I ended up looking at the source code and found the method in the scala
> source code.
> (You can see the code here:
> https://github.com/apache/spark/blob/v2.1.0/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_v2.1.0_mllib_src_main_scala_org_apache_spark_mllib_clustering_KMeans.scala&d=DwMGaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=aqceDwZltCTqlsZ5_SVCDe_DGw08lU2Duf0yymdZZ7k&s=tYWGTjYLcXRMIuaE3IKN7ugoMSSXqfHknoWQewlqMPc&e=>
>  )
>
> So the method(s) exist, but not covered in the Scala API doc.
>
> How do you raise this as a ‘bug’ ?
>
> Thx
>
> -Mike
>
> --

Scott Reynolds
Principal Engineer
[image: twilio] <http://www.twilio.com/?utm_source=email_signature>


EMAIL sreynolds@twilio.com