You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dan Bikle <bi...@gmail.com> on 2016/09/23 01:40:47 UTC
In Spark-scala, how to fill Vectors.dense in DataFrame from CSV?
hello spark-world,
I am new to spark.
I noticed this online example:
http://spark.apache.org/docs/latest/ml-pipeline.html
I am curious about this syntax:
// Prepare training data from a list of (label, features) tuples.
val training = spark.createDataFrame(Seq(
(1.0, Vectors.dense(0.0, 1.1, 0.1)),
(0.0, Vectors.dense(2.0, 1.0, -1.0)),
(0.0, Vectors.dense(2.0, 1.3, 1.0)),
(1.0, Vectors.dense(0.0, 1.2, -0.5))
)).toDF("label", "features")
Is it possible to replace the above call to some syntax which reads values
from CSV?
I want something comparable to Python-Pandas read_csv() method.
Re: In Spark-scala, how to fill Vectors.dense in DataFrame from CSV?
Posted by Kevin Mellott <ke...@gmail.com>.
You'll want to use the spark-csv package, which is included in Spark 2.0.
The repository documentation has some great usage examples.
https://github.com/databricks/spark-csv
Thanks,
Kevin
On Thu, Sep 22, 2016 at 8:40 PM, Dan Bikle <bi...@gmail.com> wrote:
> hello spark-world,
>
> I am new to spark.
>
> I noticed this online example:
>
> http://spark.apache.org/docs/latest/ml-pipeline.html
>
> I am curious about this syntax:
>
> // Prepare training data from a list of (label, features) tuples.
> val training = spark.createDataFrame(Seq(
> (1.0, Vectors.dense(0.0, 1.1, 0.1)),
> (0.0, Vectors.dense(2.0, 1.0, -1.0)),
> (0.0, Vectors.dense(2.0, 1.3, 1.0)),
> (1.0, Vectors.dense(0.0, 1.2, -0.5))
> )).toDF("label", "features")
>
> Is it possible to replace the above call to some syntax which reads values
> from CSV?
>
> I want something comparable to Python-Pandas read_csv() method.
>
>