You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dan Bikle <bi...@gmail.com> on 2016/09/23 01:40:47 UTC

In Spark-scala, how to fill Vectors.dense in DataFrame from CSV?

hello spark-world,

I am new to spark.

I noticed this online example:

http://spark.apache.org/docs/latest/ml-pipeline.html

I am curious about this syntax:

    // Prepare training data from a list of (label, features) tuples.
    val training = spark.createDataFrame(Seq(
      (1.0, Vectors.dense(0.0, 1.1, 0.1)),
      (0.0, Vectors.dense(2.0, 1.0, -1.0)),
      (0.0, Vectors.dense(2.0, 1.3, 1.0)),
      (1.0, Vectors.dense(0.0, 1.2, -0.5))
    )).toDF("label", "features")

Is it possible to replace the above call to some syntax which reads values
from CSV?

I want something comparable to Python-Pandas read_csv() method.

Re: In Spark-scala, how to fill Vectors.dense in DataFrame from CSV?

Posted by Kevin Mellott <ke...@gmail.com>.

You'll want to use the spark-csv package, which is included in Spark 2.0.
The repository documentation has some great usage examples.

https://github.com/databricks/spark-csv

Thanks,
Kevin

On Thu, Sep 22, 2016 at 8:40 PM, Dan Bikle <bi...@gmail.com> wrote:

> hello spark-world,
>
> I am new to spark.
>
> I noticed this online example:
>
> http://spark.apache.org/docs/latest/ml-pipeline.html
>
> I am curious about this syntax:
>
>     // Prepare training data from a list of (label, features) tuples.
>     val training = spark.createDataFrame(Seq(
>       (1.0, Vectors.dense(0.0, 1.1, 0.1)),
>       (0.0, Vectors.dense(2.0, 1.0, -1.0)),
>       (0.0, Vectors.dense(2.0, 1.3, 1.0)),
>       (1.0, Vectors.dense(0.0, 1.2, -0.5))
>     )).toDF("label", "features")
>
> Is it possible to replace the above call to some syntax which reads values
> from CSV?
>
> I want something comparable to Python-Pandas read_csv() method.
>
>