You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Wush Wu <wu...@bridgewell.com> on 2015/03/06 05:43:47 UTC

Construct model matrix from SchemaRDD automatically

Dear all,

I am a new spark user from R.

After exploring the schemaRDD, I notice that it is similar to data.frame.
Is there a feature like `model.matrix` in R to convert schemaRDD to model
matrix automatically according to the type without explicitly converting
them one by one?

Thanks,
Wush

Re: Construct model matrix from SchemaRDD automatically

Posted by "Evan R. Sparks" <ev...@gmail.com>.
Hi Wush,

I'm CC'ing user@spark.apache.org (which is the new list) and BCC'ing
user@spark.incubator.apache.org.

In Spark 1.3, schemaRDD is in fact being renamed to DataFrame (see:
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
)

As for a "model.matrix", you might have a look at the new pipelines API in
spark 1.2 (to be further improved in 1.3) which provides facilities for
repeatable data transformation as input to ML algorithms. That said -
something to handle the case of automatically one-hot encoding all the
categorical variables in a DataFrame might be a welcome addition.

- Evan

On Thu, Mar 5, 2015 at 8:43 PM, Wush Wu <wu...@bridgewell.com> wrote:

> Dear all,
>
> I am a new spark user from R.
>
> After exploring the schemaRDD, I notice that it is similar to data.frame.
> Is there a feature like `model.matrix` in R to convert schemaRDD to model
> matrix automatically according to the type without explicitly converting
> them one by one?
>
> Thanks,
> Wush
>
>
>
>
>
>