You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Jim Jagielski <ji...@jaguNET.com> on 2017/04/13 18:26:23 UTC

Re: Contributing an algorithm for samsara

Apologies for letting this slide... way too much life got in the way :)

> On Mar 3, 2017, at 3:36 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> And by formula yes i mean R syntax.
> 
> possible use case would be to take Spark DataFrame and formula (say, `age ~
> . -1`) and produce outputs of DrmLike[Int] (a distributed matrix type) that
> converts into predictors and target.
> 
> In this particular case, this formula means that the predictor matrix (X)
> would have all original variables except `age` (for categorical variables
> factor extraction is applied), with no bias column.
> 
> Some knowledge of R and SAS is required to pin the compatibility nuances
> there.
> 
> Maybe we could have reasonable simplifications or omissions compared to R
> stuff, if we can be reasonably convinced it is actually better that way
> than vanilla R contract, but IMO it would be really useful to retain 100%
> compatibility there since it is one of ideas there -- retain R-like-ness
> with these things.
> 
> 
> On Fri, Mar 3, 2017 at 12:31 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
>> 
>> 
>> On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski <ji...@jagunet.com> wrote:
>>> 
>>>> 
>>>> 
>>> 
>>>>> 
>>>>> 3) On the feature extraction per R like formula can you elaborate more
>>>> here, are you talking about feature extraction using R like dataframes and
>>>> operators?
>>>> 
>>> 
>>> 
>> Yes. I would start doing generic formula parser and then specific part
>> that works with backend-speicifc data frames. For spark, i don't see any
>> reason to write our own; we'd just had an adapter for the Spark native data
>> frames.
>>