You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Surya Rajaraman Iyer <si...@medallia.com> on 2022/02/14 17:36:31 UTC

[MLlib]: GLM with multinomial family

Hi Team,

I am using a multinomial regression in Spark Scala. I want to generate the
coefficient and p-values for every category.

For example, given two variables salary group (dependent variable) and age
group (Independent variable)

salary-group: 10,000-, 10,000-100,000, 100,000+
age-group: 30-, 30-40, 40+

I am looking to get an output like

With 10,000- as baseline,  get the coefficients and pvalues for each
category. in the salary group

10,000-100,000,
coefficient Pvalue
Intercept .. ..

age group
30-40 .. ..
40+ .. ..
30- 0 0

100,000+ coefficient Pvalue
Intercept .. ..

age group
30-40 .. ..
40+ .. ..
30- 0 0
To do this, I am forced to use glm with binomial family twice. In order to
parallelize it,  I am using thread pools which doesn't seem ideal.

Do you think there is a way to do multinomial logit in spark scala.I do see
it in spark R : https://rdrr.io/cran/SparkR/man/spark.logit.html

Is there a spark way to make the glms parallel? Something like:-

SparkLogisticRegressionResult glm (df: DataFrame) {
}

dfs : Seq[df]
dfs.map(glm)


Thanks a lot for the help!

Regards,
Surya,

-- 
Confidentiality Notice: This email and any files transmitted with it are 
confidential and intended solely for the use of the individual or entity to 
whom they are addressed.  Additionally, this email and any files 
transmitted with it may not be disseminated, distributed or copied. Please 
notify the sender immediately by email if you have received this email by 
mistake and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.

-- 
 
<http://www.medallia.com/gartner-report/?source=Marketing%20-%20Email&utm_campaign=FY22Q4_NA_Gartner_MQ_VoC_Campaign&utm_medium=email&utm_source=email-signature&utm_content=report&utm_term=medallia-named-a-leader>

Re: [MLlib]: GLM with multinomial family

Posted by Sean Owen <sr...@gmail.com>.
SparkR is just a wrapper on Scala implementations. Are you just looking for
setting family = multinomial on LogisticRegression ? Sure it's there in the
scala API

On Mon, Feb 14, 2022, 11:50 AM Surya Rajaraman Iyer <si...@medallia.com>
wrote:

> Hi Team,
>
> I am using a multinomial regression in Spark Scala. I want to generate the
> coefficient and p-values for every category.
>
> For example, given two variables salary group (dependent variable) and age
> group (Independent variable)
>
> salary-group: 10,000-, 10,000-100,000, 100,000+
> age-group: 30-, 30-40, 40+
>
> I am looking to get an output like
>
> With 10,000- as baseline,  get the coefficients and pvalues for each
> category. in the salary group
>
> 10,000-100,000,
> coefficient Pvalue
> Intercept .. ..
>
> age group
> 30-40 .. ..
> 40+ .. ..
> 30- 0 0
>
> 100,000+ coefficient Pvalue
> Intercept .. ..
>
> age group
> 30-40 .. ..
> 40+ .. ..
> 30- 0 0
> To do this, I am forced to use glm with binomial family twice. In order to
> parallelize it,  I am using thread pools which doesn't seem ideal.
>
> Do you think there is a way to do multinomial logit in spark scala.I do
> see it in spark R : https://rdrr.io/cran/SparkR/man/spark.logit.html
>
> Is there a spark way to make the glms parallel? Something like:-
>
> SparkLogisticRegressionResult glm (df: DataFrame) {
> }
>
> dfs : Seq[df]
> dfs.map(glm)
>
>
> Thanks a lot for the help!
>
> Regards,
> Surya,
>
> Confidentiality Notice: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed.  Additionally, this email and any files
> transmitted with it may not be disseminated, distributed or copied. Please
> notify the sender immediately by email if you have received this email by
> mistake and delete this email from your system. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.
>
> [image:
> http://www.medallia.com/gartner-report/?source=Marketing%20-%20Email&utm_campaign=FY22Q4_NA_Gartner_MQ_VoC_Campaign&utm_medium=email&utm_source=email-signature&utm_content=report&utm_term=medallia-named-a-leader]
> <http://www.medallia.com/gartner-report/?source=Marketing%20-%20Email&utm_campaign=FY22Q4_NA_Gartner_MQ_VoC_Campaign&utm_medium=email&utm_source=email-signature&utm_content=report&utm_term=medallia-named-a-leader>
>