You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@madlib.apache.org by Frank McQuillan <fm...@vmware.com> on 2021/04/02 16:51:05 UTC

Re: GLM with svec column in independent variables

Hi Nantia,

I replied to this but somehow I don't think my response got to the mailing list.

The GLM method
http://madlib.apache.org/docs/latest/group__grp__glm.html
does not support SVEC inputs for the parameter `independent_varname` .  That parameter can be any expressions that resolves to an array, as in the example from the user docs:

SELECT glm('warpbreaks_dummy',
           'glm_model',
           'breaks',
           'ARRAY[1.0,"wool_B","tension_M", "tension_H"]',
           'family=poisson, link=log');

Frank

________________________________
From: Nantia Makrynioti <na...@gmail.com>
Sent: Saturday, March 13, 2021 10:46 AM
To: user@madlib.apache.org <us...@madlib.apache.org>
Subject: GLM with svec column in independent variables

Hello,

Is there a way to run the glm training function using a svec (sparse vector) column in the independent variables? I'm using the encode_categorical_variables function to transform a set of categorical features to a sparse vector for every tuple, but glm does not seem to accept this column as an independent variable.

Thank you very much in advance,
Nantia

Re: GLM with svec column in independent variables

Posted by Nantia Makrynioti <na...@gmail.com>.
Alright, Frank. I see. Thank you very much for your reply!

On Sat, May 1, 2021 at 1:49 AM Frank McQuillan <fm...@vmware.com>
wrote:

> Hi Nantia,
>
> You need to provide a regular postgres type vector for the parameter
> `independent_varname` .  It does not support a run length encoded svec
> type unfortunately.
>
> Often I use the utility
> http://madlib.apache.org/docs/latest/group__grp__cols2vec.html
> in such cases, making use of '*' for the list_of_features​ parameter so I
> don't need to list all the columns out.  And the
> list_of_features_to_exclude​ parameter may come in handy too.
>
> Frank
>
> ------------------------------
> *From:* Nantia Makrynioti <na...@gmail.com>
> *Sent:* Friday, April 30, 2021 9:33 AM
> *To:* user@madlib.apache.org <us...@madlib.apache.org>
> *Subject:* Re: GLM with svec column in independent variables
>
> Hello Frank,
>
> Thanks a lot for your message and I'm sorry for my late response to this.
>
> So, if I have categorical features ending up in large vectors after
> one-hot encoding, is there a way to run glm without generating a huge
> denormalized representation of the features?
>
> Nantia
>
> On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <fm...@vmware.com>
> wrote:
>
> Hi Nantia,
>
> I replied to this but somehow I don't think my response got to the mailing
> list.
>
> The GLM method
> http://madlib.apache.org/docs/latest/group__grp__glm.html
> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmadlib.apache.org%2Fdocs%2Flatest%2Fgroup__grp__glm.html&data=04%7C01%7Cfmcquillan%40vmware.com%7C5a8520779c5541371a9008d90bf5bb1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553972306082610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F3ipTjWMWNqOyoR6YF4myhEEaeo%2FRVg4c5zYex2cpwo%3D&reserved=0>
> does not support SVEC inputs for the parameter `independent_varname` .
> That parameter can be any expressions that resolves to an array, as in the
> example from the user docs:
>
> SELECT glm('warpbreaks_dummy',
>            'glm_model',
>            'breaks',
>            'ARRAY[1.0,"wool_B","tension_M", "tension_H"]',
>            'family=poisson, link=log');
>
> Frank
>
> ------------------------------
> *From:* Nantia Makrynioti <na...@gmail.com>
> *Sent:* Saturday, March 13, 2021 10:46 AM
> *To:* user@madlib.apache.org <us...@madlib.apache.org>
> *Subject:* GLM with svec column in independent variables
>
> Hello,
>
> Is there a way to run the glm training function using a svec (sparse
> vector) column in the independent variables? I'm using the
> encode_categorical_variables function to transform a set of categorical
> features to a sparse vector for every tuple, but glm does not seem to
> accept this column as an independent variable.
>
> Thank you very much in advance,
> Nantia
>
>

Re: GLM with svec column in independent variables

Posted by Frank McQuillan <fm...@vmware.com>.
Hi Nantia,

You need to provide a regular postgres type vector for the parameter `independent_varname` .  It does not support a run length encoded svec type unfortunately.

Often I use the utility
http://madlib.apache.org/docs/latest/group__grp__cols2vec.html
in such cases, making use of '*' for the list_of_features​ parameter so I don't need to list all the columns out.  And the list_of_features_to_exclude​ parameter may come in handy too.

Frank

________________________________
From: Nantia Makrynioti <na...@gmail.com>
Sent: Friday, April 30, 2021 9:33 AM
To: user@madlib.apache.org <us...@madlib.apache.org>
Subject: Re: GLM with svec column in independent variables

Hello Frank,

Thanks a lot for your message and I'm sorry for my late response to this.

So, if I have categorical features ending up in large vectors after one-hot encoding, is there a way to run glm without generating a huge denormalized representation of the features?

Nantia

On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <fm...@vmware.com>> wrote:
Hi Nantia,

I replied to this but somehow I don't think my response got to the mailing list.

The GLM method
http://madlib.apache.org/docs/latest/group__grp__glm.html<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmadlib.apache.org%2Fdocs%2Flatest%2Fgroup__grp__glm.html&data=04%7C01%7Cfmcquillan%40vmware.com%7C5a8520779c5541371a9008d90bf5bb1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637553972306082610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F3ipTjWMWNqOyoR6YF4myhEEaeo%2FRVg4c5zYex2cpwo%3D&reserved=0>
does not support SVEC inputs for the parameter `independent_varname` .  That parameter can be any expressions that resolves to an array, as in the example from the user docs:

SELECT glm('warpbreaks_dummy',
           'glm_model',
           'breaks',
           'ARRAY[1.0,"wool_B","tension_M", "tension_H"]',
           'family=poisson, link=log');

Frank

________________________________
From: Nantia Makrynioti <na...@gmail.com>>
Sent: Saturday, March 13, 2021 10:46 AM
To: user@madlib.apache.org<ma...@madlib.apache.org> <us...@madlib.apache.org>>
Subject: GLM with svec column in independent variables

Hello,

Is there a way to run the glm training function using a svec (sparse vector) column in the independent variables? I'm using the encode_categorical_variables function to transform a set of categorical features to a sparse vector for every tuple, but glm does not seem to accept this column as an independent variable.

Thank you very much in advance,
Nantia

Re: GLM with svec column in independent variables

Posted by Nantia Makrynioti <na...@gmail.com>.
Hello Frank,

Thanks a lot for your message and I'm sorry for my late response to this.

So, if I have categorical features ending up in large vectors after one-hot
encoding, is there a way to run glm without generating a huge denormalized
representation of the features?

Nantia

On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <fm...@vmware.com>
wrote:

> Hi Nantia,
>
> I replied to this but somehow I don't think my response got to the mailing
> list.
>
> The GLM method
> http://madlib.apache.org/docs/latest/group__grp__glm.html
> does not support SVEC inputs for the parameter `independent_varname` .
> That parameter can be any expressions that resolves to an array, as in the
> example from the user docs:
>
> SELECT glm('warpbreaks_dummy',
>            'glm_model',
>            'breaks',
>            'ARRAY[1.0,"wool_B","tension_M", "tension_H"]',
>            'family=poisson, link=log');
>
> Frank
>
> ------------------------------
> *From:* Nantia Makrynioti <na...@gmail.com>
> *Sent:* Saturday, March 13, 2021 10:46 AM
> *To:* user@madlib.apache.org <us...@madlib.apache.org>
> *Subject:* GLM with svec column in independent variables
>
> Hello,
>
> Is there a way to run the glm training function using a svec (sparse
> vector) column in the independent variables? I'm using the
> encode_categorical_variables function to transform a set of categorical
> features to a sparse vector for every tuple, but glm does not seem to
> accept this column as an independent variable.
>
> Thank you very much in advance,
> Nantia
>