You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@madlib.apache.org by Pietro Pugni <pi...@gmail.com> on 2016/10/05 18:10:06 UTC

Time dependent variables in Cox regression model

Hi there,
I just found this amazing library and was wondering if it’s possible to estimate a Cox model using time-dependent variables. I’m used to survival and rms packages available in R. Those libraries ingest datasets built using the counting process method. 
From the docs http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html <http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html> this doesn’t seem possible. Do you plan to add this feature in the future?

Thank you
 Pietro Pugni

Re: Time dependent variables in Cox regression model

Posted by Rahul Iyer <ra...@gmail.com>.
Hi Pietro,

Thank you for using MADlib and the kind words.

Could you help us understand your use-case a little better? The Cox Prop
Hazards module does provide some functionality similar to the 'survival'
package. In fact, in our test suites we compare our results with those from
the survival package. We don't yet have complete parity and would
appreciate your help in getting there.

If you can't help in development, then maybe you could help us with
flushed-out requirements. You can create a JIRA
<https://issues.apache.org/jira/browse/MADLIB> with the necessary details.

Thanks,
Rahul

On Wed, Oct 5, 2016 at 11:10 AM, Pietro Pugni <pi...@gmail.com>
wrote:

> Hi there,
> I just found this amazing library and was wondering if it’s possible to
> estimate a Cox model using time-dependent variables. I’m used to survival
> and rms packages available in R. Those libraries ingest datasets built
> using the counting process method.
> From the docs http://madlib.incubator.apache.org/docs/latest/group__
> grp__cox__prop__hazards.html this doesn’t seem possible. Do you plan to
> add this feature in the future?
>
> Thank you
>  Pietro Pugni
>

Re: Time dependent variables in Cox regression model

Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks for opening the JIRA, Pietro.

It would be great if a developer in the MADlib community could work on this
since it is a valuable feature.

Frank

On Tue, Nov 15, 2016 at 3:46 AM, Pietro Pugni <pi...@gmail.com>
wrote:

> Hi all,
> I opened a JIRA on this topic https://issues.apache.org/
> jira/browse/MADLIB-1040 as suggested by Raul Iyer but can’t help with
> development. I will be happy to do some testing if needed. I usually work
> with very big cohorts (potentially from 4 to 10 millions subjects followed
> for 1 to 9 nine years). This means potentially billions of total
> days*person.
>
> Thank you
>  Pietro
>
> Il giorno 01 nov 2016, alle ore 11:56, Pietro Pugni <
> pietro.pugni@gmail.com> ha scritto:
>
> I’m sorry, the dataset from the R vignette isn’t 1 row per subject.. I
> interpreted the dataframe row id as the subject id.
> In that dataframe the subject 1 has three rows while subject 2 has two
> rows. It’s quite intuitive, but:
>  - subject 1 got infected on day 219 and day 373; his follow-up time ends
> at day 414 with no infection
>  - for subject 2 the R vignette prints only 2 rows but from the initial
> description, he got 7 infections
> So, the counting process format is a way to represents changes in time for
> each subject.
>
>
> Il giorno 01 nov 2016, alle ore 11:47, Pietro Pugni <
> pietro.pugni@gmail.com> ha scritto:
>
> Hi there,
> I’m sorry for being so late but was very busy.
> Thank you for the responses and for your interest in the development
> process of Time-dependent Cox.
> I’m not able to help you on the coding part, but can give you some advices.
>
> First, take a look at this document, which is related to SAS (the
> enterprise counter-part of R) and talks about the counting process format
> needed for time-dependent analysis (page 7 of 10): http://support.sas.com/
> resources/papers/proceedings12/168-2012.pdf
>
> The R vignette linked by Woo is another good place to look for.
>
> I suggest reading “Survival Analysis Using SAS - A practical Guide -
> Second Edition - Paul D. Allison - SAS Publishing”, ISBN 978-1-59994-640-5,
> in particular Chapter 5 starting from page 153. There are formulas and
> other related stuff and he talks about the counting process method.
>
> Generally, non-counting process method involves longitudinal dataset with
> each column for time event change in each variable. The counting process
> verticalizes this kind of data and each row represent a constant period of
> time for a subject. If a subject has more rows, it means that one or more
> covariates changes between two adjacent rows. The time interval length can
> vary from row to row. So, the basic information are: subject id, start time
> interval, stop time interval, outcome (dichotomous), a set of covariates.
>
> I took two screenshots from the R’s vignette representing a counting
> process dataset with 1 row for subject (page 7 of
> https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf ):
>
> <coxph_counting process dataframe.png>
>
> and here’s the coxph() invocation:
>
> <coxph_time dependent covariates estimates.png>
>
>
> Here, cluster(id) specifies the subject clustering variable, Surv() is the
> survival function evaluated in the time range [Start, Stop) for the outcome
> infect, while threat, inherit and steriods are the time-dependent
> covariates.
>
> The above example has only 1 row per subject, but as I said the counting
> process involves more than 1 row per subject. You can also build a dataset
> where a row represents a day for each subject (this is  very inefficient,
> but is possible and works too). You can have rows where nothing happens
> (all values are time-independent), etc. The counting process format is very
> flexible.
>
> From a performance point of view, I’ve seen poor results from the survival
> package. With big cohort datasets (a lot of subjects - more than 1 milion
> and more than 1 year of follow-up) the memory usage is massive and the
> processing time of the model estimates increases a lot. The advantage of
> running the Cox model from inside the database probably is the memory
> management, which is automatically balanced by the DB. In many cases, R
> goes out of memory.
>
> Hope this helps and sorry again for the late response.
>
> I appreciate your work and your interest
> Thank you
>  Pietro Pugni
>
>
>
> Il giorno 07 ott 2016, alle ore 00:29, Frank McQuillan <
> fmcquillan@pivotal.io> ha scritto:
>
> Re-posting Woo's comments to the list since it bounced for him...
>
> "Hi Pietro,
>
> Many thanks for your comments and questions!  I agree that it would be
> great to see support for time-dependent effects in the MADlib coxph
> module.  I think it would be good to have items in the roadmap for
> 'time-dependent covariates' and also 'time-dependent coefficients', and I
> believe Frank has already started the process of creating stories for these
> features.  You've mentioned R's implementation, and I think R's survival
> package vignette
> <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has
> some nice info on usage of these two flavors of time-dependent effects,
> which I believe will be good starting points for the team.
>
> Hope this helps, and please do keep the feedback coming!
>
> Thanks,
> Woo"
>
> On Thu, Oct 6, 2016 at 2:40 PM, Woo Jae Jung <wj...@pivotal.io> wrote:
>
>> Hi Pietro,
>>
>> Many thanks for your comments and questions!  I agree that it would be
>> great to see support for time-dependent effects in the MADlib coxph
>> module.  I think it would be good to have items in the roadmap for
>> 'time-dependent covariates' and also 'time-dependent coefficients', and I
>> believe Frank has already started the process of creating stories for these
>> features.  You've mentioned R's implementation, and I think R's survival
>> package vignette
>> <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has
>> some nice info on usage of these two flavors of time-dependent effects,
>> which I believe will be good starting points for the team.
>>
>> Hope this helps, and please do keep the feedback coming!
>>
>> Thanks,
>> Woo
>>
>>
>>
>>
>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Pietro Pugni <pi...@gmail.com>
>>> Date: Wed, Oct 5, 2016 at 11:10 AM
>>> Subject: Time dependent variables in Cox regression model
>>> To: user@madlib.incubator.apache.org
>>>
>>>
>>> Hi there,
>>> I just found this amazing library and was wondering if it’s possible to
>>> estimate a Cox model using time-dependent variables. I’m used to survival
>>> and rms packages available in R. Those libraries ingest datasets built
>>> using the counting process method.
>>> From the docs http://madlib.incubator.apache.org/docs/latest/group__g
>>> rp__cox__prop__hazards.html this doesn’t seem possible. Do you plan to
>>> add this feature in the future?
>>>
>>> Thank you
>>>  Pietro Pugni
>>>
>>>
>>
>
>
>
>

Re: Time dependent variables in Cox regression model

Posted by Pietro Pugni <pi...@gmail.com>.
Hi all,
I opened a JIRA on this topic https://issues.apache.org/jira/browse/MADLIB-1040 <https://issues.apache.org/jira/browse/MADLIB-1040> as suggested by Raul Iyer but can’t help with development. I will be happy to do some testing if needed. I usually work with very big cohorts (potentially from 4 to 10 millions subjects followed for 1 to 9 nine years). This means potentially billions of total days*person.

Thank you
 Pietro

> Il giorno 01 nov 2016, alle ore 11:56, Pietro Pugni <pi...@gmail.com> ha scritto:
> 
> I’m sorry, the dataset from the R vignette isn’t 1 row per subject.. I interpreted the dataframe row id as the subject id.
> In that dataframe the subject 1 has three rows while subject 2 has two rows. It’s quite intuitive, but:
>  - subject 1 got infected on day 219 and day 373; his follow-up time ends at day 414 with no infection
>  - for subject 2 the R vignette prints only 2 rows but from the initial description, he got 7 infections
> So, the counting process format is a way to represents changes in time for each subject.
> 
> 
>> Il giorno 01 nov 2016, alle ore 11:47, Pietro Pugni <pietro.pugni@gmail.com <ma...@gmail.com>> ha scritto:
>> 
>> Hi there,
>> I’m sorry for being so late but was very busy.
>> Thank you for the responses and for your interest in the development process of Time-dependent Cox.
>> I’m not able to help you on the coding part, but can give you some advices.
>> 
>> First, take a look at this document, which is related to SAS (the enterprise counter-part of R) and talks about the counting process format needed for time-dependent analysis (page 7 of 10): http://support.sas.com/resources/papers/proceedings12/168-2012.pdf <http://support.sas.com/resources/papers/proceedings12/168-2012.pdf>
>> 
>> The R vignette linked by Woo is another good place to look for.
>> 
>> I suggest reading “Survival Analysis Using SAS - A practical Guide - Second Edition - Paul D. Allison - SAS Publishing”, ISBN 978-1-59994-640-5, in particular Chapter 5 starting from page 153. There are formulas and other related stuff and he talks about the counting process method. 
>> 
>> Generally, non-counting process method involves longitudinal dataset with each column for time event change in each variable. The counting process verticalizes this kind of data and each row represent a constant period of time for a subject. If a subject has more rows, it means that one or more covariates changes between two adjacent rows. The time interval length can vary from row to row. So, the basic information are: subject id, start time interval, stop time interval, outcome (dichotomous), a set of covariates.
>> 
>> I took two screenshots from the R’s vignette representing a counting process dataset with 1 row for subject (page 7 of https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> ):
>> 
>> <coxph_counting process dataframe.png>
>> 
>> and here’s the coxph() invocation:
>> 
>> <coxph_time dependent covariates estimates.png>
>> 
>> 
>> Here, cluster(id) specifies the subject clustering variable, Surv() is the survival function evaluated in the time range [Start, Stop) for the outcome infect, while threat, inherit and steriods are the time-dependent covariates.
>> 
>> The above example has only 1 row per subject, but as I said the counting process involves more than 1 row per subject. You can also build a dataset where a row represents a day for each subject (this is  very inefficient, but is possible and works too). You can have rows where nothing happens (all values are time-independent), etc. The counting process format is very flexible.
>> 
>> From a performance point of view, I’ve seen poor results from the survival package. With big cohort datasets (a lot of subjects - more than 1 milion and more than 1 year of follow-up) the memory usage is massive and the processing time of the model estimates increases a lot. The advantage of running the Cox model from inside the database probably is the memory management, which is automatically balanced by the DB. In many cases, R goes out of memory.
>> 
>> Hope this helps and sorry again for the late response.
>> 
>> I appreciate your work and your interest 
>> Thank you
>>  Pietro Pugni
>> 
>> 
>> 
>>> Il giorno 07 ott 2016, alle ore 00:29, Frank McQuillan <fmcquillan@pivotal.io <ma...@pivotal.io>> ha scritto:
>>> 
>>> Re-posting Woo's comments to the list since it bounced for him...
>>> 
>>> "Hi Pietro,
>>> 
>>> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
>>> 
>>> Hope this helps, and please do keep the feedback coming!
>>> 
>>> Thanks,
>>> Woo"
>>> 
>>> On Thu, Oct 6, 2016 at 2:40 PM, Woo Jae Jung <wjung@pivotal.io <ma...@pivotal.io>> wrote:
>>> Hi Pietro,
>>> 
>>> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
>>> 
>>> Hope this helps, and please do keep the feedback coming!
>>> 
>>> Thanks,
>>> Woo
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ---------- Forwarded message ----------
>>> From: Pietro Pugni <pietro.pugni@gmail.com <ma...@gmail.com>>
>>> Date: Wed, Oct 5, 2016 at 11:10 AM
>>> Subject: Time dependent variables in Cox regression model
>>> To: user@madlib.incubator.apache.org <ma...@madlib.incubator.apache.org>
>>> 
>>> 
>>> Hi there,
>>> I just found this amazing library and was wondering if it’s possible to estimate a Cox model using time-dependent variables. I’m used to survival and rms packages available in R. Those libraries ingest datasets built using the counting process method. 
>>> From the docs http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html <http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html> this doesn’t seem possible. Do you plan to add this feature in the future?
>>> 
>>> Thank you
>>>  Pietro Pugni
>>> 
>>> 
>>> 
>> 
> 


Re: Time dependent variables in Cox regression model

Posted by Pietro Pugni <pi...@gmail.com>.
I’m sorry, the dataset from the R vignette isn’t 1 row per subject.. I interpreted the dataframe row id as the subject id.
In that dataframe the subject 1 has three rows while subject 2 has two rows. It’s quite intuitive, but:
 - subject 1 got infected on day 219 and day 373; his follow-up time ends at day 414 with no infection
 - for subject 2 the R vignette prints only 2 rows but from the initial description, he got 7 infections
So, the counting process format is a way to represents changes in time for each subject.


> Il giorno 01 nov 2016, alle ore 11:47, Pietro Pugni <pi...@gmail.com> ha scritto:
> 
> Hi there,
> I’m sorry for being so late but was very busy.
> Thank you for the responses and for your interest in the development process of Time-dependent Cox.
> I’m not able to help you on the coding part, but can give you some advices.
> 
> First, take a look at this document, which is related to SAS (the enterprise counter-part of R) and talks about the counting process format needed for time-dependent analysis (page 7 of 10): http://support.sas.com/resources/papers/proceedings12/168-2012.pdf <http://support.sas.com/resources/papers/proceedings12/168-2012.pdf>
> 
> The R vignette linked by Woo is another good place to look for.
> 
> I suggest reading “Survival Analysis Using SAS - A practical Guide - Second Edition - Paul D. Allison - SAS Publishing”, ISBN 978-1-59994-640-5, in particular Chapter 5 starting from page 153. There are formulas and other related stuff and he talks about the counting process method. 
> 
> Generally, non-counting process method involves longitudinal dataset with each column for time event change in each variable. The counting process verticalizes this kind of data and each row represent a constant period of time for a subject. If a subject has more rows, it means that one or more covariates changes between two adjacent rows. The time interval length can vary from row to row. So, the basic information are: subject id, start time interval, stop time interval, outcome (dichotomous), a set of covariates.
> 
> I took two screenshots from the R’s vignette representing a counting process dataset with 1 row for subject (page 7 of https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> ):
> 
> <coxph_counting process dataframe.png>
> 
> and here’s the coxph() invocation:
> 
> <coxph_time dependent covariates estimates.png>
> 
> 
> Here, cluster(id) specifies the subject clustering variable, Surv() is the survival function evaluated in the time range [Start, Stop) for the outcome infect, while threat, inherit and steriods are the time-dependent covariates.
> 
> The above example has only 1 row per subject, but as I said the counting process involves more than 1 row per subject. You can also build a dataset where a row represents a day for each subject (this is  very inefficient, but is possible and works too). You can have rows where nothing happens (all values are time-independent), etc. The counting process format is very flexible.
> 
> From a performance point of view, I’ve seen poor results from the survival package. With big cohort datasets (a lot of subjects - more than 1 milion and more than 1 year of follow-up) the memory usage is massive and the processing time of the model estimates increases a lot. The advantage of running the Cox model from inside the database probably is the memory management, which is automatically balanced by the DB. In many cases, R goes out of memory.
> 
> Hope this helps and sorry again for the late response.
> 
> I appreciate your work and your interest 
> Thank you
>  Pietro Pugni
> 
> 
> 
>> Il giorno 07 ott 2016, alle ore 00:29, Frank McQuillan <fmcquillan@pivotal.io <ma...@pivotal.io>> ha scritto:
>> 
>> Re-posting Woo's comments to the list since it bounced for him...
>> 
>> "Hi Pietro,
>> 
>> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
>> 
>> Hope this helps, and please do keep the feedback coming!
>> 
>> Thanks,
>> Woo"
>> 
>> On Thu, Oct 6, 2016 at 2:40 PM, Woo Jae Jung <wjung@pivotal.io <ma...@pivotal.io>> wrote:
>> Hi Pietro,
>> 
>> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
>> 
>> Hope this helps, and please do keep the feedback coming!
>> 
>> Thanks,
>> Woo
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------- Forwarded message ----------
>> From: Pietro Pugni <pietro.pugni@gmail.com <ma...@gmail.com>>
>> Date: Wed, Oct 5, 2016 at 11:10 AM
>> Subject: Time dependent variables in Cox regression model
>> To: user@madlib.incubator.apache.org <ma...@madlib.incubator.apache.org>
>> 
>> 
>> Hi there,
>> I just found this amazing library and was wondering if it’s possible to estimate a Cox model using time-dependent variables. I’m used to survival and rms packages available in R. Those libraries ingest datasets built using the counting process method. 
>> From the docs http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html <http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html> this doesn’t seem possible. Do you plan to add this feature in the future?
>> 
>> Thank you
>>  Pietro Pugni
>> 
>> 
>> 
> 


Re: Time dependent variables in Cox regression model

Posted by Pietro Pugni <pi...@gmail.com>.
Hi there,
I’m sorry for being so late but was very busy.
Thank you for the responses and for your interest in the development process of Time-dependent Cox.
I’m not able to help you on the coding part, but can give you some advices.

First, take a look at this document, which is related to SAS (the enterprise counter-part of R) and talks about the counting process format needed for time-dependent analysis (page 7 of 10): http://support.sas.com/resources/papers/proceedings12/168-2012.pdf

The R vignette linked by Woo is another good place to look for.

I suggest reading “Survival Analysis Using SAS - A practical Guide - Second Edition - Paul D. Allison - SAS Publishing”, ISBN 978-1-59994-640-5, in particular Chapter 5 starting from page 153. There are formulas and other related stuff and he talks about the counting process method. 

Generally, non-counting process method involves longitudinal dataset with each column for time event change in each variable. The counting process verticalizes this kind of data and each row represent a constant period of time for a subject. If a subject has more rows, it means that one or more covariates changes between two adjacent rows. The time interval length can vary from row to row. So, the basic information are: subject id, start time interval, stop time interval, outcome (dichotomous), a set of covariates.

I took two screenshots from the R’s vignette representing a counting process dataset with 1 row for subject (page 7 of https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> ):



and here’s the coxph() invocation:




Here, cluster(id) specifies the subject clustering variable, Surv() is the survival function evaluated in the time range [Start, Stop) for the outcome infect, while threat, inherit and steriods are the time-dependent covariates.

The above example has only 1 row per subject, but as I said the counting process involves more than 1 row per subject. You can also build a dataset where a row represents a day for each subject (this is  very inefficient, but is possible and works too). You can have rows where nothing happens (all values are time-independent), etc. The counting process format is very flexible.

From a performance point of view, I’ve seen poor results from the survival package. With big cohort datasets (a lot of subjects - more than 1 milion and more than 1 year of follow-up) the memory usage is massive and the processing time of the model estimates increases a lot. The advantage of running the Cox model from inside the database probably is the memory management, which is automatically balanced by the DB. In many cases, R goes out of memory.

Hope this helps and sorry again for the late response.

I appreciate your work and your interest 
Thank you
 Pietro Pugni



> Il giorno 07 ott 2016, alle ore 00:29, Frank McQuillan <fm...@pivotal.io> ha scritto:
> 
> Re-posting Woo's comments to the list since it bounced for him...
> 
> "Hi Pietro,
> 
> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
> 
> Hope this helps, and please do keep the feedback coming!
> 
> Thanks,
> Woo"
> 
> On Thu, Oct 6, 2016 at 2:40 PM, Woo Jae Jung <wjung@pivotal.io <ma...@pivotal.io>> wrote:
> Hi Pietro,
> 
> Many thanks for your comments and questions!  I agree that it would be great to see support for time-dependent effects in the MADlib coxph module.  I think it would be good to have items in the roadmap for 'time-dependent covariates' and also 'time-dependent coefficients', and I believe Frank has already started the process of creating stories for these features.  You've mentioned R's implementation, and I think R's survival package vignette <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has some nice info on usage of these two flavors of time-dependent effects, which I believe will be good starting points for the team.  
> 
> Hope this helps, and please do keep the feedback coming!
> 
> Thanks,
> Woo
> 
> 
> 
> 
> 
> 
> ---------- Forwarded message ----------
> From: Pietro Pugni <pietro.pugni@gmail.com <ma...@gmail.com>>
> Date: Wed, Oct 5, 2016 at 11:10 AM
> Subject: Time dependent variables in Cox regression model
> To: user@madlib.incubator.apache.org <ma...@madlib.incubator.apache.org>
> 
> 
> Hi there,
> I just found this amazing library and was wondering if it’s possible to estimate a Cox model using time-dependent variables. I’m used to survival and rms packages available in R. Those libraries ingest datasets built using the counting process method. 
> From the docs http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html <http://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html> this doesn’t seem possible. Do you plan to add this feature in the future?
> 
> Thank you
>  Pietro Pugni
> 
> 
> 


Re: Time dependent variables in Cox regression model

Posted by Frank McQuillan <fm...@pivotal.io>.
Re-posting Woo's comments to the list since it bounced for him...

"Hi Pietro,

Many thanks for your comments and questions!  I agree that it would be
great to see support for time-dependent effects in the MADlib coxph
module.  I think it would be good to have items in the roadmap for
'time-dependent covariates' and also 'time-dependent coefficients', and I
believe Frank has already started the process of creating stories for these
features.  You've mentioned R's implementation, and I think R's survival
package vignette
<https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has
some nice info on usage of these two flavors of time-dependent effects,
which I believe will be good starting points for the team.

Hope this helps, and please do keep the feedback coming!

Thanks,
Woo"

On Thu, Oct 6, 2016 at 2:40 PM, Woo Jae Jung <wj...@pivotal.io> wrote:

> Hi Pietro,
>
> Many thanks for your comments and questions!  I agree that it would be
> great to see support for time-dependent effects in the MADlib coxph
> module.  I think it would be good to have items in the roadmap for
> 'time-dependent covariates' and also 'time-dependent coefficients', and I
> believe Frank has already started the process of creating stories for these
> features.  You've mentioned R's implementation, and I think R's survival
> package vignette
> <https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf> has
> some nice info on usage of these two flavors of time-dependent effects,
> which I believe will be good starting points for the team.
>
> Hope this helps, and please do keep the feedback coming!
>
> Thanks,
> Woo
>
>
>
>
>
>>
>> ---------- Forwarded message ----------
>> From: Pietro Pugni <pi...@gmail.com>
>> Date: Wed, Oct 5, 2016 at 11:10 AM
>> Subject: Time dependent variables in Cox regression model
>> To: user@madlib.incubator.apache.org
>>
>>
>> Hi there,
>> I just found this amazing library and was wondering if it’s possible to
>> estimate a Cox model using time-dependent variables. I’m used to survival
>> and rms packages available in R. Those libraries ingest datasets built
>> using the counting process method.
>> From the docs http://madlib.incubator.apache.org/docs/latest/group__g
>> rp__cox__prop__hazards.html this doesn’t seem possible. Do you plan to
>> add this feature in the future?
>>
>> Thank you
>>  Pietro Pugni
>>
>>
>