You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@madlib.apache.org by "Mevada, Vatsal" <Me...@sky.optymyze.com> on 2017/08/10 10:40:05 UTC

Postgre-MADlib predictions is taking longer than training

I am training my data using following code:



    start_time := clock_timestamp();
      PERFORM madlib.create_nb_prepared_data_tables( 'nb_training',
                                                     'class',
                                                     'attributes',
                                                     'ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57]',
                                                     57,
                                                     'categ_feature_probs',
                                                     'numeric_attr_params',
                                                     'class_priors'
                                                   );
      training_time := 1000* (extract(epoch FROM clock_timestamp()) - extract(epoch FROM start_time));

And my prediction code goes as follows:

    start_time := clock_timestamp();
      PERFORM madlib.create_nb_probs_view( 'categ_feature_probs',
                                           'class_priors',
                                           'nb_testing',
                                           'id',
                                           'attributes',
                                           57,
                                           'numeric_attr_params',
                                           'probs_view' );

    select * from probs_view
    prediction_time := 1000 * (extract(epoch FROM clock_timestamp()) - extract(epoch FROM start_time));

  The training data is containing 450000 records were as testing dataset contains 50000 records.

Still, my average training_time is around 17173 ms where as prediction_time is 26481 ms. As per my understanding of naive bayes, the prediction_time should be less than training_time. What am I doing wrong here?

Re: Postgre-MADlib predictions is taking longer than training

Posted by Nandish Jayaram <nj...@pivotal.io>.
Hi Vatsal,

The naive Bayesian model has been in early stage dev for long now. Can
you please open a JIRA for this issue? It might be time to look under the
hood
and change stuff to bring it out of early stage dev.

NJ

On Thu, Aug 10, 2017 at 3:40 AM, Mevada, Vatsal <Me...@sky.optymyze.com>
wrote:

> I am training my data using following code:
>
>
>
>
>
>
>
> *    start_time := clock_timestamp();*
>
> *      PERFORM madlib.create_nb_prepared_data_tables( 'nb_training',*
>
> *                                                     'class', *
>
> *                                                     'attributes', *
>
> *                                                     'ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57]',
> *
>
> *                                                     57, *
>
> *                                                     'categ_feature_probs',
> *
>
> *                                                     'numeric_attr_params',
> *
>
> *                                                     'class_priors' *
>
> *                                                   );*
>
> *      training_time := 1000* (extract(epoch FROM clock_timestamp()) -
> extract(epoch FROM start_time));*
>
>
>
> And my prediction code goes as follows:
>
>
>
> *    start_time := clock_timestamp();*
>
> *      PERFORM madlib.create_nb_probs_view( 'categ_feature_probs', *
>
> *                                           'class_priors', *
>
> *                                           'nb_testing', *
>
> *                                           'id', *
>
> *                                           'attributes', *
>
> *                                           57, *
>
> *                                           'numeric_attr_params', *
>
> *                                           'probs_view' );*
>
>
>
> *    select * from probs_view*
>
> *    prediction_time := 1000 * (extract(epoch FROM clock_timestamp()) -
> extract(epoch FROM start_time));*
>
>
>
>   The training data is containing 450000 records were as testing dataset
> contains 50000 records.
>
>
>
> Still, my average training_time is around 17173 ms where as
> prediction_time is 26481 ms. As per my understanding of naive bayes, the
> prediction_time should be less than training_time. What am I doing wrong
> here?
>