You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@madlib.apache.org by Esther Vasiete <ev...@pivotal.io> on 2016/04/05 01:27:27 UTC

Fwd: pca_train error

Hi,

I am trying to use pca_train but I am running through this error:

ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError: Function
"madlib.__matrix_densify_sfunc(double precision[],integer,integer,double
precision)": invalid argument - col should be in the range of [0, col_dim)
 (seg35 awsaiuirl1178:40003 pid=104068) (plpython.c:4648)
SQL state: XX000
Context: Traceback (most recent call last):
  PL/Python function "pca_train", line 23, in <module>
    return pca.pca(**globals())
  PL/Python function "pca_train", line 404, in pca
PL/Python function "pca_train"

My input table has 15472 rows and two columns; a row_id and an array with
853 features. I am calling pca_train like this:

DROP TABLE if exists ev.hci_subset_pca_output;
SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
                                           'ev.hci_subset_pca_output',
                                           'row_id',
                                            3);

I unfortunately cannot share the data but this is how it looks in pgAdmin3.
Note that pgAmdin3 won't show a feature_vector that it is too large and
this is why it appears to be empty but it isn't as you can see in the
second screenshot.

[image: Inline image 1]

[image: Inline image 3]

I am not sure why I am running through this error. Please advice.

Update: I have renamed feature_vector to "row_vec" and "row_id" starts with
1. Still getting the same error.

Thanks,

-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Thanks for the update Esther.

Frank

On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <ev...@pivotal.io> wrote:

> Upgrading to MADlib 1.8 solved the problem!
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Oh sorry, it is HAWQ 1.3.1.
>>
>> And the data engineer will upgrade to MADlib 1.8 tonight.
>>
>> Thanks,
>> Esther
>>
>> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Please clarify the platform - do you mean GPDB 4.2.0?
>>>
>>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>>> latest software and we can see if you still have a problem.
>>>
>>> Frank
>>>
>>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>>> and what database platform and version are you running on?
>>>>>
>>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>>>> allow user-specified column names in PCA.)
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use pca_train but I am running through this error:
>>>>>>
>>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>>> pid=104068) (plpython.c:4648)
>>>>>> SQL state: XX000
>>>>>> Context: Traceback (most recent call last):
>>>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>>>     return pca.pca(**globals())
>>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>>> PL/Python function "pca_train"
>>>>>>
>>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>>> with 853 features. I am calling pca_train like this:
>>>>>>
>>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>>                                            'ev.hci_subset_pca_output',
>>>>>>                                            'row_id',
>>>>>>                                             3);
>>>>>>
>>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>>> the second screenshot.
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> I am not sure why I am running through this error. Please advice.
>>>>>>
>>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id"
>>>>>> starts with 1. Still getting the same error.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> *Esther Vasiete *
>>>>>> *Data Scientist | Pivotal*
>>>>>> evasiete@pivotal.io
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Thanks for the update Esther.

Frank

On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <ev...@pivotal.io> wrote:

> Upgrading to MADlib 1.8 solved the problem!
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Oh sorry, it is HAWQ 1.3.1.
>>
>> And the data engineer will upgrade to MADlib 1.8 tonight.
>>
>> Thanks,
>> Esther
>>
>> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Please clarify the platform - do you mean GPDB 4.2.0?
>>>
>>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>>> latest software and we can see if you still have a problem.
>>>
>>> Frank
>>>
>>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>>> and what database platform and version are you running on?
>>>>>
>>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>>>> allow user-specified column names in PCA.)
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use pca_train but I am running through this error:
>>>>>>
>>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>>> pid=104068) (plpython.c:4648)
>>>>>> SQL state: XX000
>>>>>> Context: Traceback (most recent call last):
>>>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>>>     return pca.pca(**globals())
>>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>>> PL/Python function "pca_train"
>>>>>>
>>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>>> with 853 features. I am calling pca_train like this:
>>>>>>
>>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>>                                            'ev.hci_subset_pca_output',
>>>>>>                                            'row_id',
>>>>>>                                             3);
>>>>>>
>>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>>> the second screenshot.
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> I am not sure why I am running through this error. Please advice.
>>>>>>
>>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id"
>>>>>> starts with 1. Still getting the same error.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> *Esther Vasiete *
>>>>>> *Data Scientist | Pivotal*
>>>>>> evasiete@pivotal.io
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

Upgrading to MADlib 1.8 solved the problem!

Thanks,
Esther

On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io> wrote:

> Oh sorry, it is HAWQ 1.3.1.
>
> And the data engineer will upgrade to MADlib 1.8 tonight.
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Please clarify the platform - do you mean GPDB 4.2.0?
>>
>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>> latest software and we can see if you still have a problem.
>>
>> Frank
>>
>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>
>>> Thanks.
>>>
>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>> wrote:
>>>
>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>> and what database platform and version are you running on?
>>>>
>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>>> allow user-specified column names in PCA.)
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use pca_train but I am running through this error:
>>>>>
>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>> pid=104068) (plpython.c:4648)
>>>>> SQL state: XX000
>>>>> Context: Traceback (most recent call last):
>>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>>     return pca.pca(**globals())
>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>> PL/Python function "pca_train"
>>>>>
>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>> with 853 features. I am calling pca_train like this:
>>>>>
>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>                                            'ev.hci_subset_pca_output',
>>>>>                                            'row_id',
>>>>>                                             3);
>>>>>
>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>> the second screenshot.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> [image: Inline image 3]
>>>>>
>>>>> I am not sure why I am running through this error. Please advice.
>>>>>
>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>>> with 1. Still getting the same error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> *Esther Vasiete *
>>>>> *Data Scientist | Pivotal*
>>>>> evasiete@pivotal.io
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>



-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

Upgrading to MADlib 1.8 solved the problem!

Thanks,
Esther

On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io> wrote:

> Oh sorry, it is HAWQ 1.3.1.
>
> And the data engineer will upgrade to MADlib 1.8 tonight.
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Please clarify the platform - do you mean GPDB 4.2.0?
>>
>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>> latest software and we can see if you still have a problem.
>>
>> Frank
>>
>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>
>>> Thanks.
>>>
>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>> wrote:
>>>
>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>> and what database platform and version are you running on?
>>>>
>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>>> allow user-specified column names in PCA.)
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use pca_train but I am running through this error:
>>>>>
>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>> pid=104068) (plpython.c:4648)
>>>>> SQL state: XX000
>>>>> Context: Traceback (most recent call last):
>>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>>     return pca.pca(**globals())
>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>> PL/Python function "pca_train"
>>>>>
>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>> with 853 features. I am calling pca_train like this:
>>>>>
>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>                                            'ev.hci_subset_pca_output',
>>>>>                                            'row_id',
>>>>>                                             3);
>>>>>
>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>> the second screenshot.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> [image: Inline image 3]
>>>>>
>>>>> I am not sure why I am running through this error. Please advice.
>>>>>
>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>>> with 1. Still getting the same error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> *Esther Vasiete *
>>>>> *Data Scientist | Pivotal*
>>>>> evasiete@pivotal.io
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>



-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

Oh sorry, it is HAWQ 1.3.1.

And the data engineer will upgrade to MADlib 1.8 tonight.

Thanks,
Esther

On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
wrote:

> Please clarify the platform - do you mean GPDB 4.2.0?
>
> Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
> software and we can see if you still have a problem.
>
> Frank
>
> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>
>> Thanks.
>>
>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Thanks for the question, Esther.  What version of MADlib are you using
>>> and what database platform and version are you running on?
>>>
>>> It seems to be a MADlib version lower than 1.8 since the error message
>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>> allow user-specified column names in PCA.)
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use pca_train but I am running through this error:
>>>>
>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>> Function "madlib.__matrix_densify_sfunc(double
>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>> pid=104068) (plpython.c:4648)
>>>> SQL state: XX000
>>>> Context: Traceback (most recent call last):
>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>     return pca.pca(**globals())
>>>>   PL/Python function "pca_train", line 404, in pca
>>>> PL/Python function "pca_train"
>>>>
>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>> with 853 features. I am calling pca_train like this:
>>>>
>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>                                            'ev.hci_subset_pca_output',
>>>>                                            'row_id',
>>>>                                             3);
>>>>
>>>> I unfortunately cannot share the data but this is how it looks in
>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>> the second screenshot.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> [image: Inline image 3]
>>>>
>>>> I am not sure why I am running through this error. Please advice.
>>>>
>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>> with 1. Still getting the same error.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>


-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

Oh sorry, it is HAWQ 1.3.1.

And the data engineer will upgrade to MADlib 1.8 tonight.

Thanks,
Esther

On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
wrote:

> Please clarify the platform - do you mean GPDB 4.2.0?
>
> Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
> software and we can see if you still have a problem.
>
> Frank
>
> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>
>> Thanks.
>>
>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Thanks for the question, Esther.  What version of MADlib are you using
>>> and what database platform and version are you running on?
>>>
>>> It seems to be a MADlib version lower than 1.8 since the error message
>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>> allow user-specified column names in PCA.)
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use pca_train but I am running through this error:
>>>>
>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>> Function "madlib.__matrix_densify_sfunc(double
>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>> pid=104068) (plpython.c:4648)
>>>> SQL state: XX000
>>>> Context: Traceback (most recent call last):
>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>     return pca.pca(**globals())
>>>>   PL/Python function "pca_train", line 404, in pca
>>>> PL/Python function "pca_train"
>>>>
>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>> with 853 features. I am calling pca_train like this:
>>>>
>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>                                            'ev.hci_subset_pca_output',
>>>>                                            'row_id',
>>>>                                             3);
>>>>
>>>> I unfortunately cannot share the data but this is how it looks in
>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>> the second screenshot.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> [image: Inline image 3]
>>>>
>>>> I am not sure why I am running through this error. Please advice.
>>>>
>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>> with 1. Still getting the same error.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>


-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Please clarify the platform - do you mean GPDB 4.2.0?

Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
software and we can see if you still have a problem.

Frank

On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io> wrote:

> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther.  What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>>   PL/Python function "pca_train", line 23, in <module>
>>>     return pca.pca(**globals())
>>>   PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>                                            'ev.hci_subset_pca_output',
>>>                                            'row_id',
>>>                                             3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Please clarify the platform - do you mean GPDB 4.2.0?

Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
software and we can see if you still have a problem.

Frank

On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io> wrote:

> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther.  What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>>   PL/Python function "pca_train", line 23, in <module>
>>>     return pca.pca(**globals())
>>>   PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>                                            'ev.hci_subset_pca_output',
>>>                                            'row_id',
>>>                                             3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

I am using MADlib 1.7.1 on HAWQ 4.2.0.

Thanks.

On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
wrote:

> Thanks for the question, Esther.  What version of MADlib are you using and
> what database platform and version are you running on?
>
> It seems to be a MADlib version lower than 1.8 since the error message you
> report is different in the 1.8 release.  (There was a bug fix in 1.8 to allow
> user-specified column names in PCA.)
>
> Frank
>
>
>
>
>
> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Hi,
>>
>> I am trying to use pca_train but I am running through this error:
>>
>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>> Function "madlib.__matrix_densify_sfunc(double
>> precision[],integer,integer,double precision)": invalid argument - col
>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>> pid=104068) (plpython.c:4648)
>> SQL state: XX000
>> Context: Traceback (most recent call last):
>>   PL/Python function "pca_train", line 23, in <module>
>>     return pca.pca(**globals())
>>   PL/Python function "pca_train", line 404, in pca
>> PL/Python function "pca_train"
>>
>> My input table has 15472 rows and two columns; a row_id and an array with
>> 853 features. I am calling pca_train like this:
>>
>> DROP TABLE if exists ev.hci_subset_pca_output;
>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>                                            'ev.hci_subset_pca_output',
>>                                            'row_id',
>>                                             3);
>>
>> I unfortunately cannot share the data but this is how it looks in
>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>> large and this is why it appears to be empty but it isn't as you can see in
>> the second screenshot.
>>
>> [image: Inline image 1]
>>
>> [image: Inline image 3]
>>
>> I am not sure why I am running through this error. Please advice.
>>
>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>> with 1. Still getting the same error.
>>
>> Thanks,
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>>
>>
>


-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Esther Vasiete <ev...@pivotal.io>.

I am using MADlib 1.7.1 on HAWQ 4.2.0.

Thanks.

On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
wrote:

> Thanks for the question, Esther.  What version of MADlib are you using and
> what database platform and version are you running on?
>
> It seems to be a MADlib version lower than 1.8 since the error message you
> report is different in the 1.8 release.  (There was a bug fix in 1.8 to allow
> user-specified column names in PCA.)
>
> Frank
>
>
>
>
>
> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Hi,
>>
>> I am trying to use pca_train but I am running through this error:
>>
>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>> Function "madlib.__matrix_densify_sfunc(double
>> precision[],integer,integer,double precision)": invalid argument - col
>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>> pid=104068) (plpython.c:4648)
>> SQL state: XX000
>> Context: Traceback (most recent call last):
>>   PL/Python function "pca_train", line 23, in <module>
>>     return pca.pca(**globals())
>>   PL/Python function "pca_train", line 404, in pca
>> PL/Python function "pca_train"
>>
>> My input table has 15472 rows and two columns; a row_id and an array with
>> 853 features. I am calling pca_train like this:
>>
>> DROP TABLE if exists ev.hci_subset_pca_output;
>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>                                            'ev.hci_subset_pca_output',
>>                                            'row_id',
>>                                             3);
>>
>> I unfortunately cannot share the data but this is how it looks in
>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>> large and this is why it appears to be empty but it isn't as you can see in
>> the second screenshot.
>>
>> [image: Inline image 1]
>>
>> [image: Inline image 3]
>>
>> I am not sure why I am running through this error. Please advice.
>>
>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>> with 1. Still getting the same error.
>>
>> Thanks,
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>>
>>
>


-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Thanks for the question, Esther.  What version of MADlib are you using and
what database platform and version are you running on?

It seems to be a MADlib version lower than 1.8 since the error message you
report is different in the 1.8 release.  (There was a bug fix in 1.8 to allow
user-specified column names in PCA.)

Frank





On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io> wrote:

> Hi,
>
> I am trying to use pca_train but I am running through this error:
>
> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
> Function "madlib.__matrix_densify_sfunc(double
> precision[],integer,integer,double precision)": invalid argument - col
> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
> pid=104068) (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
>   PL/Python function "pca_train", line 23, in <module>
>     return pca.pca(**globals())
>   PL/Python function "pca_train", line 404, in pca
> PL/Python function "pca_train"
>
> My input table has 15472 rows and two columns; a row_id and an array with
> 853 features. I am calling pca_train like this:
>
> DROP TABLE if exists ev.hci_subset_pca_output;
> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>                                            'ev.hci_subset_pca_output',
>                                            'row_id',
>                                             3);
>
> I unfortunately cannot share the data but this is how it looks in
> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
> large and this is why it appears to be empty but it isn't as you can see in
> the second screenshot.
>
> [image: Inline image 1]
>
> [image: Inline image 3]
>
> I am not sure why I am running through this error. Please advice.
>
> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
> with 1. Still getting the same error.
>
> Thanks,
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
>
>

Re: pca_train error

Posted by Frank McQuillan <fm...@pivotal.io>.

Thanks for the question, Esther.  What version of MADlib are you using and
what database platform and version are you running on?

It seems to be a MADlib version lower than 1.8 since the error message you
report is different in the 1.8 release.  (There was a bug fix in 1.8 to allow
user-specified column names in PCA.)

Frank





On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io> wrote:

> Hi,
>
> I am trying to use pca_train but I am running through this error:
>
> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
> Function "madlib.__matrix_densify_sfunc(double
> precision[],integer,integer,double precision)": invalid argument - col
> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
> pid=104068) (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
>   PL/Python function "pca_train", line 23, in <module>
>     return pca.pca(**globals())
>   PL/Python function "pca_train", line 404, in pca
> PL/Python function "pca_train"
>
> My input table has 15472 rows and two columns; a row_id and an array with
> 853 features. I am calling pca_train like this:
>
> DROP TABLE if exists ev.hci_subset_pca_output;
> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>                                            'ev.hci_subset_pca_output',
>                                            'row_id',
>                                             3);
>
> I unfortunately cannot share the data but this is how it looks in
> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
> large and this is why it appears to be empty but it isn't as you can see in
> the second screenshot.
>
> [image: Inline image 1]
>
> [image: Inline image 3]
>
> I am not sure why I am running through this error. Please advice.
>
> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
> with 1. Still getting the same error.
>
> Thanks,
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
>
>