You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@madlib.apache.org by Esther Vasiete <ev...@pivotal.io> on 2016/04/05 01:27:27 UTC
Fwd: pca_train error
Hi,
I am trying to use pca_train but I am running through this error:
ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError: Function
"madlib.__matrix_densify_sfunc(double precision[],integer,integer,double
precision)": invalid argument - col should be in the range of [0, col_dim)
(seg35 awsaiuirl1178:40003 pid=104068) (plpython.c:4648)
SQL state: XX000
Context: Traceback (most recent call last):
PL/Python function "pca_train", line 23, in <module>
return pca.pca(**globals())
PL/Python function "pca_train", line 404, in pca
PL/Python function "pca_train"
My input table has 15472 rows and two columns; a row_id and an array with
853 features. I am calling pca_train like this:
DROP TABLE if exists ev.hci_subset_pca_output;
SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
'ev.hci_subset_pca_output',
'row_id',
3);
I unfortunately cannot share the data but this is how it looks in pgAdmin3.
Note that pgAmdin3 won't show a feature_vector that it is too large and
this is why it appears to be empty but it isn't as you can see in the
second screenshot.
[image: Inline image 1]
[image: Inline image 3]
I am not sure why I am running through this error. Please advice.
Update: I have renamed feature_vector to "row_vec" and "row_id" starts with
1. Still getting the same error.
Thanks,
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks for the update Esther.
Frank
On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <ev...@pivotal.io> wrote:
> Upgrading to MADlib 1.8 solved the problem!
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Oh sorry, it is HAWQ 1.3.1.
>>
>> And the data engineer will upgrade to MADlib 1.8 tonight.
>>
>> Thanks,
>> Esther
>>
>> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Please clarify the platform - do you mean GPDB 4.2.0?
>>>
>>> Would you be able to upgrade to MADlib 1.8? Then you are using the
>>> latest software and we can see if you still have a problem.
>>>
>>> Frank
>>>
>>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Thanks for the question, Esther. What version of MADlib are you using
>>>>> and what database platform and version are you running on?
>>>>>
>>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>>>> allow user-specified column names in PCA.)
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use pca_train but I am running through this error:
>>>>>>
>>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>>>> pid=104068) (plpython.c:4648)
>>>>>> SQL state: XX000
>>>>>> Context: Traceback (most recent call last):
>>>>>> PL/Python function "pca_train", line 23, in <module>
>>>>>> return pca.pca(**globals())
>>>>>> PL/Python function "pca_train", line 404, in pca
>>>>>> PL/Python function "pca_train"
>>>>>>
>>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>>> with 853 features. I am calling pca_train like this:
>>>>>>
>>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>> 'ev.hci_subset_pca_output',
>>>>>> 'row_id',
>>>>>> 3);
>>>>>>
>>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>>> the second screenshot.
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> I am not sure why I am running through this error. Please advice.
>>>>>>
>>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id"
>>>>>> starts with 1. Still getting the same error.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> *Esther Vasiete *
>>>>>> *Data Scientist | Pivotal*
>>>>>> evasiete@pivotal.io
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks for the update Esther.
Frank
On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <ev...@pivotal.io> wrote:
> Upgrading to MADlib 1.8 solved the problem!
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Oh sorry, it is HAWQ 1.3.1.
>>
>> And the data engineer will upgrade to MADlib 1.8 tonight.
>>
>> Thanks,
>> Esther
>>
>> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Please clarify the platform - do you mean GPDB 4.2.0?
>>>
>>> Would you be able to upgrade to MADlib 1.8? Then you are using the
>>> latest software and we can see if you still have a problem.
>>>
>>> Frank
>>>
>>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Thanks for the question, Esther. What version of MADlib are you using
>>>>> and what database platform and version are you running on?
>>>>>
>>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>>>> allow user-specified column names in PCA.)
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use pca_train but I am running through this error:
>>>>>>
>>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>>>> pid=104068) (plpython.c:4648)
>>>>>> SQL state: XX000
>>>>>> Context: Traceback (most recent call last):
>>>>>> PL/Python function "pca_train", line 23, in <module>
>>>>>> return pca.pca(**globals())
>>>>>> PL/Python function "pca_train", line 404, in pca
>>>>>> PL/Python function "pca_train"
>>>>>>
>>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>>> with 853 features. I am calling pca_train like this:
>>>>>>
>>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>> 'ev.hci_subset_pca_output',
>>>>>> 'row_id',
>>>>>> 3);
>>>>>>
>>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>>> the second screenshot.
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> I am not sure why I am running through this error. Please advice.
>>>>>>
>>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id"
>>>>>> starts with 1. Still getting the same error.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> *Esther Vasiete *
>>>>>> *Data Scientist | Pivotal*
>>>>>> evasiete@pivotal.io
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
Upgrading to MADlib 1.8 solved the problem!
Thanks,
Esther
On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io> wrote:
> Oh sorry, it is HAWQ 1.3.1.
>
> And the data engineer will upgrade to MADlib 1.8 tonight.
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Please clarify the platform - do you mean GPDB 4.2.0?
>>
>> Would you be able to upgrade to MADlib 1.8? Then you are using the
>> latest software and we can see if you still have a problem.
>>
>> Frank
>>
>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>
>>> Thanks.
>>>
>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>> wrote:
>>>
>>>> Thanks for the question, Esther. What version of MADlib are you using
>>>> and what database platform and version are you running on?
>>>>
>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>>> allow user-specified column names in PCA.)
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use pca_train but I am running through this error:
>>>>>
>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>>> pid=104068) (plpython.c:4648)
>>>>> SQL state: XX000
>>>>> Context: Traceback (most recent call last):
>>>>> PL/Python function "pca_train", line 23, in <module>
>>>>> return pca.pca(**globals())
>>>>> PL/Python function "pca_train", line 404, in pca
>>>>> PL/Python function "pca_train"
>>>>>
>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>> with 853 features. I am calling pca_train like this:
>>>>>
>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>> 'ev.hci_subset_pca_output',
>>>>> 'row_id',
>>>>> 3);
>>>>>
>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>> the second screenshot.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> [image: Inline image 3]
>>>>>
>>>>> I am not sure why I am running through this error. Please advice.
>>>>>
>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>>> with 1. Still getting the same error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> *Esther Vasiete *
>>>>> *Data Scientist | Pivotal*
>>>>> evasiete@pivotal.io
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
Upgrading to MADlib 1.8 solved the problem!
Thanks,
Esther
On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <ev...@pivotal.io> wrote:
> Oh sorry, it is HAWQ 1.3.1.
>
> And the data engineer will upgrade to MADlib 1.8 tonight.
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Please clarify the platform - do you mean GPDB 4.2.0?
>>
>> Would you be able to upgrade to MADlib 1.8? Then you are using the
>> latest software and we can see if you still have a problem.
>>
>> Frank
>>
>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>
>>> Thanks.
>>>
>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>>> wrote:
>>>
>>>> Thanks for the question, Esther. What version of MADlib are you using
>>>> and what database platform and version are you running on?
>>>>
>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>>> allow user-specified column names in PCA.)
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use pca_train but I am running through this error:
>>>>>
>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>>> pid=104068) (plpython.c:4648)
>>>>> SQL state: XX000
>>>>> Context: Traceback (most recent call last):
>>>>> PL/Python function "pca_train", line 23, in <module>
>>>>> return pca.pca(**globals())
>>>>> PL/Python function "pca_train", line 404, in pca
>>>>> PL/Python function "pca_train"
>>>>>
>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>> with 853 features. I am calling pca_train like this:
>>>>>
>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>> 'ev.hci_subset_pca_output',
>>>>> 'row_id',
>>>>> 3);
>>>>>
>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>>> the second screenshot.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> [image: Inline image 3]
>>>>>
>>>>> I am not sure why I am running through this error. Please advice.
>>>>>
>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>>> with 1. Still getting the same error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> *Esther Vasiete *
>>>>> *Data Scientist | Pivotal*
>>>>> evasiete@pivotal.io
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
Oh sorry, it is HAWQ 1.3.1.
And the data engineer will upgrade to MADlib 1.8 tonight.
Thanks,
Esther
On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
wrote:
> Please clarify the platform - do you mean GPDB 4.2.0?
>
> Would you be able to upgrade to MADlib 1.8? Then you are using the latest
> software and we can see if you still have a problem.
>
> Frank
>
> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>
>> Thanks.
>>
>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Thanks for the question, Esther. What version of MADlib are you using
>>> and what database platform and version are you running on?
>>>
>>> It seems to be a MADlib version lower than 1.8 since the error message
>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>> allow user-specified column names in PCA.)
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use pca_train but I am running through this error:
>>>>
>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>> Function "madlib.__matrix_densify_sfunc(double
>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>> pid=104068) (plpython.c:4648)
>>>> SQL state: XX000
>>>> Context: Traceback (most recent call last):
>>>> PL/Python function "pca_train", line 23, in <module>
>>>> return pca.pca(**globals())
>>>> PL/Python function "pca_train", line 404, in pca
>>>> PL/Python function "pca_train"
>>>>
>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>> with 853 features. I am calling pca_train like this:
>>>>
>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>> 'ev.hci_subset_pca_output',
>>>> 'row_id',
>>>> 3);
>>>>
>>>> I unfortunately cannot share the data but this is how it looks in
>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>> the second screenshot.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> [image: Inline image 3]
>>>>
>>>> I am not sure why I am running through this error. Please advice.
>>>>
>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>> with 1. Still getting the same error.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
Oh sorry, it is HAWQ 1.3.1.
And the data engineer will upgrade to MADlib 1.8 tonight.
Thanks,
Esther
On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fm...@pivotal.io>
wrote:
> Please clarify the platform - do you mean GPDB 4.2.0?
>
> Would you be able to upgrade to MADlib 1.8? Then you are using the latest
> software and we can see if you still have a problem.
>
> Frank
>
> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>
>> Thanks.
>>
>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
>> wrote:
>>
>>> Thanks for the question, Esther. What version of MADlib are you using
>>> and what database platform and version are you running on?
>>>
>>> It seems to be a MADlib version lower than 1.8 since the error message
>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>>> allow user-specified column names in PCA.)
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use pca_train but I am running through this error:
>>>>
>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>> Function "madlib.__matrix_densify_sfunc(double
>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>>> pid=104068) (plpython.c:4648)
>>>> SQL state: XX000
>>>> Context: Traceback (most recent call last):
>>>> PL/Python function "pca_train", line 23, in <module>
>>>> return pca.pca(**globals())
>>>> PL/Python function "pca_train", line 404, in pca
>>>> PL/Python function "pca_train"
>>>>
>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>> with 853 features. I am calling pca_train like this:
>>>>
>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>> 'ev.hci_subset_pca_output',
>>>> 'row_id',
>>>> 3);
>>>>
>>>> I unfortunately cannot share the data but this is how it looks in
>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>> the second screenshot.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> [image: Inline image 3]
>>>>
>>>> I am not sure why I am running through this error. Please advice.
>>>>
>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>> with 1. Still getting the same error.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasiete@pivotal.io
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Please clarify the platform - do you mean GPDB 4.2.0?
Would you be able to upgrade to MADlib 1.8? Then you are using the latest
software and we can see if you still have a problem.
Frank
On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io> wrote:
> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther. What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>> PL/Python function "pca_train", line 23, in <module>
>>> return pca.pca(**globals())
>>> PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>> 'ev.hci_subset_pca_output',
>>> 'row_id',
>>> 3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Please clarify the platform - do you mean GPDB 4.2.0?
Would you be able to upgrade to MADlib 1.8? Then you are using the latest
software and we can see if you still have a problem.
Frank
On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <ev...@pivotal.io> wrote:
> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther. What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>> PL/Python function "pca_train", line 23, in <module>
>>> return pca.pca(**globals())
>>> PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>> 'ev.hci_subset_pca_output',
>>> 'row_id',
>>> 3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasiete@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
I am using MADlib 1.7.1 on HAWQ 4.2.0.
Thanks.
On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
wrote:
> Thanks for the question, Esther. What version of MADlib are you using and
> what database platform and version are you running on?
>
> It seems to be a MADlib version lower than 1.8 since the error message you
> report is different in the 1.8 release. (There was a bug fix in 1.8 to allow
> user-specified column names in PCA.)
>
> Frank
>
>
>
>
>
> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Hi,
>>
>> I am trying to use pca_train but I am running through this error:
>>
>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>> Function "madlib.__matrix_densify_sfunc(double
>> precision[],integer,integer,double precision)": invalid argument - col
>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>> pid=104068) (plpython.c:4648)
>> SQL state: XX000
>> Context: Traceback (most recent call last):
>> PL/Python function "pca_train", line 23, in <module>
>> return pca.pca(**globals())
>> PL/Python function "pca_train", line 404, in pca
>> PL/Python function "pca_train"
>>
>> My input table has 15472 rows and two columns; a row_id and an array with
>> 853 features. I am calling pca_train like this:
>>
>> DROP TABLE if exists ev.hci_subset_pca_output;
>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>> 'ev.hci_subset_pca_output',
>> 'row_id',
>> 3);
>>
>> I unfortunately cannot share the data but this is how it looks in
>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>> large and this is why it appears to be empty but it isn't as you can see in
>> the second screenshot.
>>
>> [image: Inline image 1]
>>
>> [image: Inline image 3]
>>
>> I am not sure why I am running through this error. Please advice.
>>
>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>> with 1. Still getting the same error.
>>
>> Thanks,
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>>
>>
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Esther Vasiete <ev...@pivotal.io>.
I am using MADlib 1.7.1 on HAWQ 4.2.0.
Thanks.
On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fm...@pivotal.io>
wrote:
> Thanks for the question, Esther. What version of MADlib are you using and
> what database platform and version are you running on?
>
> It seems to be a MADlib version lower than 1.8 since the error message you
> report is different in the 1.8 release. (There was a bug fix in 1.8 to allow
> user-specified column names in PCA.)
>
> Frank
>
>
>
>
>
> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io>
> wrote:
>
>> Hi,
>>
>> I am trying to use pca_train but I am running through this error:
>>
>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>> Function "madlib.__matrix_densify_sfunc(double
>> precision[],integer,integer,double precision)": invalid argument - col
>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
>> pid=104068) (plpython.c:4648)
>> SQL state: XX000
>> Context: Traceback (most recent call last):
>> PL/Python function "pca_train", line 23, in <module>
>> return pca.pca(**globals())
>> PL/Python function "pca_train", line 404, in pca
>> PL/Python function "pca_train"
>>
>> My input table has 15472 rows and two columns; a row_id and an array with
>> 853 features. I am calling pca_train like this:
>>
>> DROP TABLE if exists ev.hci_subset_pca_output;
>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>> 'ev.hci_subset_pca_output',
>> 'row_id',
>> 3);
>>
>> I unfortunately cannot share the data but this is how it looks in
>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>> large and this is why it appears to be empty but it isn't as you can see in
>> the second screenshot.
>>
>> [image: Inline image 1]
>>
>> [image: Inline image 3]
>>
>> I am not sure why I am running through this error. Please advice.
>>
>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>> with 1. Still getting the same error.
>>
>> Thanks,
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasiete@pivotal.io
>>
>>
>>
>
--
*Esther Vasiete *
*Data Scientist | Pivotal*
evasiete@pivotal.io
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks for the question, Esther. What version of MADlib are you using and
what database platform and version are you running on?
It seems to be a MADlib version lower than 1.8 since the error message you
report is different in the 1.8 release. (There was a bug fix in 1.8 to allow
user-specified column names in PCA.)
Frank
On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io> wrote:
> Hi,
>
> I am trying to use pca_train but I am running through this error:
>
> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
> Function "madlib.__matrix_densify_sfunc(double
> precision[],integer,integer,double precision)": invalid argument - col
> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
> pid=104068) (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
> PL/Python function "pca_train", line 23, in <module>
> return pca.pca(**globals())
> PL/Python function "pca_train", line 404, in pca
> PL/Python function "pca_train"
>
> My input table has 15472 rows and two columns; a row_id and an array with
> 853 features. I am calling pca_train like this:
>
> DROP TABLE if exists ev.hci_subset_pca_output;
> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
> 'ev.hci_subset_pca_output',
> 'row_id',
> 3);
>
> I unfortunately cannot share the data but this is how it looks in
> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
> large and this is why it appears to be empty but it isn't as you can see in
> the second screenshot.
>
> [image: Inline image 1]
>
> [image: Inline image 3]
>
> I am not sure why I am running through this error. Please advice.
>
> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
> with 1. Still getting the same error.
>
> Thanks,
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
>
>
Re: pca_train error
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks for the question, Esther. What version of MADlib are you using and
what database platform and version are you running on?
It seems to be a MADlib version lower than 1.8 since the error message you
report is different in the 1.8 release. (There was a bug fix in 1.8 to allow
user-specified column names in PCA.)
Frank
On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <ev...@pivotal.io> wrote:
> Hi,
>
> I am trying to use pca_train but I am running through this error:
>
> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
> Function "madlib.__matrix_densify_sfunc(double
> precision[],integer,integer,double precision)": invalid argument - col
> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003
> pid=104068) (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
> PL/Python function "pca_train", line 23, in <module>
> return pca.pca(**globals())
> PL/Python function "pca_train", line 404, in pca
> PL/Python function "pca_train"
>
> My input table has 15472 rows and two columns; a row_id and an array with
> 853 features. I am calling pca_train like this:
>
> DROP TABLE if exists ev.hci_subset_pca_output;
> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
> 'ev.hci_subset_pca_output',
> 'row_id',
> 3);
>
> I unfortunately cannot share the data but this is how it looks in
> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
> large and this is why it appears to be empty but it isn't as you can see in
> the second screenshot.
>
> [image: Inline image 1]
>
> [image: Inline image 3]
>
> I am not sure why I am running through this error. Please advice.
>
> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
> with 1. Still getting the same error.
>
> Thanks,
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasiete@pivotal.io
>
>
>