You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by java8964 java8964 <ja...@hotmail.com> on 2012/09/25 21:17:14 UTC

How can I get the constant value from the ObjectInspector in the UDF

Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
I am trying to write a hive UDF function as to calculate the moving sum. Right now, I am having trouble to get the constrant value passed in in the initialization stage.
For example, let's assume the function is like the following format:
msum(salary, 10) --------- salary is a int type column
which means the end user wants to calculate the last 10 rows of salary.
I kind of know how to implement this UDF. But I have one problem right now.
1) This is not a UDAF, as each row will return one data back as the moving sum.2) I create an UDF class extends from the GenericUDF.3) I can get the column type from the ObjectInspector[] passed to me in the initialize() method to verify that 'salary' and 10 both needs to be numeric type (later one needs to be integer)4) But I also want to get the real value of 10, in this case, in the initialize() stage, so I can create the corresponding data structure based on the value end user specified here.5) I looks around the javadoc of ObjectInspector class. I know at run time the real class of the 2nd parameter is WritableIntObjectInspector. I can get the type, but how I can get the real value of it?6) This is kind of ConstantsObjectInspector, should be able to give the value to me, as it already knows the type is int. What how?7) I don't want to try to get the value at the evaluate stage. Can I get this value at the initialize stage?
Thanks
Yong 		 	   		  

Re: How can I get the constant value from the ObjectInspector in the UDF

Posted by Chen Song <ch...@gmail.com>.
With my limited knowledge of hive, I don't think it is possible to get the
actual value of the argument and I don't think it is or should be designed
to provide that information either. *initialize* is intended only for
decoding the meta structure (type and its associated evaluation mechanism)
of arguments. Storing any specific values of arguments at runtime is
anti-pattern in my opinion. Can you elaborate more on why you really need
the constant value in your case?

On your 2nd question, you can get the type information from object
inspector. For example, if you expect the 1st argument as a string. You can
use the following code snippet.


>       Category category = arguments[0].getCategory();
>
> String typeName = arguments[0].getTypeName();
> if (category == Category.PRIMITIVE && ((typeName ==
> Constants.STRING_TYPE_NAME) || (typeName == Constants.VOID_TYPE_NAME))) {
> if (typeName == Constants.STRING_TYPE_NAME) {
> stringObjectInspector = (StringObjectInspector) arguments[0];
> }
> } else {
> throw new UDFArgumentTypeException(0, "The " +
> GenericUDFUtils.getOrdinal(1) + " argument is expected to be \"" +
> Constants.STRING_TYPE_NAME + "|"
> + Constants.VOID_TYPE_NAME + "\" but \"" + typeName + "\" is found");
> }
>
>
Chen

On Thu, Sep 27, 2012 at 5:04 AM, java8964 java8964 <ja...@hotmail.com>wrote:

>  I understand your message. But in this situation, I want to do the
> following:
>
> 1) I want to get the value 10 in the initialization stage. I understand
> your point that the value will only available in the evaluate stage, but
> keep in mind that for this 10 in my example, it is a constants value. It
> won't change for every evaluating. It is kind of value I should be able to
> get in the initialization stage, right? The hive Query analyzer should
> understand this parameter in the function in fact is a constants value, and
> will be able to provide to me during the initialization stage.
> 2) Further question, can I get more information from the object inspector?
> For example, when I write the UDF, I want to make sure the first parameter
> is a numeric type. I can get the type, which I am able to valid it based on
> the type. But the question is if I want to error in some case, I want to
> show the end user the NAME of the parameter in my error message, instead of
> just position.
>
> For example, in the UDF as msum(column_name, 10), if I find out the type
> of the column_name is NOT a numeric type, I want in the error message I
> give to the end user, that 'column_name' should be numeric type. But right
> now, in the API, I can not get this information. Only thing I can get is
> the category type information, but I want more.
>
> Is it possible to do that in hive 0.7.1?
>
> Thanks for your help.
>
> Yong
>
> ------------------------------
> Date: Thu, 27 Sep 2012 02:32:19 +0900
> Subject: Re: How can I get the constant value from the ObjectInspector in
> the UDF
> From: chen.song.82@gmail.com
> To: user@hive.apache.org
>
>
> Hi Yong
>
> The way GenericUDF works is as follows.
>
> *ObjectInspector initialize(ObjectInspector[] arguments) *is called only
> once for one GenericUDF instance used in your Hive query. This phase is for
> preparation steps of UDF, such as syntax check and type inference.
>
> *Object evaluate(DeferredObject[] arguments)* is called to evaluate
> against actual arguments. This should be where the actual calculation
> happens and where you can get the real values you talked about.
>
> Thanks,
> Chen
>
> On Wed, Sep 26, 2012 at 4:17 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>
>  Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
>
> I am trying to write a hive UDF function as to calculate the moving sum.
> Right now, I am having trouble to get the constrant value passed in in the
> initialization stage.
>
> For example, let's assume the function is like the following format:
>
> msum(salary, 10) --------- salary is a int type column
>
> which means the end user wants to calculate the last 10 rows of salary.
>
> I kind of know how to implement this UDF. But I have one problem right now.
>
> 1) This is not a UDAF, as each row will return one data back as the moving
> sum.
> 2) I create an UDF class extends from the GenericUDF.
> 3) I can get the column type from the ObjectInspector[] passed to me in
> the initialize() method to verify that 'salary' and 10 both needs to be
> numeric type (later one needs to be integer)
> 4) But I also want to get the real value of 10, in this case, in the
> initialize() stage, so I can create the corresponding data structure based
> on the value end user specified here.
> 5) I looks around the javadoc of ObjectInspector class. I know at run time
> the real class of the 2nd parameter is WritableIntObjectInspector. I can
> get the type, but how I can get the real value of it?
> 6) This is kind of ConstantsObjectInspector, should be able to give the
> value to me, as it already knows the type is int. What how?
> 7) I don't want to try to get the value at the evaluate stage. Can I get
> this value at the initialize stage?
>
> Thanks
>
> Yong
>
>
>
>
> --
> Chen Song
>
>
>


-- 
Chen Song

RE: How can I get the constant value from the ObjectInspector in the UDF

Posted by java8964 java8964 <ja...@hotmail.com>.
I understand your message. But in this situation, I want to do the following:
1) I want to get the value 10 in the initialization stage. I understand your point that the value will only available in the evaluate stage, but keep in mind that for this 10 in my example, it is a constants value. It won't change for every evaluating. It is kind of value I should be able to get in the initialization stage, right? The hive Query analyzer should understand this parameter in the function in fact is a constants value, and will be able to provide to me during the initialization stage.2) Further question, can I get more information from the object inspector? For example, when I write the UDF, I want to make sure the first parameter is a numeric type. I can get the type, which I am able to valid it based on the type. But the question is if I want to error in some case, I want to show the end user the NAME of the parameter in my error message, instead of just position.
For example, in the UDF as msum(column_name, 10), if I find out the type of the column_name is NOT a numeric type, I want in the error message I give to the end user, that 'column_name' should be numeric type. But right now, in the API, I can not get this information. Only thing I can get is the category type information, but I want more.
Is it possible to do that in hive 0.7.1?
Thanks for your help.
Yong

Date: Thu, 27 Sep 2012 02:32:19 +0900
Subject: Re: How can I get the constant value from the ObjectInspector in the UDF
From: chen.song.82@gmail.com
To: user@hive.apache.org

Hi Yong
The way GenericUDF works is as follows.
ObjectInspector initialize(ObjectInspector[] arguments) is called only once for one GenericUDF instance used in your Hive query. This phase is for preparation steps of UDF, such as syntax check and type inference.









Object evaluate(DeferredObject[] arguments) is called to evaluate against actual arguments. This should be where the actual calculation happens and where you can get the real values you talked about.

Thanks,Chen

On Wed, Sep 26, 2012 at 4:17 AM, java8964 java8964 <ja...@hotmail.com> wrote:





Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
I am trying to write a hive UDF function as to calculate the moving sum. Right now, I am having trouble to get the constrant value passed in in the initialization stage.

For example, let's assume the function is like the following format:
msum(salary, 10) --------- salary is a int type column
which means the end user wants to calculate the last 10 rows of salary.

I kind of know how to implement this UDF. But I have one problem right now.
1) This is not a UDAF, as each row will return one data back as the moving sum.2) I create an UDF class extends from the GenericUDF.
3) I can get the column type from the ObjectInspector[] passed to me in the initialize() method to verify that 'salary' and 10 both needs to be numeric type (later one needs to be integer)4) But I also want to get the real value of 10, in this case, in the initialize() stage, so I can create the corresponding data structure based on the value end user specified here.
5) I looks around the javadoc of ObjectInspector class. I know at run time the real class of the 2nd parameter is WritableIntObjectInspector. I can get the type, but how I can get the real value of it?6) This is kind of ConstantsObjectInspector, should be able to give the value to me, as it already knows the type is int. What how?
7) I don't want to try to get the value at the evaluate stage. Can I get this value at the initialize stage?
Thanks
Yong 		 	   		  


-- 
Chen Song



 		 	   		  

Re: How can I get the constant value from the ObjectInspector in the UDF

Posted by Chen Song <ch...@gmail.com>.
Hi Yong

The way GenericUDF works is as follows.

*ObjectInspector initialize(ObjectInspector[] arguments) *is called only
once for one GenericUDF instance used in your Hive query. This phase is for
preparation steps of UDF, such as syntax check and type inference.

*Object evaluate(DeferredObject[] arguments)* is called to evaluate against
actual arguments. This should be where the actual calculation happens and
where you can get the real values you talked about.

Thanks,
Chen

On Wed, Sep 26, 2012 at 4:17 AM, java8964 java8964 <ja...@hotmail.com>wrote:

>  Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
>
> I am trying to write a hive UDF function as to calculate the moving sum.
> Right now, I am having trouble to get the constrant value passed in in the
> initialization stage.
>
> For example, let's assume the function is like the following format:
>
> msum(salary, 10) --------- salary is a int type column
>
> which means the end user wants to calculate the last 10 rows of salary.
>
> I kind of know how to implement this UDF. But I have one problem right now.
>
> 1) This is not a UDAF, as each row will return one data back as the moving
> sum.
> 2) I create an UDF class extends from the GenericUDF.
> 3) I can get the column type from the ObjectInspector[] passed to me in
> the initialize() method to verify that 'salary' and 10 both needs to be
> numeric type (later one needs to be integer)
> 4) But I also want to get the real value of 10, in this case, in the
> initialize() stage, so I can create the corresponding data structure based
> on the value end user specified here.
> 5) I looks around the javadoc of ObjectInspector class. I know at run time
> the real class of the 2nd parameter is WritableIntObjectInspector. I can
> get the type, but how I can get the real value of it?
> 6) This is kind of ConstantsObjectInspector, should be able to give the
> value to me, as it already knows the type is int. What how?
> 7) I don't want to try to get the value at the evaluate stage. Can I get
> this value at the initialize stage?
>
> Thanks
>
> Yong
>



-- 
Chen Song