You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by John Omernik <jo...@omernik.com> on 2013/02/14 05:38:57 UTC

Using Reflect: A thread for ideas

I stumbled across the little documented reflect function today. I've always
known about it, but java scares me if it's not in a cup so I didn't dig.
 Well today I dug, and found an awesome use case for reflect (for me) and
wanted to share.  I also thought it would be nice to validate some thoughts
I had on reflect, and how we could possibly share ideas on reflect so that
folks could get more use out of this great feature of hive.

Here's my example: A simple URL decode function:

select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as
decoded_url from logs
Basically I am using the decode function of the java.net.URLDecoder class.
 Pretty awesome, works great, no files to distribute either.  Even works
through JDBC!

Ok that being said, I realized now that the function I am trying to call
has to return data in a simple data type.  For example, I struggle to come
up with a simple reflect() for making an Hex MD5 out of a string because
the built in function return an object, which have methods that can return
what I am looking for. Which is great, but then I have to compile java
code, distribute a jar, and then run the code. I am looking for simple like
the URLDecoding function.

I love this reflect feature, but I think it's probably underutilized due to
the perceived usability issues for beginners.  So that leads me to my next
thought. What if we brain storm here handy functions in Java that are not
included in the standard hive language, that make the transition to hive
well using the reflect function and the show an example of it's use? I went
first with my URLDecode, and obviously will be looking for more, but have
you seen some examples that we neat and worked well for you? Can you share?


Perhaps if we get enough examples we could roll some of these into a wiki
page on the hive wiki that folks can use to get over the "perceived"
complexity of using java reflect?

Thanks to those who have worked hard to implement features like this, it is
truly awesome.

Re: Using Reflect: A thread for ideas

Posted by Edward Capriolo <ed...@gmail.com>.
We could very easily write hive so that a UDF is a piece of groovy
loaded dynamically. This is my go-to system to make things plugable.

On Tue, Feb 19, 2013 at 10:03 AM, John Meagher <jo...@gmail.com> wrote:
> Another option for this functionality would be to use the Java scripting
> API.  The basic structure of the call would be...
>
> select script( scriptLanguage, scriptToRun, args... )
>
> I haven't seen that in Hive, but something similar is available for Pig.
> Documentation for that is available on
> http://pig.apache.org/docs/r0.9.2/udf.html#js-udfs.  There's also a
> variation in Jira https://issues.apache.org/jira/browse/PIG-1777.
>
>
>
> On Wed, Feb 13, 2013 at 11:38 PM, John Omernik <jo...@omernik.com> wrote:
>>
>> I stumbled across the little documented reflect function today. I've
>> always known about it, but java scares me if it's not in a cup so I didn't
>> dig.  Well today I dug, and found an awesome use case for reflect (for me)
>> and wanted to share.  I also thought it would be nice to validate some
>> thoughts I had on reflect, and how we could possibly share ideas on reflect
>> so that folks could get more use out of this great feature of hive.
>>
>> Here's my example: A simple URL decode function:
>>
>> select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as
>> decoded_url from logs
>> Basically I am using the decode function of the java.net.URLDecoder class.
>> Pretty awesome, works great, no files to distribute either.  Even works
>> through JDBC!
>>
>> Ok that being said, I realized now that the function I am trying to call
>> has to return data in a simple data type.  For example, I struggle to come
>> up with a simple reflect() for making an Hex MD5 out of a string because the
>> built in function return an object, which have methods that can return what
>> I am looking for. Which is great, but then I have to compile java code,
>> distribute a jar, and then run the code. I am looking for simple like the
>> URLDecoding function.
>>
>> I love this reflect feature, but I think it's probably underutilized due
>> to the perceived usability issues for beginners.  So that leads me to my
>> next thought. What if we brain storm here handy functions in Java that are
>> not included in the standard hive language, that make the transition to hive
>> well using the reflect function and the show an example of it's use? I went
>> first with my URLDecode, and obviously will be looking for more, but have
>> you seen some examples that we neat and worked well for you? Can you share?
>>
>> Perhaps if we get enough examples we could roll some of these into a wiki
>> page on the hive wiki that folks can use to get over the "perceived"
>> complexity of using java reflect?
>>
>> Thanks to those who have worked hard to implement features like this, it
>> is truly awesome.
>
>

Re: Using Reflect: A thread for ideas

Posted by John Meagher <jo...@gmail.com>.
Another option for this functionality would be to use the Java scripting
API.  The basic structure of the call would be...

select script( scriptLanguage, scriptToRun, args... )

I haven't seen that in Hive, but something similar is available for Pig.
 Documentation for that is available on
http://pig.apache.org/docs/r0.9.2/udf.html#js-udfs.  There's also a
variation in Jira https://issues.apache.org/jira/browse/PIG-1777.



On Wed, Feb 13, 2013 at 11:38 PM, John Omernik <jo...@omernik.com> wrote:

> I stumbled across the little documented reflect function today. I've
> always known about it, but java scares me if it's not in a cup so I didn't
> dig.  Well today I dug, and found an awesome use case for reflect (for me)
> and wanted to share.  I also thought it would be nice to validate some
> thoughts I had on reflect, and how we could possibly share ideas on reflect
> so that folks could get more use out of this great feature of hive.
>
> Here's my example: A simple URL decode function:
>
> select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as
> decoded_url from logs
> Basically I am using the decode function of the java.net.URLDecoder class.
>  Pretty awesome, works great, no files to distribute either.  Even works
> through JDBC!
>
> Ok that being said, I realized now that the function I am trying to call
> has to return data in a simple data type.  For example, I struggle to come
> up with a simple reflect() for making an Hex MD5 out of a string because
> the built in function return an object, which have methods that can return
> what I am looking for. Which is great, but then I have to compile java
> code, distribute a jar, and then run the code. I am looking for simple like
> the URLDecoding function.
>
> I love this reflect feature, but I think it's probably underutilized due
> to the perceived usability issues for beginners.  So that leads me to my
> next thought. What if we brain storm here handy functions in Java that are
> not included in the standard hive language, that make the transition to
> hive well using the reflect function and the show an example of it's use? I
> went first with my URLDecode, and obviously will be looking for more, but
> have you seen some examples that we neat and worked well for you? Can you
> share?
>
> Perhaps if we get enough examples we could roll some of these into a wiki
> page on the hive wiki that folks can use to get over the "perceived"
> complexity of using java reflect?
>
> Thanks to those who have worked hard to implement features like this, it
> is truly awesome.
>

Re: CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

Posted by John Omernik <jo...@omernik.com>.
I think the idea has potential.   It would be cool if there was a sort of
(excuse the analogy) social media like system that would allow community to
identify which UD[AT]Fs or reflect type functions are useful to them.  It
would almost act as a staging platform to help identify to
the maintainers of Hive which functionality should be included in hive.
For example this JIRA was added in 2010 and last updated in August 2012 and
has lots of UD[AT]Fs that we'd find useful, but doesn't have any movement
on it.  If we had a place where I could go to determine thing like how to
implement MD5 or how how do URL escaping, and then using one of the
options, and at the same time "voting" on a function that would be great.


https://issues.apache.org/jira/browse/HIVE-1545



On Thu, Feb 14, 2013 at 12:59 AM, Robin Morris <rd...@baynote.com> wrote:

>  I think we need to think a little bigger than this.
>
>  Recently I've been thinking that what would be most useful to the Hive
> user community would be a CHAN – Comprehensive Hive Archive Network,
> (analogous to CPAN, CRAN, CTAN etc.).  A central place where
> user-contributed UD[A,T]Fs could be uploaded and made available to the
> wider community.
>
>  Is there interest in the user community for something like this?
>
>  Robin
>
>   From: John Omernik <jo...@omernik.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Wednesday, February 13, 2013 8:38 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Using Reflect: A thread for ideas
>
>  I stumbled across the little documented reflect function today. I've
> always known about it, but java scares me if it's not in a cup so I didn't
> dig.  Well today I dug, and found an awesome use case for reflect (for me)
> and wanted to share.  I also thought it would be nice to validate some
> thoughts I had on reflect, and how we could possibly share ideas on reflect
> so that folks could get more use out of this great feature of hive.
>
>  Here's my example: A simple URL decode function:
>
>  select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as
> decoded_url from logs
> Basically I am using the decode function of the java.net.URLDecoder class.
>  Pretty awesome, works great, no files to distribute either.  Even works
> through JDBC!
>
>  Ok that being said, I realized now that the function I am trying to call
> has to return data in a simple data type.  For example, I struggle to come
> up with a simple reflect() for making an Hex MD5 out of a string because
> the built in function return an object, which have methods that can return
> what I am looking for. Which is great, but then I have to compile java
> code, distribute a jar, and then run the code. I am looking for simple like
> the URLDecoding function.
>
>  I love this reflect feature, but I think it's probably underutilized due
> to the perceived usability issues for beginners.  So that leads me to my
> next thought. What if we brain storm here handy functions in Java that are
> not included in the standard hive language, that make the transition to
> hive well using the reflect function and the show an example of it's use? I
> went first with my URLDecode, and obviously will be looking for more, but
> have you seen some examples that we neat and worked well for you? Can you
> share?
>
>  Perhaps if we get enough examples we could roll some of these into a
> wiki page on the hive wiki that folks can use to get over the "perceived"
> complexity of using java reflect?
>
>  Thanks to those who have worked hard to implement features like this, it
> is truly awesome.
>

Re: CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

Posted by Patrick D'Souza <pa...@gmail.com>.
I was having a similar discussion with some of our Hadoop Admins at
About.com last week about a central repository for UD[A,T]Fs. Most often
we're trying to accomplish the same thing with UD[A,T]Fs so a central
repository can make live easy for everyone.


On Thu, Feb 14, 2013 at 8:58 AM, Connell, Chuck <Ch...@nuance.com>wrote:

>  +1, great idea!****
>
> ** **
>
> Chuck Connell, Nuance****
>
> ** **
>
> *From:* Robin Morris [mailto:rdm@baynote.com]
> *Sent:* Thursday, February 14, 2013 1:59 AM
> *To:* user@hive.apache.org
> *Subject:* CHAN (Comprehensive Hive Archive Network) (Was Re: Using
> Reflect: A thread for ideas)****
>
> ** **
>
> I think we need to think a little bigger than this.  ****
>
> ** **
>
> Recently I've been thinking that what would be most useful to the Hive
> user community would be a CHAN – Comprehensive Hive Archive Network,
> (analogous to CPAN, CRAN, CTAN etc.).  A central place where
> user-contributed UD[A,T]Fs could be uploaded and made available to the
> wider community.****
>
> ** **
>
> Is there interest in the user community for something like this?****
>
> ** **
>
> Robin****
>
> ** **
>
> *From: *John Omernik <jo...@omernik.com>
> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Date: *Wednesday, February 13, 2013 8:38 PM
> *To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Subject: *Using Reflect: A thread for ideas****
>
> ** **
>
> I stumbled across the little documented reflect function today. I've
> always known about it, but java scares me if it's not in a cup so I didn't
> dig.  Well today I dug, and found an awesome use case for reflect (for me)
> and wanted to share.  I also thought it would be nice to validate some
> thoughts I had on reflect, and how we could possibly share ideas on reflect
> so that folks could get more use out of this great feature of hive.  ****
>
> ** **
>
> Here's my example: A simple URL decode function:****
>
> ** **
>
> select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as
> decoded_url from logs****
>
> Basically I am using the decode function of the java.net.URLDecoder class.
>  Pretty awesome, works great, no files to distribute either.  Even works
> through JDBC!****
>
> ** **
>
> Ok that being said, I realized now that the function I am trying to call
> has to return data in a simple data type.  For example, I struggle to come
> up with a simple reflect() for making an Hex MD5 out of a string because
> the built in function return an object, which have methods that can return
> what I am looking for. Which is great, but then I have to compile java
> code, distribute a jar, and then run the code. I am looking for simple like
> the URLDecoding function.  ****
>
> ** **
>
> I love this reflect feature, but I think it's probably underutilized due
> to the perceived usability issues for beginners.  So that leads me to my
> next thought. What if we brain storm here handy functions in Java that are
> not included in the standard hive language, that make the transition to
> hive well using the reflect function and the show an example of it's use? I
> went first with my URLDecode, and obviously will be looking for more, but
> have you seen some examples that we neat and worked well for you? Can you
> share?  ****
>
> ** **
>
> Perhaps if we get enough examples we could roll some of these into a wiki
> page on the hive wiki that folks can use to get over the "perceived"
> complexity of using java reflect?  ****
>
> ** **
>
> Thanks to those who have worked hard to implement features like this, it
> is truly awesome. ****
>

RE: CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

Posted by "Connell, Chuck" <Ch...@nuance.com>.
+1, great idea!

Chuck Connell, Nuance

From: Robin Morris [mailto:rdm@baynote.com]
Sent: Thursday, February 14, 2013 1:59 AM
To: user@hive.apache.org
Subject: CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

I think we need to think a little bigger than this.

Recently I've been thinking that what would be most useful to the Hive user community would be a CHAN - Comprehensive Hive Archive Network, (analogous to CPAN, CRAN, CTAN etc.).  A central place where user-contributed UD[A,T]Fs could be uploaded and made available to the wider community.

Is there interest in the user community for something like this?

Robin

From: John Omernik <jo...@omernik.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, February 13, 2013 8:38 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Using Reflect: A thread for ideas

I stumbled across the little documented reflect function today. I've always known about it, but java scares me if it's not in a cup so I didn't dig.  Well today I dug, and found an awesome use case for reflect (for me) and wanted to share.  I also thought it would be nice to validate some thoughts I had on reflect, and how we could possibly share ideas on reflect so that folks could get more use out of this great feature of hive.

Here's my example: A simple URL decode function:

select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as decoded_url from logs
Basically I am using the decode function of the java.net.URLDecoder class.  Pretty awesome, works great, no files to distribute either.  Even works through JDBC!

Ok that being said, I realized now that the function I am trying to call has to return data in a simple data type.  For example, I struggle to come up with a simple reflect() for making an Hex MD5 out of a string because the built in function return an object, which have methods that can return what I am looking for. Which is great, but then I have to compile java code, distribute a jar, and then run the code. I am looking for simple like the URLDecoding function.

I love this reflect feature, but I think it's probably underutilized due to the perceived usability issues for beginners.  So that leads me to my next thought. What if we brain storm here handy functions in Java that are not included in the standard hive language, that make the transition to hive well using the reflect function and the show an example of it's use? I went first with my URLDecode, and obviously will be looking for more, but have you seen some examples that we neat and worked well for you? Can you share?

Perhaps if we get enough examples we could roll some of these into a wiki page on the hive wiki that folks can use to get over the "perceived" complexity of using java reflect?

Thanks to those who have worked hard to implement features like this, it is truly awesome.

CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

Posted by Robin Morris <rd...@baynote.com>.
I think we need to think a little bigger than this.

Recently I've been thinking that what would be most useful to the Hive user community would be a CHAN – Comprehensive Hive Archive Network, (analogous to CPAN, CRAN, CTAN etc.).  A central place where user-contributed UD[A,T]Fs could be uploaded and made available to the wider community.

Is there interest in the user community for something like this?

Robin

From: John Omernik <jo...@omernik.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, February 13, 2013 8:38 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Using Reflect: A thread for ideas

I stumbled across the little documented reflect function today. I've always known about it, but java scares me if it's not in a cup so I didn't dig.  Well today I dug, and found an awesome use case for reflect (for me) and wanted to share.  I also thought it would be nice to validate some thoughts I had on reflect, and how we could possibly share ideas on reflect so that folks could get more use out of this great feature of hive.

Here's my example: A simple URL decode function:

select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as decoded_url from logs
Basically I am using the decode function of the java.net.URLDecoder class.  Pretty awesome, works great, no files to distribute either.  Even works through JDBC!

Ok that being said, I realized now that the function I am trying to call has to return data in a simple data type.  For example, I struggle to come up with a simple reflect() for making an Hex MD5 out of a string because the built in function return an object, which have methods that can return what I am looking for. Which is great, but then I have to compile java code, distribute a jar, and then run the code. I am looking for simple like the URLDecoding function.

I love this reflect feature, but I think it's probably underutilized due to the perceived usability issues for beginners.  So that leads me to my next thought. What if we brain storm here handy functions in Java that are not included in the standard hive language, that make the transition to hive well using the reflect function and the show an example of it's use? I went first with my URLDecode, and obviously will be looking for more, but have you seen some examples that we neat and worked well for you? Can you share?

Perhaps if we get enough examples we could roll some of these into a wiki page on the hive wiki that folks can use to get over the "perceived" complexity of using java reflect?

Thanks to those who have worked hard to implement features like this, it is truly awesome.