You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Matthew Rathbone <ma...@foursquare.com> on 2011/03/24 21:36:23 UTC

Adding a temporary function for thirft queries

Hey all,

We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries, and
I'm having problems dynamically adding functions for use in the thrift
server.

I want to add a jar, add a function, then execute a query.

Using ruby as the example, I've tried:

      Hive.connect(@url, @port) do |connection|
        connection.execute(<ADD JAR and FUNCTION>)
        results = connection.fetch(query)
      end
but the function is not available between calls.

So I tried prepending the query with the function creation calls, but then I
don't get any data back from hive (simply an empty array).

Could someone direct me to the best way to add functions for thrift queries?
Honestly I'd rather add them permanently on startup, but I can't find a way
to do that either.

Re: Adding a temporary function for thirft queries

Posted by Matthew Rathbone <ma...@foursquare.com>.
Hey, thanks for the response.

I have the jar on the thrift server's local file system (its the same
machine as is running hive) and it's this path I pass to the add jar
command.
If I tail the logs I can see that the ADD JAR command is successful (when
loading from local fs), but the subsequent execution of the create function
statement still doesn't see the class:

Added /mnt/var/lib/hive_07/downloaded_resources/udf.jar to class path
11/03/28 15:14:10 INFO exec.FunctionTask: create function:
java.lang.ClassNotFoundException: com.example.udf.Function1

Do you know if the state gets reset between executes?

On Mon, Mar 28, 2011 at 10:57 AM, Edward Capriolo <ed...@gmail.com>wrote:

> On Mon, Mar 28, 2011 at 10:53 AM, Matthew Rathbone
> <ma...@foursquare.com> wrote:
> > Hey guys,
> > I could really do with some expert-hive help on my issue, my
> hive-expertise
> > are not all that great.
> > I'm using hive 0.7 with hadoop 0.20
> > A simple way to describe my problem is this:
> > Using thrift, if you execute the following sequence:
> > thrift.execute("ADD JAR /udf.jar");
> > thrift.execute("create temporary function function1 as
> > 'org.apache.test.Function' ")
> > then the second execute doesn't see the jar.
> > But if I try to string them together:
> > thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
> > 'org.apache.test.Function1' ")
> > then hive throws errors:
> > 11/03/28 14:51:07 INFO SessionState: Added resource:
> > /mnt/var/lib/hive_07/downloaded_resources/udf.jar
> > ; does not exist
> > 11/03/28 14:51:07 ERROR SessionState: ; does not exist
> > create does not exist
> > 11/03/28 14:51:07 ERROR SessionState: create does not exist
> > temporary does not exist
> > 11/03/28 14:51:07 ERROR SessionState: temporary does not exist
> > function does not exist
> > 11/03/28 14:51:07 ERROR SessionState: function does not exist
> >
> >
> > Does anyone have a suggestion on how to string these together (along with
> a
> > select statement afterwards)
> > Thanks for any help,
> > Matthew
> >
> >
> > On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone <
> matthew@foursquare.com>
> > wrote:
> >>
> >> Hey all,
> >> We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
> >> and I'm having problems dynamically adding functions for use in the
> thrift
> >> server.
> >> I want to add a jar, add a function, then execute a query.
> >> Using ruby as the example, I've tried:
> >>       Hive.connect(@url, @port) do |connection|
> >>         connection.execute(<ADD JAR and FUNCTION>)
> >>         results = connection.fetch(query)
> >>       end
> >> but the function is not available between calls.
> >> So I tried prepending the query with the function creation calls, but
> then
> >> I don't get any data back from hive (simply an empty array).
> >> Could someone direct me to the best way to add functions for thrift
> >> queries? Honestly I'd rather add them permanently on startup, but I
> can't
> >> find a way to do that either.
> >
> >
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com | @rathboma | 4sq
> >
>
> Traditionally 'add jar' would look for the jar file to be on the
> thrift servers local file system not the client. I believe their is a
> 0.7.0 patch to load UDF jars from HDFS so this might help.
>



-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>

Re: Adding a temporary function for thirft queries

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Mar 28, 2011 at 10:53 AM, Matthew Rathbone
<ma...@foursquare.com> wrote:
> Hey guys,
> I could really do with some expert-hive help on my issue, my hive-expertise
> are not all that great.
> I'm using hive 0.7 with hadoop 0.20
> A simple way to describe my problem is this:
> Using thrift, if you execute the following sequence:
> thrift.execute("ADD JAR /udf.jar");
> thrift.execute("create temporary function function1 as
> 'org.apache.test.Function' ")
> then the second execute doesn't see the jar.
> But if I try to string them together:
> thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
> 'org.apache.test.Function1' ")
> then hive throws errors:
> 11/03/28 14:51:07 INFO SessionState: Added resource:
> /mnt/var/lib/hive_07/downloaded_resources/udf.jar
> ; does not exist
> 11/03/28 14:51:07 ERROR SessionState: ; does not exist
> create does not exist
> 11/03/28 14:51:07 ERROR SessionState: create does not exist
> temporary does not exist
> 11/03/28 14:51:07 ERROR SessionState: temporary does not exist
> function does not exist
> 11/03/28 14:51:07 ERROR SessionState: function does not exist
>
>
> Does anyone have a suggestion on how to string these together (along with a
> select statement afterwards)
> Thanks for any help,
> Matthew
>
>
> On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone <ma...@foursquare.com>
> wrote:
>>
>> Hey all,
>> We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
>> and I'm having problems dynamically adding functions for use in the thrift
>> server.
>> I want to add a jar, add a function, then execute a query.
>> Using ruby as the example, I've tried:
>>       Hive.connect(@url, @port) do |connection|
>>         connection.execute(<ADD JAR and FUNCTION>)
>>         results = connection.fetch(query)
>>       end
>> but the function is not available between calls.
>> So I tried prepending the query with the function creation calls, but then
>> I don't get any data back from hive (simply an empty array).
>> Could someone direct me to the best way to add functions for thrift
>> queries? Honestly I'd rather add them permanently on startup, but I can't
>> find a way to do that either.
>
>
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma | 4sq
>

Traditionally 'add jar' would look for the jar file to be on the
thrift servers local file system not the client. I believe their is a
0.7.0 patch to load UDF jars from HDFS so this might help.

Re: Adding a temporary function for thirft queries

Posted by Matthew Rathbone <ma...@foursquare.com>.
Hey guys,

I could really do with some expert-hive help on my issue, my hive-expertise
are not all that great.

I'm using hive 0.7 with hadoop 0.20

A simple way to describe my problem is this:

Using thrift, if you execute the following sequence:
thrift.execute("ADD JAR /udf.jar");
thrift.execute("create temporary function function1 as
'org.apache.test.Function' ")

then the second execute doesn't see the jar.

But if I try to string them together:
thrift.execute("ADD JAR /udf.jar ; create temporary function function1 as
'org.apache.test.Function1' ")

then hive throws errors:
11/03/28 14:51:07 INFO SessionState: Added resource:
/mnt/var/lib/hive_07/downloaded_resources/udf.jar
; does not exist
11/03/28 14:51:07 ERROR SessionState: ; does not exist
create does not exist
11/03/28 14:51:07 ERROR SessionState: create does not exist
temporary does not exist
11/03/28 14:51:07 ERROR SessionState: temporary does not exist
function does not exist
11/03/28 14:51:07 ERROR SessionState: function does not exist



Does anyone have a suggestion on how to string these together (along with a
select statement afterwards)

Thanks for any help,

Matthew



On Thu, Mar 24, 2011 at 4:36 PM, Matthew Rathbone <ma...@foursquare.com>wrote:

> Hey all,
>
> We use Amazon's elastic mapreduce and Hive 0.7 to run analytics queries,
> and I'm having problems dynamically adding functions for use in the thrift
> server.
>
> I want to add a jar, add a function, then execute a query.
>
> Using ruby as the example, I've tried:
>
>       Hive.connect(@url, @port) do |connection|
>         connection.execute(<ADD JAR and FUNCTION>)
>         results = connection.fetch(query)
>       end
> but the function is not available between calls.
>
> So I tried prepending the query with the function creation calls, but then
> I don't get any data back from hive (simply an empty array).
>
> Could someone direct me to the best way to add functions for thrift
> queries? Honestly I'd rather add them permanently on startup, but I can't
> find a way to do that either.
>



-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>