You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Royce Rollins <rr...@attinteractive.com> on 2009/09/08 22:06:09 UTC

Hive and thrift session help

I¹m curently working on an application that connects to hive via the thrift
ruby libraries.

Does hive support creation of sessions using those libraries.  If so, how?


Royce

Re: Hive and thrift session help

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Sep 8, 2009 at 7:02 PM, Edward Capriolo<ed...@gmail.com> wrote:
> On Tue, Sep 8, 2009 at 6:37 PM, Vijay<te...@gmail.com> wrote:
>> I get that HWI does manage sessions but it does that leveraging the internal
>> functionality of the "server." One usage pattern I'd like is some kind of a
>> "job" API. What I mean by that is an API that lets us simply submit a query,
>> get some kind of "job id," and leave. After that we use other APIs to query
>> the job status, kill it, get the output once it is done, etc. If we have a
>> simple API like this and the semantics to support this within hive, then the
>> UI can be completely decoupled and be as stateless as it can (using vanilla
>> apache+php as an example, we can't really do threads or stay resident after
>> submitting a job). Does something like this exist either within hive or at
>> the hadoop level? It seems to me may be this is something that needs to be
>> built first.
>>
>> Thanks,
>> Vijay
>>
>> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>>
>>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>>> Rollins<rr...@attinteractive.com> wrote:
>>> > OK I see. I just looked at the code in HWISessionManager.java.  So it
>>> > looks
>>> > like either I will have to write my own ruby HWISessionManager that
>>> > manages
>>> > sessions through thrift or expose the existng HWISessionManager via some
>>> > web
>>> > service interface.  Has anyone done this?
>>> >
>>> > Royce
>>> >
>>> >
>>> > On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
>>> >
>>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>>> >>> Sorry to inject into this thread but I have the same problem (only I'm
>>> >>> trying to use the thrift PHP libraries from apache-php scripts). The
>>> >>> problem
>>> >>> with this approach is that the http request cannot run indefinitely as
>>> >>> the
>>> >>> server is executing a query. Are there any solutions for this?
>>> >>>
>>> >>> Thanks,
>>> >>> Vijay
>>> >>>
>>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>>> >>> <rr...@attinteractive.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Raghu,
>>> >>>> Thanks for the quick response.
>>> >>>> Yes.  My application is web based so instead of having to build some
>>> >>>> kind
>>> >>>> of
>>> >>>> session model myself for queries that might take a while,  I'd like
>>> >>>> to use
>>> >>>> a session model in the hive service.
>>> >>>>
>>> >>>> Royce
>>> >>>>
>>> >>>>
>>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>> >>>>
>>> >>>>> Our model so far has been to create a new connection to the hive
>>> >>>>> thrift
>>> >>>>> server per session. Is there anything specific you are looking for
>>> >>>>> in
>>> >>>>> sessions?
>>> >>>>>
>>> >>>>>
>>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> I¹m curently working on an application that connects to hive via
>>> >>>>>> the
>>> >>>>>> thrift
>>> >>>>>> ruby libraries.
>>> >>>>>>
>>> >>>>>> Does hive support creation of sessions using those libraries.  If
>>> >>>>>> so,
>>> >>>>>> how?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Royce
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >> Royce,
>>> >>
>>> >> The Hive Web Interface deals with this by having a threaded object
>>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
>>> >> has any equivalent to threading and Application Scope.
>>> >>
>>> >> Edward
>>> >
>>> >
>>>
>>> Someone correct me if I am wrong.
>>>
>>> Royce,
>>>
>>> You may be able to get at this another way. From my understanding, the
>>> internal hive web interface used at facebook would spawn ` bin/hive -e
>>> 'INSERT INTO X select * FROM`. All results were written to a hive
>>> table.
>>>
>>> Doing it this way gives you no way to interact with the query and
>>> 'stream' the result, set you can't really use 'fetchOne()' or
>>> 'fetchAll()' but you could start a query and set flags on completion.
>>>
>>> As for web interface, we just had some talks, and one of the things I
>>> was looking to do was create some type of web service style bindings.
>>> (We would also like to have HWI talk to Thrift and have thrift be the
>>> code path for everything). However, if we do make some web server
>>> style bindings they would really be independent of the back end. Do
>>> you want to work on this ? I would like to open a Jira and tackle the
>>> issue.
>>>
>>>
>>> The big picture here is that we need a 'state holder'. That is really
>>> what HWI is. You create a session, detach from it, and optionally
>>> check on it later. If an application needs that pattern how to handle
>>> it?
>>>
>>> One way to tackle this is
>>>
>>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>>>
>>> then have your client 'tail' the hdfs://path/to/file or record the
>>> last position it saw. I guess the big question is dealing with
>>> streaming results. HWI manages the session for you and writes the
>>> results to a local file, (and the new SessionBucket
>>>
>>> What is the usage pattern you need?
>>
>>
>
> Vijay,
>
>> What I mean by that is an API that lets us simply submit a query,
>> get some kind of "job id," and leave.
>
> No. (again someone correct me if I am wrong) As I under, if you
> disconnect from the Thrift HiveServer you can not reconnect.
>
> Assuming we punt on intermediate data (large queries with 10 TB of
> results waiting for client pickup). There are a few ways we (you)
> could handle this.
>
> You could use HWI as a web service. With some URL hacking like
> http://hwi:9999/hwi/create_session.jsp?name=bob
>
> This is not a true XML web service, but you could use it to accomplish
> your goals.
>
>> After that we use other APIs to query
>> the job status, kill it, get the output once it is done, etc
>
> We could write some other XMLRPC style JSP pages that would be a more
> formal web service.
>
> Hive Thrift Server could support this directly maybe with alternate
> constructors or objects for detached sessions.
>
> In summary
> option 1) URL hacking (you have that today, not very clean)
> option 2) web service bindings ( you could have that pretty fast, more
> clean does not have to touch anything upstream)
> option 3) detached sessions HiveServer ( patched HiveServer patched
> Hive Bindings, clean,)
>

It is an irony that you could have multiple 'hive -e'  running on the
same server, but with one JVM and thread locals/static variables have
had subtle issues.

Both stateful applications (hwi,hiveserver) struggle a bit as the API
was designed around the CLI. It would be interesting if the CLI could
even connect to a HiveServer or run a local HiveServer.

I opened up this issue: Create a Hive CLI that connects to hive ThriftServer
https://issues.apache.org/jira/browse/HIVE-818

Re: Hive and thrift session help

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Sep 8, 2009 at 6:37 PM, Vijay<te...@gmail.com> wrote:
> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at
> the hadoop level? It seems to me may be this is something that needs to be
> built first.
>
> Thanks,
> Vijay
>
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>>
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<rr...@attinteractive.com> wrote:
>> > OK I see. I just looked at the code in HWISessionManager.java.  So it
>> > looks
>> > like either I will have to write my own ruby HWISessionManager that
>> > manages
>> > sessions through thrift or expose the existng HWISessionManager via some
>> > web
>> > service interface.  Has anyone done this?
>> >
>> > Royce
>> >
>> >
>> > On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
>> >
>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>> >>> Sorry to inject into this thread but I have the same problem (only I'm
>> >>> trying to use the thrift PHP libraries from apache-php scripts). The
>> >>> problem
>> >>> with this approach is that the http request cannot run indefinitely as
>> >>> the
>> >>> server is executing a query. Are there any solutions for this?
>> >>>
>> >>> Thanks,
>> >>> Vijay
>> >>>
>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>> >>> <rr...@attinteractive.com>
>> >>> wrote:
>> >>>>
>> >>>> Raghu,
>> >>>> Thanks for the quick response.
>> >>>> Yes.  My application is web based so instead of having to build some
>> >>>> kind
>> >>>> of
>> >>>> session model myself for queries that might take a while,  I'd like
>> >>>> to use
>> >>>> a session model in the hive service.
>> >>>>
>> >>>> Royce
>> >>>>
>> >>>>
>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>> >>>>
>> >>>>> Our model so far has been to create a new connection to the hive
>> >>>>> thrift
>> >>>>> server per session. Is there anything specific you are looking for
>> >>>>> in
>> >>>>> sessions?
>> >>>>>
>> >>>>>
>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> I¹m curently working on an application that connects to hive via
>> >>>>>> the
>> >>>>>> thrift
>> >>>>>> ruby libraries.
>> >>>>>>
>> >>>>>> Does hive support creation of sessions using those libraries.  If
>> >>>>>> so,
>> >>>>>> how?
>> >>>>>>
>> >>>>>>
>> >>>>>> Royce
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >> Royce,
>> >>
>> >> The Hive Web Interface deals with this by having a threaded object
>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
>> >> has any equivalent to threading and Application Scope.
>> >>
>> >> Edward
>> >
>> >
>>
>> Someone correct me if I am wrong.
>>
>> Royce,
>>
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>>
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>>
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>>
>>
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>>
>> One way to tackle this is
>>
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>>
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>>
>> What is the usage pattern you need?
>
>

Vijay,

> What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave.

No. (again someone correct me if I am wrong) As I under, if you
disconnect from the Thrift HiveServer you can not reconnect.

Assuming we punt on intermediate data (large queries with 10 TB of
results waiting for client pickup). There are a few ways we (you)
could handle this.

You could use HWI as a web service. With some URL hacking like
http://hwi:9999/hwi/create_session.jsp?name=bob

This is not a true XML web service, but you could use it to accomplish
your goals.

> After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc

We could write some other XMLRPC style JSP pages that would be a more
formal web service.

Hive Thrift Server could support this directly maybe with alternate
constructors or objects for detached sessions.

In summary
option 1) URL hacking (you have that today, not very clean)
option 2) web service bindings ( you could have that pretty fast, more
clean does not have to touch anything upstream)
option 3) detached sessions HiveServer ( patched HiveServer patched
Hive Bindings, clean,)

Re: Hive and thrift session help

Posted by David Lerman <dl...@videoegg.com>.
I believe what you're looking for is being worked on and tracked here:

http://issues.apache.org/jira/browse/HIVE-80


On 9/8/09 6:37 PM, "Vijay" <te...@gmail.com> wrote:

> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at the
> hadoop level? It seems to me may be this is something that needs to be built
> first.
> 
> Thanks,
> Vijay
> 
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<rr...@attinteractive.com> wrote:
>>>> OK I see. I just looked at the code in HWISessionManager.java.  So it looks
>>>> like either I will have to write my own ruby HWISessionManager that manages
>>>> sessions through thrift or expose the existng HWISessionManager via some >>
>>>> web
>>>> service interface.  Has anyone done this?
>>>> 
>>>> Royce
>>>> 
>>>> 
>>>> On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
>>>> 
>>>>>> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>>>>>>>> Sorry to inject into this thread but I have the same problem (only I'm
>>>>>>>> trying to use the thrift PHP libraries from apache-php scripts). The
>>>>> problem
>>>>>>>> with this approach is that the http request cannot run indefinitely as
>>>>>>>> the
>>>>>>>> server is executing a query. Are there any solutions for this?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>> 
>>>>>>>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>>>>> <rr...@attinteractive.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Raghu,
>>>>>>>>>> Thanks for the quick response.
>>>>>>>>>> Yes.  My application is web based so instead of having to build some
>>>>>>>>>> kind
>>>>>>>>>> of
>>>>>>>>>> session model myself for queries that might take a while,  I'd like
>>>>>>>>>> >>>>> to use
>>>>>>>>>> a session model in the hive service.
>>>>>>>>>> 
>>>>>>>>>> Royce
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>>> Our model so far has been to create a new connection to the hive
>>>>>>>>>>>> >>>>>> thrift
>>>>>>>>>>>> server per session. Is there anything specific you are looking for
>>>>>>>>>>>> in
>>>>>>>>>>>> sessions?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com>
>>>>>>>>>>>> >>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I¹m curently working on an application that connects to hive via
>>>>>>>>>>>> the
>>>>>>>>>>>> thrift
>>>>>>>>>>>> ruby libraries.
>>>>>>>>>>>> 
>>>>>>>>>>>> Does hive support creation of sessions using those libraries.  If
>>>>>>>>>>>> so,
>>>>>>>>>>>> how?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Royce
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> Royce,
>>>>>> 
>>>>>> The Hive Web Interface deals with this by having a threaded object
>>>>>> (HWISessionManager) in the Web application scope. I am not sure if PHP
>>>>>> has any equivalent to threading and Application Scope.
>>>>>> 
>>>>>> Edward
>>>> 
>>>> 
>> 
>> Someone correct me if I am wrong.
>> 
>> Royce,
>> 
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>> 
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>> 
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>> 
>> 
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>> 
>> One way to tackle this is
>> 
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>> 
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>> 
>> What is the usage pattern you need?
> 


Re: Hive and thrift session help

Posted by Royce Rollins <rr...@attinteractive.com>.
I agree with Vijay,
I need some sort of session management service that lives on top of hive.
That way I  can submit a job using an api such as the thrift api.  It would
also be useful to be able to get the
actual job id in hadoop.

Royce


On 9/8/09 3:37 PM, "Vijay" <te...@gmail.com> wrote:

> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at the
> hadoop level? It seems to me may be this is something that needs to be built
> first.
> 
> Thanks,
> Vijay
> 
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<rr...@attinteractive.com> wrote:
>>> > OK I see. I just looked at the code in HWISessionManager.java.  So it
>>> looks
>>> > like either I will have to write my own ruby HWISessionManager that
>>> manages
>>> > sessions through thrift or expose the existng HWISessionManager via some
>>> web
>>> > service interface.  Has anyone done this?
>>> >
>>> > Royce
>>> >
>>> >
>>> > On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
>>> >
>>>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>>>>> >>> Sorry to inject into this thread but I have the same problem (only I'm
>>>>> >>> trying to use the thrift PHP libraries from apache-php scripts). The
>>>>> problem
>>>>> >>> with this approach is that the http request cannot run indefinitely as
the
>>>>> >>> server is executing a query. Are there any solutions for this?
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Vijay
>>>>> >>>
>>>>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>>>>> <rr...@attinteractive.com>
>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> Raghu,
>>>>>> >>>> Thanks for the quick response.
>>>>>> >>>> Yes.  My application is web based so instead of having to build some
kind
>>>>>> >>>> of
>>>>>> >>>> session model myself for queries that might take a while,  I'd like
>>>>>> to use
>>>>>> >>>> a session model in the hive service.
>>>>>> >>>>
>>>>>> >>>> Royce
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>>>>> >>>>
>>>>>>> >>>>> Our model so far has been to create a new connection to the hive
>>>>>>> thrift
>>>>>>> >>>>> server per session. Is there anything specific you are looking for
in
>>>>>>> >>>>> sessions?
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>>> >>>>>> I¹m curently working on an application that connects to hive via
the
>>>>>>>> >>>>>> thrift
>>>>>>>> >>>>>> ruby libraries.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Does hive support creation of sessions using those libraries.
>>>>>>>>  If so,
>>>>>>>> >>>>>> how?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Royce
>>>>>>> >>>>>
>>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>> >>
>>>> >> Royce,
>>>> >>
>>>> >> The Hive Web Interface deals with this by having a threaded object
>>>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
>>>> >> has any equivalent to threading and Application Scope.
>>>> >>
>>>> >> Edward
>>> >
>>> >
>> 
>> Someone correct me if I am wrong.
>> 
>> Royce,
>> 
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>> 
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>> 
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>> 
>> 
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>> 
>> One way to tackle this is
>> 
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>> 
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>> 
>> What is the usage pattern you need?
> 
> 


Re: Hive and thrift session help

Posted by Vijay <te...@gmail.com>.
I get that HWI does manage sessions but it does that leveraging the internal
functionality of the "server." One usage pattern I'd like is some kind of a
"job" API. What I mean by that is an API that lets us simply submit a query,
get some kind of "job id," and leave. After that we use other APIs to query
the job status, kill it, get the output once it is done, etc. If we have a
simple API like this and the semantics to support this within hive, then the
UI can be completely decoupled and be as stateless as it can (using vanilla
apache+php as an example, we can't really do threads or stay resident after
submitting a job). Does something like this exist either within hive or at
the hadoop level? It seems to me may be this is something that needs to be
built first.

Thanks,
Vijay

On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <ed...@gmail.com>wrote:

> On Tue, Sep 8, 2009 at 5:15 PM, Royce
> Rollins<rr...@attinteractive.com> wrote:
> > OK I see. I just looked at the code in HWISessionManager.java.  So it
> looks
> > like either I will have to write my own ruby HWISessionManager that
> manages
> > sessions through thrift or expose the existng HWISessionManager via some
> web
> > service interface.  Has anyone done this?
> >
> > Royce
> >
> >
> > On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
> >
> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
> >>> Sorry to inject into this thread but I have the same problem (only I'm
> >>> trying to use the thrift PHP libraries from apache-php scripts). The
> problem
> >>> with this approach is that the http request cannot run indefinitely as
> the
> >>> server is executing a query. Are there any solutions for this?
> >>>
> >>> Thanks,
> >>> Vijay
> >>>
> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <
> rrollins@attinteractive.com>
> >>> wrote:
> >>>>
> >>>> Raghu,
> >>>> Thanks for the quick response.
> >>>> Yes.  My application is web based so instead of having to build some
> kind
> >>>> of
> >>>> session model myself for queries that might take a while,  I'd like to
> use
> >>>> a session model in the hive service.
> >>>>
> >>>> Royce
> >>>>
> >>>>
> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
> >>>>
> >>>>> Our model so far has been to create a new connection to the hive
> thrift
> >>>>> server per session. Is there anything specific you are looking for in
> >>>>> sessions?
> >>>>>
> >>>>>
> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com>
> wrote:
> >>>>>
> >>>>>> I¹m curently working on an application that connects to hive via the
> >>>>>> thrift
> >>>>>> ruby libraries.
> >>>>>>
> >>>>>> Does hive support creation of sessions using those libraries.  If
> so,
> >>>>>> how?
> >>>>>>
> >>>>>>
> >>>>>> Royce
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >> Royce,
> >>
> >> The Hive Web Interface deals with this by having a threaded object
> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
> >> has any equivalent to threading and Application Scope.
> >>
> >> Edward
> >
> >
>
> Someone correct me if I am wrong.
>
> Royce,
>
> You may be able to get at this another way. From my understanding, the
> internal hive web interface used at facebook would spawn ` bin/hive -e
> 'INSERT INTO X select * FROM`. All results were written to a hive
> table.
>
> Doing it this way gives you no way to interact with the query and
> 'stream' the result, set you can't really use 'fetchOne()' or
> 'fetchAll()' but you could start a query and set flags on completion.
>
> As for web interface, we just had some talks, and one of the things I
> was looking to do was create some type of web service style bindings.
> (We would also like to have HWI talk to Thrift and have thrift be the
> code path for everything). However, if we do make some web server
> style bindings they would really be independent of the back end. Do
> you want to work on this ? I would like to open a Jira and tackle the
> issue.
>
>
> The big picture here is that we need a 'state holder'. That is really
> what HWI is. You create a session, detach from it, and optionally
> check on it later. If an application needs that pattern how to handle
> it?
>
> One way to tackle this is
>
> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>
> then have your client 'tail' the hdfs://path/to/file or record the
> last position it saw. I guess the big question is dealing with
> streaming results. HWI manages the session for you and writes the
> results to a local file, (and the new SessionBucket
>
> What is the usage pattern you need?
>

Re: Hive and thrift session help

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Sep 8, 2009 at 5:15 PM, Royce
Rollins<rr...@attinteractive.com> wrote:
> OK I see. I just looked at the code in HWISessionManager.java.  So it looks
> like either I will have to write my own ruby HWISessionManager that manages
> sessions through thrift or expose the existng HWISessionManager via some web
> service interface.  Has anyone done this?
>
> Royce
>
>
> On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:
>
>> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>>> Sorry to inject into this thread but I have the same problem (only I'm
>>> trying to use the thrift PHP libraries from apache-php scripts). The problem
>>> with this approach is that the http request cannot run indefinitely as the
>>> server is executing a query. Are there any solutions for this?
>>>
>>> Thanks,
>>> Vijay
>>>
>>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <rr...@attinteractive.com>
>>> wrote:
>>>>
>>>> Raghu,
>>>> Thanks for the quick response.
>>>> Yes.  My application is web based so instead of having to build some kind
>>>> of
>>>> session model myself for queries that might take a while,  I'd like to use
>>>> a session model in the hive service.
>>>>
>>>> Royce
>>>>
>>>>
>>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>>>
>>>>> Our model so far has been to create a new connection to the hive thrift
>>>>> server per session. Is there anything specific you are looking for in
>>>>> sessions?
>>>>>
>>>>>
>>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:
>>>>>
>>>>>> I¹m curently working on an application that connects to hive via the
>>>>>> thrift
>>>>>> ruby libraries.
>>>>>>
>>>>>> Does hive support creation of sessions using those libraries.  If so,
>>>>>> how?
>>>>>>
>>>>>>
>>>>>> Royce
>>>>>
>>>>
>>>
>>>
>>
>> Royce,
>>
>> The Hive Web Interface deals with this by having a threaded object
>> (HWISessionManager) in the Web application scope. I am not sure if PHP
>> has any equivalent to threading and Application Scope.
>>
>> Edward
>
>

Someone correct me if I am wrong.

Royce,

You may be able to get at this another way. From my understanding, the
internal hive web interface used at facebook would spawn ` bin/hive -e
'INSERT INTO X select * FROM`. All results were written to a hive
table.

Doing it this way gives you no way to interact with the query and
'stream' the result, set you can't really use 'fetchOne()' or
'fetchAll()' but you could start a query and set flags on completion.

As for web interface, we just had some talks, and one of the things I
was looking to do was create some type of web service style bindings.
(We would also like to have HWI talk to Thrift and have thrift be the
code path for everything). However, if we do make some web server
style bindings they would really be independent of the back end. Do
you want to work on this ? I would like to open a Jira and tackle the
issue.


The big picture here is that we need a 'state holder'. That is really
what HWI is. You create a session, detach from it, and optionally
check on it later. If an application needs that pattern how to handle
it?

One way to tackle this is

INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &

then have your client 'tail' the hdfs://path/to/file or record the
last position it saw. I guess the big question is dealing with
streaming results. HWI manages the session for you and writes the
results to a local file, (and the new SessionBucket

What is the usage pattern you need?

Re: Hive and thrift session help

Posted by Royce Rollins <rr...@attinteractive.com>.
OK I see. I just looked at the code in HWISessionManager.java.  So it looks
like either I will have to write my own ruby HWISessionManager that manages
sessions through thrift or expose the existng HWISessionManager via some web
service interface.  Has anyone done this?

Royce


On 9/8/09 1:47 PM, "Edward Capriolo" <ed...@gmail.com> wrote:

> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
>> Sorry to inject into this thread but I have the same problem (only I'm
>> trying to use the thrift PHP libraries from apache-php scripts). The problem
>> with this approach is that the http request cannot run indefinitely as the
>> server is executing a query. Are there any solutions for this?
>> 
>> Thanks,
>> Vijay
>> 
>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <rr...@attinteractive.com>
>> wrote:
>>> 
>>> Raghu,
>>> Thanks for the quick response.
>>> Yes.  My application is web based so instead of having to build some kind
>>> of
>>> session model myself for queries that might take a while,  I'd like to use
>>> a session model in the hive service.
>>> 
>>> Royce
>>> 
>>> 
>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>> 
>>>> Our model so far has been to create a new connection to the hive thrift
>>>> server per session. Is there anything specific you are looking for in
>>>> sessions?
>>>> 
>>>> 
>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:
>>>> 
>>>>> I¹m curently working on an application that connects to hive via the
>>>>> thrift
>>>>> ruby libraries.
>>>>> 
>>>>> Does hive support creation of sessions using those libraries.  If so,
>>>>> how?
>>>>> 
>>>>> 
>>>>> Royce
>>>> 
>>> 
>> 
>> 
> 
> Royce,
> 
> The Hive Web Interface deals with this by having a threaded object
> (HWISessionManager) in the Web application scope. I am not sure if PHP
> has any equivalent to threading and Application Scope.
> 
> Edward


Re: Hive and thrift session help

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Sep 8, 2009 at 4:38 PM, Vijay<te...@gmail.com> wrote:
> Sorry to inject into this thread but I have the same problem (only I'm
> trying to use the thrift PHP libraries from apache-php scripts). The problem
> with this approach is that the http request cannot run indefinitely as the
> server is executing a query. Are there any solutions for this?
>
> Thanks,
> Vijay
>
> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <rr...@attinteractive.com>
> wrote:
>>
>> Raghu,
>> Thanks for the quick response.
>> Yes.  My application is web based so instead of having to build some kind
>> of
>> session model myself for queries that might take a while,  I'd like to use
>> a session model in the hive service.
>>
>> Royce
>>
>>
>> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>>
>> > Our model so far has been to create a new connection to the hive thrift
>> > server per session. Is there anything specific you are looking for in
>> > sessions?
>> >
>> >
>> > On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:
>> >
>> >> I¹m curently working on an application that connects to hive via the
>> >> thrift
>> >> ruby libraries.
>> >>
>> >> Does hive support creation of sessions using those libraries.  If so,
>> >> how?
>> >>
>> >>
>> >> Royce
>> >
>>
>
>

Royce,

The Hive Web Interface deals with this by having a threaded object
(HWISessionManager) in the Web application scope. I am not sure if PHP
has any equivalent to threading and Application Scope.

Edward

Re: Hive and thrift session help

Posted by Vijay <te...@gmail.com>.
Sorry to inject into this thread but I have the same problem (only I'm
trying to use the thrift PHP libraries from apache-php scripts). The problem
with this approach is that the http request cannot run indefinitely as the
server is executing a query. Are there any solutions for this?

Thanks,
Vijay

On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
<rr...@attinteractive.com>wrote:

> Raghu,
> Thanks for the quick response.
> Yes.  My application is web based so instead of having to build some kind
> of
> session model myself for queries that might take a while,  I'd like to use
> a session model in the hive service.
>
> Royce
>
>
> On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
>
> > Our model so far has been to create a new connection to the hive thrift
> > server per session. Is there anything specific you are looking for in
> > sessions?
> >
> >
> > On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:
> >
> >> I¹m curently working on an application that connects to hive via the
> thrift
> >> ruby libraries.
> >>
> >> Does hive support creation of sessions using those libraries.  If so,
> how?
> >>
> >>
> >> Royce
> >
>
>

Re: Hive and thrift session help

Posted by Royce Rollins <rr...@attinteractive.com>.
Raghu,
Thanks for the quick response.
Yes.  My application is web based so instead of having to build some kind of
session model myself for queries that might take a while,  I'd like to use
a session model in the hive service.

Royce


On 9/8/09 1:32 PM, "Raghu Murthy" <rm...@facebook.com> wrote:

> Our model so far has been to create a new connection to the hive thrift
> server per session. Is there anything specific you are looking for in
> sessions?
> 
> 
> On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:
> 
>> I¹m curently working on an application that connects to hive via the thrift
>> ruby libraries.
>> 
>> Does hive support creation of sessions using those libraries.  If so, how?
>> 
>> 
>> Royce
> 


Re: Hive and thrift session help

Posted by Raghu Murthy <rm...@facebook.com>.
Our model so far has been to create a new connection to the hive thrift
server per session. Is there anything specific you are looking for in
sessions?


On 9/8/09 1:06 PM, "Royce Rollins" <rr...@attinteractive.com> wrote:

> I¹m curently working on an application that connects to hive via the thrift
> ruby libraries.
> 
> Does hive support creation of sessions using those libraries.  If so, how?
> 
> 
> Royce