You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Stan Rosenberg <sr...@proclivitysystems.com> on 2011/11/14 16:44:28 UTC

hive queries from pig

Hi,

We are trying to brainstorm on how best to integrate hive queries into
pig.  All suggestions are greatly appreciated!

Note, we are trying to use hcatalog but there are a couple of problems
with that approach.
We also considered using jython to communicate with a thrift server
but jython seems very dated and does not
have support for thrift as well as other useful modules.

Thanks,

stan

Re: hive queries from pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Oh so "sh" does this then right?
Make sure you 'exec" before running the "sh" command (that way you
ensure the store finished before the sh is executed)

D

On Mon, Nov 14, 2011 at 2:47 PM, Stan Rosenberg
<sr...@proclivitysystems.com> wrote:
> On Mon, Nov 14, 2011 at 5:30 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>> If you manually create the hive table + partitions to match the format
>> Pig writes things in, it should just work.
>
> Hive table already exists.  However, we don't want to write directly
> into its warehouse location because it may result in a data race.
> Instead we output to a temp (partitioned) location.  Then, we
> (atomically) move the data into hive's warehouse. Hive's load command
> already implements
> the last step.
>
>> For your second question:
>> grunt> sh echo foo
>> foo
>>
>
> Feel like an idiot; the answer should have been RTFM. :)
>

Re: hive queries from pig

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
On Mon, Nov 14, 2011 at 5:30 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> If you manually create the hive table + partitions to match the format
> Pig writes things in, it should just work.

Hive table already exists.  However, we don't want to write directly
into its warehouse location because it may result in a data race.
Instead we output to a temp (partitioned) location.  Then, we
(atomically) move the data into hive's warehouse. Hive's load command
already implements
the last step.

> For your second question:
> grunt> sh echo foo
> foo
>

Feel like an idiot; the answer should have been RTFM. :)

Re: hive queries from pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
If you manually create the hive table + partitions to match the format
Pig writes things in, it should just work.  HCatalog is about doing
the deep integration; if you want deeper integration than just
matching up formats and metadata, you will pretty much wind up
rewriting HCat...
For your second question:
grunt> sh echo foo
foo

D

On Mon, Nov 14, 2011 at 1:03 PM, Stan Rosenberg
<sr...@proclivitysystems.com> wrote:
> On Mon, Nov 14, 2011 at 3:08 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>> My lack of imagination is showing -- can you explain what you mean by
>> integrating hive queries with pig,
>
> For example, we implemented a storage function which creates path
> partitioning based on a given sequence of columns; the output is
> stored in a temporary hdfs location.  Subsequent to the pig 'store'
> command we'd like to execute the hive 'load' command.
>
>> and what the problems with hcatalog are?
>
> One of them is the version requirement.
>
>>
>> For thrift, you might want to check jruby integration
>> (https://issues.apache.org/jira/browse/PIG-2317)
>>
>
> Thanks, but we'd like to limit the language choices to either java or
> python.  Btw, is there some plan to have a shell-execute command in
> pig? E.g., fs -exec "java Foo"
>

Re: hive queries from pig

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
On Mon, Nov 14, 2011 at 3:08 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> My lack of imagination is showing -- can you explain what you mean by
> integrating hive queries with pig,

For example, we implemented a storage function which creates path
partitioning based on a given sequence of columns; the output is
stored in a temporary hdfs location.  Subsequent to the pig 'store'
command we'd like to execute the hive 'load' command.

> and what the problems with hcatalog are?

One of them is the version requirement.

>
> For thrift, you might want to check jruby integration
> (https://issues.apache.org/jira/browse/PIG-2317)
>

Thanks, but we'd like to limit the language choices to either java or
python.  Btw, is there some plan to have a shell-execute command in
pig? E.g., fs -exec "java Foo"

Re: hive queries from pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
My lack of imagination is showing -- can you explain what you mean by
integrating hive queries with pig, and what the problems with hcatalog
are?

For thrift, you might want to check jruby integration
(https://issues.apache.org/jira/browse/PIG-2317)

-Dmitriy

On Mon, Nov 14, 2011 at 7:44 AM, Stan Rosenberg
<sr...@proclivitysystems.com> wrote:
> Hi,
>
> We are trying to brainstorm on how best to integrate hive queries into
> pig.  All suggestions are greatly appreciated!
>
> Note, we are trying to use hcatalog but there are a couple of problems
> with that approach.
> We also considered using jython to communicate with a thrift server
> but jython seems very dated and does not
> have support for thrift as well as other useful modules.
>
> Thanks,
>
> stan
>