You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vijay <te...@gmail.com> on 2011/02/16 23:07:14 UTC
Question about Transform and M/R scripts
Hi,
I'm trying this use case: do a simple select from an existing table
and pass the results through a reduce script to do some analysis. The
table has web logs so the select uses a pseudo user ID as the key and
the rest of the data as values. My expectation is that a single reduce
script should receive all logs for a given user so that I can do some
path based analysis. Are there any issues with this idea so far?
When I try it though, hive is not doing what I'd expect. The
particular query is not generating any reduce tasks at all. Here's a
sample query:
FROM(
SELECT userid, time, url
FROM weblogs
) weblogs
reduce weblogs.userid, weblogs.time, weblogs.url
using 'counter.pl'
as user, count;
Thanks,
Vijay
Re: Question about Transform and M/R scripts
Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Feb 16, 2011 at 5:07 PM, Vijay <te...@gmail.com> wrote:
> Hi,
>
> I'm trying this use case: do a simple select from an existing table
> and pass the results through a reduce script to do some analysis. The
> table has web logs so the select uses a pseudo user ID as the key and
> the rest of the data as values. My expectation is that a single reduce
> script should receive all logs for a given user so that I can do some
> path based analysis. Are there any issues with this idea so far?
>
> When I try it though, hive is not doing what I'd expect. The
> particular query is not generating any reduce tasks at all. Here's a
> sample query:
>
> FROM(
> SELECT userid, time, url
> FROM weblogs
> ) weblogs
> reduce weblogs.userid, weblogs.time, weblogs.url
> using 'counter.pl'
> as user, count;
>
> Thanks,
> Vijay
>
It is hard to tell without the script. Is your pl script working on pipes?
ie.
while (<in>){
echo $_
}
Re: Question about Transform and M/R scripts
Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Feb 16, 2011 at 5:07 PM, Vijay <te...@gmail.com> wrote:
> Hi,
>
> I'm trying this use case: do a simple select from an existing table
> and pass the results through a reduce script to do some analysis. The
> table has web logs so the select uses a pseudo user ID as the key and
> the rest of the data as values. My expectation is that a single reduce
> script should receive all logs for a given user so that I can do some
> path based analysis. Are there any issues with this idea so far?
>
> When I try it though, hive is not doing what I'd expect. The
> particular query is not generating any reduce tasks at all. Here's a
> sample query:
>
> FROM(
> SELECT userid, time, url
> FROM weblogs
> ) weblogs
> reduce weblogs.userid, weblogs.time, weblogs.url
> using 'counter.pl'
> as user, count;
>
> Thanks,
> Vijay
>
It is hard to tell without the script. Is your pl script working on pipes?
ie.
while (<in>){
echo $_
}