You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Rajesh Srinivasan <ra...@outlook.com> on 2012/11/15 04:31:22 UTC

Logarithm functions

Hi,

Can you hep me with the syntax of the natural logarithm (base e) of an expression  in Pig? According to Help, the syntax is LOG(expression).

I am trying to basically perform the following query:

select server, processor, area, log(server_time)/log(2) as LogGroup, count(*) as users, sum(server_time) as group_time, sum(server_cnt) as group_cnt 
from Table_reqd 
group by 1, 2, 3, 4

My script is like:
--Load
L = LOAD '/user/RS/serverdata1030' AS (
        server:chararray,
        processor:chararray,
        area:chararray,
        server_time:int,
        server_cnt:int,
        
);


--Group After loading data

A = group a by (server,processor,area,(double)LOG((server_time)+1) as LogGroup);

-- Generate Counts and Sums
B= foreach A generate group,(long) COUNT(reqd)as Users,(long) SUM(reqd.server_time)as time,(long) SUM(reqd.server_cnt)as count;

Store B into 'data';


The job fails and I get 'ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message.'

I tried changing the datatypes etc but to no avail. Any thoughts on the correct syntax? Thanks.
 		 	   		  

Re: Logarithm functions

Posted by Prashant Kommireddi <pr...@gmail.com>.
You can do a "log n base 2" operation similar to your SQL query

A = foreach L generate *, LOG(server_time)/LOG(2) as LogGroup;
B = group A by (server,processor,area, LogGroup);


On Wed, Nov 14, 2012 at 7:31 PM, Rajesh Srinivasan <ra...@outlook.com>wrote:

> Hi,
>
> Can you hep me with the syntax of the natural logarithm (base e) of an
> expression  in Pig? According to Help, the syntax is LOG(expression).
>
> I am trying to basically perform the following query:
>
> select server, processor, area, log(server_time)/log(2) as LogGroup,
> count(*) as users, sum(server_time) as group_time, sum(server_cnt) as
> group_cnt
> from Table_reqd
> group by 1, 2, 3, 4
>
> My script is like:
> --Load
> L = LOAD '/user/RS/serverdata1030' AS (
>         server:chararray,
>         processor:chararray,
>         area:chararray,
>         server_time:int,
>         server_cnt:int,
>
> );
>
>
> --Group After loading data
>
> A = group a by (server,processor,area,(double)LOG((server_time)+1) as
> LogGroup);
>
> -- Generate Counts and Sums
> B= foreach A generate group,(long) COUNT(reqd)as Users,(long)
> SUM(reqd.server_time)as time,(long) SUM(reqd.server_cnt)as count;
>
> Store B into 'data';
>
>
> The job fails and I get 'ERROR org.apache.pig.tools.grunt.GruntParser -
> ERROR 2244: Job failed, hadoop does not return any error message.'
>
> I tried changing the datatypes etc but to no avail. Any thoughts on the
> correct syntax? Thanks.
>