You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by venkata ramanaiah anneboina <av...@gmail.com> on 2009/07/14 07:49:13 UTC

how to find the key list in Map datatype in pig

Hi,
 I want to know how to get the list of keys in a map,and list of keys
in a set of records whiich contain map type;
 how to find the count of each key by grouping the key

for example
 i have the data as fallows as data.txt

[open#apache,name#raman]
[apache#hadoop]
[product#tv,price#5000]
[product#computer,price#5000]
[name#raman,desg#manager]
[name#krishana,desg#developer]
[name#prakash]
[product#vcp]
[price#2300]

and the i have loaded like this

A = load 'data.txt' as (in: map[]);

output result should be like this;

apache      1
open         1
name        4
product    3
price         3
desg         2


that is i want to find the count of each keys in the map

Please can any one help on this

thanks
ramnaiah

Re: Issue implementing PIG-573

Posted by Dmitriy Ryaboy <dv...@cloudera.com>.
Chad, good catch -- go ahead and attach a regenerated patch to the Jira issue.

-D

On Tue, Jul 14, 2009 at 11:44 AM, Naber, Chad<CN...@edmunds.com> wrote:
> The email was incorrectly formatted.  Here are the lines that need to change:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
>
>
>
> needs to be set to:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
>
>
>
> -----Original Message-----
> From: Naber, Chad [mailto:CNaber@edmunds.com]
> Sent: Tuesday, July 14, 2009 11:39 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: Issue implementing PIG-573
>
>
>
> Hi folks,
>
>
>
> I found the problem, and I think it should be added to the PIG-573 patch.
>
>
>
> In the /bin/pig script, the line
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
>
>
>
> needs to be set to:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
>
>
>
> for grunt to be successfully called from the script.
>
>
>
> Let me know what you think.
>
>
>
> Chad
>
>
>
> From: Naber, Chad
>
> Sent: Tuesday, July 14, 2009 10:02 AM
>
> To: Naber, Chad; 'pig-user@hadoop.apache.org'
>
> Subject: Issue implementing PIG-573
>
>
>
>
>
> Hi folks,
>
>
>
>
>
>
>
> I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue.  I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error.   It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.
>
>
>
>
>
>
>
> Are there any files I need to move other than the new pig.jar, or settings I need to change?
>
>
>
>
>
>
>
> Here is the error:
>
>
>
>
>
>
>
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol
>
>
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)
>
>
>
>        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>
>
>
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>
>
>        at java.lang.reflect.Method.invoke(Method.java:597)
>
>
>
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>
>
>
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>
>
>
>
>
>
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:715)
>
>
>
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>
>
>
>        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
>
>
>
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>
>
>
>        at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
>
>
>
>
>
>
>
> Here is a listing of the new jar.  The old jar is out of the PIG_CLASSPATH:
>
>
>
>
>
>
>
> []$ jar tvf pig.jar|grep ClientProtocol
>
>
>
> 2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class
>
>
>
>
>
>
>
>
>
>
>
> Thanks a ton,
>
>
>
> Chad
>

RE: Issue implementing PIG-573

Posted by "Naber, Chad" <CN...@edmunds.com>.
The email was incorrectly formatted.  Here are the lines that need to change:



# Set the version for Hadoop, default to 17



PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"



needs to be set to:



# Set the version for Hadoop, default to 17



PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"



-----Original Message-----
From: Naber, Chad [mailto:CNaber@edmunds.com]
Sent: Tuesday, July 14, 2009 11:39 AM
To: pig-user@hadoop.apache.org
Subject: RE: Issue implementing PIG-573



Hi folks,



I found the problem, and I think it should be added to the PIG-573 patch.



In the /bin/pig script, the line



# Set the version for Hadoop, default to 17



PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"



needs to be set to:



# Set the version for Hadoop, default to 17



PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"



for grunt to be successfully called from the script.



Let me know what you think.



Chad



From: Naber, Chad

Sent: Tuesday, July 14, 2009 10:02 AM

To: Naber, Chad; 'pig-user@hadoop.apache.org'

Subject: Issue implementing PIG-573





Hi folks,







I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue.  I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error.   It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.







Are there any files I need to move other than the new pig.jar, or settings I need to change?







Here is the error:







Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol



        at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)



        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)



        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)



        at java.lang.reflect.Method.invoke(Method.java:597)



        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)



        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)







        at org.apache.hadoop.ipc.Client.call(Client.java:715)



        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)



        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)



        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)



        at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)







Here is a listing of the new jar.  The old jar is out of the PIG_CLASSPATH:







[]$ jar tvf pig.jar|grep ClientProtocol



2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class











Thanks a ton,



Chad

RE: Issue implementing PIG-573

Posted by "Naber, Chad" <CN...@edmunds.com>.
Hi folks,

I found the problem, and I think it should be added to the PIG-573 patch.

In the /bin/pig script, the line

# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"

needs to be set to:

# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"

for grunt to be successfully called from the script.

Let me know what you think.

Chad

From: Naber, Chad
Sent: Tuesday, July 14, 2009 10:02 AM
To: Naber, Chad; 'pig-user@hadoop.apache.org'
Subject: Issue implementing PIG-573


Hi folks,



I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue.  I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error.   It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.



Are there any files I need to move other than the new pig.jar, or settings I need to change?



Here is the error:



Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol

        at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)

        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)



        at org.apache.hadoop.ipc.Client.call(Client.java:715)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)

        at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)



Here is a listing of the new jar.  The old jar is out of the PIG_CLASSPATH:



[]$ jar tvf pig.jar|grep ClientProtocol

2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class





Thanks a ton,

Chad

Issue implementing PIG-573

Posted by "Naber, Chad" <CN...@edmunds.com>.
Hi folks,



I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue.  I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error.   It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.



Are there any files I need to move other than the new pig.jar, or settings I need to change?



Here is the error:



Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol

        at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)

        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)



        at org.apache.hadoop.ipc.Client.call(Client.java:715)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)

        at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)



Here is a listing of the new jar.  The old jar is out of the PIG_CLASSPATH:



[]$ jar tvf pig.jar|grep ClientProtocol

2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class





Thanks a ton,

Chad

RE: how to find the key list in Map datatype in pig

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
There is no way to do this natively in Pig. You will need to write a UDF
that converts each map to a bag of tuples with the key and the value as
the fields. Then the following code would work:

A = load 'data.txt' as (in: map[]);
B = foreach A generate flatten(MapToBag(in)) as (key, val);
C = group B by key;
D = foreach C generate group, COUNT(B);
....

Olga


-----Original Message-----
From: venkata ramanaiah anneboina [mailto:avryadav@gmail.com] 
Sent: Monday, July 13, 2009 10:49 PM
To: pig-user@hadoop.apache.org
Subject: how to find the key list in Map datatype in pig

Hi,
 I want to know how to get the list of keys in a map,and list of keys
in a set of records whiich contain map type;
 how to find the count of each key by grouping the key

for example
 i have the data as fallows as data.txt

[open#apache,name#raman]
[apache#hadoop]
[product#tv,price#5000]
[product#computer,price#5000]
[name#raman,desg#manager]
[name#krishana,desg#developer]
[name#prakash]
[product#vcp]
[price#2300]

and the i have loaded like this

A = load 'data.txt' as (in: map[]);

output result should be like this;

apache      1
open         1
name        4
product    3
price         3
desg         2


that is i want to find the count of each keys in the map

Please can any one help on this

thanks
ramnaiah