You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by venkata ramanaiah anneboina <av...@gmail.com> on 2009/07/14 07:49:13 UTC
how to find the key list in Map datatype in pig
Hi,
I want to know how to get the list of keys in a map,and list of keys
in a set of records whiich contain map type;
how to find the count of each key by grouping the key
for example
i have the data as fallows as data.txt
[open#apache,name#raman]
[apache#hadoop]
[product#tv,price#5000]
[product#computer,price#5000]
[name#raman,desg#manager]
[name#krishana,desg#developer]
[name#prakash]
[product#vcp]
[price#2300]
and the i have loaded like this
A = load 'data.txt' as (in: map[]);
output result should be like this;
apache 1
open 1
name 4
product 3
price 3
desg 2
that is i want to find the count of each keys in the map
Please can any one help on this
thanks
ramnaiah
Re: Issue implementing PIG-573
Posted by Dmitriy Ryaboy <dv...@cloudera.com>.
Chad, good catch -- go ahead and attach a regenerated patch to the Jira issue.
-D
On Tue, Jul 14, 2009 at 11:44 AM, Naber, Chad<CN...@edmunds.com> wrote:
> The email was incorrectly formatted. Here are the lines that need to change:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
>
>
>
> needs to be set to:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
>
>
>
> -----Original Message-----
> From: Naber, Chad [mailto:CNaber@edmunds.com]
> Sent: Tuesday, July 14, 2009 11:39 AM
> To: pig-user@hadoop.apache.org
> Subject: RE: Issue implementing PIG-573
>
>
>
> Hi folks,
>
>
>
> I found the problem, and I think it should be added to the PIG-573 patch.
>
>
>
> In the /bin/pig script, the line
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
>
>
>
> needs to be set to:
>
>
>
> # Set the version for Hadoop, default to 17
>
>
>
> PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
>
>
>
> for grunt to be successfully called from the script.
>
>
>
> Let me know what you think.
>
>
>
> Chad
>
>
>
> From: Naber, Chad
>
> Sent: Tuesday, July 14, 2009 10:02 AM
>
> To: Naber, Chad; 'pig-user@hadoop.apache.org'
>
> Subject: Issue implementing PIG-573
>
>
>
>
>
> Hi folks,
>
>
>
>
>
>
>
> I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue. I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error. It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.
>
>
>
>
>
>
>
> Are there any files I need to move other than the new pig.jar, or settings I need to change?
>
>
>
>
>
>
>
> Here is the error:
>
>
>
>
>
>
>
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol
>
>
>
> at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)
>
>
>
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>
>
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
>
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>
>
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>
>
>
>
>
>
>
> at org.apache.hadoop.ipc.Client.call(Client.java:715)
>
>
>
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>
>
>
> at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
>
>
>
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>
>
>
> at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
>
>
>
>
>
>
>
> Here is a listing of the new jar. The old jar is out of the PIG_CLASSPATH:
>
>
>
>
>
>
>
> []$ jar tvf pig.jar|grep ClientProtocol
>
>
>
> 2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class
>
>
>
>
>
>
>
>
>
>
>
> Thanks a ton,
>
>
>
> Chad
>
RE: Issue implementing PIG-573
Posted by "Naber, Chad" <CN...@edmunds.com>.
The email was incorrectly formatted. Here are the lines that need to change:
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
needs to be set to:
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
-----Original Message-----
From: Naber, Chad [mailto:CNaber@edmunds.com]
Sent: Tuesday, July 14, 2009 11:39 AM
To: pig-user@hadoop.apache.org
Subject: RE: Issue implementing PIG-573
Hi folks,
I found the problem, and I think it should be added to the PIG-573 patch.
In the /bin/pig script, the line
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
needs to be set to:
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
for grunt to be successfully called from the script.
Let me know what you think.
Chad
From: Naber, Chad
Sent: Tuesday, July 14, 2009 10:02 AM
To: Naber, Chad; 'pig-user@hadoop.apache.org'
Subject: Issue implementing PIG-573
Hi folks,
I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue. I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error. It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.
Are there any files I need to move other than the new pig.jar, or settings I need to change?
Here is the error:
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
Here is a listing of the new jar. The old jar is out of the PIG_CLASSPATH:
[]$ jar tvf pig.jar|grep ClientProtocol
2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class
Thanks a ton,
Chad
RE: Issue implementing PIG-573
Posted by "Naber, Chad" <CN...@edmunds.com>.
Hi folks,
I found the problem, and I think it should be added to the PIG-573 patch.
In the /bin/pig script, the line
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-18}"
needs to be set to:
# Set the version for Hadoop, default to 17
PIG_HADOOP_VERSION="${PIG_HADOOP_VERSION:-19}"
for grunt to be successfully called from the script.
Let me know what you think.
Chad
From: Naber, Chad
Sent: Tuesday, July 14, 2009 10:02 AM
To: Naber, Chad; 'pig-user@hadoop.apache.org'
Subject: Issue implementing PIG-573
Hi folks,
I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue. I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error. It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.
Are there any files I need to move other than the new pig.jar, or settings I need to change?
Here is the error:
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
Here is a listing of the new jar. The old jar is out of the PIG_CLASSPATH:
[]$ jar tvf pig.jar|grep ClientProtocol
2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class
Thanks a ton,
Chad
Issue implementing PIG-573
Posted by "Naber, Chad" <CN...@edmunds.com>.
Hi folks,
I recently upgraded to hadoop 19, and of course ran into the PIG-573<https://issues.apache.org/jira/browse/PIG-573> issue. I patched and rebuilt pig.jar and placed it in PIG_CLASSPATH, but I am having the same exact error. It looks like the patch is in the jar, because the jar has hdfs instead of dfs, but I am getting the same exact error.
Are there any files I need to move other than the new pig.jar, or settings I need to change?
Here is the error:
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientProtocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(NameNode.java:98)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
Here is a listing of the new jar. The old jar is out of the PIG_CLASSPATH:
[]$ jar tvf pig.jar|grep ClientProtocol
2852 Fri Nov 14 03:04:34 PST 2008 org/apache/hadoop/hdfs/protocol/ClientProtocol.class
Thanks a ton,
Chad
RE: how to find the key list in Map datatype in pig
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
There is no way to do this natively in Pig. You will need to write a UDF
that converts each map to a bag of tuples with the key and the value as
the fields. Then the following code would work:
A = load 'data.txt' as (in: map[]);
B = foreach A generate flatten(MapToBag(in)) as (key, val);
C = group B by key;
D = foreach C generate group, COUNT(B);
....
Olga
-----Original Message-----
From: venkata ramanaiah anneboina [mailto:avryadav@gmail.com]
Sent: Monday, July 13, 2009 10:49 PM
To: pig-user@hadoop.apache.org
Subject: how to find the key list in Map datatype in pig
Hi,
I want to know how to get the list of keys in a map,and list of keys
in a set of records whiich contain map type;
how to find the count of each key by grouping the key
for example
i have the data as fallows as data.txt
[open#apache,name#raman]
[apache#hadoop]
[product#tv,price#5000]
[product#computer,price#5000]
[name#raman,desg#manager]
[name#krishana,desg#developer]
[name#prakash]
[product#vcp]
[price#2300]
and the i have loaded like this
A = load 'data.txt' as (in: map[]);
output result should be like this;
apache 1
open 1
name 4
product 3
price 3
desg 2
that is i want to find the count of each keys in the map
Please can any one help on this
thanks
ramnaiah