You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by michal shmueli <mi...@gmail.com> on 2009/09/07 08:56:01 UTC
Python output into Hive Table MAP ?
Hi,
We have 2 tables that need to be merged into third table (TableMerged).
TableOld :CREATE TABLE TableOld (userid STRING, terms MAP<STRING, DOUBLE>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY
':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;
TableNew: CREATE TABLE TableNew (userid STRING, terms MAP<STRING, DOUBLE>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY
':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;
TableMerged: CREATE TABLE TableMerged (userid STRING, terms MAP<STRING,
DOUBLE>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS
TERMINATED BY ':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;
The merge is done by join on the userid as below:
INSERT OVERWRITE TABLE TableMerged
SELECT u.userid, u.terms
FROM ( FROM TableOld old JOIN TableNew new ON (old.userid = new.userid)
MAP old.userid, old.terms, new.terms
USING 'python merge.py'
AS userid, terms) u;
The part that generates the output of the script merge.py:
for line in sys.stdin:
user, oldTerms, newTerms = line.strip().split('\t')
oldTerms, newTerms = eval(oldTerms), eval(newTerms)
sys.stderr.write('Old terms = ' + str(oldTerms) + '\n')
sys.stderr.write('New data = ' + str(newTerms) + '\n')
sys.stderr.write('Profile = ' +
str(merge_profile(oldTerms, newTerms)) + '\n')
sys.stdout.write(user + '\t' + str(merge_profile(oldTerms,
newTerms)) + '\n')
However, I'm getting this error:
09/09/06 07:58:47 ERROR ql.Driver: FAILED: Error in semantic analysis:
line 1:23 Cannot insert into target table because column number/types
are different TableMerged: Cannot convert column 1 from string to
map<string,double>.
org.apache.hadoop.hive.ql.parse.SemanticException: line 1:23 Cannot
insert into target table because column number/types are different
TableMerged: Cannot convert column 1 from string to
map<string,double>.
So it looks like the output of the script is String and it expect the
map<string,double>.
Looks like something is missing in taking the output from the script to the
table format.
Any suggestions?
thanks,
Michal