You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by michal shmueli <mi...@gmail.com> on 2009/09/07 08:56:01 UTC

Python output into Hive Table MAP ?

Hi,

We have 2 tables that need to be merged into third table (TableMerged).

TableOld :CREATE TABLE TableOld (userid STRING, terms MAP<STRING, DOUBLE>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY
':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;

TableNew: CREATE TABLE TableNew (userid STRING, terms MAP<STRING, DOUBLE>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY
':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;

TableMerged: CREATE TABLE TableMerged (userid STRING, terms MAP<STRING,
DOUBLE>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS
TERMINATED BY ':' MAP KEYS TERMINATED BY '!' STORED AS TEXTFILE;


The merge is done by join on the userid as below:

INSERT OVERWRITE TABLE TableMerged
SELECT u.userid, u.terms
FROM ( FROM TableOld old JOIN TableNew new ON (old.userid = new.userid)
MAP old.userid, old.terms, new.terms
USING 'python merge.py'
AS userid, terms) u;

The part that generates the output of the script merge.py:

for line in sys.stdin:
       user, oldTerms, newTerms = line.strip().split('\t')
       oldTerms, newTerms = eval(oldTerms), eval(newTerms)
       sys.stderr.write('Old terms =   ' + str(oldTerms) + '\n')
       sys.stderr.write('New data =    ' + str(newTerms) + '\n')
       sys.stderr.write('Profile =     ' +
str(merge_profile(oldTerms, newTerms)) + '\n')
       sys.stdout.write(user + '\t' + str(merge_profile(oldTerms,
newTerms)) + '\n')


However, I'm getting this error:

09/09/06 07:58:47 ERROR ql.Driver: FAILED: Error in semantic analysis:
line 1:23 Cannot insert into target table because column number/types
are different TableMerged: Cannot convert column 1 from string to
map<string,double>.
org.apache.hadoop.hive.ql.parse.SemanticException: line 1:23 Cannot
insert into target table because column number/types are different
TableMerged: Cannot convert column 1 from string to
map<string,double>.

So it looks like the output of the script is String and it expect the
map<string,double>.
Looks like something is missing in taking the output from the script to the
table format.

Any suggestions?

thanks,
Michal