You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Eric Wang (JIRA)" <ji...@apache.org> on 2015/03/12 10:55:38 UTC

[jira] [Created] (HIVE-9940) The standard output of Python reduce script can not be interpreted correctly by Hive

Eric Wang created HIVE-9940:
-------------------------------

             Summary: The standard output of Python reduce script can not be interpreted correctly by Hive
                 Key: HIVE-9940
                 URL: https://issues.apache.org/jira/browse/HIVE-9940
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Eric Wang


use HQL statement like:
FROM (
  select_statement
  ) map_output
INSERT OVERWRITE TABLE table
  REDUCE map_output.a, map_output.b
  USING 'py_script'
  AS col1, col2;

(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500	Meerjungfrau	25	AO DE	20140704
...

Hive interprets these as:
527500	Meer	<null>	AO DE	20140704
...

stderr_log interprets these as:
527500	Meerjungfrau	25	AO DE	20140704

(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500	bug	25	AO DE	20140704
...

Hive interprets these as:
527500	b	<null>	AO DE	20140704
...

stderr_log interprets these as:
527500	bug	25	AO DE	20140704

(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500	25	AO DE	20140704	Meerjungfrau
...

Hive interprets these as:
527500	25	<null>	20140704	Meerjungfrau
...

stderr_log interprets these as:
527500	25	AO DE	20140704	Meerjungfrau



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)