You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Venky Iyer (JIRA)" <ji...@apache.org> on 2008/11/05 10:41:44 UTC
[jira] Created: (HADOOP-4592) Generate and accept JSON as the
input-output format from mappers and reducers
Generate and accept JSON as the input-output format from mappers and reducers
-----------------------------------------------------------------------------
Key: HADOOP-4592
URL: https://issues.apache.org/jira/browse/HADOOP-4592
Project: Hadoop Core
Issue Type: Wish
Components: contrib/hive
Reporter: Venky Iyer
set mapred.data.format=JSON;
....
MAP USING 'python filter.py'
....;
would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.
{ column1: value1, column2: [1,2,3] } etc
It would in turn produce JSON.
This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).
This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of {column name: value} on every record.
Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.