You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/01/13 09:38:54 UTC

[jira] Created: (HIVE-1050) Reduce the memory foot-print of HiveInputSplit

Reduce the memory foot-print of HiveInputSplit
----------------------------------------------

                 Key: HIVE-1050
                 URL: https://issues.apache.org/jira/browse/HIVE-1050
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


{{HiveInputSplit}} now inherits from {{FileSplit}} just because we want {{MapTask}} to forward the file name of the mapper:
This makes {{HiveInputSplit}} big. See MAPREDUCE-1374

{code}
  private void updateJobWithSplit(final JobConf job, InputSplit inputSplit) {
    if (inputSplit instanceof FileSplit) {
      FileSplit fileSplit = (FileSplit) inputSplit;
      job.set("map.input.file", fileSplit.getPath().toString());
      job.setLong("map.input.start", fileSplit.getStart());
      job.setLong("map.input.length", fileSplit.getLength());
      LOG.info("split: " + job.get("map.input.file")+", range: "
               + job.getLong("map.input.start", 0) + "-"
               + job.getLong("map.input.length", 0));
    }
  }

{code}

Once we move to the new MapReduce framework, we should be able to make smaller HiveInputFormat which will reduce the amount of memory needed on {{JobClient}}.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.