You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Snehal Nagmote <na...@gmail.com> on 2009/08/19 03:44:06 UTC
Re: Hadoop-Archive Error for size of input data >2GB
Hi,
how to unarchive the logical files from the har file ?Is there anyway t o
unarchive the logical files.
Pratyush Banerjee-2 wrote:
>
> Hi All,
>
> I have been using hadoop archives programmatically to generate har
> archives from some logfiles which are being dumped into the hdfs.
>
> When the input directory to Hadoop Archiving program has files of size
> more than 2GB, strangely the archiving fails with a error message saying
>
> INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId= Illegal Capacity: -1
>
> Going into the code i found out that this was due to numMaps having the
> Value of -1.
>
> As per the code in org.apache.hadoop.util.HadoopArchives:
> archive(List<Path> srcPaths, String archiveName, Path dest)
>
> the numMaps is initialized as
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>
> partSize has been statically assigned the value of 2GB in the beginning
> of the class as,
>
> static final long partSize = 2 * 1024 * 1024 * 1024
>
> Strangely enough, the value i find assigned to partSize is = -
> 2147483648
>
> Hence as a result in case of input directories of greater size, numMaps
> is assigned -1 which leads to the code throwing up error.
>
> I am using hadoop-0.17.1 and I got the archiving facility after applying
> the patch hadoop-3307_4 patch.
>
> This looks like a bug for me, so please let me know how to go about it.
>
> Pratyush Banerjee
>
>
>
--
View this message in context: http://www.nabble.com/Hadoop-Archive-Error-for-size-of-input-data-%3E2GB-tp18568129p25036326.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop-Archive Error for size of input data >2GB
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
> how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
>
Opened https://issues.apache.org/jira/browse/MAPREDUCE-883 for documenting,
but the idea is you just need to copy :)
+
+ <section>
+ <title> How to unarchive an archive?</title>
+ <p> Since all the fs shell commands in the archives work
transparently,
+ unarchiving is just a matter of copying </p>
+ <p> To unarchive sequentially:</p>
+ <p><code> hadoop dfs -cp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+ <p> To unarchive in parallel, use distcp: </p>
+ <p><code> hadoop distcp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+ </section>
As for the original email,
somehow I was able to archive files larger than 2G in 0.18.3.
Maybe there's additional condition I'm missing?
Koji
On 8/18/09 6:44 PM, "Snehal Nagmote" <na...@gmail.com> wrote:
>
> Hi,
>
> how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
>
>
>
> Pratyush Banerjee-2 wrote:
>>
>> Hi All,
>>
>> I have been using hadoop archives programmatically to generate har
>> archives from some logfiles which are being dumped into the hdfs.
>>
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>>
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId= Illegal Capacity: -1
>>
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>>
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>>
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>>
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>>
>> static final long partSize = 2 * 1024 * 1024 * 1024
>>
>> Strangely enough, the value i find assigned to partSize is = -
>> 2147483648
>>
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>>
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>>
>> This looks like a bug for me, so please let me know how to go about it.
>>
>> Pratyush Banerjee
>>
>>
>>