You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Snehal Nagmote <na...@gmail.com> on 2009/08/19 03:44:06 UTC

Re: Hadoop-Archive Error for size of input data >2GB

Hi,

 how to unarchive the logical files from the har file ?Is there anyway t o
unarchive the logical files.



Pratyush Banerjee-2 wrote:
> 
> Hi All,
> 
> I have been using hadoop archives programmatically  to generate  har 
> archives from some logfiles  which are being dumped into the hdfs.
> 
> When the input directory to Hadoop Archiving program has files of size 
> more than 2GB, strangely the archiving fails with a error message saying
> 
> INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=   Illegal Capacity: -1
> 
> Going into the code i found out that this was due to numMaps having the 
> Value of -1.
> 
> As per the code in org.apache.hadoop.util.HadoopArchives: 
> archive(List<Path> srcPaths, String archiveName, Path dest)
> 
> the numMaps is initialized as 
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
> 
> partSize has been statically assigned the value of 2GB in the beginning 
> of the class as,
> 
> static final long partSize = 2 * 1024 * 1024 * 1024
> 
> Strangely enough, the value i find assigned to partSize is  =  -
> 2147483648
> 
> Hence as a result in case of input directories of greater size, numMaps 
> is assigned -1 which leads to the code throwing up error.
> 
> I am using hadoop-0.17.1 and I got the archiving facility after applying 
> the patch hadoop-3307_4 patch.
> 
> This looks like a bug for me, so please let me know how to go about it.
> 
> Pratyush Banerjee
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Hadoop-Archive-Error-for-size-of-input-data-%3E2GB-tp18568129p25036326.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Hadoop-Archive Error for size of input data >2GB

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

>  how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
> 
Opened https://issues.apache.org/jira/browse/MAPREDUCE-883 for documenting,
but the idea is you just need to copy :)


+
+        <section>
+        <title> How to unarchive an archive?</title>
+        <p> Since all the fs shell commands in the archives work
transparently,
+            unarchiving is just a matter of copying </p>
+        <p> To unarchive sequentially:</p>
+        <p><code> hadoop dfs -cp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+        <p> To unarchive in parallel, use distcp: </p>
+          <p><code> hadoop distcp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+        </section>


As for the original email,
somehow I was able to archive files larger than 2G in 0.18.3.
Maybe there's additional condition I'm missing?

Koji     


On 8/18/09 6:44 PM, "Snehal Nagmote" <na...@gmail.com> wrote:

> 
> Hi,
> 
>  how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
> 
> 
> 
> Pratyush Banerjee-2 wrote:
>> 
>> Hi All,
>> 
>> I have been using hadoop archives programmatically  to generate  har
>> archives from some logfiles  which are being dumped into the hdfs.
>> 
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>> 
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=   Illegal Capacity: -1
>> 
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>> 
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>> 
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>> 
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>> 
>> static final long partSize = 2 * 1024 * 1024 * 1024
>> 
>> Strangely enough, the value i find assigned to partSize is  =  -
>> 2147483648
>> 
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>> 
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>> 
>> This looks like a bug for me, so please let me know how to go about it.
>> 
>> Pratyush Banerjee
>> 
>> 
>>