You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pratyush Banerjee <pr...@aol.com> on 2008/07/21 14:56:37 UTC

Hadoop-Archive Error for size of input data >2GB

Hi All,

I have been using hadoop archives programmatically  to generate  har 
archives from some logfiles  which are being dumped into the hdfs.

When the input directory to Hadoop Archiving program has files of size 
more than 2GB, strangely the archiving fails with a error message saying

INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=   Illegal Capacity: -1

Going into the code i found out that this was due to numMaps having the 
Value of -1.

As per the code in org.apache.hadoop.util.HadoopArchives: 
archive(List<Path> srcPaths, String archiveName, Path dest)

the numMaps is initialized as 
int numMaps = (int)(totalSize/partSize);
//run atleast one map.
conf.setNumMapTasks(numMaps == 0? 1:numMaps);

partSize has been statically assigned the value of 2GB in the beginning 
of the class as,

static final long partSize = 2 * 1024 * 1024 * 1024

Strangely enough, the value i find assigned to partSize is  =  - 2147483648

Hence as a result in case of input directories of greater size, numMaps 
is assigned -1 which leads to the code throwing up error.

I am using hadoop-0.17.1 and I got the archiving facility after applying 
the patch hadoop-3307_4 patch.

This looks like a bug for me, so please let me know how to go about it.

Pratyush Banerjee

Re: Hadoop-Archive Error for size of input data >2GB

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

>  how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
> 
Opened https://issues.apache.org/jira/browse/MAPREDUCE-883 for documenting,
but the idea is you just need to copy :)


+
+        <section>
+        <title> How to unarchive an archive?</title>
+        <p> Since all the fs shell commands in the archives work
transparently,
+            unarchiving is just a matter of copying </p>
+        <p> To unarchive sequentially:</p>
+        <p><code> hadoop dfs -cp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+        <p> To unarchive in parallel, use distcp: </p>
+          <p><code> hadoop distcp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+        </section>


As for the original email,
somehow I was able to archive files larger than 2G in 0.18.3.
Maybe there's additional condition I'm missing?

Koji     


On 8/18/09 6:44 PM, "Snehal Nagmote" <na...@gmail.com> wrote:

> 
> Hi,
> 
>  how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
> 
> 
> 
> Pratyush Banerjee-2 wrote:
>> 
>> Hi All,
>> 
>> I have been using hadoop archives programmatically  to generate  har
>> archives from some logfiles  which are being dumped into the hdfs.
>> 
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>> 
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=   Illegal Capacity: -1
>> 
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>> 
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>> 
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>> 
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>> 
>> static final long partSize = 2 * 1024 * 1024 * 1024
>> 
>> Strangely enough, the value i find assigned to partSize is  =  -
>> 2147483648
>> 
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>> 
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>> 
>> This looks like a bug for me, so please let me know how to go about it.
>> 
>> Pratyush Banerjee
>> 
>> 
>>

Re: Hadoop-Archive Error for size of input data >2GB

Posted by Snehal Nagmote <na...@gmail.com>.

Hi,

 how to unarchive the logical files from the har file ?Is there anyway t o
unarchive the logical files.



Pratyush Banerjee-2 wrote:
> 
> Hi All,
> 
> I have been using hadoop archives programmatically  to generate  har 
> archives from some logfiles  which are being dumped into the hdfs.
> 
> When the input directory to Hadoop Archiving program has files of size 
> more than 2GB, strangely the archiving fails with a error message saying
> 
> INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=   Illegal Capacity: -1
> 
> Going into the code i found out that this was due to numMaps having the 
> Value of -1.
> 
> As per the code in org.apache.hadoop.util.HadoopArchives: 
> archive(List<Path> srcPaths, String archiveName, Path dest)
> 
> the numMaps is initialized as 
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
> 
> partSize has been statically assigned the value of 2GB in the beginning 
> of the class as,
> 
> static final long partSize = 2 * 1024 * 1024 * 1024
> 
> Strangely enough, the value i find assigned to partSize is  =  -
> 2147483648
> 
> Hence as a result in case of input directories of greater size, numMaps 
> is assigned -1 which leads to the code throwing up error.
> 
> I am using hadoop-0.17.1 and I got the archiving facility after applying 
> the patch hadoop-3307_4 patch.
> 
> This looks like a bug for me, so please let me know how to go about it.
> 
> Pratyush Banerjee
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Hadoop-Archive-Error-for-size-of-input-data-%3E2GB-tp18568129p25036326.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Hadoop-Archive Error for size of input data >2GB

Posted by Pratyush Banerjee <pr...@aol.com>.

Thanks Mahadev,
Thanks for letting me know of the patch. I have already applied it and 
the archiving seems to run fine for input directory size of about 5GB.

Currently am testing the same programatically,  but since it is working 
from the command line, it should ideally also work this way.

thanks and regards~

Pratyush

mahadev@yahoo-inc.com wrote:
> HI Pratyush,
>
>   I think this bug was fixed in
> https://issues.apache.org/jira/browse/HADOOP-3545.
>
> Can you apply the patch and see if it works?
>
> Mahadev
>
>
> On 7/21/08 5:56 AM, "Pratyush Banerjee" <pr...@aol.com> wrote:
>
>   
>> Hi All,
>>
>> I have been using hadoop archives programmatically  to generate  har
>> archives from some logfiles  which are being dumped into the hdfs.
>>
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>>
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=   Illegal Capacity: -1
>>
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>>
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>>
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>>
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>>
>> static final long partSize = 2 * 1024 * 1024 * 1024
>>
>> Strangely enough, the value i find assigned to partSize is  =  - 2147483648
>>
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>>
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>>
>> This looks like a bug for me, so please let me know how to go about it.
>>
>> Pratyush Banerjee
>>
>>     
>
>

Re: Hadoop-Archive Error for size of input data >2GB

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

HI Pratyush,

  I think this bug was fixed in
https://issues.apache.org/jira/browse/HADOOP-3545.

Can you apply the patch and see if it works?

Mahadev


On 7/21/08 5:56 AM, "Pratyush Banerjee" <pr...@aol.com> wrote:

> Hi All,
> 
> I have been using hadoop archives programmatically  to generate  har
> archives from some logfiles  which are being dumped into the hdfs.
> 
> When the input directory to Hadoop Archiving program has files of size
> more than 2GB, strangely the archiving fails with a error message saying
> 
> INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=   Illegal Capacity: -1
> 
> Going into the code i found out that this was due to numMaps having the
> Value of -1.
> 
> As per the code in org.apache.hadoop.util.HadoopArchives:
> archive(List<Path> srcPaths, String archiveName, Path dest)
> 
> the numMaps is initialized as
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
> 
> partSize has been statically assigned the value of 2GB in the beginning
> of the class as,
> 
> static final long partSize = 2 * 1024 * 1024 * 1024
> 
> Strangely enough, the value i find assigned to partSize is  =  - 2147483648
> 
> Hence as a result in case of input directories of greater size, numMaps
> is assigned -1 which leads to the code throwing up error.
> 
> I am using hadoop-0.17.1 and I got the archiving facility after applying
> the patch hadoop-3307_4 patch.
> 
> This looks like a bug for me, so please let me know how to go about it.
> 
> Pratyush Banerjee
>