You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Pratyush Banerjee <pr...@aol.com> on 2008/07/21 14:56:37 UTC
Hadoop-Archive Error for size of input data >2GB
Hi All,
I have been using hadoop archives programmatically to generate har
archives from some logfiles which are being dumped into the hdfs.
When the input directory to Hadoop Archiving program has files of size
more than 2GB, strangely the archiving fails with a error message saying
INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= Illegal Capacity: -1
Going into the code i found out that this was due to numMaps having the
Value of -1.
As per the code in org.apache.hadoop.util.HadoopArchives:
archive(List<Path> srcPaths, String archiveName, Path dest)
the numMaps is initialized as
int numMaps = (int)(totalSize/partSize);
//run atleast one map.
conf.setNumMapTasks(numMaps == 0? 1:numMaps);
partSize has been statically assigned the value of 2GB in the beginning
of the class as,
static final long partSize = 2 * 1024 * 1024 * 1024
Strangely enough, the value i find assigned to partSize is = - 2147483648
Hence as a result in case of input directories of greater size, numMaps
is assigned -1 which leads to the code throwing up error.
I am using hadoop-0.17.1 and I got the archiving facility after applying
the patch hadoop-3307_4 patch.
This looks like a bug for me, so please let me know how to go about it.
Pratyush Banerjee
Re: Hadoop-Archive Error for size of input data >2GB
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
> how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
>
Opened https://issues.apache.org/jira/browse/MAPREDUCE-883 for documenting,
but the idea is you just need to copy :)
+
+ <section>
+ <title> How to unarchive an archive?</title>
+ <p> Since all the fs shell commands in the archives work
transparently,
+ unarchiving is just a matter of copying </p>
+ <p> To unarchive sequentially:</p>
+ <p><code> hadoop dfs -cp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+ <p> To unarchive in parallel, use distcp: </p>
+ <p><code> hadoop distcp har:///user/zoo/foo.har/dir1
hdfs:/user/zoo/newdir </code></p>
+ </section>
As for the original email,
somehow I was able to archive files larger than 2G in 0.18.3.
Maybe there's additional condition I'm missing?
Koji
On 8/18/09 6:44 PM, "Snehal Nagmote" <na...@gmail.com> wrote:
>
> Hi,
>
> how to unarchive the logical files from the har file ?Is there anyway t o
> unarchive the logical files.
>
>
>
> Pratyush Banerjee-2 wrote:
>>
>> Hi All,
>>
>> I have been using hadoop archives programmatically to generate har
>> archives from some logfiles which are being dumped into the hdfs.
>>
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>>
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId= Illegal Capacity: -1
>>
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>>
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>>
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>>
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>>
>> static final long partSize = 2 * 1024 * 1024 * 1024
>>
>> Strangely enough, the value i find assigned to partSize is = -
>> 2147483648
>>
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>>
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>>
>> This looks like a bug for me, so please let me know how to go about it.
>>
>> Pratyush Banerjee
>>
>>
>>
Re: Hadoop-Archive Error for size of input data >2GB
Posted by Snehal Nagmote <na...@gmail.com>.
Hi,
how to unarchive the logical files from the har file ?Is there anyway t o
unarchive the logical files.
Pratyush Banerjee-2 wrote:
>
> Hi All,
>
> I have been using hadoop archives programmatically to generate har
> archives from some logfiles which are being dumped into the hdfs.
>
> When the input directory to Hadoop Archiving program has files of size
> more than 2GB, strangely the archiving fails with a error message saying
>
> INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId= Illegal Capacity: -1
>
> Going into the code i found out that this was due to numMaps having the
> Value of -1.
>
> As per the code in org.apache.hadoop.util.HadoopArchives:
> archive(List<Path> srcPaths, String archiveName, Path dest)
>
> the numMaps is initialized as
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>
> partSize has been statically assigned the value of 2GB in the beginning
> of the class as,
>
> static final long partSize = 2 * 1024 * 1024 * 1024
>
> Strangely enough, the value i find assigned to partSize is = -
> 2147483648
>
> Hence as a result in case of input directories of greater size, numMaps
> is assigned -1 which leads to the code throwing up error.
>
> I am using hadoop-0.17.1 and I got the archiving facility after applying
> the patch hadoop-3307_4 patch.
>
> This looks like a bug for me, so please let me know how to go about it.
>
> Pratyush Banerjee
>
>
>
--
View this message in context: http://www.nabble.com/Hadoop-Archive-Error-for-size-of-input-data-%3E2GB-tp18568129p25036326.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop-Archive Error for size of input data >2GB
Posted by Pratyush Banerjee <pr...@aol.com>.
Thanks Mahadev,
Thanks for letting me know of the patch. I have already applied it and
the archiving seems to run fine for input directory size of about 5GB.
Currently am testing the same programatically, but since it is working
from the command line, it should ideally also work this way.
thanks and regards~
Pratyush
mahadev@yahoo-inc.com wrote:
> HI Pratyush,
>
> I think this bug was fixed in
> https://issues.apache.org/jira/browse/HADOOP-3545.
>
> Can you apply the patch and see if it works?
>
> Mahadev
>
>
> On 7/21/08 5:56 AM, "Pratyush Banerjee" <pr...@aol.com> wrote:
>
>
>> Hi All,
>>
>> I have been using hadoop archives programmatically to generate har
>> archives from some logfiles which are being dumped into the hdfs.
>>
>> When the input directory to Hadoop Archiving program has files of size
>> more than 2GB, strangely the archiving fails with a error message saying
>>
>> INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId= Illegal Capacity: -1
>>
>> Going into the code i found out that this was due to numMaps having the
>> Value of -1.
>>
>> As per the code in org.apache.hadoop.util.HadoopArchives:
>> archive(List<Path> srcPaths, String archiveName, Path dest)
>>
>> the numMaps is initialized as
>> int numMaps = (int)(totalSize/partSize);
>> //run atleast one map.
>> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>>
>> partSize has been statically assigned the value of 2GB in the beginning
>> of the class as,
>>
>> static final long partSize = 2 * 1024 * 1024 * 1024
>>
>> Strangely enough, the value i find assigned to partSize is = - 2147483648
>>
>> Hence as a result in case of input directories of greater size, numMaps
>> is assigned -1 which leads to the code throwing up error.
>>
>> I am using hadoop-0.17.1 and I got the archiving facility after applying
>> the patch hadoop-3307_4 patch.
>>
>> This looks like a bug for me, so please let me know how to go about it.
>>
>> Pratyush Banerjee
>>
>>
>
>
Re: Hadoop-Archive Error for size of input data >2GB
Posted by Mahadev Konar <ma...@yahoo-inc.com>.
HI Pratyush,
I think this bug was fixed in
https://issues.apache.org/jira/browse/HADOOP-3545.
Can you apply the patch and see if it works?
Mahadev
On 7/21/08 5:56 AM, "Pratyush Banerjee" <pr...@aol.com> wrote:
> Hi All,
>
> I have been using hadoop archives programmatically to generate har
> archives from some logfiles which are being dumped into the hdfs.
>
> When the input directory to Hadoop Archiving program has files of size
> more than 2GB, strangely the archiving fails with a error message saying
>
> INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId= Illegal Capacity: -1
>
> Going into the code i found out that this was due to numMaps having the
> Value of -1.
>
> As per the code in org.apache.hadoop.util.HadoopArchives:
> archive(List<Path> srcPaths, String archiveName, Path dest)
>
> the numMaps is initialized as
> int numMaps = (int)(totalSize/partSize);
> //run atleast one map.
> conf.setNumMapTasks(numMaps == 0? 1:numMaps);
>
> partSize has been statically assigned the value of 2GB in the beginning
> of the class as,
>
> static final long partSize = 2 * 1024 * 1024 * 1024
>
> Strangely enough, the value i find assigned to partSize is = - 2147483648
>
> Hence as a result in case of input directories of greater size, numMaps
> is assigned -1 which leads to the code throwing up error.
>
> I am using hadoop-0.17.1 and I got the archiving facility after applying
> the patch hadoop-3307_4 patch.
>
> This looks like a bug for me, so please let me know how to go about it.
>
> Pratyush Banerjee
>