You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Molnár Bálint <mo...@gmail.com> on 2015/03/05 13:53:43 UTC

HDFS Append Problem

Hi Everyone!


I ‘m experiencing an annoying problem.


My Scenario is:


I want to store lots of small files (1-2MB max) in map files. These files
will come periodically during the days, so I cannot use the “factory”
writer because it will create a lot of small MapFiles. (I want to store
these files in the HDFS immediately.)


I’ m trying to create a code to append Map files. I use the

*org.apache.hadoop.fs.FileSystem append() *method which calls the
*org.apache.hadoop.hdfs.DistributedFileSystem
append()* method to do the job.


My code works well, because the stock MapFile Reader can retrieve the
files. My problem appears in the upload phase. When I try to upload a set
(1GB) of small files, the free space of the HDFS decreases fast. The
program only uploads 400MB but according to the Cloudera Manager it is more
than 5GB.

The interesting part is that, when I terminate the upload, and wait 1-2
minutes, the HDFS goes back to normal size (500MB), and none of my files
are lost. If I don’t terminate the upload, the HDFS goes out of free space
and the program gets errors.

I’m using cloudera quickvm 5.3 for testing, and the hdfs replication number
is 1.



Any ideas how to solve this issue?



Thanks

Re: HDFS Append Problem

Posted by Suresh Srinivas <su...@hortonworks.com>.
Please take this up CDH mailing list.


________________________________
From: Molnár Bálint <mo...@gmail.com>
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the "factory" writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1.


Any ideas how to solve this issue?


Thanks

Re: HDFS Append Problem

Posted by Suresh Srinivas <su...@hortonworks.com>.
Please take this up CDH mailing list.


________________________________
From: Molnár Bálint <mo...@gmail.com>
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the "factory" writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1.


Any ideas how to solve this issue?


Thanks

Re: HDFS Append Problem

Posted by Suresh Srinivas <su...@hortonworks.com>.
Please take this up CDH mailing list.


________________________________
From: Molnár Bálint <mo...@gmail.com>
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the "factory" writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1.


Any ideas how to solve this issue?


Thanks

Re: HDFS Append Problem

Posted by Suresh Srinivas <su...@hortonworks.com>.
Please take this up CDH mailing list.


________________________________
From: Molnár Bálint <mo...@gmail.com>
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the "factory" writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1.


Any ideas how to solve this issue?


Thanks