You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by snehal nagmote <na...@gmail.com> on 2009/03/25 06:10:15 UTC

Need Help hdfs -How to minimize access Time

Hello Sir,
I am doing mtech in iiit  hyderabad , I am doing research project whose aim
is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields and store it
in the filesystem and then those images would be accessed by agricultural
scientist to detect the problem, So currently many fields in the A.P. are
using this system,it may go beyond A.Pso we require storage system

1)My problem is we are using hadoop for the storage, but hadoop retrieves
(reads/writes) in 64 mb chunk . these images stored would be very small size
say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like building our own
cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data like this, if
not  what tricks we could do to make it usable for such knid of application

Please help, Thanks in advance



Regards,
Snehal Nagmote
IIIT Hyderabad

Re: Need Help hdfs -How to minimize access Time

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hey Snehal (removing the core-dev list; please only post to one at a  
time),

The access time should be fine, but it depends on what you define as  
an acceptable access time.  If this is not acceptable, I'd suggest  
putting it behind a web cache like Squid.  The best way to find out is  
to use the system as a prototype and to evaluate it based on your  
requirements.

Hadoop is useful for small data, but optimized and originally designed  
only for big data.  The primary downfall of the small files is that it  
may cost more per file in terms of memory.  Hadoop as a solution may  
be overkill, however, if your total storage size is never going to  
grow very large.

We currently use HDFS for mostly random access.

Brian

On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote:

> Hello Sir,
> I am doing mtech in iiit  hyderabad , I am doing research project  
> whose aim
> is to develop the scalable storage system For esagu.
> The esagu is all about taking the crop images from the fields and  
> store it
> in the filesystem and then those images would be accessed by  
> agricultural
> scientist to detect the problem, So currently many fields in the  
> A.P. are
> using this system,it may go beyond A.Pso we require storage system
>
> 1)My problem is we are using hadoop for the storage, but hadoop  
> retrieves
> (reads/writes) in 64 mb chunk . these images stored would be very  
> small size
> say max 2 to 3 mb, So access time would be larger in case of accessing
> images, Can you suggest how this access time can be reduced.Is there
> anyother thing we could do to improve the performance like building  
> our own
> cache, To what extent it would be feasible or helpful in such kind of
> application.
> 2)Second is does hadoop would be useful for small small data like  
> this, if
> not  what tricks we could do to make it usable for such knid of  
> application
>
> Please help, Thanks in advance
>
>
>
> Regards,
> Snehal Nagmote
> IIIT Hyderabad