You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by snehal nagmote <na...@gmail.com> on 2009/03/25 06:10:15 UTC
Need Help hdfs -How to minimize access Time
Hello Sir,
I am doing mtech in iiit hyderabad , I am doing research project whose aim
is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields and store it
in the filesystem and then those images would be accessed by agricultural
scientist to detect the problem, So currently many fields in the A.P. are
using this system,it may go beyond A.Pso we require storage system
1)My problem is we are using hadoop for the storage, but hadoop retrieves
(reads/writes) in 64 mb chunk . these images stored would be very small size
say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like building our own
cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data like this, if
not what tricks we could do to make it usable for such knid of application
Please help, Thanks in advance
Regards,
Snehal Nagmote
IIIT Hyderabad
Re: Need Help hdfs -How to minimize access Time
Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Snehal (removing the core-dev list; please only post to one at a
time),
The access time should be fine, but it depends on what you define as
an acceptable access time. If this is not acceptable, I'd suggest
putting it behind a web cache like Squid. The best way to find out is
to use the system as a prototype and to evaluate it based on your
requirements.
Hadoop is useful for small data, but optimized and originally designed
only for big data. The primary downfall of the small files is that it
may cost more per file in terms of memory. Hadoop as a solution may
be overkill, however, if your total storage size is never going to
grow very large.
We currently use HDFS for mostly random access.
Brian
On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote:
> Hello Sir,
> I am doing mtech in iiit hyderabad , I am doing research project
> whose aim
> is to develop the scalable storage system For esagu.
> The esagu is all about taking the crop images from the fields and
> store it
> in the filesystem and then those images would be accessed by
> agricultural
> scientist to detect the problem, So currently many fields in the
> A.P. are
> using this system,it may go beyond A.Pso we require storage system
>
> 1)My problem is we are using hadoop for the storage, but hadoop
> retrieves
> (reads/writes) in 64 mb chunk . these images stored would be very
> small size
> say max 2 to 3 mb, So access time would be larger in case of accessing
> images, Can you suggest how this access time can be reduced.Is there
> anyother thing we could do to improve the performance like building
> our own
> cache, To what extent it would be feasible or helpful in such kind of
> application.
> 2)Second is does hadoop would be useful for small small data like
> this, if
> not what tricks we could do to make it usable for such knid of
> application
>
> Please help, Thanks in advance
>
>
>
> Regards,
> Snehal Nagmote
> IIIT Hyderabad