You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by chiranjeevi vasupilli <ch...@gmail.com> on 2017/01/25 08:47:59 UTC

DB vs HDFS for DataTorrent

Hi Team,

Can you please provide the pointers for using the Data Base vs HDFS as
source data for Data Torrent tool.

Currenlry we are using HDFS to read the data as source and would like to
know the proc/cons , if we swith to Data base as source system for data.

Please sugges.
-- 
ur's
chiru

Re: DB vs HDFS for DataTorrent

Posted by Yogi Devendra <yo...@apache.org>.
Chiranjeevi,


   - HDFS works as distributed system. Thus, reads can be served from
   different nodes at the source.
   - Not all databases are distributed. If your database server is not
   distributed then you might face issues for parallel read beyond certain no.
   of partitions (say 4-5 partitions)
   - Ready to use applications for these usecases are available on
   https://www.datatorrent.com/apphub/
   - Source code for these apps is Apache licensed under :
   https://github.com/datatorrent/app-templates
   - I would suggest to do some sample tests for the workloads you are
   looking for and take the decision. Kindly share your results for the
   benefit of the community.

~ Yogi

On 25 January 2017 at 14:17, chiranjeevi vasupilli <ch...@gmail.com>
wrote:

> Hi Team,
>
> Can you please provide the pointers for using the Data Base vs HDFS as
> source data for Data Torrent tool.
>
> Currenlry we are using HDFS to read the data as source and would like to
> know the proc/cons , if we swith to Data base as source system for data.
>
> Please sugges.
> --
> ur's
> chiru
>