You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by chiranjeevi vasupilli <ch...@gmail.com> on 2017/01/25 08:47:59 UTC
DB vs HDFS for DataTorrent
Hi Team,
Can you please provide the pointers for using the Data Base vs HDFS as
source data for Data Torrent tool.
Currenlry we are using HDFS to read the data as source and would like to
know the proc/cons , if we swith to Data base as source system for data.
Please sugges.
--
ur's
chiru
Re: DB vs HDFS for DataTorrent
Posted by Yogi Devendra <yo...@apache.org>.
Chiranjeevi,
- HDFS works as distributed system. Thus, reads can be served from
different nodes at the source.
- Not all databases are distributed. If your database server is not
distributed then you might face issues for parallel read beyond certain no.
of partitions (say 4-5 partitions)
- Ready to use applications for these usecases are available on
https://www.datatorrent.com/apphub/
- Source code for these apps is Apache licensed under :
https://github.com/datatorrent/app-templates
- I would suggest to do some sample tests for the workloads you are
looking for and take the decision. Kindly share your results for the
benefit of the community.
~ Yogi
On 25 January 2017 at 14:17, chiranjeevi vasupilli <ch...@gmail.com>
wrote:
> Hi Team,
>
> Can you please provide the pointers for using the Data Base vs HDFS as
> source data for Data Torrent tool.
>
> Currenlry we are using HDFS to read the data as source and would like to
> know the proc/cons , if we swith to Data base as source system for data.
>
> Please sugges.
> --
> ur's
> chiru
>