You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Saravanan Nagarajan <sa...@gmail.com> on 2015/01/29 13:53:45 UTC

Need help on Hadoop cluster capacity and hardware specification

HI,



Need help on Hadoop cluster capacity and hardware specification:

=================================================

We are plan to migrate the existing “Enterprise Data warehouse”/”Business
intelligent “ to Hadoop based solution.



In the current system has Teradata as storage, Abinitio for ETL  and
MicroStrategy for reporting.  We like to replace the current solution with
Hadoop based solution. In the Hadoop solution, should store all raw CDR in
HDFS and ETL processing of that CDR using hive/spark ( using any Hadoop SQL
tool) .





In the current system, Teradata has 128TB storage and 100TB+ CDR files.


Question:

1. How many Node needed to store and process 228TB(128TB+100TB )of data?


2. What hardware configuration required for each node slave node and master
node?


3. Which is the best SQL on Hadoop tools for writing ETL jobs?

We are considered hive, spark,casandra and cascading for evaluation. Please

suggest me if you have any other tools.


Please provide the valable input, thanks for you support.



Thanks,

Saravanan

https://www.linkedin.com/in/saravanan303