You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jagaran das <ja...@yahoo.co.in> on 2011/07/16 08:18:54 UTC

Hadoop Production Issue

Hi,

Due to requirements in our current production CDH3 cluster we need to copy around 11520 small size files (Total Size 12 GB) to the cluster for one application.
Like this we have 20 applications that would run in parallel

So one set would have 11520 files of total size 12 GB
Like this we would have 15 sets in parallel, 

We have a total SLA for the pipeline from copy to pig aggregation to copy to local and sql load is 15 mins. 

What we do:

1. Merge Files so that we get rid of small files. - Huge time hit process??? Do we have any other option???
2. Copy to cluster
3. Execute PIG job
4. copy to local
5 Sql loader

Can we perform merge and copy to cluster from a different host other than the Namenode?
We want an out of cluster machine running a java process that would
1. Run periodically
2. Merge Files
3. Copy to Cluster 

Secondly,
If we can append to an existing file in cluster?

Please provide your thoughts as maintaing the SLA is becoming tough. 

Regards,
Jagaran 

Re: Hadoop Production Issue

Posted by Віталій Тимчишин <ti...@gmail.com>.
2011/7/16 jagaran das <ja...@yahoo.co.in>

> Hi,
>
> Due to requirements in our current production CDH3 cluster we need to copy
> around 11520 small size files (Total Size 12 GB) to the cluster for one
> application.
> Like this we have 20 applications that would run in parallel
>
> So one set would have 11520 files of total size 12 GB
> Like this we would have 15 sets in parallel,
>
> We have a total SLA for the pipeline from copy to pig aggregation to copy
> to local and sql load is 15 mins.
>
>
Have you tried to use HARs?
-- 
Best regards,
 Vitalii Tymchyshyn