You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by unmesha sreeveni <un...@gmail.com> on 2013/11/12 08:24:24 UTC

Parallel SVM Implementation | Taking Long time for JobCompletion

I am trying to implement SVM in hadoop ,the training phase..
when i am processing large files(checked with 5000 records) it is taking
about 30 min to complete the job.

how can i increase the speed.

In Hadoop - The Definitive Guide it is telling that

The logical records that FileInputFormats define do not usually fit neatly
into HDFS blocks. For example, a TextInputFormat’s logical records are
lines, which will cross HDFS boundaries more often than not. This has no
bearing on the functioning of your program—lines are not missed or broken,
for example—but it’s worth knowing about, as it does mean that data-local
maps (that is, maps that are running on the same host as their input data)
will perform some remote reads. The slight overhead this causes is not
normally significant.

I am using
               job.setInputFormatClass(TextInputFormat.class);
               job.setOutputFormatClass(TextOutputFormat.class);
in driver class. so in mapper i am getting each line of input..is that a
reason for slowing down my job.

how to increase the speed..
Any suggestion?

-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Parallel SVM Implementation | Taking Long time for JobCompletion

Posted by Jens Scheidtmann <je...@gmail.com>.
Dear unmesha,

Please profile your code or Provide a minimal working example...

Thanks, Jens

Am Dienstag, 12. November 2013 schrieb unmesha sreeveni :

> I am trying to implement SVM in hadoop ,the training phase..
> when i am processing large files(checked with 5000 records) it is taking
> about 30 min to complete the job.
>
>
> how to increase the speed..
> Any suggestion?
>
> --
>

Re: Parallel SVM Implementation | Taking Long time for JobCompletion

Posted by Jens Scheidtmann <je...@gmail.com>.
Dear unmesha,

Please profile your code or Provide a minimal working example...

Thanks, Jens

Am Dienstag, 12. November 2013 schrieb unmesha sreeveni :

> I am trying to implement SVM in hadoop ,the training phase..
> when i am processing large files(checked with 5000 records) it is taking
> about 30 min to complete the job.
>
>
> how to increase the speed..
> Any suggestion?
>
> --
>

Re: Parallel SVM Implementation | Taking Long time for JobCompletion

Posted by Jens Scheidtmann <je...@gmail.com>.
Dear unmesha,

Please profile your code or Provide a minimal working example...

Thanks, Jens

Am Dienstag, 12. November 2013 schrieb unmesha sreeveni :

> I am trying to implement SVM in hadoop ,the training phase..
> when i am processing large files(checked with 5000 records) it is taking
> about 30 min to complete the job.
>
>
> how to increase the speed..
> Any suggestion?
>
> --
>

Re: Parallel SVM Implementation | Taking Long time for JobCompletion

Posted by Jens Scheidtmann <je...@gmail.com>.
Dear unmesha,

Please profile your code or Provide a minimal working example...

Thanks, Jens

Am Dienstag, 12. November 2013 schrieb unmesha sreeveni :

> I am trying to implement SVM in hadoop ,the training phase..
> when i am processing large files(checked with 5000 records) it is taking
> about 30 min to complete the job.
>
>
> how to increase the speed..
> Any suggestion?
>
> --
>