You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by akshay shetye <ak...@gmail.com> on 2013/01/10 06:16:24 UTC

machine learning algorithm giving wrong results

I have a machine learning problem which i am illustrating by giving a
simile ,less complex example

John goes from home to office daily.He takes following time to reach to
office

Bus -> 3 hours
Cab -> 2 hours
bike -> 1 hours

Problem:How much time john will take to reach his office from the time he
starts.

He mostly takes bus and sometimes cab and rarely bike depending on how much
time he has to reach his office

He must reach at office at 9am.

Now if he starts at 6 he takes bus
     if he starts at 7 he takes cab
     if he starts at  8 he takes bike.

Now the model which i build using M5P and libSvm predicts fine when he
starts on or before 8.Now the problem occurs when John leaves his home
after 8 (eg 8.30 or 9 /assume he got up late) . Ideally in this case he
will take around 1 hour as he should take his bike.

My model is giving me negative predictions and this is what is causing
problem.

Now as john wakes up late very rarely we have very few data points to train
it on such cases.

My feature list is as follows

timeLeftForDuty, DAY_OF_WEEK , TRAVEL_TIME

TRAVEL_TIME is we are trying to predict.

How can solve this problem?Meaning how can i avoid getting negati values of
travel time?Which algorithm should i use from mahout?

-- 
Regards,
Damodar Shetyo

Re: machine learning algorithm giving wrong results

Posted by Ted Dunning <te...@gmail.com>.
This is a regression problem.  The regression algorithm available in Mahout
is logistic regression.  You can force it to solve this problem in two
ways.  First, you can scale and offset the output by a large enough factor
so that the normal 0 to 1 output range is much larger than necessary and
the mean is centered at the rough mean of your data.  The only input
feature would be wake-up time.

Another approach would be to use multinomial output with three outputs.
 This is a more natural fit to the Mahout algorithm.

Is this a homework problem?

On Wed, Jan 9, 2013 at 9:16 PM, akshay shetye <ak...@gmail.com>wrote:

> I have a machine learning problem which i am illustrating by giving a
> simile ,less complex example
>
> John goes from home to office daily.He takes following time to reach to
> office
>
> Bus -> 3 hours
> Cab -> 2 hours
> bike -> 1 hours
>
> Problem:How much time john will take to reach his office from the time he
> starts.
>
> He mostly takes bus and sometimes cab and rarely bike depending on how much
> time he has to reach his office
>
> He must reach at office at 9am.
>
> Now if he starts at 6 he takes bus
>      if he starts at 7 he takes cab
>      if he starts at  8 he takes bike.
>
> Now the model which i build using M5P and libSvm predicts fine when he
> starts on or before 8.Now the problem occurs when John leaves his home
> after 8 (eg 8.30 or 9 /assume he got up late) . Ideally in this case he
> will take around 1 hour as he should take his bike.
>
> My model is giving me negative predictions and this is what is causing
> problem.
>
> Now as john wakes up late very rarely we have very few data points to train
> it on such cases.
>
> My feature list is as follows
>
> timeLeftForDuty, DAY_OF_WEEK , TRAVEL_TIME
>
> TRAVEL_TIME is we are trying to predict.
>
> How can solve this problem?Meaning how can i avoid getting negati values of
> travel time?Which algorithm should i use from mahout?
>
> --
> Regards,
> Damodar Shetyo
>