You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by unmesha sreeveni <un...@gmail.com> on 2013/11/23 11:01:57 UTC

Desicion Tree Implementation in Hadoop MapReduce

I want to go through Decision tree implementation in mahout. Refereed Apache
Mahout <http://mahout.apache.org/>

6 Feb 2012 - Apache Mahout 0.6 released
Apache Mahout has reached version 0.6. All developers are encouraged
to begin using version 0.6. Highlights include:
Improved Decision Tree performance and added support for regression problems

Where can I find its source code and documentation.

Should I download mahout

-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thank you Yexi...Thanks for spending your valuable time.


On Mon, Dec 2, 2013 at 8:22 PM, Yexi Jiang <ye...@gmail.com> wrote:

> Yes, the user is responsible for using the correct model for a given piece
> of testing (or unlabeled) data.
>
>
> 2013/12/2 unmesha sreeveni <un...@gmail.com>
>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>> Ok , I will go for the second one.
>> So if we are going for separate.They will not have any connection with
>> both. So we should tell what test data belongs to which train data.
>> And load the corresponding playtennnis_tree.txt (so the result file
>> should be named in a manner that the training result name can be noticed by
>> its file name) for the train data and predict the test data.
>>
>>
>> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> Actually the training and testing (or prediction) are not necessary to
>>> be done in one shot. If you need to do them consecutively in your
>>> particular scenario, you can do it as what you said.
>>>
>>> To make it more general, it's better to separate them. Since there might
>>> be multiple batches of training (or to-be-label), and you only need to
>>> train the model once (if your data is stable).
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> 1. I jst thought of building a model using a project named say DT and
>>>> wen a huge input comes do another mr job test.java with in DT.
>>>> If not chaining jobs we need to create seperate project right DT_build
>>>> and DT_test projects
>>>> NO need for seperate project file?
>>>>
>>>> 2. M1_train - dataset for training.
>>>>
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Any thing wrong in my inference...
>>>> Are u able to guess wt i am trying to accomplish.
>>>> I am confused if i need to create only 1 project that includes train
>>>> and test.or 2 projects
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> What is your motivation of using chaining jobs?
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>>> Explained in a very simple way which is really understandable for
>>>>>> beginners..Thanks a lot.
>>>>>> I can go for chaining jobs right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> In my opinion.
>>>>>>>
>>>>>>> 1. Build the decision tree model with the training data.
>>>>>>> 2. Store it somewhere.
>>>>>>> 3. When the unlabeled data is available:
>>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Yexi ,
>>>>>>>>
>>>>>>>> But how  it can be accomplished.
>>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>>> right?
>>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>>
>>>>>>>> -------
>>>>>>>>
>>>>>>>> M1_train - dataset for training.
>>>>>>>> M1_test - test data or prediction.
>>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>>> as input at-once.
>>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>>
>>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>>
>>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>>> to
>>>>>>>> > permission issue.
>>>>>>>> > In my opinion, once the decision tree model is built, the model
>>>>>>>> is small
>>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>>> another
>>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>>> streaming way.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >
>>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>>> >>
>>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>>> >>
>>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> You are welcome :)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>
>>>>>>>> >>>> ok . Thx Yexi
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>>> yexijiang@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>>> currently,
>>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>>> >>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>>> >>>>>> It includes prediction also?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> You can directly find it at
>>>>>>>> https://github.com/apache/mahout, or you
>>>>>>>> >>>>>>> can check out from svn by following
>>>>>>>> >>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>>> mahout.
>>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>>> encouraged
>>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>>> regression
>>>>>>>> >>>>>>>> problems
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Should I download mahout
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> ------
>>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>>> >>>>>>> Florida International University
>>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> *Junior Developer*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> ------
>>>>>>>> >>>>> Yexi Jiang,
>>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>>> >>>>> Florida International University
>>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> *Thanks & Regards*
>>>>>>>> >>>>
>>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>
>>>>>>>> >>>> *Junior Developer*
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> ------
>>>>>>>> >>> Yexi Jiang,
>>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>> School of Computer and Information Science,
>>>>>>>> >>> Florida International University
>>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> *Thanks & Regards*
>>>>>>>> >>
>>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>>> >>
>>>>>>>> >> *Junior Developer*
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > ------
>>>>>>>> > Yexi Jiang,
>>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> > School of Computer and Information Science,
>>>>>>>> > Florida International University
>>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thank you Yexi...Thanks for spending your valuable time.


On Mon, Dec 2, 2013 at 8:22 PM, Yexi Jiang <ye...@gmail.com> wrote:

> Yes, the user is responsible for using the correct model for a given piece
> of testing (or unlabeled) data.
>
>
> 2013/12/2 unmesha sreeveni <un...@gmail.com>
>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>> Ok , I will go for the second one.
>> So if we are going for separate.They will not have any connection with
>> both. So we should tell what test data belongs to which train data.
>> And load the corresponding playtennnis_tree.txt (so the result file
>> should be named in a manner that the training result name can be noticed by
>> its file name) for the train data and predict the test data.
>>
>>
>> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> Actually the training and testing (or prediction) are not necessary to
>>> be done in one shot. If you need to do them consecutively in your
>>> particular scenario, you can do it as what you said.
>>>
>>> To make it more general, it's better to separate them. Since there might
>>> be multiple batches of training (or to-be-label), and you only need to
>>> train the model once (if your data is stable).
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> 1. I jst thought of building a model using a project named say DT and
>>>> wen a huge input comes do another mr job test.java with in DT.
>>>> If not chaining jobs we need to create seperate project right DT_build
>>>> and DT_test projects
>>>> NO need for seperate project file?
>>>>
>>>> 2. M1_train - dataset for training.
>>>>
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Any thing wrong in my inference...
>>>> Are u able to guess wt i am trying to accomplish.
>>>> I am confused if i need to create only 1 project that includes train
>>>> and test.or 2 projects
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> What is your motivation of using chaining jobs?
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>>> Explained in a very simple way which is really understandable for
>>>>>> beginners..Thanks a lot.
>>>>>> I can go for chaining jobs right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> In my opinion.
>>>>>>>
>>>>>>> 1. Build the decision tree model with the training data.
>>>>>>> 2. Store it somewhere.
>>>>>>> 3. When the unlabeled data is available:
>>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Yexi ,
>>>>>>>>
>>>>>>>> But how  it can be accomplished.
>>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>>> right?
>>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>>
>>>>>>>> -------
>>>>>>>>
>>>>>>>> M1_train - dataset for training.
>>>>>>>> M1_test - test data or prediction.
>>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>>> as input at-once.
>>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>>
>>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>>
>>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>>> to
>>>>>>>> > permission issue.
>>>>>>>> > In my opinion, once the decision tree model is built, the model
>>>>>>>> is small
>>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>>> another
>>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>>> streaming way.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >
>>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>>> >>
>>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>>> >>
>>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> You are welcome :)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>
>>>>>>>> >>>> ok . Thx Yexi
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>>> yexijiang@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>>> currently,
>>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>>> >>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>>> >>>>>> It includes prediction also?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> You can directly find it at
>>>>>>>> https://github.com/apache/mahout, or you
>>>>>>>> >>>>>>> can check out from svn by following
>>>>>>>> >>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>>> mahout.
>>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>>> encouraged
>>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>>> regression
>>>>>>>> >>>>>>>> problems
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Should I download mahout
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> ------
>>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>>> >>>>>>> Florida International University
>>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> *Junior Developer*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> ------
>>>>>>>> >>>>> Yexi Jiang,
>>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>>> >>>>> Florida International University
>>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> *Thanks & Regards*
>>>>>>>> >>>>
>>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>
>>>>>>>> >>>> *Junior Developer*
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> ------
>>>>>>>> >>> Yexi Jiang,
>>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>> School of Computer and Information Science,
>>>>>>>> >>> Florida International University
>>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> *Thanks & Regards*
>>>>>>>> >>
>>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>>> >>
>>>>>>>> >> *Junior Developer*
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > ------
>>>>>>>> > Yexi Jiang,
>>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> > School of Computer and Information Science,
>>>>>>>> > Florida International University
>>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thank you Yexi...Thanks for spending your valuable time.


On Mon, Dec 2, 2013 at 8:22 PM, Yexi Jiang <ye...@gmail.com> wrote:

> Yes, the user is responsible for using the correct model for a given piece
> of testing (or unlabeled) data.
>
>
> 2013/12/2 unmesha sreeveni <un...@gmail.com>
>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>> Ok , I will go for the second one.
>> So if we are going for separate.They will not have any connection with
>> both. So we should tell what test data belongs to which train data.
>> And load the corresponding playtennnis_tree.txt (so the result file
>> should be named in a manner that the training result name can be noticed by
>> its file name) for the train data and predict the test data.
>>
>>
>> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> Actually the training and testing (or prediction) are not necessary to
>>> be done in one shot. If you need to do them consecutively in your
>>> particular scenario, you can do it as what you said.
>>>
>>> To make it more general, it's better to separate them. Since there might
>>> be multiple batches of training (or to-be-label), and you only need to
>>> train the model once (if your data is stable).
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> 1. I jst thought of building a model using a project named say DT and
>>>> wen a huge input comes do another mr job test.java with in DT.
>>>> If not chaining jobs we need to create seperate project right DT_build
>>>> and DT_test projects
>>>> NO need for seperate project file?
>>>>
>>>> 2. M1_train - dataset for training.
>>>>
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Any thing wrong in my inference...
>>>> Are u able to guess wt i am trying to accomplish.
>>>> I am confused if i need to create only 1 project that includes train
>>>> and test.or 2 projects
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> What is your motivation of using chaining jobs?
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>>> Explained in a very simple way which is really understandable for
>>>>>> beginners..Thanks a lot.
>>>>>> I can go for chaining jobs right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> In my opinion.
>>>>>>>
>>>>>>> 1. Build the decision tree model with the training data.
>>>>>>> 2. Store it somewhere.
>>>>>>> 3. When the unlabeled data is available:
>>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Yexi ,
>>>>>>>>
>>>>>>>> But how  it can be accomplished.
>>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>>> right?
>>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>>
>>>>>>>> -------
>>>>>>>>
>>>>>>>> M1_train - dataset for training.
>>>>>>>> M1_test - test data or prediction.
>>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>>> as input at-once.
>>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>>
>>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>>
>>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>>> to
>>>>>>>> > permission issue.
>>>>>>>> > In my opinion, once the decision tree model is built, the model
>>>>>>>> is small
>>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>>> another
>>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>>> streaming way.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >
>>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>>> >>
>>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>>> >>
>>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> You are welcome :)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>
>>>>>>>> >>>> ok . Thx Yexi
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>>> yexijiang@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>>> currently,
>>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>>> >>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>>> >>>>>> It includes prediction also?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> You can directly find it at
>>>>>>>> https://github.com/apache/mahout, or you
>>>>>>>> >>>>>>> can check out from svn by following
>>>>>>>> >>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>>> mahout.
>>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>>> encouraged
>>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>>> regression
>>>>>>>> >>>>>>>> problems
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Should I download mahout
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> ------
>>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>>> >>>>>>> Florida International University
>>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> *Junior Developer*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> ------
>>>>>>>> >>>>> Yexi Jiang,
>>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>>> >>>>> Florida International University
>>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> *Thanks & Regards*
>>>>>>>> >>>>
>>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>
>>>>>>>> >>>> *Junior Developer*
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> ------
>>>>>>>> >>> Yexi Jiang,
>>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>> School of Computer and Information Science,
>>>>>>>> >>> Florida International University
>>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> *Thanks & Regards*
>>>>>>>> >>
>>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>>> >>
>>>>>>>> >> *Junior Developer*
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > ------
>>>>>>>> > Yexi Jiang,
>>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> > School of Computer and Information Science,
>>>>>>>> > Florida International University
>>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thank you Yexi...Thanks for spending your valuable time.


On Mon, Dec 2, 2013 at 8:22 PM, Yexi Jiang <ye...@gmail.com> wrote:

> Yes, the user is responsible for using the correct model for a given piece
> of testing (or unlabeled) data.
>
>
> 2013/12/2 unmesha sreeveni <un...@gmail.com>
>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>> Ok , I will go for the second one.
>> So if we are going for separate.They will not have any connection with
>> both. So we should tell what test data belongs to which train data.
>> And load the corresponding playtennnis_tree.txt (so the result file
>> should be named in a manner that the training result name can be noticed by
>> its file name) for the train data and predict the test data.
>>
>>
>> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> Actually the training and testing (or prediction) are not necessary to
>>> be done in one shot. If you need to do them consecutively in your
>>> particular scenario, you can do it as what you said.
>>>
>>> To make it more general, it's better to separate them. Since there might
>>> be multiple batches of training (or to-be-label), and you only need to
>>> train the model once (if your data is stable).
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> 1. I jst thought of building a model using a project named say DT and
>>>> wen a huge input comes do another mr job test.java with in DT.
>>>> If not chaining jobs we need to create seperate project right DT_build
>>>> and DT_test projects
>>>> NO need for seperate project file?
>>>>
>>>> 2. M1_train - dataset for training.
>>>>
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Any thing wrong in my inference...
>>>> Are u able to guess wt i am trying to accomplish.
>>>> I am confused if i need to create only 1 project that includes train
>>>> and test.or 2 projects
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> What is your motivation of using chaining jobs?
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>>> Explained in a very simple way which is really understandable for
>>>>>> beginners..Thanks a lot.
>>>>>> I can go for chaining jobs right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> In my opinion.
>>>>>>>
>>>>>>> 1. Build the decision tree model with the training data.
>>>>>>> 2. Store it somewhere.
>>>>>>> 3. When the unlabeled data is available:
>>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>> Thanks Yexi ,
>>>>>>>>
>>>>>>>> But how  it can be accomplished.
>>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>>> right?
>>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>>
>>>>>>>> -------
>>>>>>>>
>>>>>>>> M1_train - dataset for training.
>>>>>>>> M1_test - test data or prediction.
>>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>>> as input at-once.
>>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>>
>>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>>
>>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>>> to
>>>>>>>> > permission issue.
>>>>>>>> > In my opinion, once the decision tree model is built, the model
>>>>>>>> is small
>>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>>> another
>>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>>> streaming way.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >
>>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>>> >>
>>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>>> >>
>>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> You are welcome :)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>
>>>>>>>> >>>> ok . Thx Yexi
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>>> yexijiang@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>>> currently,
>>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>>> >>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>>> >>>>>> It includes prediction also?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> You can directly find it at
>>>>>>>> https://github.com/apache/mahout, or you
>>>>>>>> >>>>>>> can check out from svn by following
>>>>>>>> >>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>>> mahout.
>>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>>> encouraged
>>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>>> regression
>>>>>>>> >>>>>>>> problems
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Should I download mahout
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> ------
>>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>>> >>>>>>> Florida International University
>>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> *Junior Developer*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> ------
>>>>>>>> >>>>> Yexi Jiang,
>>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>>> >>>>> Florida International University
>>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> *Thanks & Regards*
>>>>>>>> >>>>
>>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>
>>>>>>>> >>>> *Junior Developer*
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> ------
>>>>>>>> >>> Yexi Jiang,
>>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>> School of Computer and Information Science,
>>>>>>>> >>> Florida International University
>>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> *Thanks & Regards*
>>>>>>>> >>
>>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>>> >>
>>>>>>>> >> *Junior Developer*
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > ------
>>>>>>>> > Yexi Jiang,
>>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> > School of Computer and Information Science,
>>>>>>>> > Florida International University
>>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Yes, the user is responsible for using the correct model for a given piece
of testing (or unlabeled) data.


2013/12/2 unmesha sreeveni <un...@gmail.com>

> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
> Ok , I will go for the second one.
> So if we are going for separate.They will not have any connection with
> both. So we should tell what test data belongs to which train data.
> And load the corresponding playtennnis_tree.txt (so the result file should
> be named in a manner that the training result name can be noticed by its
> file name) for the train data and predict the test data.
>
>
> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> Actually the training and testing (or prediction) are not necessary to be
>> done in one shot. If you need to do them consecutively in your particular
>> scenario, you can do it as what you said.
>>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> 1. I jst thought of building a model using a project named say DT and
>>> wen a huge input comes do another mr job test.java with in DT.
>>> If not chaining jobs we need to create seperate project right DT_build
>>> and DT_test projects
>>> NO need for seperate project file?
>>>
>>> 2. M1_train - dataset for training.
>>>
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Any thing wrong in my inference...
>>> Are u able to guess wt i am trying to accomplish.
>>> I am confused if i need to create only 1 project that includes train and
>>> test.or 2 projects
>>>
>>>
>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> What is your motivation of using chaining jobs?
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>> Explained in a very simple way which is really understandable for
>>>>> beginners..Thanks a lot.
>>>>> I can go for chaining jobs right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> In my opinion.
>>>>>>
>>>>>> 1. Build the decision tree model with the training data.
>>>>>> 2. Store it somewhere.
>>>>>> 3. When the unlabeled data is available:
>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>> Thanks Yexi ,
>>>>>>>
>>>>>>> But how  it can be accomplished.
>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>> right?
>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> M1_train - dataset for training.
>>>>>>> M1_test - test data or prediction.
>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>> as input at-once.
>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>
>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>
>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>> to
>>>>>>> > permission issue.
>>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>>> small
>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>> another
>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>> streaming way.
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>> >
>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>> >>
>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>> >>
>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> You are welcome :)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>
>>>>>>> >>>> ok . Thx Yexi
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>> yexijiang@gmail.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>> currently,
>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>> >>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>
>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>> >>>>>> It includes prediction also?
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>>> or you
>>>>>>> >>>>>>> can check out from svn by following
>>>>>>> >>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>> mahout.
>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>> encouraged
>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>> regression
>>>>>>> >>>>>>>> problems
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Should I download mahout
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> ------
>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>> >>>>>>> Florida International University
>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>> >>>>>>
>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>
>>>>>>> >>>>>> *Junior Developer*
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> ------
>>>>>>> >>>>> Yexi Jiang,
>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>> >>>>> Florida International University
>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> *Thanks & Regards*
>>>>>>> >>>>
>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>> >>>>
>>>>>>> >>>> *Junior Developer*
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> ------
>>>>>>> >>> Yexi Jiang,
>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>> School of Computer and Information Science,
>>>>>>> >>> Florida International University
>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> *Thanks & Regards*
>>>>>>> >>
>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>> >>
>>>>>>> >> *Junior Developer*
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > ------
>>>>>>> > Yexi Jiang,
>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>> > School of Computer and Information Science,
>>>>>>> > Florida International University
>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Yes, the user is responsible for using the correct model for a given piece
of testing (or unlabeled) data.


2013/12/2 unmesha sreeveni <un...@gmail.com>

> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
> Ok , I will go for the second one.
> So if we are going for separate.They will not have any connection with
> both. So we should tell what test data belongs to which train data.
> And load the corresponding playtennnis_tree.txt (so the result file should
> be named in a manner that the training result name can be noticed by its
> file name) for the train data and predict the test data.
>
>
> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> Actually the training and testing (or prediction) are not necessary to be
>> done in one shot. If you need to do them consecutively in your particular
>> scenario, you can do it as what you said.
>>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> 1. I jst thought of building a model using a project named say DT and
>>> wen a huge input comes do another mr job test.java with in DT.
>>> If not chaining jobs we need to create seperate project right DT_build
>>> and DT_test projects
>>> NO need for seperate project file?
>>>
>>> 2. M1_train - dataset for training.
>>>
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Any thing wrong in my inference...
>>> Are u able to guess wt i am trying to accomplish.
>>> I am confused if i need to create only 1 project that includes train and
>>> test.or 2 projects
>>>
>>>
>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> What is your motivation of using chaining jobs?
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>> Explained in a very simple way which is really understandable for
>>>>> beginners..Thanks a lot.
>>>>> I can go for chaining jobs right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> In my opinion.
>>>>>>
>>>>>> 1. Build the decision tree model with the training data.
>>>>>> 2. Store it somewhere.
>>>>>> 3. When the unlabeled data is available:
>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>> Thanks Yexi ,
>>>>>>>
>>>>>>> But how  it can be accomplished.
>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>> right?
>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> M1_train - dataset for training.
>>>>>>> M1_test - test data or prediction.
>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>> as input at-once.
>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>
>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>
>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>> to
>>>>>>> > permission issue.
>>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>>> small
>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>> another
>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>> streaming way.
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>> >
>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>> >>
>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>> >>
>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> You are welcome :)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>
>>>>>>> >>>> ok . Thx Yexi
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>> yexijiang@gmail.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>> currently,
>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>> >>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>
>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>> >>>>>> It includes prediction also?
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>>> or you
>>>>>>> >>>>>>> can check out from svn by following
>>>>>>> >>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>> mahout.
>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>> encouraged
>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>> regression
>>>>>>> >>>>>>>> problems
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Should I download mahout
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> ------
>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>> >>>>>>> Florida International University
>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>> >>>>>>
>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>
>>>>>>> >>>>>> *Junior Developer*
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> ------
>>>>>>> >>>>> Yexi Jiang,
>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>> >>>>> Florida International University
>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> *Thanks & Regards*
>>>>>>> >>>>
>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>> >>>>
>>>>>>> >>>> *Junior Developer*
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> ------
>>>>>>> >>> Yexi Jiang,
>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>> School of Computer and Information Science,
>>>>>>> >>> Florida International University
>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> *Thanks & Regards*
>>>>>>> >>
>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>> >>
>>>>>>> >> *Junior Developer*
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > ------
>>>>>>> > Yexi Jiang,
>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>> > School of Computer and Information Science,
>>>>>>> > Florida International University
>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Yes, the user is responsible for using the correct model for a given piece
of testing (or unlabeled) data.


2013/12/2 unmesha sreeveni <un...@gmail.com>

> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
> Ok , I will go for the second one.
> So if we are going for separate.They will not have any connection with
> both. So we should tell what test data belongs to which train data.
> And load the corresponding playtennnis_tree.txt (so the result file should
> be named in a manner that the training result name can be noticed by its
> file name) for the train data and predict the test data.
>
>
> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> Actually the training and testing (or prediction) are not necessary to be
>> done in one shot. If you need to do them consecutively in your particular
>> scenario, you can do it as what you said.
>>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> 1. I jst thought of building a model using a project named say DT and
>>> wen a huge input comes do another mr job test.java with in DT.
>>> If not chaining jobs we need to create seperate project right DT_build
>>> and DT_test projects
>>> NO need for seperate project file?
>>>
>>> 2. M1_train - dataset for training.
>>>
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Any thing wrong in my inference...
>>> Are u able to guess wt i am trying to accomplish.
>>> I am confused if i need to create only 1 project that includes train and
>>> test.or 2 projects
>>>
>>>
>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> What is your motivation of using chaining jobs?
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>> Explained in a very simple way which is really understandable for
>>>>> beginners..Thanks a lot.
>>>>> I can go for chaining jobs right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> In my opinion.
>>>>>>
>>>>>> 1. Build the decision tree model with the training data.
>>>>>> 2. Store it somewhere.
>>>>>> 3. When the unlabeled data is available:
>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>> Thanks Yexi ,
>>>>>>>
>>>>>>> But how  it can be accomplished.
>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>> right?
>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> M1_train - dataset for training.
>>>>>>> M1_test - test data or prediction.
>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>> as input at-once.
>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>
>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>
>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>> to
>>>>>>> > permission issue.
>>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>>> small
>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>> another
>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>> streaming way.
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>> >
>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>> >>
>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>> >>
>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> You are welcome :)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>
>>>>>>> >>>> ok . Thx Yexi
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>> yexijiang@gmail.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>> currently,
>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>> >>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>
>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>> >>>>>> It includes prediction also?
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>>> or you
>>>>>>> >>>>>>> can check out from svn by following
>>>>>>> >>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>> mahout.
>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>> encouraged
>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>> regression
>>>>>>> >>>>>>>> problems
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Should I download mahout
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> ------
>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>> >>>>>>> Florida International University
>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>> >>>>>>
>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>
>>>>>>> >>>>>> *Junior Developer*
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> ------
>>>>>>> >>>>> Yexi Jiang,
>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>> >>>>> Florida International University
>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> *Thanks & Regards*
>>>>>>> >>>>
>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>> >>>>
>>>>>>> >>>> *Junior Developer*
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> ------
>>>>>>> >>> Yexi Jiang,
>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>> School of Computer and Information Science,
>>>>>>> >>> Florida International University
>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> *Thanks & Regards*
>>>>>>> >>
>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>> >>
>>>>>>> >> *Junior Developer*
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > ------
>>>>>>> > Yexi Jiang,
>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>> > School of Computer and Information Science,
>>>>>>> > Florida International University
>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Yes, the user is responsible for using the correct model for a given piece
of testing (or unlabeled) data.


2013/12/2 unmesha sreeveni <un...@gmail.com>

> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
> Ok , I will go for the second one.
> So if we are going for separate.They will not have any connection with
> both. So we should tell what test data belongs to which train data.
> And load the corresponding playtennnis_tree.txt (so the result file should
> be named in a manner that the training result name can be noticed by its
> file name) for the train data and predict the test data.
>
>
> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> Actually the training and testing (or prediction) are not necessary to be
>> done in one shot. If you need to do them consecutively in your particular
>> scenario, you can do it as what you said.
>>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> 1. I jst thought of building a model using a project named say DT and
>>> wen a huge input comes do another mr job test.java with in DT.
>>> If not chaining jobs we need to create seperate project right DT_build
>>> and DT_test projects
>>> NO need for seperate project file?
>>>
>>> 2. M1_train - dataset for training.
>>>
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Any thing wrong in my inference...
>>> Are u able to guess wt i am trying to accomplish.
>>> I am confused if i need to create only 1 project that includes train and
>>> test.or 2 projects
>>>
>>>
>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> What is your motivation of using chaining jobs?
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>> Explained in a very simple way which is really understandable for
>>>>> beginners..Thanks a lot.
>>>>> I can go for chaining jobs right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> In my opinion.
>>>>>>
>>>>>> 1. Build the decision tree model with the training data.
>>>>>> 2. Store it somewhere.
>>>>>> 3. When the unlabeled data is available:
>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>>> them, load the model at the setup stage, use the model to label the data
>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>> Thanks Yexi ,
>>>>>>>
>>>>>>> But how  it can be accomplished.
>>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>> right?
>>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> M1_train - dataset for training.
>>>>>>> M1_test - test data or prediction.
>>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>>> as input at-once.
>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>
>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>
>>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>>> > I watched the video in it but I cannot access its source code due
>>>>>>> to
>>>>>>> > permission issue.
>>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>>> small
>>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>>> another
>>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>>> streaming way.
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>>> >
>>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>>> >>
>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>> >>
>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> You are welcome :)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>
>>>>>>> >>>> ok . Thx Yexi
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <
>>>>>>> yexijiang@gmail.com>
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>>> currently,
>>>>>>> >>>>> but you can use the decision forest instead.
>>>>>>> >>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>
>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>> >>>>>> It includes prediction also?
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>>> or you
>>>>>>> >>>>>>> can check out from svn by following
>>>>>>> >>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>>  I want to go through Decision tree implementation in
>>>>>>> mahout.
>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>>> encouraged
>>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>>> regression
>>>>>>> >>>>>>>> problems
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Should I download mahout
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> ------
>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>>> >>>>>>> Florida International University
>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>> >>>>>>
>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>> >>>>>>
>>>>>>> >>>>>> *Junior Developer*
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> ------
>>>>>>> >>>>> Yexi Jiang,
>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>> >>>>> Florida International University
>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> *Thanks & Regards*
>>>>>>> >>>>
>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>> >>>>
>>>>>>> >>>> *Junior Developer*
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> ------
>>>>>>> >>> Yexi Jiang,
>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> >>> School of Computer and Information Science,
>>>>>>> >>> Florida International University
>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> *Thanks & Regards*
>>>>>>> >>
>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>> >>
>>>>>>> >> *Junior Developer*
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > ------
>>>>>>> > Yexi Jiang,
>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>> > School of Computer and Information Science,
>>>>>>> > Florida International University
>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).

Ok , I will go for the second one.
So if we are going for separate.They will not have any connection with
both. So we should tell what test data belongs to which train data.
And load the corresponding playtennnis_tree.txt (so the result file should
be named in a manner that the training result name can be noticed by its
file name) for the train data and predict the test data.


On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:

> Actually the training and testing (or prediction) are not necessary to be
> done in one shot. If you need to do them consecutively in your particular
> scenario, you can do it as what you said.
>
> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> 1. I jst thought of building a model using a project named say DT and wen
>> a huge input comes do another mr job test.java with in DT.
>> If not chaining jobs we need to create seperate project right DT_build
>> and DT_test projects
>> NO need for seperate project file?
>>
>> 2. M1_train - dataset for training.
>>
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Any thing wrong in my inference...
>> Are u able to guess wt i am trying to accomplish.
>> I am confused if i need to create only 1 project that includes train and
>> test.or 2 projects
>>
>>
>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> What is your motivation of using chaining jobs?
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>> Explained in a very simple way which is really understandable for
>>>> beginners..Thanks a lot.
>>>> I can go for chaining jobs right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> In my opinion.
>>>>>
>>>>> 1. Build the decision tree model with the training data.
>>>>> 2. Store it somewhere.
>>>>> 3. When the unlabeled data is available:
>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>> them, load the model at the setup stage, use the model to label the data
>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi ,
>>>>>>
>>>>>> But how  it can be accomplished.
>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>> predicting a data it will be a one line data without classlabel right?
>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>
>>>>>> -------
>>>>>>
>>>>>> M1_train - dataset for training.
>>>>>> M1_test - test data or prediction.
>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>> as input at-once.
>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>
>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>
>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>> > I watched the video in it but I cannot access its source code due to
>>>>>> > permission issue.
>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>> small
>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>> another
>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>> streaming way.
>>>>>> >
>>>>>> >
>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>> >
>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>> >>
>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>> >>
>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>> >> Can we also include the prediction along with  that?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> You are welcome :)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>
>>>>>> >>>> ok . Thx Yexi
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com
>>>>>> >
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>> currently,
>>>>>> >>>>> but you can use the decision forest instead.
>>>>>> >>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>
>>>>>> >>>>>> Is that ID3 classification?
>>>>>> >>>>>> It includes prediction also?
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>> or you
>>>>>> >>>>>>> can check out from svn by following
>>>>>> >>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>>>
>>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>> encouraged
>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>> regression
>>>>>> >>>>>>>> problems
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Should I download mahout
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> --
>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> *Junior Developer*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> ------
>>>>>> >>>>>>> Yexi Jiang,
>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>> >>>>>>> Florida International University
>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> *Thanks & Regards*
>>>>>> >>>>>>
>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>
>>>>>> >>>>>> *Junior Developer*
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> ------
>>>>>> >>>>> Yexi Jiang,
>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>> School of Computer and Information Science,
>>>>>> >>>>> Florida International University
>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> *Thanks & Regards*
>>>>>> >>>>
>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>> >>>>
>>>>>> >>>> *Junior Developer*
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> ------
>>>>>> >>> Yexi Jiang,
>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>> School of Computer and Information Science,
>>>>>> >>> Florida International University
>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> *Thanks & Regards*
>>>>>> >>
>>>>>> >> Unmesha Sreeveni U.B
>>>>>> >>
>>>>>> >> *Junior Developer*
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > ------
>>>>>> > Yexi Jiang,
>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>> > School of Computer and Information Science,
>>>>>> > Florida International University
>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).

Ok , I will go for the second one.
So if we are going for separate.They will not have any connection with
both. So we should tell what test data belongs to which train data.
And load the corresponding playtennnis_tree.txt (so the result file should
be named in a manner that the training result name can be noticed by its
file name) for the train data and predict the test data.


On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:

> Actually the training and testing (or prediction) are not necessary to be
> done in one shot. If you need to do them consecutively in your particular
> scenario, you can do it as what you said.
>
> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> 1. I jst thought of building a model using a project named say DT and wen
>> a huge input comes do another mr job test.java with in DT.
>> If not chaining jobs we need to create seperate project right DT_build
>> and DT_test projects
>> NO need for seperate project file?
>>
>> 2. M1_train - dataset for training.
>>
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Any thing wrong in my inference...
>> Are u able to guess wt i am trying to accomplish.
>> I am confused if i need to create only 1 project that includes train and
>> test.or 2 projects
>>
>>
>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> What is your motivation of using chaining jobs?
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>> Explained in a very simple way which is really understandable for
>>>> beginners..Thanks a lot.
>>>> I can go for chaining jobs right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> In my opinion.
>>>>>
>>>>> 1. Build the decision tree model with the training data.
>>>>> 2. Store it somewhere.
>>>>> 3. When the unlabeled data is available:
>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>> them, load the model at the setup stage, use the model to label the data
>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi ,
>>>>>>
>>>>>> But how  it can be accomplished.
>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>> predicting a data it will be a one line data without classlabel right?
>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>
>>>>>> -------
>>>>>>
>>>>>> M1_train - dataset for training.
>>>>>> M1_test - test data or prediction.
>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>> as input at-once.
>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>
>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>
>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>> > I watched the video in it but I cannot access its source code due to
>>>>>> > permission issue.
>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>> small
>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>> another
>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>> streaming way.
>>>>>> >
>>>>>> >
>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>> >
>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>> >>
>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>> >>
>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>> >> Can we also include the prediction along with  that?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> You are welcome :)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>
>>>>>> >>>> ok . Thx Yexi
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com
>>>>>> >
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>> currently,
>>>>>> >>>>> but you can use the decision forest instead.
>>>>>> >>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>
>>>>>> >>>>>> Is that ID3 classification?
>>>>>> >>>>>> It includes prediction also?
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>> or you
>>>>>> >>>>>>> can check out from svn by following
>>>>>> >>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>>>
>>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>> encouraged
>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>> regression
>>>>>> >>>>>>>> problems
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Should I download mahout
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> --
>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> *Junior Developer*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> ------
>>>>>> >>>>>>> Yexi Jiang,
>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>> >>>>>>> Florida International University
>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> *Thanks & Regards*
>>>>>> >>>>>>
>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>
>>>>>> >>>>>> *Junior Developer*
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> ------
>>>>>> >>>>> Yexi Jiang,
>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>> School of Computer and Information Science,
>>>>>> >>>>> Florida International University
>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> *Thanks & Regards*
>>>>>> >>>>
>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>> >>>>
>>>>>> >>>> *Junior Developer*
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> ------
>>>>>> >>> Yexi Jiang,
>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>> School of Computer and Information Science,
>>>>>> >>> Florida International University
>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> *Thanks & Regards*
>>>>>> >>
>>>>>> >> Unmesha Sreeveni U.B
>>>>>> >>
>>>>>> >> *Junior Developer*
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > ------
>>>>>> > Yexi Jiang,
>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>> > School of Computer and Information Science,
>>>>>> > Florida International University
>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).

Ok , I will go for the second one.
So if we are going for separate.They will not have any connection with
both. So we should tell what test data belongs to which train data.
And load the corresponding playtennnis_tree.txt (so the result file should
be named in a manner that the training result name can be noticed by its
file name) for the train data and predict the test data.


On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:

> Actually the training and testing (or prediction) are not necessary to be
> done in one shot. If you need to do them consecutively in your particular
> scenario, you can do it as what you said.
>
> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> 1. I jst thought of building a model using a project named say DT and wen
>> a huge input comes do another mr job test.java with in DT.
>> If not chaining jobs we need to create seperate project right DT_build
>> and DT_test projects
>> NO need for seperate project file?
>>
>> 2. M1_train - dataset for training.
>>
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Any thing wrong in my inference...
>> Are u able to guess wt i am trying to accomplish.
>> I am confused if i need to create only 1 project that includes train and
>> test.or 2 projects
>>
>>
>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> What is your motivation of using chaining jobs?
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>> Explained in a very simple way which is really understandable for
>>>> beginners..Thanks a lot.
>>>> I can go for chaining jobs right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> In my opinion.
>>>>>
>>>>> 1. Build the decision tree model with the training data.
>>>>> 2. Store it somewhere.
>>>>> 3. When the unlabeled data is available:
>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>> them, load the model at the setup stage, use the model to label the data
>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi ,
>>>>>>
>>>>>> But how  it can be accomplished.
>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>> predicting a data it will be a one line data without classlabel right?
>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>
>>>>>> -------
>>>>>>
>>>>>> M1_train - dataset for training.
>>>>>> M1_test - test data or prediction.
>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>> as input at-once.
>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>
>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>
>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>> > I watched the video in it but I cannot access its source code due to
>>>>>> > permission issue.
>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>> small
>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>> another
>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>> streaming way.
>>>>>> >
>>>>>> >
>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>> >
>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>> >>
>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>> >>
>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>> >> Can we also include the prediction along with  that?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> You are welcome :)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>
>>>>>> >>>> ok . Thx Yexi
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com
>>>>>> >
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>> currently,
>>>>>> >>>>> but you can use the decision forest instead.
>>>>>> >>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>
>>>>>> >>>>>> Is that ID3 classification?
>>>>>> >>>>>> It includes prediction also?
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>> or you
>>>>>> >>>>>>> can check out from svn by following
>>>>>> >>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>>>
>>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>> encouraged
>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>> regression
>>>>>> >>>>>>>> problems
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Should I download mahout
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> --
>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> *Junior Developer*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> ------
>>>>>> >>>>>>> Yexi Jiang,
>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>> >>>>>>> Florida International University
>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> *Thanks & Regards*
>>>>>> >>>>>>
>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>
>>>>>> >>>>>> *Junior Developer*
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> ------
>>>>>> >>>>> Yexi Jiang,
>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>> School of Computer and Information Science,
>>>>>> >>>>> Florida International University
>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> *Thanks & Regards*
>>>>>> >>>>
>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>> >>>>
>>>>>> >>>> *Junior Developer*
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> ------
>>>>>> >>> Yexi Jiang,
>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>> School of Computer and Information Science,
>>>>>> >>> Florida International University
>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> *Thanks & Regards*
>>>>>> >>
>>>>>> >> Unmesha Sreeveni U.B
>>>>>> >>
>>>>>> >> *Junior Developer*
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > ------
>>>>>> > Yexi Jiang,
>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>> > School of Computer and Information Science,
>>>>>> > Florida International University
>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).

Ok , I will go for the second one.
So if we are going for separate.They will not have any connection with
both. So we should tell what test data belongs to which train data.
And load the corresponding playtennnis_tree.txt (so the result file should
be named in a manner that the training result name can be noticed by its
file name) for the train data and predict the test data.


On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <ye...@gmail.com> wrote:

> Actually the training and testing (or prediction) are not necessary to be
> done in one shot. If you need to do them consecutively in your particular
> scenario, you can do it as what you said.
>
> To make it more general, it's better to separate them. Since there might
> be multiple batches of training (or to-be-label), and you only need to
> train the model once (if your data is stable).
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> 1. I jst thought of building a model using a project named say DT and wen
>> a huge input comes do another mr job test.java with in DT.
>> If not chaining jobs we need to create seperate project right DT_build
>> and DT_test projects
>> NO need for seperate project file?
>>
>> 2. M1_train - dataset for training.
>>
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Any thing wrong in my inference...
>> Are u able to guess wt i am trying to accomplish.
>> I am confused if i need to create only 1 project that includes train and
>> test.or 2 projects
>>
>>
>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> What is your motivation of using chaining jobs?
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>> Explained in a very simple way which is really understandable for
>>>> beginners..Thanks a lot.
>>>> I can go for chaining jobs right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>>
>>>>> In my opinion.
>>>>>
>>>>> 1. Build the decision tree model with the training data.
>>>>> 2. Store it somewhere.
>>>>> 3. When the unlabeled data is available:
>>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>>> them, load the model at the setup stage, use the model to label the data
>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Thanks Yexi ,
>>>>>>
>>>>>> But how  it can be accomplished.
>>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>>> predicting a data it will be a one line data without classlabel right?
>>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>> 2. else if a one line data is coming.check the output of decision
>>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>>
>>>>>> -------
>>>>>>
>>>>>> M1_train - dataset for training.
>>>>>> M1_test - test data or prediction.
>>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>>> as input at-once.
>>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>
>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>
>>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>>> > I watched the video in it but I cannot access its source code due to
>>>>>> > permission issue.
>>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>>> small
>>>>>> > enough to be loaded into memory and can be used directly without
>>>>>> another
>>>>>> > mrjob for prediction. The prediction can be conducted in a
>>>>>> streaming way.
>>>>>> >
>>>>>> >
>>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>>> >
>>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>>> >>
>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>> >>
>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>> >> Can we also include the prediction along with  that?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> You are welcome :)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>
>>>>>> >>>> ok . Thx Yexi
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com
>>>>>> >
>>>>>> >>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>>> currently,
>>>>>> >>>>> but you can use the decision forest instead.
>>>>>> >>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>
>>>>>> >>>>>> Is that ID3 classification?
>>>>>> >>>>>> It includes prediction also?
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>>> or you
>>>>>> >>>>>>> can check out from svn by following
>>>>>> >>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>> >>>>>>>
>>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>>> encouraged
>>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>>> regression
>>>>>> >>>>>>>> problems
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Should I download mahout
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> --
>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> *Junior Developer*
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> ------
>>>>>> >>>>>>> Yexi Jiang,
>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>>>> School of Computer and Information Science,
>>>>>> >>>>>>> Florida International University
>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> *Thanks & Regards*
>>>>>> >>>>>>
>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>> >>>>>>
>>>>>> >>>>>> *Junior Developer*
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> ------
>>>>>> >>>>> Yexi Jiang,
>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>>>> School of Computer and Information Science,
>>>>>> >>>>> Florida International University
>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> *Thanks & Regards*
>>>>>> >>>>
>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>> >>>>
>>>>>> >>>> *Junior Developer*
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> ------
>>>>>> >>> Yexi Jiang,
>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> >>> School of Computer and Information Science,
>>>>>> >>> Florida International University
>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> *Thanks & Regards*
>>>>>> >>
>>>>>> >> Unmesha Sreeveni U.B
>>>>>> >>
>>>>>> >> *Junior Developer*
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > ------
>>>>>> > Yexi Jiang,
>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>> > School of Computer and Information Science,
>>>>>> > Florida International University
>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Actually the training and testing (or prediction) are not necessary to be
done in one shot. If you need to do them consecutively in your particular
scenario, you can do it as what you said.

To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).


2013/12/1 unmesha sreeveni <un...@gmail.com>

> 1. I jst thought of building a model using a project named say DT and wen
> a huge input comes do another mr job test.java with in DT.
> If not chaining jobs we need to create seperate project right DT_build and
> DT_test projects
> NO need for seperate project file?
>
> 2. M1_train - dataset for training.
>
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Any thing wrong in my inference...
> Are u able to guess wt i am trying to accomplish.
> I am confused if i need to create only 1 project that includes train and
> test.or 2 projects
>
>
> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> What is your motivation of using chaining jobs?
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>> Explained in a very simple way which is really understandable for
>>> beginners..Thanks a lot.
>>> I can go for chaining jobs right?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> In my opinion.
>>>>
>>>> 1. Build the decision tree model with the training data.
>>>> 2. Store it somewhere.
>>>> 3. When the unlabeled data is available:
>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>> them, load the model at the setup stage, use the model to label the data
>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi ,
>>>>>
>>>>> But how  it can be accomplished.
>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>> predicting a data it will be a one line data without classlabel right?
>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>> 1. When a set of data is coming draw Desicion tree
>>>>> 2. else if a one line data is coming.check the output of decision
>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>
>>>>> -------
>>>>>
>>>>> M1_train - dataset for training.
>>>>> M1_test - test data or prediction.
>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>> as input at-once.
>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>> M2_train it should show error. is nt 'it?.
>>>>>
>>>>> Pls suggest if my thoughts are wrong.
>>>>>
>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>> > I watched the video in it but I cannot access its source code due to
>>>>> > permission issue.
>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>> small
>>>>> > enough to be loaded into memory and can be used directly without
>>>>> another
>>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>>> way.
>>>>> >
>>>>> >
>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>> >
>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>> >>
>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>> >>
>>>>> >> Here a decision tree is build. So my doubt is
>>>>> >> Can we also include the prediction along with  that?
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> You are welcome :)
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>
>>>>> >>>> ok . Thx Yexi
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>> currently,
>>>>> >>>>> but you can use the decision forest instead.
>>>>> >>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>
>>>>> >>>>>> Is that ID3 classification?
>>>>> >>>>>> It includes prediction also?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>> >>>>>>
>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>> or you
>>>>> >>>>>>> can check out from svn by following
>>>>> >>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>>>
>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>> encouraged
>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>> regression
>>>>> >>>>>>>> problems
>>>>> >>>>>>>>
>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Should I download mahout
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> *Thanks & Regards*
>>>>> >>>>>>>>
>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>>>
>>>>> >>>>>>>> *Junior Developer*
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> ------
>>>>> >>>>>>> Yexi Jiang,
>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>>>> School of Computer and Information Science,
>>>>> >>>>>>> Florida International University
>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> *Thanks & Regards*
>>>>> >>>>>>
>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>
>>>>> >>>>>> *Junior Developer*
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> ------
>>>>> >>>>> Yexi Jiang,
>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>> School of Computer and Information Science,
>>>>> >>>>> Florida International University
>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> *Thanks & Regards*
>>>>> >>>>
>>>>> >>>> Unmesha Sreeveni U.B
>>>>> >>>>
>>>>> >>>> *Junior Developer*
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> ------
>>>>> >>> Yexi Jiang,
>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>> School of Computer and Information Science,
>>>>> >>> Florida International University
>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> *Thanks & Regards*
>>>>> >>
>>>>> >> Unmesha Sreeveni U.B
>>>>> >>
>>>>> >> *Junior Developer*
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ------
>>>>> > Yexi Jiang,
>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>> > School of Computer and Information Science,
>>>>> > Florida International University
>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Actually the training and testing (or prediction) are not necessary to be
done in one shot. If you need to do them consecutively in your particular
scenario, you can do it as what you said.

To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).


2013/12/1 unmesha sreeveni <un...@gmail.com>

> 1. I jst thought of building a model using a project named say DT and wen
> a huge input comes do another mr job test.java with in DT.
> If not chaining jobs we need to create seperate project right DT_build and
> DT_test projects
> NO need for seperate project file?
>
> 2. M1_train - dataset for training.
>
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Any thing wrong in my inference...
> Are u able to guess wt i am trying to accomplish.
> I am confused if i need to create only 1 project that includes train and
> test.or 2 projects
>
>
> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> What is your motivation of using chaining jobs?
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>> Explained in a very simple way which is really understandable for
>>> beginners..Thanks a lot.
>>> I can go for chaining jobs right?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> In my opinion.
>>>>
>>>> 1. Build the decision tree model with the training data.
>>>> 2. Store it somewhere.
>>>> 3. When the unlabeled data is available:
>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>> them, load the model at the setup stage, use the model to label the data
>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi ,
>>>>>
>>>>> But how  it can be accomplished.
>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>> predicting a data it will be a one line data without classlabel right?
>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>> 1. When a set of data is coming draw Desicion tree
>>>>> 2. else if a one line data is coming.check the output of decision
>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>
>>>>> -------
>>>>>
>>>>> M1_train - dataset for training.
>>>>> M1_test - test data or prediction.
>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>> as input at-once.
>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>> M2_train it should show error. is nt 'it?.
>>>>>
>>>>> Pls suggest if my thoughts are wrong.
>>>>>
>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>> > I watched the video in it but I cannot access its source code due to
>>>>> > permission issue.
>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>> small
>>>>> > enough to be loaded into memory and can be used directly without
>>>>> another
>>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>>> way.
>>>>> >
>>>>> >
>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>> >
>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>> >>
>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>> >>
>>>>> >> Here a decision tree is build. So my doubt is
>>>>> >> Can we also include the prediction along with  that?
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> You are welcome :)
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>
>>>>> >>>> ok . Thx Yexi
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>> currently,
>>>>> >>>>> but you can use the decision forest instead.
>>>>> >>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>
>>>>> >>>>>> Is that ID3 classification?
>>>>> >>>>>> It includes prediction also?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>> >>>>>>
>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>> or you
>>>>> >>>>>>> can check out from svn by following
>>>>> >>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>>>
>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>> encouraged
>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>> regression
>>>>> >>>>>>>> problems
>>>>> >>>>>>>>
>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Should I download mahout
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> *Thanks & Regards*
>>>>> >>>>>>>>
>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>>>
>>>>> >>>>>>>> *Junior Developer*
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> ------
>>>>> >>>>>>> Yexi Jiang,
>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>>>> School of Computer and Information Science,
>>>>> >>>>>>> Florida International University
>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> *Thanks & Regards*
>>>>> >>>>>>
>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>
>>>>> >>>>>> *Junior Developer*
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> ------
>>>>> >>>>> Yexi Jiang,
>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>> School of Computer and Information Science,
>>>>> >>>>> Florida International University
>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> *Thanks & Regards*
>>>>> >>>>
>>>>> >>>> Unmesha Sreeveni U.B
>>>>> >>>>
>>>>> >>>> *Junior Developer*
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> ------
>>>>> >>> Yexi Jiang,
>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>> School of Computer and Information Science,
>>>>> >>> Florida International University
>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> *Thanks & Regards*
>>>>> >>
>>>>> >> Unmesha Sreeveni U.B
>>>>> >>
>>>>> >> *Junior Developer*
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ------
>>>>> > Yexi Jiang,
>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>> > School of Computer and Information Science,
>>>>> > Florida International University
>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Actually the training and testing (or prediction) are not necessary to be
done in one shot. If you need to do them consecutively in your particular
scenario, you can do it as what you said.

To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).


2013/12/1 unmesha sreeveni <un...@gmail.com>

> 1. I jst thought of building a model using a project named say DT and wen
> a huge input comes do another mr job test.java with in DT.
> If not chaining jobs we need to create seperate project right DT_build and
> DT_test projects
> NO need for seperate project file?
>
> 2. M1_train - dataset for training.
>
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Any thing wrong in my inference...
> Are u able to guess wt i am trying to accomplish.
> I am confused if i need to create only 1 project that includes train and
> test.or 2 projects
>
>
> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> What is your motivation of using chaining jobs?
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>> Explained in a very simple way which is really understandable for
>>> beginners..Thanks a lot.
>>> I can go for chaining jobs right?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> In my opinion.
>>>>
>>>> 1. Build the decision tree model with the training data.
>>>> 2. Store it somewhere.
>>>> 3. When the unlabeled data is available:
>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>> them, load the model at the setup stage, use the model to label the data
>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi ,
>>>>>
>>>>> But how  it can be accomplished.
>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>> predicting a data it will be a one line data without classlabel right?
>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>> 1. When a set of data is coming draw Desicion tree
>>>>> 2. else if a one line data is coming.check the output of decision
>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>
>>>>> -------
>>>>>
>>>>> M1_train - dataset for training.
>>>>> M1_test - test data or prediction.
>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>> as input at-once.
>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>> M2_train it should show error. is nt 'it?.
>>>>>
>>>>> Pls suggest if my thoughts are wrong.
>>>>>
>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>> > I watched the video in it but I cannot access its source code due to
>>>>> > permission issue.
>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>> small
>>>>> > enough to be loaded into memory and can be used directly without
>>>>> another
>>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>>> way.
>>>>> >
>>>>> >
>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>> >
>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>> >>
>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>> >>
>>>>> >> Here a decision tree is build. So my doubt is
>>>>> >> Can we also include the prediction along with  that?
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> You are welcome :)
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>
>>>>> >>>> ok . Thx Yexi
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>> currently,
>>>>> >>>>> but you can use the decision forest instead.
>>>>> >>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>
>>>>> >>>>>> Is that ID3 classification?
>>>>> >>>>>> It includes prediction also?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>> >>>>>>
>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>> or you
>>>>> >>>>>>> can check out from svn by following
>>>>> >>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>>>
>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>> encouraged
>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>> regression
>>>>> >>>>>>>> problems
>>>>> >>>>>>>>
>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Should I download mahout
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> *Thanks & Regards*
>>>>> >>>>>>>>
>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>>>
>>>>> >>>>>>>> *Junior Developer*
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> ------
>>>>> >>>>>>> Yexi Jiang,
>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>>>> School of Computer and Information Science,
>>>>> >>>>>>> Florida International University
>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> *Thanks & Regards*
>>>>> >>>>>>
>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>
>>>>> >>>>>> *Junior Developer*
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> ------
>>>>> >>>>> Yexi Jiang,
>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>> School of Computer and Information Science,
>>>>> >>>>> Florida International University
>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> *Thanks & Regards*
>>>>> >>>>
>>>>> >>>> Unmesha Sreeveni U.B
>>>>> >>>>
>>>>> >>>> *Junior Developer*
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> ------
>>>>> >>> Yexi Jiang,
>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>> School of Computer and Information Science,
>>>>> >>> Florida International University
>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> *Thanks & Regards*
>>>>> >>
>>>>> >> Unmesha Sreeveni U.B
>>>>> >>
>>>>> >> *Junior Developer*
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ------
>>>>> > Yexi Jiang,
>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>> > School of Computer and Information Science,
>>>>> > Florida International University
>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
Actually the training and testing (or prediction) are not necessary to be
done in one shot. If you need to do them consecutively in your particular
scenario, you can do it as what you said.

To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).


2013/12/1 unmesha sreeveni <un...@gmail.com>

> 1. I jst thought of building a model using a project named say DT and wen
> a huge input comes do another mr job test.java with in DT.
> If not chaining jobs we need to create seperate project right DT_build and
> DT_test projects
> NO need for seperate project file?
>
> 2. M1_train - dataset for training.
>
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Any thing wrong in my inference...
> Are u able to guess wt i am trying to accomplish.
> I am confused if i need to create only 1 project that includes train and
> test.or 2 projects
>
>
> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> What is your motivation of using chaining jobs?
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>> Explained in a very simple way which is really understandable for
>>> beginners..Thanks a lot.
>>> I can go for chaining jobs right?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> In my opinion.
>>>>
>>>> 1. Build the decision tree model with the training data.
>>>> 2. Store it somewhere.
>>>> 3. When the unlabeled data is available:
>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>> them, load the model at the setup stage, use the model to label the data
>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Thanks Yexi ,
>>>>>
>>>>> But how  it can be accomplished.
>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>> predicting a data it will be a one line data without classlabel right?
>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>> 1. When a set of data is coming draw Desicion tree
>>>>> 2. else if a one line data is coming.check the output of decision
>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>
>>>>> -------
>>>>>
>>>>> M1_train - dataset for training.
>>>>> M1_test - test data or prediction.
>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>> as input at-once.
>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>> M2_train it should show error. is nt 'it?.
>>>>>
>>>>> Pls suggest if my thoughts are wrong.
>>>>>
>>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>>> > I watched the video in it but I cannot access its source code due to
>>>>> > permission issue.
>>>>> > In my opinion, once the decision tree model is built, the model is
>>>>> small
>>>>> > enough to be loaded into memory and can be used directly without
>>>>> another
>>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>>> way.
>>>>> >
>>>>> >
>>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>>> >
>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>> >>
>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>> >>
>>>>> >> Here a decision tree is build. So my doubt is
>>>>> >> Can we also include the prediction along with  that?
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> You are welcome :)
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>
>>>>> >>>> ok . Thx Yexi
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>>> currently,
>>>>> >>>>> but you can use the decision forest instead.
>>>>> >>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>
>>>>> >>>>>> Is that ID3 classification?
>>>>> >>>>>> It includes prediction also?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>> >>>>>> <ye...@gmail.com>wrote:
>>>>> >>>>>>
>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>> or you
>>>>> >>>>>>> can check out from svn by following
>>>>> >>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>> >>>>>>>
>>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>>> encouraged
>>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>>> regression
>>>>> >>>>>>>> problems
>>>>> >>>>>>>>
>>>>> >>>>>>>> Where can I find its source code and documentation.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Should I download mahout
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> *Thanks & Regards*
>>>>> >>>>>>>>
>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>>>
>>>>> >>>>>>>> *Junior Developer*
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> ------
>>>>> >>>>>>> Yexi Jiang,
>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>>>> School of Computer and Information Science,
>>>>> >>>>>>> Florida International University
>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> *Thanks & Regards*
>>>>> >>>>>>
>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>
>>>>> >>>>>> *Junior Developer*
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> ------
>>>>> >>>>> Yexi Jiang,
>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>> School of Computer and Information Science,
>>>>> >>>>> Florida International University
>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> *Thanks & Regards*
>>>>> >>>>
>>>>> >>>> Unmesha Sreeveni U.B
>>>>> >>>>
>>>>> >>>> *Junior Developer*
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> ------
>>>>> >>> Yexi Jiang,
>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>> School of Computer and Information Science,
>>>>> >>> Florida International University
>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> *Thanks & Regards*
>>>>> >>
>>>>> >> Unmesha Sreeveni U.B
>>>>> >>
>>>>> >> *Junior Developer*
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ------
>>>>> > Yexi Jiang,
>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>> > School of Computer and Information Science,
>>>>> > Florida International University
>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
1. I jst thought of building a model using a project named say DT and wen a
huge input comes do another mr job test.java with in DT.
If not chaining jobs we need to create seperate project right DT_build and
DT_test projects
NO need for seperate project file?

2. M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need to create only 1 project that includes train and
test.or 2 projects


On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:

> What is your motivation of using chaining jobs?
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi...A very nice explanation...Thanks a lot..
>> Explained in a very simple way which is really understandable for
>> beginners..Thanks a lot.
>> I can go for chaining jobs right?
>>
>>
>>
>>
>>
>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> In my opinion.
>>>
>>> 1. Build the decision tree model with the training data.
>>> 2. Store it somewhere.
>>> 3. When the unlabeled data is available:
>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>> them, load the model at the setup stage, use the model to label the data
>>> one by one in map stage. There is no necessary to have a reducer.
>>>   3.2 if the unlabeled data is small, it is trivial.
>>>
>>>
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi ,
>>>>
>>>> But how  it can be accomplished.
>>>> The input to Desicion Tree MR will be a set of data. But while
>>>> predicting a data it will be a one line data without classlabel right?
>>>> So what changes will be there in mrjob.Should we design like this.
>>>> 1. When a set of data is coming draw Desicion tree
>>>> 2. else if a one line data is coming.check the output of decision
>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>
>>>> -------
>>>>
>>>> M1_train - dataset for training.
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Pls suggest if my thoughts are wrong.
>>>>
>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>> > I watched the video in it but I cannot access its source code due to
>>>> > permission issue.
>>>> > In my opinion, once the decision tree model is built, the model is
>>>> small
>>>> > enough to be loaded into memory and can be used directly without
>>>> another
>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>> way.
>>>> >
>>>> >
>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>> >
>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>> >>
>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>> >>
>>>> >> Here a decision tree is build. So my doubt is
>>>> >> Can we also include the prediction along with  that?
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> You are welcome :)
>>>> >>>
>>>> >>>
>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>
>>>> >>>> ok . Thx Yexi
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>> currently,
>>>> >>>>> but you can use the decision forest instead.
>>>> >>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>
>>>> >>>>>> Is that ID3 classification?
>>>> >>>>>> It includes prediction also?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>> >>>>>> <ye...@gmail.com>wrote:
>>>> >>>>>>
>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>> or you
>>>> >>>>>>> can check out from svn by following
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>>>
>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>> >>>>>>>>
>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>> encouraged
>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>> regression
>>>> >>>>>>>> problems
>>>> >>>>>>>>
>>>> >>>>>>>> Where can I find its source code and documentation.
>>>> >>>>>>>>
>>>> >>>>>>>> Should I download mahout
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> *Thanks & Regards*
>>>> >>>>>>>>
>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>>>
>>>> >>>>>>>> *Junior Developer*
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ------
>>>> >>>>>>> Yexi Jiang,
>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>>>> School of Computer and Information Science,
>>>> >>>>>>> Florida International University
>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> *Thanks & Regards*
>>>> >>>>>>
>>>> >>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>
>>>> >>>>>> *Junior Developer*
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> ------
>>>> >>>>> Yexi Jiang,
>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>> School of Computer and Information Science,
>>>> >>>>> Florida International University
>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> *Thanks & Regards*
>>>> >>>>
>>>> >>>> Unmesha Sreeveni U.B
>>>> >>>>
>>>> >>>> *Junior Developer*
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> ------
>>>> >>> Yexi Jiang,
>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>> School of Computer and Information Science,
>>>> >>> Florida International University
>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> *Thanks & Regards*
>>>> >>
>>>> >> Unmesha Sreeveni U.B
>>>> >>
>>>> >> *Junior Developer*
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ------
>>>> > Yexi Jiang,
>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>> > School of Computer and Information Science,
>>>> > Florida International University
>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
1. I jst thought of building a model using a project named say DT and wen a
huge input comes do another mr job test.java with in DT.
If not chaining jobs we need to create seperate project right DT_build and
DT_test projects
NO need for seperate project file?

2. M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need to create only 1 project that includes train and
test.or 2 projects


On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:

> What is your motivation of using chaining jobs?
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi...A very nice explanation...Thanks a lot..
>> Explained in a very simple way which is really understandable for
>> beginners..Thanks a lot.
>> I can go for chaining jobs right?
>>
>>
>>
>>
>>
>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> In my opinion.
>>>
>>> 1. Build the decision tree model with the training data.
>>> 2. Store it somewhere.
>>> 3. When the unlabeled data is available:
>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>> them, load the model at the setup stage, use the model to label the data
>>> one by one in map stage. There is no necessary to have a reducer.
>>>   3.2 if the unlabeled data is small, it is trivial.
>>>
>>>
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi ,
>>>>
>>>> But how  it can be accomplished.
>>>> The input to Desicion Tree MR will be a set of data. But while
>>>> predicting a data it will be a one line data without classlabel right?
>>>> So what changes will be there in mrjob.Should we design like this.
>>>> 1. When a set of data is coming draw Desicion tree
>>>> 2. else if a one line data is coming.check the output of decision
>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>
>>>> -------
>>>>
>>>> M1_train - dataset for training.
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Pls suggest if my thoughts are wrong.
>>>>
>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>> > I watched the video in it but I cannot access its source code due to
>>>> > permission issue.
>>>> > In my opinion, once the decision tree model is built, the model is
>>>> small
>>>> > enough to be loaded into memory and can be used directly without
>>>> another
>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>> way.
>>>> >
>>>> >
>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>> >
>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>> >>
>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>> >>
>>>> >> Here a decision tree is build. So my doubt is
>>>> >> Can we also include the prediction along with  that?
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> You are welcome :)
>>>> >>>
>>>> >>>
>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>
>>>> >>>> ok . Thx Yexi
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>> currently,
>>>> >>>>> but you can use the decision forest instead.
>>>> >>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>
>>>> >>>>>> Is that ID3 classification?
>>>> >>>>>> It includes prediction also?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>> >>>>>> <ye...@gmail.com>wrote:
>>>> >>>>>>
>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>> or you
>>>> >>>>>>> can check out from svn by following
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>>>
>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>> >>>>>>>>
>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>> encouraged
>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>> regression
>>>> >>>>>>>> problems
>>>> >>>>>>>>
>>>> >>>>>>>> Where can I find its source code and documentation.
>>>> >>>>>>>>
>>>> >>>>>>>> Should I download mahout
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> *Thanks & Regards*
>>>> >>>>>>>>
>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>>>
>>>> >>>>>>>> *Junior Developer*
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ------
>>>> >>>>>>> Yexi Jiang,
>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>>>> School of Computer and Information Science,
>>>> >>>>>>> Florida International University
>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> *Thanks & Regards*
>>>> >>>>>>
>>>> >>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>
>>>> >>>>>> *Junior Developer*
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> ------
>>>> >>>>> Yexi Jiang,
>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>> School of Computer and Information Science,
>>>> >>>>> Florida International University
>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> *Thanks & Regards*
>>>> >>>>
>>>> >>>> Unmesha Sreeveni U.B
>>>> >>>>
>>>> >>>> *Junior Developer*
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> ------
>>>> >>> Yexi Jiang,
>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>> School of Computer and Information Science,
>>>> >>> Florida International University
>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> *Thanks & Regards*
>>>> >>
>>>> >> Unmesha Sreeveni U.B
>>>> >>
>>>> >> *Junior Developer*
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ------
>>>> > Yexi Jiang,
>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>> > School of Computer and Information Science,
>>>> > Florida International University
>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
1. I jst thought of building a model using a project named say DT and wen a
huge input comes do another mr job test.java with in DT.
If not chaining jobs we need to create seperate project right DT_build and
DT_test projects
NO need for seperate project file?

2. M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need to create only 1 project that includes train and
test.or 2 projects


On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:

> What is your motivation of using chaining jobs?
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi...A very nice explanation...Thanks a lot..
>> Explained in a very simple way which is really understandable for
>> beginners..Thanks a lot.
>> I can go for chaining jobs right?
>>
>>
>>
>>
>>
>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> In my opinion.
>>>
>>> 1. Build the decision tree model with the training data.
>>> 2. Store it somewhere.
>>> 3. When the unlabeled data is available:
>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>> them, load the model at the setup stage, use the model to label the data
>>> one by one in map stage. There is no necessary to have a reducer.
>>>   3.2 if the unlabeled data is small, it is trivial.
>>>
>>>
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi ,
>>>>
>>>> But how  it can be accomplished.
>>>> The input to Desicion Tree MR will be a set of data. But while
>>>> predicting a data it will be a one line data without classlabel right?
>>>> So what changes will be there in mrjob.Should we design like this.
>>>> 1. When a set of data is coming draw Desicion tree
>>>> 2. else if a one line data is coming.check the output of decision
>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>
>>>> -------
>>>>
>>>> M1_train - dataset for training.
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Pls suggest if my thoughts are wrong.
>>>>
>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>> > I watched the video in it but I cannot access its source code due to
>>>> > permission issue.
>>>> > In my opinion, once the decision tree model is built, the model is
>>>> small
>>>> > enough to be loaded into memory and can be used directly without
>>>> another
>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>> way.
>>>> >
>>>> >
>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>> >
>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>> >>
>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>> >>
>>>> >> Here a decision tree is build. So my doubt is
>>>> >> Can we also include the prediction along with  that?
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> You are welcome :)
>>>> >>>
>>>> >>>
>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>
>>>> >>>> ok . Thx Yexi
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>> currently,
>>>> >>>>> but you can use the decision forest instead.
>>>> >>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>
>>>> >>>>>> Is that ID3 classification?
>>>> >>>>>> It includes prediction also?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>> >>>>>> <ye...@gmail.com>wrote:
>>>> >>>>>>
>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>> or you
>>>> >>>>>>> can check out from svn by following
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>>>
>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>> >>>>>>>>
>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>> encouraged
>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>> regression
>>>> >>>>>>>> problems
>>>> >>>>>>>>
>>>> >>>>>>>> Where can I find its source code and documentation.
>>>> >>>>>>>>
>>>> >>>>>>>> Should I download mahout
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> *Thanks & Regards*
>>>> >>>>>>>>
>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>>>
>>>> >>>>>>>> *Junior Developer*
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ------
>>>> >>>>>>> Yexi Jiang,
>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>>>> School of Computer and Information Science,
>>>> >>>>>>> Florida International University
>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> *Thanks & Regards*
>>>> >>>>>>
>>>> >>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>
>>>> >>>>>> *Junior Developer*
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> ------
>>>> >>>>> Yexi Jiang,
>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>> School of Computer and Information Science,
>>>> >>>>> Florida International University
>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> *Thanks & Regards*
>>>> >>>>
>>>> >>>> Unmesha Sreeveni U.B
>>>> >>>>
>>>> >>>> *Junior Developer*
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> ------
>>>> >>> Yexi Jiang,
>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>> School of Computer and Information Science,
>>>> >>> Florida International University
>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> *Thanks & Regards*
>>>> >>
>>>> >> Unmesha Sreeveni U.B
>>>> >>
>>>> >> *Junior Developer*
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ------
>>>> > Yexi Jiang,
>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>> > School of Computer and Information Science,
>>>> > Florida International University
>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
1. I jst thought of building a model using a project named say DT and wen a
huge input comes do another mr job test.java with in DT.
If not chaining jobs we need to create seperate project right DT_build and
DT_test projects
NO need for seperate project file?

2. M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need to create only 1 project that includes train and
test.or 2 projects


On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <ye...@gmail.com> wrote:

> What is your motivation of using chaining jobs?
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi...A very nice explanation...Thanks a lot..
>> Explained in a very simple way which is really understandable for
>> beginners..Thanks a lot.
>> I can go for chaining jobs right?
>>
>>
>>
>>
>>
>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> In my opinion.
>>>
>>> 1. Build the decision tree model with the training data.
>>> 2. Store it somewhere.
>>> 3. When the unlabeled data is available:
>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>> them, load the model at the setup stage, use the model to label the data
>>> one by one in map stage. There is no necessary to have a reducer.
>>>   3.2 if the unlabeled data is small, it is trivial.
>>>
>>>
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Thanks Yexi ,
>>>>
>>>> But how  it can be accomplished.
>>>> The input to Desicion Tree MR will be a set of data. But while
>>>> predicting a data it will be a one line data without classlabel right?
>>>> So what changes will be there in mrjob.Should we design like this.
>>>> 1. When a set of data is coming draw Desicion tree
>>>> 2. else if a one line data is coming.check the output of decision
>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>
>>>> -------
>>>>
>>>> M1_train - dataset for training.
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Pls suggest if my thoughts are wrong.
>>>>
>>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>>> > I watched the video in it but I cannot access its source code due to
>>>> > permission issue.
>>>> > In my opinion, once the decision tree model is built, the model is
>>>> small
>>>> > enough to be loaded into memory and can be used directly without
>>>> another
>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>> way.
>>>> >
>>>> >
>>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>>> >
>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>> >>
>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>> >>
>>>> >> Here a decision tree is build. So my doubt is
>>>> >> Can we also include the prediction along with  that?
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> You are welcome :)
>>>> >>>
>>>> >>>
>>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>
>>>> >>>> ok . Thx Yexi
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>>> currently,
>>>> >>>>> but you can use the decision forest instead.
>>>> >>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>
>>>> >>>>>> Is that ID3 classification?
>>>> >>>>>> It includes prediction also?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>> >>>>>> <ye...@gmail.com>wrote:
>>>> >>>>>>
>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>> or you
>>>> >>>>>>> can check out from svn by following
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>> >>>>>>>
>>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>> >>>>>>>>
>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>>> encouraged
>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>> >>>>>>>> Improved Decision Tree performance and added support for
>>>> regression
>>>> >>>>>>>> problems
>>>> >>>>>>>>
>>>> >>>>>>>> Where can I find its source code and documentation.
>>>> >>>>>>>>
>>>> >>>>>>>> Should I download mahout
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> *Thanks & Regards*
>>>> >>>>>>>>
>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>>>
>>>> >>>>>>>> *Junior Developer*
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ------
>>>> >>>>>>> Yexi Jiang,
>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>>>> School of Computer and Information Science,
>>>> >>>>>>> Florida International University
>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> *Thanks & Regards*
>>>> >>>>>>
>>>> >>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>
>>>> >>>>>> *Junior Developer*
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> ------
>>>> >>>>> Yexi Jiang,
>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>> School of Computer and Information Science,
>>>> >>>>> Florida International University
>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> *Thanks & Regards*
>>>> >>>>
>>>> >>>> Unmesha Sreeveni U.B
>>>> >>>>
>>>> >>>> *Junior Developer*
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> ------
>>>> >>> Yexi Jiang,
>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>> School of Computer and Information Science,
>>>> >>> Florida International University
>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> *Thanks & Regards*
>>>> >>
>>>> >> Unmesha Sreeveni U.B
>>>> >>
>>>> >> *Junior Developer*
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ------
>>>> > Yexi Jiang,
>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>> > School of Computer and Information Science,
>>>> > Florida International University
>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
What is your motivation of using chaining jobs?


2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi...A very nice explanation...Thanks a lot..
> Explained in a very simple way which is really understandable for
> beginners..Thanks a lot.
> I can go for chaining jobs right?
>
>
>
>
>
> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> In my opinion.
>>
>> 1. Build the decision tree model with the training data.
>> 2. Store it somewhere.
>> 3. When the unlabeled data is available:
>>    3.1 if the unlabeled data is huge, write another mrjob to process
>> them, load the model at the setup stage, use the model to label the data
>> one by one in map stage. There is no necessary to have a reducer.
>>   3.2 if the unlabeled data is small, it is trivial.
>>
>>
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi ,
>>>
>>> But how  it can be accomplished.
>>> The input to Desicion Tree MR will be a set of data. But while
>>> predicting a data it will be a one line data without classlabel right?
>>> So what changes will be there in mrjob.Should we design like this.
>>> 1. When a set of data is coming draw Desicion tree
>>> 2. else if a one line data is coming.check the output of decision
>>> tree(Decision tree generated from mr) and predict the class label.
>>>
>>> -------
>>>
>>> M1_train - dataset for training.
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Pls suggest if my thoughts are wrong.
>>>
>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>> > I watched the video in it but I cannot access its source code due to
>>> > permission issue.
>>> > In my opinion, once the decision tree model is built, the model is
>>> small
>>> > enough to be loaded into memory and can be used directly without
>>> another
>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>> way.
>>> >
>>> >
>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>> >
>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>> >>
>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>> >>
>>> >> Here a decision tree is build. So my doubt is
>>> >> Can we also include the prediction along with  that?
>>> >>
>>> >>
>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>> >>
>>> >>> You are welcome :)
>>> >>>
>>> >>>
>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>
>>> >>>> ok . Thx Yexi
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>> currently,
>>> >>>>> but you can use the decision forest instead.
>>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example
>>> .
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>>>
>>> >>>>>> Is that ID3 classification?
>>> >>>>>> It includes prediction also?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>> >>>>>> <ye...@gmail.com>wrote:
>>> >>>>>>
>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>> or you
>>> >>>>>>> can check out from svn by following
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>> >>>>>>>
>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>> >>>>>>>>
>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>> encouraged
>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>> >>>>>>>> Improved Decision Tree performance and added support for
>>> regression
>>> >>>>>>>> problems
>>> >>>>>>>>
>>> >>>>>>>> Where can I find its source code and documentation.
>>> >>>>>>>>
>>> >>>>>>>> Should I download mahout
>>> >>>>>>>>
>>> >>>>>>>> --
>>> >>>>>>>> *Thanks & Regards*
>>> >>>>>>>>
>>> >>>>>>>> Unmesha Sreeveni U.B
>>> >>>>>>>>
>>> >>>>>>>> *Junior Developer*
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ------
>>> >>>>>>> Yexi Jiang,
>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>>>> School of Computer and Information Science,
>>> >>>>>>> Florida International University
>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> *Thanks & Regards*
>>> >>>>>>
>>> >>>>>> Unmesha Sreeveni U.B
>>> >>>>>>
>>> >>>>>> *Junior Developer*
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> ------
>>> >>>>> Yexi Jiang,
>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>> School of Computer and Information Science,
>>> >>>>> Florida International University
>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> *Thanks & Regards*
>>> >>>>
>>> >>>> Unmesha Sreeveni U.B
>>> >>>>
>>> >>>> *Junior Developer*
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> *Thanks & Regards*
>>> >>
>>> >> Unmesha Sreeveni U.B
>>> >>
>>> >> *Junior Developer*
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > ------
>>> > Yexi Jiang,
>>> > ECS 251,  yjian004@cs.fiu.edu
>>> > School of Computer and Information Science,
>>> > Florida International University
>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
What is your motivation of using chaining jobs?


2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi...A very nice explanation...Thanks a lot..
> Explained in a very simple way which is really understandable for
> beginners..Thanks a lot.
> I can go for chaining jobs right?
>
>
>
>
>
> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> In my opinion.
>>
>> 1. Build the decision tree model with the training data.
>> 2. Store it somewhere.
>> 3. When the unlabeled data is available:
>>    3.1 if the unlabeled data is huge, write another mrjob to process
>> them, load the model at the setup stage, use the model to label the data
>> one by one in map stage. There is no necessary to have a reducer.
>>   3.2 if the unlabeled data is small, it is trivial.
>>
>>
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi ,
>>>
>>> But how  it can be accomplished.
>>> The input to Desicion Tree MR will be a set of data. But while
>>> predicting a data it will be a one line data without classlabel right?
>>> So what changes will be there in mrjob.Should we design like this.
>>> 1. When a set of data is coming draw Desicion tree
>>> 2. else if a one line data is coming.check the output of decision
>>> tree(Decision tree generated from mr) and predict the class label.
>>>
>>> -------
>>>
>>> M1_train - dataset for training.
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Pls suggest if my thoughts are wrong.
>>>
>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>> > I watched the video in it but I cannot access its source code due to
>>> > permission issue.
>>> > In my opinion, once the decision tree model is built, the model is
>>> small
>>> > enough to be loaded into memory and can be used directly without
>>> another
>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>> way.
>>> >
>>> >
>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>> >
>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>> >>
>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>> >>
>>> >> Here a decision tree is build. So my doubt is
>>> >> Can we also include the prediction along with  that?
>>> >>
>>> >>
>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>> >>
>>> >>> You are welcome :)
>>> >>>
>>> >>>
>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>
>>> >>>> ok . Thx Yexi
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>> currently,
>>> >>>>> but you can use the decision forest instead.
>>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example
>>> .
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>>>
>>> >>>>>> Is that ID3 classification?
>>> >>>>>> It includes prediction also?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>> >>>>>> <ye...@gmail.com>wrote:
>>> >>>>>>
>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>> or you
>>> >>>>>>> can check out from svn by following
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>> >>>>>>>
>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>> >>>>>>>>
>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>> encouraged
>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>> >>>>>>>> Improved Decision Tree performance and added support for
>>> regression
>>> >>>>>>>> problems
>>> >>>>>>>>
>>> >>>>>>>> Where can I find its source code and documentation.
>>> >>>>>>>>
>>> >>>>>>>> Should I download mahout
>>> >>>>>>>>
>>> >>>>>>>> --
>>> >>>>>>>> *Thanks & Regards*
>>> >>>>>>>>
>>> >>>>>>>> Unmesha Sreeveni U.B
>>> >>>>>>>>
>>> >>>>>>>> *Junior Developer*
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ------
>>> >>>>>>> Yexi Jiang,
>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>>>> School of Computer and Information Science,
>>> >>>>>>> Florida International University
>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> *Thanks & Regards*
>>> >>>>>>
>>> >>>>>> Unmesha Sreeveni U.B
>>> >>>>>>
>>> >>>>>> *Junior Developer*
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> ------
>>> >>>>> Yexi Jiang,
>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>> School of Computer and Information Science,
>>> >>>>> Florida International University
>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> *Thanks & Regards*
>>> >>>>
>>> >>>> Unmesha Sreeveni U.B
>>> >>>>
>>> >>>> *Junior Developer*
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> *Thanks & Regards*
>>> >>
>>> >> Unmesha Sreeveni U.B
>>> >>
>>> >> *Junior Developer*
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > ------
>>> > Yexi Jiang,
>>> > ECS 251,  yjian004@cs.fiu.edu
>>> > School of Computer and Information Science,
>>> > Florida International University
>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
What is your motivation of using chaining jobs?


2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi...A very nice explanation...Thanks a lot..
> Explained in a very simple way which is really understandable for
> beginners..Thanks a lot.
> I can go for chaining jobs right?
>
>
>
>
>
> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> In my opinion.
>>
>> 1. Build the decision tree model with the training data.
>> 2. Store it somewhere.
>> 3. When the unlabeled data is available:
>>    3.1 if the unlabeled data is huge, write another mrjob to process
>> them, load the model at the setup stage, use the model to label the data
>> one by one in map stage. There is no necessary to have a reducer.
>>   3.2 if the unlabeled data is small, it is trivial.
>>
>>
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi ,
>>>
>>> But how  it can be accomplished.
>>> The input to Desicion Tree MR will be a set of data. But while
>>> predicting a data it will be a one line data without classlabel right?
>>> So what changes will be there in mrjob.Should we design like this.
>>> 1. When a set of data is coming draw Desicion tree
>>> 2. else if a one line data is coming.check the output of decision
>>> tree(Decision tree generated from mr) and predict the class label.
>>>
>>> -------
>>>
>>> M1_train - dataset for training.
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Pls suggest if my thoughts are wrong.
>>>
>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>> > I watched the video in it but I cannot access its source code due to
>>> > permission issue.
>>> > In my opinion, once the decision tree model is built, the model is
>>> small
>>> > enough to be loaded into memory and can be used directly without
>>> another
>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>> way.
>>> >
>>> >
>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>> >
>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>> >>
>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>> >>
>>> >> Here a decision tree is build. So my doubt is
>>> >> Can we also include the prediction along with  that?
>>> >>
>>> >>
>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>> >>
>>> >>> You are welcome :)
>>> >>>
>>> >>>
>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>
>>> >>>> ok . Thx Yexi
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>> currently,
>>> >>>>> but you can use the decision forest instead.
>>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example
>>> .
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>>>
>>> >>>>>> Is that ID3 classification?
>>> >>>>>> It includes prediction also?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>> >>>>>> <ye...@gmail.com>wrote:
>>> >>>>>>
>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>> or you
>>> >>>>>>> can check out from svn by following
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>> >>>>>>>
>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>> >>>>>>>>
>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>> encouraged
>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>> >>>>>>>> Improved Decision Tree performance and added support for
>>> regression
>>> >>>>>>>> problems
>>> >>>>>>>>
>>> >>>>>>>> Where can I find its source code and documentation.
>>> >>>>>>>>
>>> >>>>>>>> Should I download mahout
>>> >>>>>>>>
>>> >>>>>>>> --
>>> >>>>>>>> *Thanks & Regards*
>>> >>>>>>>>
>>> >>>>>>>> Unmesha Sreeveni U.B
>>> >>>>>>>>
>>> >>>>>>>> *Junior Developer*
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ------
>>> >>>>>>> Yexi Jiang,
>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>>>> School of Computer and Information Science,
>>> >>>>>>> Florida International University
>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> *Thanks & Regards*
>>> >>>>>>
>>> >>>>>> Unmesha Sreeveni U.B
>>> >>>>>>
>>> >>>>>> *Junior Developer*
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> ------
>>> >>>>> Yexi Jiang,
>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>> School of Computer and Information Science,
>>> >>>>> Florida International University
>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> *Thanks & Regards*
>>> >>>>
>>> >>>> Unmesha Sreeveni U.B
>>> >>>>
>>> >>>> *Junior Developer*
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> *Thanks & Regards*
>>> >>
>>> >> Unmesha Sreeveni U.B
>>> >>
>>> >> *Junior Developer*
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > ------
>>> > Yexi Jiang,
>>> > ECS 251,  yjian004@cs.fiu.edu
>>> > School of Computer and Information Science,
>>> > Florida International University
>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
What is your motivation of using chaining jobs?


2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi...A very nice explanation...Thanks a lot..
> Explained in a very simple way which is really understandable for
> beginners..Thanks a lot.
> I can go for chaining jobs right?
>
>
>
>
>
> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> In my opinion.
>>
>> 1. Build the decision tree model with the training data.
>> 2. Store it somewhere.
>> 3. When the unlabeled data is available:
>>    3.1 if the unlabeled data is huge, write another mrjob to process
>> them, load the model at the setup stage, use the model to label the data
>> one by one in map stage. There is no necessary to have a reducer.
>>   3.2 if the unlabeled data is small, it is trivial.
>>
>>
>>
>>
>> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>>
>>> Thanks Yexi ,
>>>
>>> But how  it can be accomplished.
>>> The input to Desicion Tree MR will be a set of data. But while
>>> predicting a data it will be a one line data without classlabel right?
>>> So what changes will be there in mrjob.Should we design like this.
>>> 1. When a set of data is coming draw Desicion tree
>>> 2. else if a one line data is coming.check the output of decision
>>> tree(Decision tree generated from mr) and predict the class label.
>>>
>>> -------
>>>
>>> M1_train - dataset for training.
>>> M1_test - test data or prediction.
>>> 1. Will it be one data as input for prediction or  set of data given
>>> as input at-once.
>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>> only. we shld check that also ...right? if M1_test is given into
>>> M2_train it should show error. is nt 'it?.
>>>
>>> Pls suggest if my thoughts are wrong.
>>>
>>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>>> > I watched the video in it but I cannot access its source code due to
>>> > permission issue.
>>> > In my opinion, once the decision tree model is built, the model is
>>> small
>>> > enough to be loaded into memory and can be used directly without
>>> another
>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>> way.
>>> >
>>> >
>>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>>> >
>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>> >>
>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>> >>
>>> >> Here a decision tree is build. So my doubt is
>>> >> Can we also include the prediction along with  that?
>>> >>
>>> >>
>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>> >>
>>> >>> You are welcome :)
>>> >>>
>>> >>>
>>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>
>>> >>>> ok . Thx Yexi
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> As far as I know, there is no ID3 implementation in mahout
>>> currently,
>>> >>>>> but you can use the decision forest instead.
>>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example
>>> .
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>> >>>>>
>>> >>>>>> Is that ID3 classification?
>>> >>>>>> It includes prediction also?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>> >>>>>> <ye...@gmail.com>wrote:
>>> >>>>>>
>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>> or you
>>> >>>>>>> can check out from svn by following
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>> >>>>>>>
>>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>> >>>>>>>>
>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>>> encouraged
>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>> >>>>>>>> Improved Decision Tree performance and added support for
>>> regression
>>> >>>>>>>> problems
>>> >>>>>>>>
>>> >>>>>>>> Where can I find its source code and documentation.
>>> >>>>>>>>
>>> >>>>>>>> Should I download mahout
>>> >>>>>>>>
>>> >>>>>>>> --
>>> >>>>>>>> *Thanks & Regards*
>>> >>>>>>>>
>>> >>>>>>>> Unmesha Sreeveni U.B
>>> >>>>>>>>
>>> >>>>>>>> *Junior Developer*
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ------
>>> >>>>>>> Yexi Jiang,
>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>>>> School of Computer and Information Science,
>>> >>>>>>> Florida International University
>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> *Thanks & Regards*
>>> >>>>>>
>>> >>>>>> Unmesha Sreeveni U.B
>>> >>>>>>
>>> >>>>>> *Junior Developer*
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> ------
>>> >>>>> Yexi Jiang,
>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>> >>>>> School of Computer and Information Science,
>>> >>>>> Florida International University
>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> *Thanks & Regards*
>>> >>>>
>>> >>>> Unmesha Sreeveni U.B
>>> >>>>
>>> >>>> *Junior Developer*
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> *Thanks & Regards*
>>> >>
>>> >> Unmesha Sreeveni U.B
>>> >>
>>> >> *Junior Developer*
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > ------
>>> > Yexi Jiang,
>>> > ECS 251,  yjian004@cs.fiu.edu
>>> > School of Computer and Information Science,
>>> > Florida International University
>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi...A very nice explanation...Thanks a lot..
Explained in a very simple way which is really understandable for
beginners..Thanks a lot.
I can go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:

> In my opinion.
>
> 1. Build the decision tree model with the training data.
> 2. Store it somewhere.
> 3. When the unlabeled data is available:
>    3.1 if the unlabeled data is huge, write another mrjob to process them,
> load the model at the setup stage, use the model to label the data one by
> one in map stage. There is no necessary to have a reducer.
>   3.2 if the unlabeled data is small, it is trivial.
>
>
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi ,
>>
>> But how  it can be accomplished.
>> The input to Desicion Tree MR will be a set of data. But while
>> predicting a data it will be a one line data without classlabel right?
>> So what changes will be there in mrjob.Should we design like this.
>> 1. When a set of data is coming draw Desicion tree
>> 2. else if a one line data is coming.check the output of decision
>> tree(Decision tree generated from mr) and predict the class label.
>>
>> -------
>>
>> M1_train - dataset for training.
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Pls suggest if my thoughts are wrong.
>>
>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>> > I watched the video in it but I cannot access its source code due to
>> > permission issue.
>> > In my opinion, once the decision tree model is built, the model is small
>> > enough to be loaded into memory and can be used directly without another
>> > mrjob for prediction. The prediction can be conducted in a streaming
>> way.
>> >
>> >
>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>> >
>> >> I have gone through a Map Reduce implementation of c4.5 in
>> >>
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>> >>
>> >> Here a decision tree is build. So my doubt is
>> >> Can we also include the prediction along with  that?
>> >>
>> >>
>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >>
>> >>> You are welcome :)
>> >>>
>> >>>
>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>
>> >>>> ok . Thx Yexi
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> As far as I know, there is no ID3 implementation in mahout
>> currently,
>> >>>>> but you can use the decision forest instead.
>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>> >>>>>
>> >>>>>
>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>>>
>> >>>>>> Is that ID3 classification?
>> >>>>>> It includes prediction also?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>> >>>>>> <ye...@gmail.com>wrote:
>> >>>>>>
>> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
>> you
>> >>>>>>> can check out from svn by following
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>> >>>>>>>
>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>> >>>>>>>>
>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>> encouraged
>> >>>>>>>> to begin using version 0.6. Highlights include:
>> >>>>>>>> Improved Decision Tree performance and added support for
>> regression
>> >>>>>>>> problems
>> >>>>>>>>
>> >>>>>>>> Where can I find its source code and documentation.
>> >>>>>>>>
>> >>>>>>>> Should I download mahout
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Thanks & Regards*
>> >>>>>>>>
>> >>>>>>>> Unmesha Sreeveni U.B
>> >>>>>>>>
>> >>>>>>>> *Junior Developer*
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> ------
>> >>>>>>> Yexi Jiang,
>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>>>> School of Computer and Information Science,
>> >>>>>>> Florida International University
>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Thanks & Regards*
>> >>>>>>
>> >>>>>> Unmesha Sreeveni U.B
>> >>>>>>
>> >>>>>> *Junior Developer*
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> ------
>> >>>>> Yexi Jiang,
>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>> School of Computer and Information Science,
>> >>>>> Florida International University
>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> *Thanks & Regards*
>> >>>>
>> >>>> Unmesha Sreeveni U.B
>> >>>>
>> >>>> *Junior Developer*
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  yjian004@cs.fiu.edu
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Thanks & Regards*
>> >>
>> >> Unmesha Sreeveni U.B
>> >>
>> >> *Junior Developer*
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi...A very nice explanation...Thanks a lot..
Explained in a very simple way which is really understandable for
beginners..Thanks a lot.
I can go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:

> In my opinion.
>
> 1. Build the decision tree model with the training data.
> 2. Store it somewhere.
> 3. When the unlabeled data is available:
>    3.1 if the unlabeled data is huge, write another mrjob to process them,
> load the model at the setup stage, use the model to label the data one by
> one in map stage. There is no necessary to have a reducer.
>   3.2 if the unlabeled data is small, it is trivial.
>
>
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi ,
>>
>> But how  it can be accomplished.
>> The input to Desicion Tree MR will be a set of data. But while
>> predicting a data it will be a one line data without classlabel right?
>> So what changes will be there in mrjob.Should we design like this.
>> 1. When a set of data is coming draw Desicion tree
>> 2. else if a one line data is coming.check the output of decision
>> tree(Decision tree generated from mr) and predict the class label.
>>
>> -------
>>
>> M1_train - dataset for training.
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Pls suggest if my thoughts are wrong.
>>
>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>> > I watched the video in it but I cannot access its source code due to
>> > permission issue.
>> > In my opinion, once the decision tree model is built, the model is small
>> > enough to be loaded into memory and can be used directly without another
>> > mrjob for prediction. The prediction can be conducted in a streaming
>> way.
>> >
>> >
>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>> >
>> >> I have gone through a Map Reduce implementation of c4.5 in
>> >>
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>> >>
>> >> Here a decision tree is build. So my doubt is
>> >> Can we also include the prediction along with  that?
>> >>
>> >>
>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >>
>> >>> You are welcome :)
>> >>>
>> >>>
>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>
>> >>>> ok . Thx Yexi
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> As far as I know, there is no ID3 implementation in mahout
>> currently,
>> >>>>> but you can use the decision forest instead.
>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>> >>>>>
>> >>>>>
>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>>>
>> >>>>>> Is that ID3 classification?
>> >>>>>> It includes prediction also?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>> >>>>>> <ye...@gmail.com>wrote:
>> >>>>>>
>> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
>> you
>> >>>>>>> can check out from svn by following
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>> >>>>>>>
>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>> >>>>>>>>
>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>> encouraged
>> >>>>>>>> to begin using version 0.6. Highlights include:
>> >>>>>>>> Improved Decision Tree performance and added support for
>> regression
>> >>>>>>>> problems
>> >>>>>>>>
>> >>>>>>>> Where can I find its source code and documentation.
>> >>>>>>>>
>> >>>>>>>> Should I download mahout
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Thanks & Regards*
>> >>>>>>>>
>> >>>>>>>> Unmesha Sreeveni U.B
>> >>>>>>>>
>> >>>>>>>> *Junior Developer*
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> ------
>> >>>>>>> Yexi Jiang,
>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>>>> School of Computer and Information Science,
>> >>>>>>> Florida International University
>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Thanks & Regards*
>> >>>>>>
>> >>>>>> Unmesha Sreeveni U.B
>> >>>>>>
>> >>>>>> *Junior Developer*
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> ------
>> >>>>> Yexi Jiang,
>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>> School of Computer and Information Science,
>> >>>>> Florida International University
>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> *Thanks & Regards*
>> >>>>
>> >>>> Unmesha Sreeveni U.B
>> >>>>
>> >>>> *Junior Developer*
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  yjian004@cs.fiu.edu
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Thanks & Regards*
>> >>
>> >> Unmesha Sreeveni U.B
>> >>
>> >> *Junior Developer*
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi...A very nice explanation...Thanks a lot..
Explained in a very simple way which is really understandable for
beginners..Thanks a lot.
I can go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:

> In my opinion.
>
> 1. Build the decision tree model with the training data.
> 2. Store it somewhere.
> 3. When the unlabeled data is available:
>    3.1 if the unlabeled data is huge, write another mrjob to process them,
> load the model at the setup stage, use the model to label the data one by
> one in map stage. There is no necessary to have a reducer.
>   3.2 if the unlabeled data is small, it is trivial.
>
>
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi ,
>>
>> But how  it can be accomplished.
>> The input to Desicion Tree MR will be a set of data. But while
>> predicting a data it will be a one line data without classlabel right?
>> So what changes will be there in mrjob.Should we design like this.
>> 1. When a set of data is coming draw Desicion tree
>> 2. else if a one line data is coming.check the output of decision
>> tree(Decision tree generated from mr) and predict the class label.
>>
>> -------
>>
>> M1_train - dataset for training.
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Pls suggest if my thoughts are wrong.
>>
>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>> > I watched the video in it but I cannot access its source code due to
>> > permission issue.
>> > In my opinion, once the decision tree model is built, the model is small
>> > enough to be loaded into memory and can be used directly without another
>> > mrjob for prediction. The prediction can be conducted in a streaming
>> way.
>> >
>> >
>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>> >
>> >> I have gone through a Map Reduce implementation of c4.5 in
>> >>
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>> >>
>> >> Here a decision tree is build. So my doubt is
>> >> Can we also include the prediction along with  that?
>> >>
>> >>
>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >>
>> >>> You are welcome :)
>> >>>
>> >>>
>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>
>> >>>> ok . Thx Yexi
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> As far as I know, there is no ID3 implementation in mahout
>> currently,
>> >>>>> but you can use the decision forest instead.
>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>> >>>>>
>> >>>>>
>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>>>
>> >>>>>> Is that ID3 classification?
>> >>>>>> It includes prediction also?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>> >>>>>> <ye...@gmail.com>wrote:
>> >>>>>>
>> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
>> you
>> >>>>>>> can check out from svn by following
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>> >>>>>>>
>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>> >>>>>>>>
>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>> encouraged
>> >>>>>>>> to begin using version 0.6. Highlights include:
>> >>>>>>>> Improved Decision Tree performance and added support for
>> regression
>> >>>>>>>> problems
>> >>>>>>>>
>> >>>>>>>> Where can I find its source code and documentation.
>> >>>>>>>>
>> >>>>>>>> Should I download mahout
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Thanks & Regards*
>> >>>>>>>>
>> >>>>>>>> Unmesha Sreeveni U.B
>> >>>>>>>>
>> >>>>>>>> *Junior Developer*
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> ------
>> >>>>>>> Yexi Jiang,
>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>>>> School of Computer and Information Science,
>> >>>>>>> Florida International University
>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Thanks & Regards*
>> >>>>>>
>> >>>>>> Unmesha Sreeveni U.B
>> >>>>>>
>> >>>>>> *Junior Developer*
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> ------
>> >>>>> Yexi Jiang,
>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>> School of Computer and Information Science,
>> >>>>> Florida International University
>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> *Thanks & Regards*
>> >>>>
>> >>>> Unmesha Sreeveni U.B
>> >>>>
>> >>>> *Junior Developer*
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  yjian004@cs.fiu.edu
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Thanks & Regards*
>> >>
>> >> Unmesha Sreeveni U.B
>> >>
>> >> *Junior Developer*
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi...A very nice explanation...Thanks a lot..
Explained in a very simple way which is really understandable for
beginners..Thanks a lot.
I can go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <ye...@gmail.com> wrote:

> In my opinion.
>
> 1. Build the decision tree model with the training data.
> 2. Store it somewhere.
> 3. When the unlabeled data is available:
>    3.1 if the unlabeled data is huge, write another mrjob to process them,
> load the model at the setup stage, use the model to label the data one by
> one in map stage. There is no necessary to have a reducer.
>   3.2 if the unlabeled data is small, it is trivial.
>
>
>
>
> 2013/12/1 unmesha sreeveni <un...@gmail.com>
>
>> Thanks Yexi ,
>>
>> But how  it can be accomplished.
>> The input to Desicion Tree MR will be a set of data. But while
>> predicting a data it will be a one line data without classlabel right?
>> So what changes will be there in mrjob.Should we design like this.
>> 1. When a set of data is coming draw Desicion tree
>> 2. else if a one line data is coming.check the output of decision
>> tree(Decision tree generated from mr) and predict the class label.
>>
>> -------
>>
>> M1_train - dataset for training.
>> M1_test - test data or prediction.
>> 1. Will it be one data as input for prediction or  set of data given
>> as input at-once.
>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>> only. we shld check that also ...right? if M1_test is given into
>> M2_train it should show error. is nt 'it?.
>>
>> Pls suggest if my thoughts are wrong.
>>
>> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
>> > I watched the video in it but I cannot access its source code due to
>> > permission issue.
>> > In my opinion, once the decision tree model is built, the model is small
>> > enough to be loaded into memory and can be used directly without another
>> > mrjob for prediction. The prediction can be conducted in a streaming
>> way.
>> >
>> >
>> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
>> >
>> >> I have gone through a Map Reduce implementation of c4.5 in
>> >>
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>> >>
>> >> Here a decision tree is build. So my doubt is
>> >> Can we also include the prediction along with  that?
>> >>
>> >>
>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >>
>> >>> You are welcome :)
>> >>>
>> >>>
>> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>
>> >>>> ok . Thx Yexi
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> As far as I know, there is no ID3 implementation in mahout
>> currently,
>> >>>>> but you can use the decision forest instead.
>> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>> >>>>>
>> >>>>>
>> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>> >>>>>
>> >>>>>> Is that ID3 classification?
>> >>>>>> It includes prediction also?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>> >>>>>> <ye...@gmail.com>wrote:
>> >>>>>>
>> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
>> you
>> >>>>>>> can check out from svn by following
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>> >>>>>>>
>> >>>>>>>>  I want to go through Decision tree implementation in mahout.
>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>> >>>>>>>>
>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
>> encouraged
>> >>>>>>>> to begin using version 0.6. Highlights include:
>> >>>>>>>> Improved Decision Tree performance and added support for
>> regression
>> >>>>>>>> problems
>> >>>>>>>>
>> >>>>>>>> Where can I find its source code and documentation.
>> >>>>>>>>
>> >>>>>>>> Should I download mahout
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Thanks & Regards*
>> >>>>>>>>
>> >>>>>>>> Unmesha Sreeveni U.B
>> >>>>>>>>
>> >>>>>>>> *Junior Developer*
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> ------
>> >>>>>>> Yexi Jiang,
>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>>>> School of Computer and Information Science,
>> >>>>>>> Florida International University
>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Thanks & Regards*
>> >>>>>>
>> >>>>>> Unmesha Sreeveni U.B
>> >>>>>>
>> >>>>>> *Junior Developer*
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> ------
>> >>>>> Yexi Jiang,
>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>> >>>>> School of Computer and Information Science,
>> >>>>> Florida International University
>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> *Thanks & Regards*
>> >>>>
>> >>>> Unmesha Sreeveni U.B
>> >>>>
>> >>>> *Junior Developer*
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  yjian004@cs.fiu.edu
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Thanks & Regards*
>> >>
>> >> Unmesha Sreeveni U.B
>> >>
>> >> *Junior Developer*
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
In my opinion.

1. Build the decision tree model with the training data.
2. Store it somewhere.
3. When the unlabeled data is available:
   3.1 if the unlabeled data is huge, write another mrjob to process them,
load the model at the setup stage, use the model to label the data one by
one in map stage. There is no necessary to have a reducer.
  3.2 if the unlabeled data is small, it is trivial.




2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi ,
>
> But how  it can be accomplished.
> The input to Desicion Tree MR will be a set of data. But while
> predicting a data it will be a one line data without classlabel right?
> So what changes will be there in mrjob.Should we design like this.
> 1. When a set of data is coming draw Desicion tree
> 2. else if a one line data is coming.check the output of decision
> tree(Decision tree generated from mr) and predict the class label.
>
> -------
>
> M1_train - dataset for training.
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Pls suggest if my thoughts are wrong.
>
> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> > I watched the video in it but I cannot access its source code due to
> > permission issue.
> > In my opinion, once the decision tree model is built, the model is small
> > enough to be loaded into memory and can be used directly without another
> > mrjob for prediction. The prediction can be conducted in a streaming way.
> >
> >
> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
> >
> >> I have gone through a Map Reduce implementation of c4.5 in
> >>
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
> >>
> >> Here a decision tree is build. So my doubt is
> >> Can we also include the prediction along with  that?
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >>
> >>> You are welcome :)
> >>>
> >>>
> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>
> >>>> ok . Thx Yexi
> >>>>
> >>>>
> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> As far as I know, there is no ID3 implementation in mahout currently,
> >>>>> but you can use the decision forest instead.
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
> >>>>>
> >>>>>
> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>>>
> >>>>>> Is that ID3 classification?
> >>>>>> It includes prediction also?
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
> >>>>>> <ye...@gmail.com>wrote:
> >>>>>>
> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
> you
> >>>>>>> can check out from svn by following
> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control
> .
> >>>>>>>
> >>>>>>>
> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
> >>>>>>>
> >>>>>>>>  I want to go through Decision tree implementation in mahout.
> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
> >>>>>>>>
> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
> encouraged
> >>>>>>>> to begin using version 0.6. Highlights include:
> >>>>>>>> Improved Decision Tree performance and added support for
> regression
> >>>>>>>> problems
> >>>>>>>>
> >>>>>>>> Where can I find its source code and documentation.
> >>>>>>>>
> >>>>>>>> Should I download mahout
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Thanks & Regards*
> >>>>>>>>
> >>>>>>>> Unmesha Sreeveni U.B
> >>>>>>>>
> >>>>>>>> *Junior Developer*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ------
> >>>>>>> Yexi Jiang,
> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>>>> School of Computer and Information Science,
> >>>>>>> Florida International University
> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Thanks & Regards*
> >>>>>>
> >>>>>> Unmesha Sreeveni U.B
> >>>>>>
> >>>>>> *Junior Developer*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ------
> >>>>> Yexi Jiang,
> >>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>> School of Computer and Information Science,
> >>>>> Florida International University
> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Thanks & Regards*
> >>>>
> >>>> Unmesha Sreeveni U.B
> >>>>
> >>>> *Junior Developer*
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  yjian004@cs.fiu.edu
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>
> >>>
> >>
> >>
> >> --
> >> *Thanks & Regards*
> >>
> >> Unmesha Sreeveni U.B
> >>
> >> *Junior Developer*
> >>
> >>
> >>
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
In my opinion.

1. Build the decision tree model with the training data.
2. Store it somewhere.
3. When the unlabeled data is available:
   3.1 if the unlabeled data is huge, write another mrjob to process them,
load the model at the setup stage, use the model to label the data one by
one in map stage. There is no necessary to have a reducer.
  3.2 if the unlabeled data is small, it is trivial.




2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi ,
>
> But how  it can be accomplished.
> The input to Desicion Tree MR will be a set of data. But while
> predicting a data it will be a one line data without classlabel right?
> So what changes will be there in mrjob.Should we design like this.
> 1. When a set of data is coming draw Desicion tree
> 2. else if a one line data is coming.check the output of decision
> tree(Decision tree generated from mr) and predict the class label.
>
> -------
>
> M1_train - dataset for training.
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Pls suggest if my thoughts are wrong.
>
> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> > I watched the video in it but I cannot access its source code due to
> > permission issue.
> > In my opinion, once the decision tree model is built, the model is small
> > enough to be loaded into memory and can be used directly without another
> > mrjob for prediction. The prediction can be conducted in a streaming way.
> >
> >
> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
> >
> >> I have gone through a Map Reduce implementation of c4.5 in
> >>
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
> >>
> >> Here a decision tree is build. So my doubt is
> >> Can we also include the prediction along with  that?
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >>
> >>> You are welcome :)
> >>>
> >>>
> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>
> >>>> ok . Thx Yexi
> >>>>
> >>>>
> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> As far as I know, there is no ID3 implementation in mahout currently,
> >>>>> but you can use the decision forest instead.
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
> >>>>>
> >>>>>
> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>>>
> >>>>>> Is that ID3 classification?
> >>>>>> It includes prediction also?
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
> >>>>>> <ye...@gmail.com>wrote:
> >>>>>>
> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
> you
> >>>>>>> can check out from svn by following
> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control
> .
> >>>>>>>
> >>>>>>>
> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
> >>>>>>>
> >>>>>>>>  I want to go through Decision tree implementation in mahout.
> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
> >>>>>>>>
> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
> encouraged
> >>>>>>>> to begin using version 0.6. Highlights include:
> >>>>>>>> Improved Decision Tree performance and added support for
> regression
> >>>>>>>> problems
> >>>>>>>>
> >>>>>>>> Where can I find its source code and documentation.
> >>>>>>>>
> >>>>>>>> Should I download mahout
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Thanks & Regards*
> >>>>>>>>
> >>>>>>>> Unmesha Sreeveni U.B
> >>>>>>>>
> >>>>>>>> *Junior Developer*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ------
> >>>>>>> Yexi Jiang,
> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>>>> School of Computer and Information Science,
> >>>>>>> Florida International University
> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Thanks & Regards*
> >>>>>>
> >>>>>> Unmesha Sreeveni U.B
> >>>>>>
> >>>>>> *Junior Developer*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ------
> >>>>> Yexi Jiang,
> >>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>> School of Computer and Information Science,
> >>>>> Florida International University
> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Thanks & Regards*
> >>>>
> >>>> Unmesha Sreeveni U.B
> >>>>
> >>>> *Junior Developer*
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  yjian004@cs.fiu.edu
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>
> >>>
> >>
> >>
> >> --
> >> *Thanks & Regards*
> >>
> >> Unmesha Sreeveni U.B
> >>
> >> *Junior Developer*
> >>
> >>
> >>
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
In my opinion.

1. Build the decision tree model with the training data.
2. Store it somewhere.
3. When the unlabeled data is available:
   3.1 if the unlabeled data is huge, write another mrjob to process them,
load the model at the setup stage, use the model to label the data one by
one in map stage. There is no necessary to have a reducer.
  3.2 if the unlabeled data is small, it is trivial.




2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi ,
>
> But how  it can be accomplished.
> The input to Desicion Tree MR will be a set of data. But while
> predicting a data it will be a one line data without classlabel right?
> So what changes will be there in mrjob.Should we design like this.
> 1. When a set of data is coming draw Desicion tree
> 2. else if a one line data is coming.check the output of decision
> tree(Decision tree generated from mr) and predict the class label.
>
> -------
>
> M1_train - dataset for training.
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Pls suggest if my thoughts are wrong.
>
> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> > I watched the video in it but I cannot access its source code due to
> > permission issue.
> > In my opinion, once the decision tree model is built, the model is small
> > enough to be loaded into memory and can be used directly without another
> > mrjob for prediction. The prediction can be conducted in a streaming way.
> >
> >
> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
> >
> >> I have gone through a Map Reduce implementation of c4.5 in
> >>
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
> >>
> >> Here a decision tree is build. So my doubt is
> >> Can we also include the prediction along with  that?
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >>
> >>> You are welcome :)
> >>>
> >>>
> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>
> >>>> ok . Thx Yexi
> >>>>
> >>>>
> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> As far as I know, there is no ID3 implementation in mahout currently,
> >>>>> but you can use the decision forest instead.
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
> >>>>>
> >>>>>
> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>>>
> >>>>>> Is that ID3 classification?
> >>>>>> It includes prediction also?
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
> >>>>>> <ye...@gmail.com>wrote:
> >>>>>>
> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
> you
> >>>>>>> can check out from svn by following
> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control
> .
> >>>>>>>
> >>>>>>>
> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
> >>>>>>>
> >>>>>>>>  I want to go through Decision tree implementation in mahout.
> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
> >>>>>>>>
> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
> encouraged
> >>>>>>>> to begin using version 0.6. Highlights include:
> >>>>>>>> Improved Decision Tree performance and added support for
> regression
> >>>>>>>> problems
> >>>>>>>>
> >>>>>>>> Where can I find its source code and documentation.
> >>>>>>>>
> >>>>>>>> Should I download mahout
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Thanks & Regards*
> >>>>>>>>
> >>>>>>>> Unmesha Sreeveni U.B
> >>>>>>>>
> >>>>>>>> *Junior Developer*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ------
> >>>>>>> Yexi Jiang,
> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>>>> School of Computer and Information Science,
> >>>>>>> Florida International University
> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Thanks & Regards*
> >>>>>>
> >>>>>> Unmesha Sreeveni U.B
> >>>>>>
> >>>>>> *Junior Developer*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ------
> >>>>> Yexi Jiang,
> >>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>> School of Computer and Information Science,
> >>>>> Florida International University
> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Thanks & Regards*
> >>>>
> >>>> Unmesha Sreeveni U.B
> >>>>
> >>>> *Junior Developer*
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  yjian004@cs.fiu.edu
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>
> >>>
> >>
> >>
> >> --
> >> *Thanks & Regards*
> >>
> >> Unmesha Sreeveni U.B
> >>
> >> *Junior Developer*
> >>
> >>
> >>
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
In my opinion.

1. Build the decision tree model with the training data.
2. Store it somewhere.
3. When the unlabeled data is available:
   3.1 if the unlabeled data is huge, write another mrjob to process them,
load the model at the setup stage, use the model to label the data one by
one in map stage. There is no necessary to have a reducer.
  3.2 if the unlabeled data is small, it is trivial.




2013/12/1 unmesha sreeveni <un...@gmail.com>

> Thanks Yexi ,
>
> But how  it can be accomplished.
> The input to Desicion Tree MR will be a set of data. But while
> predicting a data it will be a one line data without classlabel right?
> So what changes will be there in mrjob.Should we design like this.
> 1. When a set of data is coming draw Desicion tree
> 2. else if a one line data is coming.check the output of decision
> tree(Decision tree generated from mr) and predict the class label.
>
> -------
>
> M1_train - dataset for training.
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Pls suggest if my thoughts are wrong.
>
> On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> > I watched the video in it but I cannot access its source code due to
> > permission issue.
> > In my opinion, once the decision tree model is built, the model is small
> > enough to be loaded into memory and can be used directly without another
> > mrjob for prediction. The prediction can be conducted in a streaming way.
> >
> >
> > 2013/11/30 unmesha sreeveni <un...@gmail.com>
> >
> >> I have gone through a Map Reduce implementation of c4.5 in
> >>
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
> >>
> >> Here a decision tree is build. So my doubt is
> >> Can we also include the prediction along with  that?
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >>
> >>> You are welcome :)
> >>>
> >>>
> >>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>
> >>>> ok . Thx Yexi
> >>>>
> >>>>
> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> As far as I know, there is no ID3 implementation in mahout currently,
> >>>>> but you can use the decision forest instead.
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
> >>>>>
> >>>>>
> >>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
> >>>>>
> >>>>>> Is that ID3 classification?
> >>>>>> It includes prediction also?
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
> >>>>>> <ye...@gmail.com>wrote:
> >>>>>>
> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
> you
> >>>>>>> can check out from svn by following
> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control
> .
> >>>>>>>
> >>>>>>>
> >>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
> >>>>>>>
> >>>>>>>>  I want to go through Decision tree implementation in mahout.
> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
> >>>>>>>>
> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
> encouraged
> >>>>>>>> to begin using version 0.6. Highlights include:
> >>>>>>>> Improved Decision Tree performance and added support for
> regression
> >>>>>>>> problems
> >>>>>>>>
> >>>>>>>> Where can I find its source code and documentation.
> >>>>>>>>
> >>>>>>>> Should I download mahout
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Thanks & Regards*
> >>>>>>>>
> >>>>>>>> Unmesha Sreeveni U.B
> >>>>>>>>
> >>>>>>>> *Junior Developer*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ------
> >>>>>>> Yexi Jiang,
> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>>>> School of Computer and Information Science,
> >>>>>>> Florida International University
> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Thanks & Regards*
> >>>>>>
> >>>>>> Unmesha Sreeveni U.B
> >>>>>>
> >>>>>> *Junior Developer*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ------
> >>>>> Yexi Jiang,
> >>>>> ECS 251,  yjian004@cs.fiu.edu
> >>>>> School of Computer and Information Science,
> >>>>> Florida International University
> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Thanks & Regards*
> >>>>
> >>>> Unmesha Sreeveni U.B
> >>>>
> >>>> *Junior Developer*
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  yjian004@cs.fiu.edu
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>>
> >>>
> >>
> >>
> >> --
> >> *Thanks & Regards*
> >>
> >> Unmesha Sreeveni U.B
> >>
> >> *Junior Developer*
> >>
> >>
> >>
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi ,

But how  it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to
> permission issue.
> In my opinion, once the decision tree model is built, the model is small
> enough to be loaded into memory and can be used directly without another
> mrjob for prediction. The prediction can be conducted in a streaming way.
>
>
> 2013/11/30 unmesha sreeveni <un...@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with  that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> Improved Decision Tree performance and added support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and documentation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi ,

But how  it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to
> permission issue.
> In my opinion, once the decision tree model is built, the model is small
> enough to be loaded into memory and can be used directly without another
> mrjob for prediction. The prediction can be conducted in a streaming way.
>
>
> 2013/11/30 unmesha sreeveni <un...@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with  that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> Improved Decision Tree performance and added support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and documentation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi ,

But how  it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to
> permission issue.
> In my opinion, once the decision tree model is built, the model is small
> enough to be loaded into memory and can be used directly without another
> mrjob for prediction. The prediction can be conducted in a streaming way.
>
>
> 2013/11/30 unmesha sreeveni <un...@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with  that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> Improved Decision Tree performance and added support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and documentation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Thanks Yexi ,

But how  it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <ye...@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to
> permission issue.
> In my opinion, once the decision tree model is built, the model is small
> enough to be loaded into memory and can be used directly without another
> mrjob for prediction. The prediction can be conducted in a streaming way.
>
>
> 2013/11/30 unmesha sreeveni <un...@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with  that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com>
>>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <ye...@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>>
>>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights include:
>>>>>>>> Improved Decision Tree performance and added support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and documentation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
I watched the video in it but I cannot access its source code due to
permission issue.
In my opinion, once the decision tree model is built, the model is small
enough to be loaded into memory and can be used directly without another
mrjob for prediction. The prediction can be conducted in a streaming way.


2013/11/30 unmesha sreeveni <un...@gmail.com>

> I have gone through a Map Reduce implementation of c4.5 in
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>
> Here a decision tree is build. So my doubt is
> Can we also include the prediction along with  that?
>
>
> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You are welcome :)
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> ok . Thx Yexi
>>>
>>>
>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>> but you can use the decision forest instead.
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>
>>>>
>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Is that ID3 classification?
>>>>> It includes prediction also?
>>>>>
>>>>>
>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>> can check out from svn by following
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>
>>>>>>
>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>
>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>>
>>>>>>> Where can I find its source code and documentation.
>>>>>>>
>>>>>>> Should I download mahout
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
I watched the video in it but I cannot access its source code due to
permission issue.
In my opinion, once the decision tree model is built, the model is small
enough to be loaded into memory and can be used directly without another
mrjob for prediction. The prediction can be conducted in a streaming way.


2013/11/30 unmesha sreeveni <un...@gmail.com>

> I have gone through a Map Reduce implementation of c4.5 in
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>
> Here a decision tree is build. So my doubt is
> Can we also include the prediction along with  that?
>
>
> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You are welcome :)
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> ok . Thx Yexi
>>>
>>>
>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>> but you can use the decision forest instead.
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>
>>>>
>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Is that ID3 classification?
>>>>> It includes prediction also?
>>>>>
>>>>>
>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>> can check out from svn by following
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>
>>>>>>
>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>
>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>>
>>>>>>> Where can I find its source code and documentation.
>>>>>>>
>>>>>>> Should I download mahout
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
I watched the video in it but I cannot access its source code due to
permission issue.
In my opinion, once the decision tree model is built, the model is small
enough to be loaded into memory and can be used directly without another
mrjob for prediction. The prediction can be conducted in a streaming way.


2013/11/30 unmesha sreeveni <un...@gmail.com>

> I have gone through a Map Reduce implementation of c4.5 in
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>
> Here a decision tree is build. So my doubt is
> Can we also include the prediction along with  that?
>
>
> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You are welcome :)
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> ok . Thx Yexi
>>>
>>>
>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>> but you can use the decision forest instead.
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>
>>>>
>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Is that ID3 classification?
>>>>> It includes prediction also?
>>>>>
>>>>>
>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>> can check out from svn by following
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>
>>>>>>
>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>
>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>>
>>>>>>> Where can I find its source code and documentation.
>>>>>>>
>>>>>>> Should I download mahout
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
I watched the video in it but I cannot access its source code due to
permission issue.
In my opinion, once the decision tree model is built, the model is small
enough to be loaded into memory and can be used directly without another
mrjob for prediction. The prediction can be conducted in a streaming way.


2013/11/30 unmesha sreeveni <un...@gmail.com>

> I have gone through a Map Reduce implementation of c4.5 in
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>
> Here a decision tree is build. So my doubt is
> Can we also include the prediction along with  that?
>
>
> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You are welcome :)
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> ok . Thx Yexi
>>>
>>>
>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> As far as I know, there is no ID3 implementation in mahout currently,
>>>> but you can use the decision forest instead.
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>
>>>>
>>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>> Is that ID3 classification?
>>>>> It includes prediction also?
>>>>>
>>>>>
>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>>
>>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>>> can check out from svn by following
>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>
>>>>>>
>>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>>
>>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>
>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>>
>>>>>>> Where can I find its source code and documentation.
>>>>>>>
>>>>>>> Should I download mahout
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards*
>>>>>>>
>>>>>>> Unmesha Sreeveni U.B
>>>>>>>
>>>>>>> *Junior Developer*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
I have gone through a Map Reduce implementation of c4.5 in
http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html

Here a decision tree is build. So my doubt is
Can we also include the prediction along with  that?


On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:

> You are welcome :)
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> ok . Thx Yexi
>>
>>
>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> As far as I know, there is no ID3 implementation in mahout currently,
>>> but you can use the decision forest instead.
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Is that ID3 classification?
>>>> It includes prediction also?
>>>>
>>>>
>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>
>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>> can check out from svn by following
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>
>>>>>
>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>
>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>
>>>>>> Where can I find its source code and documentation.
>>>>>>
>>>>>> Should I download mahout
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
I have gone through a Map Reduce implementation of c4.5 in
http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html

Here a decision tree is build. So my doubt is
Can we also include the prediction along with  that?


On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:

> You are welcome :)
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> ok . Thx Yexi
>>
>>
>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> As far as I know, there is no ID3 implementation in mahout currently,
>>> but you can use the decision forest instead.
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Is that ID3 classification?
>>>> It includes prediction also?
>>>>
>>>>
>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>
>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>> can check out from svn by following
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>
>>>>>
>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>
>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>
>>>>>> Where can I find its source code and documentation.
>>>>>>
>>>>>> Should I download mahout
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
I have gone through a Map Reduce implementation of c4.5 in
http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html

Here a decision tree is build. So my doubt is
Can we also include the prediction along with  that?


On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:

> You are welcome :)
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> ok . Thx Yexi
>>
>>
>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> As far as I know, there is no ID3 implementation in mahout currently,
>>> but you can use the decision forest instead.
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Is that ID3 classification?
>>>> It includes prediction also?
>>>>
>>>>
>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>
>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>> can check out from svn by following
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>
>>>>>
>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>
>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>
>>>>>> Where can I find its source code and documentation.
>>>>>>
>>>>>> Should I download mahout
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
I have gone through a Map Reduce implementation of c4.5 in
http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html

Here a decision tree is build. So my doubt is
Can we also include the prediction along with  that?


On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <ye...@gmail.com> wrote:

> You are welcome :)
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> ok . Thx Yexi
>>
>>
>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> As far as I know, there is no ID3 implementation in mahout currently,
>>> but you can use the decision forest instead.
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>>
>>>> Is that ID3 classification?
>>>> It includes prediction also?
>>>>
>>>>
>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com>wrote:
>>>>
>>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>>> can check out from svn by following
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>
>>>>>
>>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>>
>>>>>>  I want to go through Decision tree implementation in mahout.
>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>
>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>>
>>>>>> Where can I find its source code and documentation.
>>>>>>
>>>>>> Should I download mahout
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You are welcome :)


2013/11/25 unmesha sreeveni <un...@gmail.com>

> ok . Thx Yexi
>
>
> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> As far as I know, there is no ID3 implementation in mahout currently, but
>> you can use the decision forest instead.
>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> Is that ID3 classification?
>>> It includes prediction also?
>>>
>>>
>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>> can check out from svn by following
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>
>>>>
>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>>> Mahout <http://mahout.apache.org/>
>>>>>
>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>
>>>>> Where can I find its source code and documentation.
>>>>>
>>>>> Should I download mahout
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You are welcome :)


2013/11/25 unmesha sreeveni <un...@gmail.com>

> ok . Thx Yexi
>
>
> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> As far as I know, there is no ID3 implementation in mahout currently, but
>> you can use the decision forest instead.
>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> Is that ID3 classification?
>>> It includes prediction also?
>>>
>>>
>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>> can check out from svn by following
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>
>>>>
>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>>> Mahout <http://mahout.apache.org/>
>>>>>
>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>
>>>>> Where can I find its source code and documentation.
>>>>>
>>>>> Should I download mahout
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You are welcome :)


2013/11/25 unmesha sreeveni <un...@gmail.com>

> ok . Thx Yexi
>
>
> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> As far as I know, there is no ID3 implementation in mahout currently, but
>> you can use the decision forest instead.
>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> Is that ID3 classification?
>>> It includes prediction also?
>>>
>>>
>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>> can check out from svn by following
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>
>>>>
>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>>> Mahout <http://mahout.apache.org/>
>>>>>
>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>
>>>>> Where can I find its source code and documentation.
>>>>>
>>>>> Should I download mahout
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You are welcome :)


2013/11/25 unmesha sreeveni <un...@gmail.com>

> ok . Thx Yexi
>
>
> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> As far as I know, there is no ID3 implementation in mahout currently, but
>> you can use the decision forest instead.
>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>
>>
>> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>>
>>> Is that ID3 classification?
>>> It includes prediction also?
>>>
>>>
>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>>
>>>> You can directly find it at https://github.com/apache/mahout, or you
>>>> can check out from svn by following
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>
>>>>
>>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>>
>>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>>> Mahout <http://mahout.apache.org/>
>>>>>
>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>>> Improved Decision Tree performance and added support for regression problems
>>>>>
>>>>> Where can I find its source code and documentation.
>>>>>
>>>>> Should I download mahout
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
ok . Thx Yexi


On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:

> As far as I know, there is no ID3 implementation in mahout currently, but
> you can use the decision forest instead.
> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> Is that ID3 classification?
>> It includes prediction also?
>>
>>
>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You can directly find it at https://github.com/apache/mahout, or you
>>> can check out from svn by following
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>
>>>
>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>
>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>> Mahout <http://mahout.apache.org/>
>>>>
>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>> Improved Decision Tree performance and added support for regression problems
>>>>
>>>> Where can I find its source code and documentation.
>>>>
>>>> Should I download mahout
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
ok . Thx Yexi


On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:

> As far as I know, there is no ID3 implementation in mahout currently, but
> you can use the decision forest instead.
> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> Is that ID3 classification?
>> It includes prediction also?
>>
>>
>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You can directly find it at https://github.com/apache/mahout, or you
>>> can check out from svn by following
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>
>>>
>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>
>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>> Mahout <http://mahout.apache.org/>
>>>>
>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>> Improved Decision Tree performance and added support for regression problems
>>>>
>>>> Where can I find its source code and documentation.
>>>>
>>>> Should I download mahout
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
ok . Thx Yexi


On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:

> As far as I know, there is no ID3 implementation in mahout currently, but
> you can use the decision forest instead.
> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> Is that ID3 classification?
>> It includes prediction also?
>>
>>
>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You can directly find it at https://github.com/apache/mahout, or you
>>> can check out from svn by following
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>
>>>
>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>
>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>> Mahout <http://mahout.apache.org/>
>>>>
>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>> Improved Decision Tree performance and added support for regression problems
>>>>
>>>> Where can I find its source code and documentation.
>>>>
>>>> Should I download mahout
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
ok . Thx Yexi


On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <ye...@gmail.com> wrote:

> As far as I know, there is no ID3 implementation in mahout currently, but
> you can use the decision forest instead.
> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>
>
> 2013/11/25 unmesha sreeveni <un...@gmail.com>
>
>> Is that ID3 classification?
>> It includes prediction also?
>>
>>
>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>>
>>> You can directly find it at https://github.com/apache/mahout, or you
>>> can check out from svn by following
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>
>>>
>>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>>
>>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>>> Mahout <http://mahout.apache.org/>
>>>>
>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>>> Improved Decision Tree performance and added support for regression problems
>>>>
>>>> Where can I find its source code and documentation.
>>>>
>>>> Should I download mahout
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
As far as I know, there is no ID3 implementation in mahout currently, but
you can use the decision forest instead.
https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


2013/11/25 unmesha sreeveni <un...@gmail.com>

> Is that ID3 classification?
> It includes prediction also?
>
>
> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You can directly find it at https://github.com/apache/mahout, or you can
>> check out from svn by following
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>
>>
>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>
>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>> Mahout <http://mahout.apache.org/>
>>>
>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>> Improved Decision Tree performance and added support for regression problems
>>>
>>> Where can I find its source code and documentation.
>>>
>>> Should I download mahout
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
As far as I know, there is no ID3 implementation in mahout currently, but
you can use the decision forest instead.
https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


2013/11/25 unmesha sreeveni <un...@gmail.com>

> Is that ID3 classification?
> It includes prediction also?
>
>
> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You can directly find it at https://github.com/apache/mahout, or you can
>> check out from svn by following
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>
>>
>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>
>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>> Mahout <http://mahout.apache.org/>
>>>
>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>> Improved Decision Tree performance and added support for regression problems
>>>
>>> Where can I find its source code and documentation.
>>>
>>> Should I download mahout
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
As far as I know, there is no ID3 implementation in mahout currently, but
you can use the decision forest instead.
https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


2013/11/25 unmesha sreeveni <un...@gmail.com>

> Is that ID3 classification?
> It includes prediction also?
>
>
> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You can directly find it at https://github.com/apache/mahout, or you can
>> check out from svn by following
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>
>>
>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>
>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>> Mahout <http://mahout.apache.org/>
>>>
>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>> Improved Decision Tree performance and added support for regression problems
>>>
>>> Where can I find its source code and documentation.
>>>
>>> Should I download mahout
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
As far as I know, there is no ID3 implementation in mahout currently, but
you can use the decision forest instead.
https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.


2013/11/25 unmesha sreeveni <un...@gmail.com>

> Is that ID3 classification?
> It includes prediction also?
>
>
> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:
>
>> You can directly find it at https://github.com/apache/mahout, or you can
>> check out from svn by following
>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>
>>
>> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>>
>>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>>> Mahout <http://mahout.apache.org/>
>>>
>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>>> Improved Decision Tree performance and added support for regression problems
>>>
>>> Where can I find its source code and documentation.
>>>
>>> Should I download mahout
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Is that ID3 classification?
It includes prediction also?


On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:

> You can directly find it at https://github.com/apache/mahout, or you can
> check out from svn by following
> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>
>
> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>
>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>> Mahout <http://mahout.apache.org/>
>>
>> 6 Feb 2012 - Apache Mahout 0.6 released
>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>> Improved Decision Tree performance and added support for regression problems
>>
>> Where can I find its source code and documentation.
>>
>> Should I download mahout
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Is that ID3 classification?
It includes prediction also?


On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:

> You can directly find it at https://github.com/apache/mahout, or you can
> check out from svn by following
> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>
>
> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>
>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>> Mahout <http://mahout.apache.org/>
>>
>> 6 Feb 2012 - Apache Mahout 0.6 released
>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>> Improved Decision Tree performance and added support for regression problems
>>
>> Where can I find its source code and documentation.
>>
>> Should I download mahout
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Is that ID3 classification?
It includes prediction also?


On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:

> You can directly find it at https://github.com/apache/mahout, or you can
> check out from svn by following
> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>
>
> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>
>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>> Mahout <http://mahout.apache.org/>
>>
>> 6 Feb 2012 - Apache Mahout 0.6 released
>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>> Improved Decision Tree performance and added support for regression problems
>>
>> Where can I find its source code and documentation.
>>
>> Should I download mahout
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

yes, I find the reason because of the following issue.
 'org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop/yarn/yarn_data/tmp/dfs/name is in an inconsistent state"

Formatted the HDFS again and fixed the issue.

jps
	3774 Jps
	3701 NameNode


regards


On 28 Nov, 2013, at 1:25 pm, Harsh J <ha...@cloudera.com> wrote:

> Yes you should expect to see a NameNode separately available but
> apparently its dying out. Check the NN's log on that machine to see
> why.
> 
> On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
>> Hi,
>> 
>> I am new to 2.2.0, after running the following command to start the first
>> namenode, I used jps to check the cluster:
>> 
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> starting namenode, logging to
>> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
>> jps
>> 3405 Jps
>> 3132 DataNode
>> 
>> 
>> The name of ID 3132 is DataNode, is this correct as I expected something
>> like "3132 NameNode"? does it mean that the following two commands are doing
>> the same thing in 2.2.0?
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>> 
>> 
>> regards
>> 
> 
> 
> 
> -- 
> Harsh J


Re: Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

yes, I find the reason because of the following issue.
 'org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop/yarn/yarn_data/tmp/dfs/name is in an inconsistent state"

Formatted the HDFS again and fixed the issue.

jps
	3774 Jps
	3701 NameNode


regards


On 28 Nov, 2013, at 1:25 pm, Harsh J <ha...@cloudera.com> wrote:

> Yes you should expect to see a NameNode separately available but
> apparently its dying out. Check the NN's log on that machine to see
> why.
> 
> On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
>> Hi,
>> 
>> I am new to 2.2.0, after running the following command to start the first
>> namenode, I used jps to check the cluster:
>> 
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> starting namenode, logging to
>> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
>> jps
>> 3405 Jps
>> 3132 DataNode
>> 
>> 
>> The name of ID 3132 is DataNode, is this correct as I expected something
>> like "3132 NameNode"? does it mean that the following two commands are doing
>> the same thing in 2.2.0?
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>> 
>> 
>> regards
>> 
> 
> 
> 
> -- 
> Harsh J


Re: Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

yes, I find the reason because of the following issue.
 'org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop/yarn/yarn_data/tmp/dfs/name is in an inconsistent state"

Formatted the HDFS again and fixed the issue.

jps
	3774 Jps
	3701 NameNode


regards


On 28 Nov, 2013, at 1:25 pm, Harsh J <ha...@cloudera.com> wrote:

> Yes you should expect to see a NameNode separately available but
> apparently its dying out. Check the NN's log on that machine to see
> why.
> 
> On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
>> Hi,
>> 
>> I am new to 2.2.0, after running the following command to start the first
>> namenode, I used jps to check the cluster:
>> 
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> starting namenode, logging to
>> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
>> jps
>> 3405 Jps
>> 3132 DataNode
>> 
>> 
>> The name of ID 3132 is DataNode, is this correct as I expected something
>> like "3132 NameNode"? does it mean that the following two commands are doing
>> the same thing in 2.2.0?
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>> 
>> 
>> regards
>> 
> 
> 
> 
> -- 
> Harsh J


Re: Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

yes, I find the reason because of the following issue.
 'org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop/yarn/yarn_data/tmp/dfs/name is in an inconsistent state"

Formatted the HDFS again and fixed the issue.

jps
	3774 Jps
	3701 NameNode


regards


On 28 Nov, 2013, at 1:25 pm, Harsh J <ha...@cloudera.com> wrote:

> Yes you should expect to see a NameNode separately available but
> apparently its dying out. Check the NN's log on that machine to see
> why.
> 
> On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
>> Hi,
>> 
>> I am new to 2.2.0, after running the following command to start the first
>> namenode, I used jps to check the cluster:
>> 
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> starting namenode, logging to
>> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
>> jps
>> 3405 Jps
>> 3132 DataNode
>> 
>> 
>> The name of ID 3132 is DataNode, is this correct as I expected something
>> like "3132 NameNode"? does it mean that the following two commands are doing
>> the same thing in 2.2.0?
>> 
>> ./sbin/hadoop-daemon.sh --script hdfs start namenode
>> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>> 
>> 
>> regards
>> 
> 
> 
> 
> -- 
> Harsh J


Re: Start and Stop Namenode

Posted by Harsh J <ha...@cloudera.com>.
Yes you should expect to see a NameNode separately available but
apparently its dying out. Check the NN's log on that machine to see
why.

On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
> Hi,
>
> I am new to 2.2.0, after running the following command to start the first
> namenode, I used jps to check the cluster:
>
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> starting namenode, logging to
> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
> jps
> 3405 Jps
> 3132 DataNode
>
>
> The name of ID 3132 is DataNode, is this correct as I expected something
> like "3132 NameNode"? does it mean that the following two commands are doing
> the same thing in 2.2.0?
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>
>
> regards
>



-- 
Harsh J

Re: Start and Stop Namenode

Posted by Harsh J <ha...@cloudera.com>.
Yes you should expect to see a NameNode separately available but
apparently its dying out. Check the NN's log on that machine to see
why.

On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
> Hi,
>
> I am new to 2.2.0, after running the following command to start the first
> namenode, I used jps to check the cluster:
>
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> starting namenode, logging to
> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
> jps
> 3405 Jps
> 3132 DataNode
>
>
> The name of ID 3132 is DataNode, is this correct as I expected something
> like "3132 NameNode"? does it mean that the following two commands are doing
> the same thing in 2.2.0?
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>
>
> regards
>



-- 
Harsh J

Re: Start and Stop Namenode

Posted by Harsh J <ha...@cloudera.com>.
Yes you should expect to see a NameNode separately available but
apparently its dying out. Check the NN's log on that machine to see
why.

On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
> Hi,
>
> I am new to 2.2.0, after running the following command to start the first
> namenode, I used jps to check the cluster:
>
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> starting namenode, logging to
> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
> jps
> 3405 Jps
> 3132 DataNode
>
>
> The name of ID 3132 is DataNode, is this correct as I expected something
> like "3132 NameNode"? does it mean that the following two commands are doing
> the same thing in 2.2.0?
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>
>
> regards
>



-- 
Harsh J

Re: Start and Stop Namenode

Posted by Harsh J <ha...@cloudera.com>.
Yes you should expect to see a NameNode separately available but
apparently its dying out. Check the NN's log on that machine to see
why.

On Thu, Nov 28, 2013 at 8:37 AM, Ascot Moss <as...@gmail.com> wrote:
> Hi,
>
> I am new to 2.2.0, after running the following command to start the first
> namenode, I used jps to check the cluster:
>
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> starting namenode, logging to
> /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
> jps
> 3405 Jps
> 3132 DataNode
>
>
> The name of ID 3132 is DataNode, is this correct as I expected something
> like "3132 NameNode"? does it mean that the following two commands are doing
> the same thing in 2.2.0?
>
> ./sbin/hadoop-daemon.sh --script hdfs start namenode
> ./sbin/hadoop-daemon.sh --script hdfs start datanode
>
>
> regards
>



-- 
Harsh J

Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

I am new to 2.2.0, after running the following command to start the first namenode, I used jps to check the cluster: 


./sbin/hadoop-daemon.sh --script hdfs start namenode
	starting namenode, logging to /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
jps
	3405 Jps
	3132 DataNode


The name of ID 3132 is DataNode, is this correct as I expected something like "3132 NameNode"? does it mean that the following two commands are doing the same thing in 2.2.0? 

		./sbin/hadoop-daemon.sh --script hdfs start namenode
	./sbin/hadoop-daemon.sh --script hdfs start datanode
	


regards


Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

I am new to 2.2.0, after running the following command to start the first namenode, I used jps to check the cluster: 


./sbin/hadoop-daemon.sh --script hdfs start namenode
	starting namenode, logging to /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
jps
	3405 Jps
	3132 DataNode


The name of ID 3132 is DataNode, is this correct as I expected something like "3132 NameNode"? does it mean that the following two commands are doing the same thing in 2.2.0? 

		./sbin/hadoop-daemon.sh --script hdfs start namenode
	./sbin/hadoop-daemon.sh --script hdfs start datanode
	


regards


Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

I am new to 2.2.0, after running the following command to start the first namenode, I used jps to check the cluster: 


./sbin/hadoop-daemon.sh --script hdfs start namenode
	starting namenode, logging to /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
jps
	3405 Jps
	3132 DataNode


The name of ID 3132 is DataNode, is this correct as I expected something like "3132 NameNode"? does it mean that the following two commands are doing the same thing in 2.2.0? 

		./sbin/hadoop-daemon.sh --script hdfs start namenode
	./sbin/hadoop-daemon.sh --script hdfs start datanode
	


regards


Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by unmesha sreeveni <un...@gmail.com>.
Is that ID3 classification?
It includes prediction also?


On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang <ye...@gmail.com> wrote:

> You can directly find it at https://github.com/apache/mahout, or you can
> check out from svn by following
> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>
>
> 2013/11/23 unmesha sreeveni <un...@gmail.com>
>
>>  I want to go through Decision tree implementation in mahout. Refereed Apache
>> Mahout <http://mahout.apache.org/>
>>
>> 6 Feb 2012 - Apache Mahout 0.6 released
>> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
>> Improved Decision Tree performance and added support for regression problems
>>
>> Where can I find its source code and documentation.
>>
>> Should I download mahout
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Start and Stop Namenode

Posted by Ascot Moss <as...@gmail.com>.
Hi,

I am new to 2.2.0, after running the following command to start the first namenode, I used jps to check the cluster: 


./sbin/hadoop-daemon.sh --script hdfs start namenode
	starting namenode, logging to /usr/local/hadoop/yarn/hadoop//logs/hadoop-hduser-namenode-hd01.emblocsoft.net.out
jps
	3405 Jps
	3132 DataNode


The name of ID 3132 is DataNode, is this correct as I expected something like "3132 NameNode"? does it mean that the following two commands are doing the same thing in 2.2.0? 

		./sbin/hadoop-daemon.sh --script hdfs start namenode
	./sbin/hadoop-daemon.sh --script hdfs start datanode
	


regards


Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You can directly find it at https://github.com/apache/mahout, or you can
check out from svn by following
https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


2013/11/23 unmesha sreeveni <un...@gmail.com>

> I want to go through Decision tree implementation in mahout. Refereed Apache
> Mahout <http://mahout.apache.org/>
>
> 6 Feb 2012 - Apache Mahout 0.6 released
> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
> Improved Decision Tree performance and added support for regression problems
>
> Where can I find its source code and documentation.
>
> Should I download mahout
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You can directly find it at https://github.com/apache/mahout, or you can
check out from svn by following
https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


2013/11/23 unmesha sreeveni <un...@gmail.com>

> I want to go through Decision tree implementation in mahout. Refereed Apache
> Mahout <http://mahout.apache.org/>
>
> 6 Feb 2012 - Apache Mahout 0.6 released
> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
> Improved Decision Tree performance and added support for regression problems
>
> Where can I find its source code and documentation.
>
> Should I download mahout
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You can directly find it at https://github.com/apache/mahout, or you can
check out from svn by following
https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


2013/11/23 unmesha sreeveni <un...@gmail.com>

> I want to go through Decision tree implementation in mahout. Refereed Apache
> Mahout <http://mahout.apache.org/>
>
> 6 Feb 2012 - Apache Mahout 0.6 released
> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
> Improved Decision Tree performance and added support for regression problems
>
> Where can I find its source code and documentation.
>
> Should I download mahout
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Desicion Tree Implementation in Hadoop MapReduce

Posted by Yexi Jiang <ye...@gmail.com>.
You can directly find it at https://github.com/apache/mahout, or you can
check out from svn by following
https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.


2013/11/23 unmesha sreeveni <un...@gmail.com>

> I want to go through Decision tree implementation in mahout. Refereed Apache
> Mahout <http://mahout.apache.org/>
>
> 6 Feb 2012 - Apache Mahout 0.6 released
> Apache Mahout has reached version 0.6. All developers are encouraged to begin using version 0.6. Highlights include:
> Improved Decision Tree performance and added support for regression problems
>
> Where can I find its source code and documentation.
>
> Should I download mahout
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/