You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Marko Novakovic <at...@yahoo.com> on 2008/03/29 08:31:25 UTC

GSOC

I apply for SVM algorithm at Hadoop platform.
I hope that I will be accepted by Google and Appache,
I am serious in intention to do this jos as great.

Greetings


      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Re: GSOC

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Saturday 29 March 2008, Ted Dunning wrote:
> SVM is not the only solution to these problems.  For many search engine
> applications, it isn't even likely to be the best.  Regularized logistic
> regression is a strong candidate as are random forests and boosted trees.

There have been several interesting papers on ranking search results based on 
preferences on NIPS 2007. The algorithms presented therein optimise exactly 
the criterion used to evaluate search engine rankings. In some cases they 
also compare against the svm solution of Thorsten Joachims.

> The algorithm may well have some 
> virtues, but it is unlikely to be universal. 

There is even an "official theorem" for your statement: the no-free-lunch 
theorem :)

> It is more likely that the author who claims this simply has a limited view
> of the range of things that might need to be done.

Or that he just examined the algorithm from exactly one angle that might not 
be the one that is important for your problem.

Isabel

-- 
Flattery is like cologne -- to be smelled, but not swallowed.		-- Josh 
Billings
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

I am changing my proposal according to your advice.
I described what the paralelization is useful.
I mentioned about big database of web pages, what
could I add?

Greetings

--- Ted Dunning <td...@veoh.com> wrote:

> 
> My suggestion is that you either make your proposal
> more about the algorithm
> itself or more about search engine application as
> seen by the user.
> 
> As it stands, your proposal doesn't sound like you
> know much about either
> one of these and are making serious decisions about
> implementation without
> learning more about the factors that would drive
> these decisions.
> 
> It would also help if you have a friend who speaks
> English well help you
> edit your proposal to make it easier to understand
> and to make sure it says
> what you mean to say.  Many readers of your proposal
> will try to compensate
> for the difficulty in communication, but if you can
> make it easier for them,
> it would help very much.
> 
> As an example of what I mean by the algorithm based
> proposal, you could
> simply say that you would like to implement SVM as
> part of the mahout
> project, especially optimized for processing text
> data and relevance
> feedback.  
> 
> For the user centered kind of proposal, you could
> say what the user problem
> is that you are trying to help with and then
> describe how your
> implementation will help with that.
> 
> For a proposal to be successful as an implementation
> using map-reduce, you
> should say why it is important to use parallel
> processing.  Note that it is
> NOT usually important to use a large parallel
> cluster for processing
> relevance feedback because there are only a few
> training examples in these
> cases.  It is also not usually feasible to use SVM
> in a large web index
> because there is no training data and because SVM
> training cost goes up
> dramatically with the size of the problem.
> 
> 
> On 3/29/08 11:14 AM, "Marko Novakovic"
> <at...@yahoo.com> wrote:
> 
> >>>>>>> I noted that the most usable solution for
> >> search
> >>>>>>> engines is Support Vector Machine.
> >>>>>>> The best solution for determination relevant
> >>>> page
> >>>>>>> ranking for user based search result is SVM.
> 
> 



      ____________________________________________________________________________________
OMG, Sweet deal for Yahoo! users/friends:Get A Month of Blockbuster Total Access, No Cost. W00t 
http://tc.deals.yahoo.com/tc/blockbuster/text2.com

Re: GSOC

Posted by Ted Dunning <td...@veoh.com>.

My suggestion is that you either make your proposal more about the algorithm
itself or more about search engine application as seen by the user.

As it stands, your proposal doesn't sound like you know much about either
one of these and are making serious decisions about implementation without
learning more about the factors that would drive these decisions.

It would also help if you have a friend who speaks English well help you
edit your proposal to make it easier to understand and to make sure it says
what you mean to say.  Many readers of your proposal will try to compensate
for the difficulty in communication, but if you can make it easier for them,
it would help very much.

As an example of what I mean by the algorithm based proposal, you could
simply say that you would like to implement SVM as part of the mahout
project, especially optimized for processing text data and relevance
feedback.  

For the user centered kind of proposal, you could say what the user problem
is that you are trying to help with and then describe how your
implementation will help with that.

For a proposal to be successful as an implementation using map-reduce, you
should say why it is important to use parallel processing.  Note that it is
NOT usually important to use a large parallel cluster for processing
relevance feedback because there are only a few training examples in these
cases.  It is also not usually feasible to use SVM in a large web index
because there is no training data and because SVM training cost goes up
dramatically with the size of the problem.

On 3/29/08 11:14 AM, "Marko Novakovic" <at...@yahoo.com> wrote:

>>>>>>> I noted that the most usable solution for
>> search
>>>>>>> engines is Support Vector Machine.
>>>>>>> The best solution for determination relevant
>>>> page
>>>>>>> ranking for user based search result is SVM.

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

OK, thanks.

Do do you suggest to me to change anything in my
application?

--- Ted Dunning <td...@veoh.com> wrote:

> 
> If you produce nice code, then your contribution is
> almost 100% likely to be
> accepted by the Mahout project.
> 
> I can't comment on the likelihood of getting funding
> from Google as part of
> the summer of code.
> 
> 
> On 3/29/08 11:07 AM, "Marko Novakovic"
> <at...@yahoo.com> wrote:
> 
> > OK,
> > 
> > I decided to implement SVM because it would be
> useful
> > for professor from my college, who works at his
> SE.
> > How much is probabitity to my application be
> accepted?
> > 
> > Greetings
> > 
> > --- Ted Dunning <td...@veoh.com> wrote:
> > 
> >> 
> >> SVM is fine but can be very expensive (and
> complex)
> >> for training especially
> >> for text-like applications.  Regularized logistic
> >> regression can be just
> >> about as good for document classification and is
> >> much easier to implement.
> >> I suspect that random forests would work very
> well
> >> as well.
> >> 
> >> As a GSOC project, SVM would be a good thing to
> >> implement for mahout.  So
> >> would all of the other algorithms.
> >> 
> >> 
> >> On 3/29/08 10:47 AM, "Marko Novakovic"
> >> <at...@yahoo.com> wrote:
> >> 
> >>> I collabotate with one proffesor form my
> faculty,
> >>> whose phd thesis was about machine learning in
> >> SE-s.
> >>> He uses combination of Naive Bayes and SVM. I
> >> didn't
> >>> understand his solution enough.
> >>> But I think that SVM is very useful and
> deployable
> >>> algorithm for SE-s.
> >>> Do you think that I should change anything in my
> >>> application.
> >>> 
> >>> Greetings
> >>> 
> >>> --- Ted Dunning <td...@veoh.com> wrote:
> >>> 
> >>>> 
> >>>> SVM is not the only solution to these problems.
> >> For
> >>>> many search engine
> >>>> applications, it isn't even likely to be the
> >> best.
> >>>> Regularized logistic
> >>>> regression is a strong candidate as are random
> >>>> forests and boosted trees.
> >>>> 
> >>>> Beware of any author who claims that their
> >> algorithm
> >>>> for machine learning
> >>>> that claims to be better than all others.  The
> >>>> algorithm may well have some
> >>>> virtues, but it is unlikely to be universal. 
> It
> >> is
> >>>> more likely that the
> >>>> author who claims this simply has a limited
> view
> >> of
> >>>> the range of things that
> >>>> might need to be done.
> >>>> 
> >>>> 
> >>>> On 3/29/08 10:23 AM, "Marko Novakovic"
> >>>> <at...@yahoo.com> wrote:
> >>>> 
> >>>>> The implementation of SVM algorithm at Hadoop
> >>>> platform
> >>>>> 
> >>>>> Abstract:
> >>>>> 
> >>>>> I have been researching in Search Engines
> >>>>> functionalities, like ranking, presenting
> >> relevant
> >>>>> page to users, etc.
> >>>>> I noted that the most usable solution for
> search
> >>>>> engines is Support Vector Machine.
> >>>>> The best solution for determination relevant
> >> page
> >>>>> ranking for user based search result is SVM.
> >>>>> Reference to this problem is article:
> >>>>> T. Joachims, F. Radlinski: "Search Engines
> that
> >>>>> Laerning from Implicit Feedback," IEEE
> Computer,
> >>>>> August 2007, pp 38
> >>>>> According to SVM is very complex algorithm,
> >> which
> >>>> has
> >>>>> a lot of operations,
> >>>>> I decided to implement SVM algorithm at Hadoop
> >>>>> platform.
> >>>>> 
> >>>>> Dear Apache,
> >>>>> 
> >>>>> My Idea:
> >>>>> 
> >>>>> I have idea to implement model and solution
> for
> >>>>> retrieving relevant ranking Web pages driven
> by
> >>>> user's
> >>>>> past behavior.
> >>>>> According to SE-s have a lot of crawled Web
> >> pages,
> >>>>> this operation must be realized distributed if
> >> we
> >>>> want
> >>>>> to obtain results in real time and have fresh
> >>>> learned
> >>>>> database. 
> >>>>> So we should paralelize all algorithms, which
> >> are
> >>>> used
> >>>>> for processing Web pages.
> >>>>> So I decided to implement the most used and
> >>>> exploited
> >>>>> algorithm in machine learning, deployed in
> >>>> operating
> >>>>> Web pages.
> >>>>> I also, choose SVM algorithm because it is
> very
> >>>>> complex algorithm for implementation
> >>>>> and I like temptations and I am not affraid of
> >>>> hard
> >>>>> tasks.
> >>>>> I tend to achieve most a big degree of
> >>>> performances
> >>>>> through paralelization.
> >>>>> I will exploit working on this project for
> >> writing
> >>>> new
> >>>>> article about deployment of clustering at
> SE-a.
> >>>>> I have prepared to this project reading
> >> articles:
> >>>>> [1] C. Burges, "A Tutorial on Suppot Vector
> >>>> Machines
> >>>>> for Pattern Recognition," Kluwer Academin
> >>>> Publishers,
> >>>>> Boston
> >>>>> [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
> >>>>> Selection Using Second Order Information for
> >>>> Training
> >>>>> Support Vector Machines," Journal of Machine
> >>>> Learning
> >>>>> Research 6 (2005), pp 18891918
> >>>>> I also have read Hadoop documentation and
> >> examined
> >>>>> your implementations of algoritm kMeans at
> this
> >>>>> platform.
> >>>>> 
> >>>>> Methodoligies of Development:
> >>>>> 
> >>>>> - Test Driven Development
> >>>>> - Deployment ANT an JUnit
> >>>>> - SDK: Eclipse
> >>>>> - SVN System for Versioning
> >>>>> - Javadoc
> >>>>> 
> >>>>> About Me:
> >>>>> 
> >>>>> My resume you can see at link
> >>>>> http://atisha34.googlepages.com/.
> >>>>> I also participate in some academic projects
> at
> >> my
> >>>>> college:
> >>>>> - Working at topic based Search Engine, called
> >>>> Grain,
> >>>>> which is in construction at my faculty.
> >>>>> - Tutorial about SE-s, mentored by professor
> >>>> Veljko
> 
=== message truncated ===



      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Re: GSOC

Posted by Ted Dunning <td...@veoh.com>.

If you produce nice code, then your contribution is almost 100% likely to be
accepted by the Mahout project.

I can't comment on the likelihood of getting funding from Google as part of
the summer of code.


On 3/29/08 11:07 AM, "Marko Novakovic" <at...@yahoo.com> wrote:

> OK,
> 
> I decided to implement SVM because it would be useful
> for professor from my college, who works at his SE.
> How much is probabitity to my application be accepted?
> 
> Greetings
> 
> --- Ted Dunning <td...@veoh.com> wrote:
> 
>> 
>> SVM is fine but can be very expensive (and complex)
>> for training especially
>> for text-like applications.  Regularized logistic
>> regression can be just
>> about as good for document classification and is
>> much easier to implement.
>> I suspect that random forests would work very well
>> as well.
>> 
>> As a GSOC project, SVM would be a good thing to
>> implement for mahout.  So
>> would all of the other algorithms.
>> 
>> 
>> On 3/29/08 10:47 AM, "Marko Novakovic"
>> <at...@yahoo.com> wrote:
>> 
>>> I collabotate with one proffesor form my faculty,
>>> whose phd thesis was about machine learning in
>> SE-s.
>>> He uses combination of Naive Bayes and SVM. I
>> didn't
>>> understand his solution enough.
>>> But I think that SVM is very useful and deployable
>>> algorithm for SE-s.
>>> Do you think that I should change anything in my
>>> application.
>>> 
>>> Greetings
>>> 
>>> --- Ted Dunning <td...@veoh.com> wrote:
>>> 
>>>> 
>>>> SVM is not the only solution to these problems.
>> For
>>>> many search engine
>>>> applications, it isn't even likely to be the
>> best.
>>>> Regularized logistic
>>>> regression is a strong candidate as are random
>>>> forests and boosted trees.
>>>> 
>>>> Beware of any author who claims that their
>> algorithm
>>>> for machine learning
>>>> that claims to be better than all others.  The
>>>> algorithm may well have some
>>>> virtues, but it is unlikely to be universal.  It
>> is
>>>> more likely that the
>>>> author who claims this simply has a limited view
>> of
>>>> the range of things that
>>>> might need to be done.
>>>> 
>>>> 
>>>> On 3/29/08 10:23 AM, "Marko Novakovic"
>>>> <at...@yahoo.com> wrote:
>>>> 
>>>>> The implementation of SVM algorithm at Hadoop
>>>> platform
>>>>> 
>>>>> Abstract:
>>>>> 
>>>>> I have been researching in Search Engines
>>>>> functionalities, like ranking, presenting
>> relevant
>>>>> page to users, etc.
>>>>> I noted that the most usable solution for search
>>>>> engines is Support Vector Machine.
>>>>> The best solution for determination relevant
>> page
>>>>> ranking for user based search result is SVM.
>>>>> Reference to this problem is article:
>>>>> T. Joachims, F. Radlinski: "Search Engines that
>>>>> Laerning from Implicit Feedback," IEEE Computer,
>>>>> August 2007, pp 38
>>>>> According to SVM is very complex algorithm,
>> which
>>>> has
>>>>> a lot of operations,
>>>>> I decided to implement SVM algorithm at Hadoop
>>>>> platform.
>>>>> 
>>>>> Dear Apache,
>>>>> 
>>>>> My Idea:
>>>>> 
>>>>> I have idea to implement model and solution for
>>>>> retrieving relevant ranking Web pages driven by
>>>> user's
>>>>> past behavior.
>>>>> According to SE-s have a lot of crawled Web
>> pages,
>>>>> this operation must be realized distributed if
>> we
>>>> want
>>>>> to obtain results in real time and have fresh
>>>> learned
>>>>> database. 
>>>>> So we should paralelize all algorithms, which
>> are
>>>> used
>>>>> for processing Web pages.
>>>>> So I decided to implement the most used and
>>>> exploited
>>>>> algorithm in machine learning, deployed in
>>>> operating
>>>>> Web pages.
>>>>> I also, choose SVM algorithm because it is very
>>>>> complex algorithm for implementation
>>>>> and I like temptations and I am not affraid of
>>>> hard
>>>>> tasks.
>>>>> I tend to achieve most a big degree of
>>>> performances
>>>>> through paralelization.
>>>>> I will exploit working on this project for
>> writing
>>>> new
>>>>> article about deployment of clustering at SE-a.
>>>>> I have prepared to this project reading
>> articles:
>>>>> [1] C. Burges, "A Tutorial on Suppot Vector
>>>> Machines
>>>>> for Pattern Recognition," Kluwer Academin
>>>> Publishers,
>>>>> Boston
>>>>> [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
>>>>> Selection Using Second Order Information for
>>>> Training
>>>>> Support Vector Machines," Journal of Machine
>>>> Learning
>>>>> Research 6 (2005), pp 18891918
>>>>> I also have read Hadoop documentation and
>> examined
>>>>> your implementations of algoritm kMeans at this
>>>>> platform.
>>>>> 
>>>>> Methodoligies of Development:
>>>>> 
>>>>> - Test Driven Development
>>>>> - Deployment ANT an JUnit
>>>>> - SDK: Eclipse
>>>>> - SVN System for Versioning
>>>>> - Javadoc
>>>>> 
>>>>> About Me:
>>>>> 
>>>>> My resume you can see at link
>>>>> http://atisha34.googlepages.com/.
>>>>> I also participate in some academic projects at
>> my
>>>>> college:
>>>>> - Working at topic based Search Engine, called
>>>> Grain,
>>>>> which is in construction at my faculty.
>>>>> - Tutorial about SE-s, mentored by professor
>>>> Veljko
>>>>> Milutinovic: "The New Avenues in Search Engines"
>>>>> presentation:
>>>>> 
>> http://atisha34.googlepages.com/Searchengines.ppt
>>>>> abstract:
>>>>> 
>>>> 
>>> 
>> 
> http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
>>>>> I should publish article driven by this
>>>> presentation
>>>>> at IPSI Magazine.
>>>>> - Other projects in which I participate aren't
>>>> related
>>>>> to machine learning and search engines.
>>>>> 
>>>>> My Interests:
>>>>> - Search Engines
>>>>> - Software Engineering and Test Driven
>> Development
>>>>> - Machine Learning
>>>>> - Database Modeling and OO Design
>>>>> - ERP and Business Processes
>>>>> 
>>>>> Sincerely Yours,
>>>>> Marko Novakovic
>>>>> 
>>>>> --- Karl Wettin <ka...@gmail.com> wrote:
>>>>> 
>>>>>> Marko Novakovic skrev:
>>>>>> 
>>>>>> Hi Marko,
>>>>>> 
>>>>>>> I apply for SVM algorithm at Hadoop platform.
>>>>>>> I hope that I will be accepted by Google and
>>>>>> Appache,
>>>>>>> I am serious in intention to do this jos as
>>>> great.
>>>>>> 
>>>>>> great news! Feel free to post your proposal
>> here
>> 
> === message truncated ===
> 
> 
> 
>       
> ______________________________________________________________________________
> ______
> OMG, Sweet deal for Yahoo! users/friends:Get A Month of Blockbuster Total
> Access, No Cost. W00t
> http://tc.deals.yahoo.com/tc/blockbuster/text2.com

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

OK,

I decided to implement SVM because it would be useful
for professor from my college, who works at his SE.
How much is probabitity to my application be accepted?

Greetings

--- Ted Dunning <td...@veoh.com> wrote:

> 
> SVM is fine but can be very expensive (and complex)
> for training especially
> for text-like applications.  Regularized logistic
> regression can be just
> about as good for document classification and is
> much easier to implement.
> I suspect that random forests would work very well
> as well.
> 
> As a GSOC project, SVM would be a good thing to
> implement for mahout.  So
> would all of the other algorithms.
> 
> 
> On 3/29/08 10:47 AM, "Marko Novakovic"
> <at...@yahoo.com> wrote:
> 
> > I collabotate with one proffesor form my faculty,
> > whose phd thesis was about machine learning in
> SE-s.
> > He uses combination of Naive Bayes and SVM. I
> didn't
> > understand his solution enough.
> > But I think that SVM is very useful and deployable
> > algorithm for SE-s.
> > Do you think that I should change anything in my
> > application.
> > 
> > Greetings
> > 
> > --- Ted Dunning <td...@veoh.com> wrote:
> > 
> >> 
> >> SVM is not the only solution to these problems. 
> For
> >> many search engine
> >> applications, it isn't even likely to be the
> best.
> >> Regularized logistic
> >> regression is a strong candidate as are random
> >> forests and boosted trees.
> >> 
> >> Beware of any author who claims that their
> algorithm
> >> for machine learning
> >> that claims to be better than all others.  The
> >> algorithm may well have some
> >> virtues, but it is unlikely to be universal.  It
> is
> >> more likely that the
> >> author who claims this simply has a limited view
> of
> >> the range of things that
> >> might need to be done.
> >> 
> >> 
> >> On 3/29/08 10:23 AM, "Marko Novakovic"
> >> <at...@yahoo.com> wrote:
> >> 
> >>> The implementation of SVM algorithm at Hadoop
> >> platform
> >>> 
> >>> Abstract:
> >>> 
> >>> I have been researching in Search Engines
> >>> functionalities, like ranking, presenting
> relevant
> >>> page to users, etc.
> >>> I noted that the most usable solution for search
> >>> engines is Support Vector Machine.
> >>> The best solution for determination relevant
> page
> >>> ranking for user based search result is SVM.
> >>> Reference to this problem is article:
> >>> T. Joachims, F. Radlinski: "Search Engines that
> >>> Laerning from Implicit Feedback," IEEE Computer,
> >>> August 2007, pp 38
> >>> According to SVM is very complex algorithm,
> which
> >> has
> >>> a lot of operations,
> >>> I decided to implement SVM algorithm at Hadoop
> >>> platform.
> >>> 
> >>> Dear Apache,
> >>> 
> >>> My Idea:
> >>> 
> >>> I have idea to implement model and solution for
> >>> retrieving relevant ranking Web pages driven by
> >> user's
> >>> past behavior. 
> >>> According to SE-s have a lot of crawled Web
> pages,
> >>> this operation must be realized distributed if
> we
> >> want
> >>> to obtain results in real time and have fresh
> >> learned
> >>> database. 
> >>> So we should paralelize all algorithms, which
> are
> >> used
> >>> for processing Web pages.
> >>> So I decided to implement the most used and
> >> exploited
> >>> algorithm in machine learning, deployed in
> >> operating
> >>> Web pages.
> >>> I also, choose SVM algorithm because it is very
> >>> complex algorithm for implementation
> >>> and I like temptations and I am not affraid of
> >> hard
> >>> tasks.
> >>> I tend to achieve most a big degree of
> >> performances
> >>> through paralelization.
> >>> I will exploit working on this project for
> writing
> >> new
> >>> article about deployment of clustering at SE-a.
> >>> I have prepared to this project reading
> articles:
> >>> [1] C. Burges, "A Tutorial on Suppot Vector
> >> Machines
> >>> for Pattern Recognition," Kluwer Academin
> >> Publishers,
> >>> Boston
> >>> [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
> >>> Selection Using Second Order Information for
> >> Training
> >>> Support Vector Machines," Journal of Machine
> >> Learning
> >>> Research 6 (2005), pp 18891918
> >>> I also have read Hadoop documentation and
> examined
> >>> your implementations of algoritm kMeans at this
> >>> platform.
> >>> 
> >>> Methodoligies of Development:
> >>> 
> >>> - Test Driven Development
> >>> - Deployment ANT an JUnit
> >>> - SDK: Eclipse
> >>> - SVN System for Versioning
> >>> - Javadoc
> >>> 
> >>> About Me:
> >>> 
> >>> My resume you can see at link
> >>> http://atisha34.googlepages.com/.
> >>> I also participate in some academic projects at
> my
> >>> college:
> >>> - Working at topic based Search Engine, called
> >> Grain,
> >>> which is in construction at my faculty.
> >>> - Tutorial about SE-s, mentored by professor
> >> Veljko
> >>> Milutinovic: "The New Avenues in Search Engines"
> >>> presentation:
> >>>
> http://atisha34.googlepages.com/Searchengines.ppt
> >>> abstract:
> >>> 
> >> 
> >
>
http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
> >>> I should publish article driven by this
> >> presentation
> >>> at IPSI Magazine.
> >>> - Other projects in which I participate aren't
> >> related
> >>> to machine learning and search engines.
> >>> 
> >>> My Interests:
> >>> - Search Engines
> >>> - Software Engineering and Test Driven
> Development
> >>> - Machine Learning
> >>> - Database Modeling and OO Design
> >>> - ERP and Business Processes
> >>> 
> >>> Sincerely Yours,
> >>> Marko Novakovic
> >>> 
> >>> --- Karl Wettin <ka...@gmail.com> wrote:
> >>> 
> >>>> Marko Novakovic skrev:
> >>>> 
> >>>> Hi Marko,
> >>>> 
> >>>>> I apply for SVM algorithm at Hadoop platform.
> >>>>> I hope that I will be accepted by Google and
> >>>> Appache,
> >>>>> I am serious in intention to do this jos as
> >> great.
> >>>> 
> >>>> great news! Feel free to post your proposal
> here
> 
=== message truncated ===



      ____________________________________________________________________________________
OMG, Sweet deal for Yahoo! users/friends:Get A Month of Blockbuster Total Access, No Cost. W00t 
http://tc.deals.yahoo.com/tc/blockbuster/text2.com

Re: GSOC

Posted by Ted Dunning <td...@veoh.com>.

SVM is fine but can be very expensive (and complex) for training especially
for text-like applications.  Regularized logistic regression can be just
about as good for document classification and is much easier to implement.
I suspect that random forests would work very well as well.

As a GSOC project, SVM would be a good thing to implement for mahout.  So
would all of the other algorithms.


On 3/29/08 10:47 AM, "Marko Novakovic" <at...@yahoo.com> wrote:

> I collabotate with one proffesor form my faculty,
> whose phd thesis was about machine learning in SE-s.
> He uses combination of Naive Bayes and SVM. I didn't
> understand his solution enough.
> But I think that SVM is very useful and deployable
> algorithm for SE-s.
> Do you think that I should change anything in my
> application.
> 
> Greetings
> 
> --- Ted Dunning <td...@veoh.com> wrote:
> 
>> 
>> SVM is not the only solution to these problems.  For
>> many search engine
>> applications, it isn't even likely to be the best.
>> Regularized logistic
>> regression is a strong candidate as are random
>> forests and boosted trees.
>> 
>> Beware of any author who claims that their algorithm
>> for machine learning
>> that claims to be better than all others.  The
>> algorithm may well have some
>> virtues, but it is unlikely to be universal.  It is
>> more likely that the
>> author who claims this simply has a limited view of
>> the range of things that
>> might need to be done.
>> 
>> 
>> On 3/29/08 10:23 AM, "Marko Novakovic"
>> <at...@yahoo.com> wrote:
>> 
>>> The implementation of SVM algorithm at Hadoop
>> platform
>>> 
>>> Abstract:
>>> 
>>> I have been researching in Search Engines
>>> functionalities, like ranking, presenting relevant
>>> page to users, etc.
>>> I noted that the most usable solution for search
>>> engines is Support Vector Machine.
>>> The best solution for determination relevant page
>>> ranking for user based search result is SVM.
>>> Reference to this problem is article:
>>> T. Joachims, F. Radlinski: "Search Engines that
>>> Laerning from Implicit Feedback," IEEE Computer,
>>> August 2007, pp 38
>>> According to SVM is very complex algorithm, which
>> has
>>> a lot of operations,
>>> I decided to implement SVM algorithm at Hadoop
>>> platform.
>>> 
>>> Dear Apache,
>>> 
>>> My Idea:
>>> 
>>> I have idea to implement model and solution for
>>> retrieving relevant ranking Web pages driven by
>> user's
>>> past behavior. 
>>> According to SE-s have a lot of crawled Web pages,
>>> this operation must be realized distributed if we
>> want
>>> to obtain results in real time and have fresh
>> learned
>>> database. 
>>> So we should paralelize all algorithms, which are
>> used
>>> for processing Web pages.
>>> So I decided to implement the most used and
>> exploited
>>> algorithm in machine learning, deployed in
>> operating
>>> Web pages.
>>> I also, choose SVM algorithm because it is very
>>> complex algorithm for implementation
>>> and I like temptations and I am not affraid of
>> hard
>>> tasks.
>>> I tend to achieve most a big degree of
>> performances
>>> through paralelization.
>>> I will exploit working on this project for writing
>> new
>>> article about deployment of clustering at SE-a.
>>> I have prepared to this project reading articles:
>>> [1] C. Burges, "A Tutorial on Suppot Vector
>> Machines
>>> for Pattern Recognition," Kluwer Academin
>> Publishers,
>>> Boston
>>> [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
>>> Selection Using Second Order Information for
>> Training
>>> Support Vector Machines," Journal of Machine
>> Learning
>>> Research 6 (2005), pp 18891918
>>> I also have read Hadoop documentation and examined
>>> your implementations of algoritm kMeans at this
>>> platform.
>>> 
>>> Methodoligies of Development:
>>> 
>>> - Test Driven Development
>>> - Deployment ANT an JUnit
>>> - SDK: Eclipse
>>> - SVN System for Versioning
>>> - Javadoc
>>> 
>>> About Me:
>>> 
>>> My resume you can see at link
>>> http://atisha34.googlepages.com/.
>>> I also participate in some academic projects at my
>>> college:
>>> - Working at topic based Search Engine, called
>> Grain,
>>> which is in construction at my faculty.
>>> - Tutorial about SE-s, mentored by professor
>> Veljko
>>> Milutinovic: "The New Avenues in Search Engines"
>>> presentation:
>>> http://atisha34.googlepages.com/Searchengines.ppt
>>> abstract:
>>> 
>> 
> http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
>>> I should publish article driven by this
>> presentation
>>> at IPSI Magazine.
>>> - Other projects in which I participate aren't
>> related
>>> to machine learning and search engines.
>>> 
>>> My Interests:
>>> - Search Engines
>>> - Software Engineering and Test Driven Development
>>> - Machine Learning
>>> - Database Modeling and OO Design
>>> - ERP and Business Processes
>>> 
>>> Sincerely Yours,
>>> Marko Novakovic
>>> 
>>> --- Karl Wettin <ka...@gmail.com> wrote:
>>> 
>>>> Marko Novakovic skrev:
>>>> 
>>>> Hi Marko,
>>>> 
>>>>> I apply for SVM algorithm at Hadoop platform.
>>>>> I hope that I will be accepted by Google and
>>>> Appache,
>>>>> I am serious in intention to do this jos as
>> great.
>>>> 
>>>> great news! Feel free to post your proposal here
>>>> too.
>>>> 
>>>> 
>>>>      karl
>>>> 
>>> 
>>> 
>>> 
>>>       
>>> 
>> 
> ______________________________________________________________________________
>>> ______
>>> Looking for last minute shopping deals?
>>> Find them fast with Yahoo! Search.
>>> 
>> 
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>>> 
>> 
>> 
> 
> 
> 
>       
> ______________________________________________________________________________
> ______
> Special deal for Yahoo! users & friends - No Cost. Get a month of Blockbuster
> Total Access now 
> http://tc.deals.yahoo.com/tc/blockbuster/text3.com
>

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

I collabotate with one proffesor form my faculty,
whose phd thesis was about machine learning in SE-s.
He uses combination of Naive Bayes and SVM. I didn't
understand his solution enough.
But I think that SVM is very useful and deployable
algorithm for SE-s.
Do you think that I should change anything in my
application.

Greetings

--- Ted Dunning <td...@veoh.com> wrote:

> 
> SVM is not the only solution to these problems.  For
> many search engine
> applications, it isn't even likely to be the best. 
> Regularized logistic
> regression is a strong candidate as are random
> forests and boosted trees.
> 
> Beware of any author who claims that their algorithm
> for machine learning
> that claims to be better than all others.  The
> algorithm may well have some
> virtues, but it is unlikely to be universal.  It is
> more likely that the
> author who claims this simply has a limited view of
> the range of things that
> might need to be done.
> 
> 
> On 3/29/08 10:23 AM, "Marko Novakovic"
> <at...@yahoo.com> wrote:
> 
> > The implementation of SVM algorithm at Hadoop
> platform
> > 
> > Abstract:
> > 
> > I have been researching in Search Engines
> > functionalities, like ranking, presenting relevant
> > page to users, etc.
> > I noted that the most usable solution for search
> > engines is Support Vector Machine.
> > The best solution for determination relevant page
> > ranking for user based search result is SVM.
> > Reference to this problem is article:
> > T. Joachims, F. Radlinski: "Search Engines that
> > Laerning from Implicit Feedback," IEEE Computer,
> > August 2007, pp 38
> > According to SVM is very complex algorithm, which
> has
> > a lot of operations,
> > I decided to implement SVM algorithm at Hadoop
> > platform.
> > 
> > Dear Apache,
> > 
> > My Idea:
> > 
> > I have idea to implement model and solution for
> > retrieving relevant ranking Web pages driven by
> user's
> > past behavior. 
> > According to SE-s have a lot of crawled Web pages,
> > this operation must be realized distributed if we
> want
> > to obtain results in real time and have fresh
> learned
> > database. 
> > So we should paralelize all algorithms, which are
> used
> > for processing Web pages.
> > So I decided to implement the most used and
> exploited
> > algorithm in machine learning, deployed in
> operating
> > Web pages.
> > I also, choose SVM algorithm because it is very
> > complex algorithm for implementation
> > and I like temptations and I am not affraid of
> hard
> > tasks.
> > I tend to achieve most a big degree of
> performances
> > through paralelization.
> > I will exploit working on this project for writing
> new
> > article about deployment of clustering at SE-a.
> > I have prepared to this project reading articles:
> > [1] C. Burges, "A Tutorial on Suppot Vector
> Machines
> > for Pattern Recognition," Kluwer Academin
> Publishers,
> > Boston
> > [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
> > Selection Using Second Order Information for
> Training
> > Support Vector Machines," Journal of Machine
> Learning
> > Research 6 (2005), pp 18891918
> > I also have read Hadoop documentation and examined
> > your implementations of algoritm kMeans at this
> > platform.
> > 
> > Methodoligies of Development:
> > 
> > - Test Driven Development
> > - Deployment ANT an JUnit
> > - SDK: Eclipse
> > - SVN System for Versioning
> > - Javadoc
> > 
> > About Me:
> > 
> > My resume you can see at link
> > http://atisha34.googlepages.com/.
> > I also participate in some academic projects at my
> > college:
> > - Working at topic based Search Engine, called
> Grain,
> > which is in construction at my faculty.
> > - Tutorial about SE-s, mentored by professor
> Veljko
> > Milutinovic: "The New Avenues in Search Engines"
> > presentation:
> > http://atisha34.googlepages.com/Searchengines.ppt
> > abstract:
> >
>
http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
> > I should publish article driven by this
> presentation
> > at IPSI Magazine.
> > - Other projects in which I participate aren't
> related
> > to machine learning and search engines.
> > 
> > My Interests:
> > - Search Engines
> > - Software Engineering and Test Driven Development
> > - Machine Learning
> > - Database Modeling and OO Design
> > - ERP and Business Processes
> > 
> > Sincerely Yours,
> > Marko Novakovic
> > 
> > --- Karl Wettin <ka...@gmail.com> wrote:
> > 
> >> Marko Novakovic skrev:
> >> 
> >> Hi Marko,
> >> 
> >>> I apply for SVM algorithm at Hadoop platform.
> >>> I hope that I will be accepted by Google and
> >> Appache,
> >>> I am serious in intention to do this jos as
> great.
> >> 
> >> great news! Feel free to post your proposal here
> >> too.
> >> 
> >> 
> >>      karl
> >> 
> > 
> > 
> > 
> >       
> >
>
______________________________________________________________________________
> > ______
> > Looking for last minute shopping deals?
> > Find them fast with Yahoo! Search.
> >
>
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > 
> 
> 



      ____________________________________________________________________________________
Special deal for Yahoo! users & friends - No Cost. Get a month of Blockbuster Total Access now 
http://tc.deals.yahoo.com/tc/blockbuster/text3.com

Re: GSOC

Posted by Ted Dunning <td...@veoh.com>.

SVM is not the only solution to these problems.  For many search engine
applications, it isn't even likely to be the best.  Regularized logistic
regression is a strong candidate as are random forests and boosted trees.

Beware of any author who claims that their algorithm for machine learning
that claims to be better than all others.  The algorithm may well have some
virtues, but it is unlikely to be universal.  It is more likely that the
author who claims this simply has a limited view of the range of things that
might need to be done.


On 3/29/08 10:23 AM, "Marko Novakovic" <at...@yahoo.com> wrote:

> The implementation of SVM algorithm at Hadoop platform
> 
> Abstract:
> 
> I have been researching in Search Engines
> functionalities, like ranking, presenting relevant
> page to users, etc.
> I noted that the most usable solution for search
> engines is Support Vector Machine.
> The best solution for determination relevant page
> ranking for user based search result is SVM.
> Reference to this problem is article:
> T. Joachims, F. Radlinski: "Search Engines that
> Laerning from Implicit Feedback," IEEE Computer,
> August 2007, pp 38
> According to SVM is very complex algorithm, which has
> a lot of operations,
> I decided to implement SVM algorithm at Hadoop
> platform.
> 
> Dear Apache,
> 
> My Idea:
> 
> I have idea to implement model and solution for
> retrieving relevant ranking Web pages driven by user's
> past behavior. 
> According to SE-s have a lot of crawled Web pages,
> this operation must be realized distributed if we want
> to obtain results in real time and have fresh learned
> database. 
> So we should paralelize all algorithms, which are used
> for processing Web pages.
> So I decided to implement the most used and exploited
> algorithm in machine learning, deployed in operating
> Web pages.
> I also, choose SVM algorithm because it is very
> complex algorithm for implementation
> and I like temptations and I am not affraid of hard
> tasks.
> I tend to achieve most a big degree of performances
> through paralelization.
> I will exploit working on this project for writing new
> article about deployment of clustering at SE-a.
> I have prepared to this project reading articles:
> [1] C. Burges, "A Tutorial on Suppot Vector Machines
> for Pattern Recognition," Kluwer Academin Publishers,
> Boston
> [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
> Selection Using Second Order Information for Training
> Support Vector Machines," Journal of Machine Learning
> Research 6 (2005), pp 18891918
> I also have read Hadoop documentation and examined
> your implementations of algoritm kMeans at this
> platform.
> 
> Methodoligies of Development:
> 
> - Test Driven Development
> - Deployment ANT an JUnit
> - SDK: Eclipse
> - SVN System for Versioning
> - Javadoc
> 
> About Me:
> 
> My resume you can see at link
> http://atisha34.googlepages.com/.
> I also participate in some academic projects at my
> college:
> - Working at topic based Search Engine, called Grain,
> which is in construction at my faculty.
> - Tutorial about SE-s, mentored by professor Veljko
> Milutinovic: "The New Avenues in Search Engines"
> presentation:
> http://atisha34.googlepages.com/Searchengines.ppt
> abstract:
> http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
> I should publish article driven by this presentation
> at IPSI Magazine.
> - Other projects in which I participate aren't related
> to machine learning and search engines.
> 
> My Interests:
> - Search Engines
> - Software Engineering and Test Driven Development
> - Machine Learning
> - Database Modeling and OO Design
> - ERP and Business Processes
> 
> Sincerely Yours,
> Marko Novakovic
> 
> --- Karl Wettin <ka...@gmail.com> wrote:
> 
>> Marko Novakovic skrev:
>> 
>> Hi Marko,
>> 
>>> I apply for SVM algorithm at Hadoop platform.
>>> I hope that I will be accepted by Google and
>> Appache,
>>> I am serious in intention to do this jos as great.
>> 
>> great news! Feel free to post your proposal here
>> too.
>> 
>> 
>>      karl
>> 
> 
> 
> 
>       
> ______________________________________________________________________________
> ______
> Looking for last minute shopping deals?
> Find them fast with Yahoo! Search.
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

The implementation of SVM algorithm at Hadoop platform

Abstract:

I have been researching in Search Engines
functionalities, like ranking, presenting relevant
page to users, etc. 
I noted that the most usable solution for search
engines is Support Vector Machine.
The best solution for determination relevant page
ranking for user based search result is SVM.
Reference to this problem is article:
T. Joachims, F. Radlinski: "Search Engines that
Laerning from Implicit Feedback," IEEE Computer,
August 2007, pp 38
According to SVM is very complex algorithm, which has
a lot of operations, 
I decided to implement SVM algorithm at Hadoop
platform.

Dear Apache,

My Idea:

I have idea to implement model and solution for
retrieving relevant ranking Web pages driven by user's
past behavior. 
According to SE-s have a lot of crawled Web pages, 
this operation must be realized distributed if we want
to obtain results in real time and have fresh learned
database. 
So we should paralelize all algorithms, which are used
for processing Web pages.
So I decided to implement the most used and exploited
algorithm in machine learning, deployed in operating
Web pages.
I also, choose SVM algorithm because it is very
complex algorithm for implementation 
and I like temptations and I am not affraid of hard
tasks.
I tend to achieve most a big degree of performances
through paralelization.
I will exploit working on this project for writing new
article about deployment of clustering at SE-a.
I have prepared to this project reading articles:
[1] C. Burges, "A Tutorial on Suppot Vector Machines
for Pattern Recognition," Kluwer Academin Publishers,
Boston
[2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
Selection Using Second Order Information for Training
Support Vector Machines," Journal of Machine Learning
Research 6 (2005), pp 18891918
I also have read Hadoop documentation and examined
your implementations of algoritm kMeans at this
platform.

Methodoligies of Development:

- Test Driven Development
- Deployment ANT an JUnit
- SDK: Eclipse
- SVN System for Versioning
- Javadoc

About Me:

My resume you can see at link
http://atisha34.googlepages.com/.
I also participate in some academic projects at my
college:
- Working at topic based Search Engine, called Grain,
which is in construction at my faculty.
- Tutorial about SE-s, mentored by professor Veljko
Milutinovic: "The New Avenues in Search Engines" 
presentation:
http://atisha34.googlepages.com/Searchengines.ppt
abstract:
http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
I should publish article driven by this presentation
at IPSI Magazine.
- Other projects in which I participate aren't related
to machine learning and search engines.

My Interests:
- Search Engines
- Software Engineering and Test Driven Development
- Machine Learning
- Database Modeling and OO Design
- ERP and Business Processes

Sincerely Yours,
Marko Novakovic

--- Karl Wettin <ka...@gmail.com> wrote:

> Marko Novakovic skrev:
> 
> Hi Marko,
> 
> > I apply for SVM algorithm at Hadoop platform.
> > I hope that I will be accepted by Google and
> Appache,
> > I am serious in intention to do this jos as great.
> 
> great news! Feel free to post your proposal here
> too.
> 
> 
>      karl
> 

      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Re: GSOC

Posted by Karl Wettin <ka...@gmail.com>.

Marko Novakovic skrev:

Hi Marko,

> I apply for SVM algorithm at Hadoop platform.
> I hope that I will be accepted by Google and Appache,
> I am serious in intention to do this jos as great.

great news! Feel free to post your proposal here too.


     karl

Re: GSOC

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Sunday 30 March 2008, you wrote:
> This is my application, give me feedback, please.

Sorry, I am having a slow network connection now and made the mistake to start 
answering mails before everything was here. I saw your extended application 
only after replying to your initial mail :(

Isabel

-- 
"You're a creature of the night, Michael.  Wait'll Mom hears about this."		-- 
from the movie "The Lost Boys"
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: GSOC

Posted by Marko Novakovic <at...@yahoo.com>.

> I think it would be great if you could add a little
> more information to your 
> application - if you have not already done so in the
> GSoC web form. Some 
> ideas of useful information, that can help us judge
> your application:
> 
>  - your background
>  - your reason for applying to do this task
>  - your first project plan how you want to proceed
>  - whatever else you think might help us decide that
> you are a perfect fit
 
This is my application, give me feedback, please.

The Implementation of Support Vector Machine Algorithm
at Hadoop Platform

Abstract

I have been researching in Search Engines
functionalities, like ranking, presenting relevant
page to users, etc. 
I noted that SVM algorithm is good solution for
clasifying crawled Web pages in search engines.
After I had been reading and elaborating article
[Joachims, 2007]
I decided to implement SVM optimized for processing
text data and retrieving relevant feedback.
According to SVM is very complex algorithm, which has
a lot of operations, 
I choose map-reduce Hadoop platform.

[Joachims, 2007] T. Joachims, F. Radlinski: "Search
Engines that Laerning from Implicit Feedback," IEEE
Computer, August 2007, pp 38

Detailed Description

Dear Google and Apache,

Project: Lucene Mahout

My Idea:

I have idea to implement model and solution for
retrieving relevant ranking Web pages, in order to
user's recent behavior. 
According to SE-s have a lot of crawled Web pages, 
machine learning algorithms, which is used by SE, must
be realized as distributed or paralilized, if we want
to obtain  real-time results  and have fresh retrieved
database. 
I want to implement the Support Vector Machine (SVM)
formulation for optimizing multivariate performance
measures described in [Joachims, 2005]. Furthermore,
that would implement the alternative structural
formulation of the SVM optimization problem for
conventional binary classification with error rate and
ordinal regression described in [Joachims, 2006].
There is not usually important to use a large parallel
cluster for processing relevance feedback because
there are only a few training examples in these cases.
According to SVM training cost goes up extremly with
the size of the problem (quadratic complexity), I want
to deploy this solution at first 100 pages for each
combination of user and query.
I also, choose SVM algorithm because I comprehend that
this is big temptation for me and will be useful for
professors at my college.
I will exploit working on this project for writing new
article about deployment of SVM algorithm optimization
at SE-a.
I have prepared to this project reading articles:
[1] C. Burges, "A Tutorial on Suppot Vector Machines
for Pattern Recognition," Kluwer Academic Publishers,
Boston, 1998
[2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
Selection Using Second Order Information for Training
Support Vector Machines," Journal of Machine Learning
Research 6 (2005), pp 18891918
I also have read Hadoop documentation and examined
your implementations of algoritm kMeans at this
platform.

Methodoligies of Development:

- Test Driven Development
- Deployment ANT an JUnit
- SDK: Eclipse
- SVN System for Versioning
- Javadoc

About Me:

My resume you can see at link
http://atisha34.googlepages.com/.
I also participate in some academic projects at my
college:
- Working at topic based Search Engine, called Grain,
which is in construction at my faculty.
- Tutorial about SE-s, mentored by professor Veljko
Milutinovic: "The New Avenues in Search Engines" 
presentation:
http://atisha34.googlepages.com/Searchengines.ppt
abstract:
http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
I should publish article driven by this presentation
at IPSI Magazine.
- Other projects in which I participate aren't related
to machine learning and search engines.

My Interests:
- Search Engines
- Software Engineering and Test Driven Development
- Machine Learning
- Database Modeling and OO Design
- ERP and Business Processes

Sincerely Yours,
Marko Novakovic
 
[Joachims, 2006] T. Joachims, Training Linear SVMs in
Linear Time, Proceedings of the ACM Conference on
Knowledge Discovery and Data Mining (KDD), 2006.
[Joachims, 2005] T. Joachims, A Support Vector Method
for Multivariate Performance Measures, Proceedings of
the International Conference on Machine Learning
(ICML), 2005.
 


      ____________________________________________________________________________________
Special deal for Yahoo! users & friends - No Cost. Get a month of Blockbuster Total Access now 
http://tc.deals.yahoo.com/tc/blockbuster/text3.com

Re: GSOC

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Saturday 29 March 2008, Marko Novakovic wrote:
> I apply for SVM algorithm at Hadoop platform.
> I hope that I will be accepted by Google and Appache,
> I am serious in intention to do this jos as great.

I think it would be great if you could add a little more information to your 
application - if you have not already done so in the GSoC web form. Some 
ideas of useful information, that can help us judge your application:

 - your background
 - your reason for applying to do this task
 - your first project plan how you want to proceed
 - whatever else you think might help us decide that you are a perfect fit

Isabel

-- 
You look like a million dollars.  All green and wrinkled.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>