You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tonci Buljan <to...@gmail.com> on 2010/03/01 15:01:56 UTC

Hadoop as master's thesis

Hello everyone,

 I'm thinking of using Hadoop as a subject in my master's thesis in Computer
Science. I'm supposed to solve some kind of a problem with Hadoop, but can't
think of any :)).

 We have a lab with 10-15 computers and I tough of installing Hadoop on
those computers, and now I should write some kind of a program to run on my
cluster.

 I really hope you understood my problem :). I really need any kind of
suggestion.


 P.S. Sorry for my bad English, I'm from Croatia.

Re: Hadoop as master's thesis

Posted by Thomas Koch <th...@koch.ro>.
Hi Tonci,
>  I'm thinking of using Hadoop as a subject in my master's thesis in
>  Computer Science. I'm supposed to solve some kind of a problem with
>  Hadoop, but can't think of any :)).

I've a question, that could be a topic for a master thesis, although it's more 
a question about hadoop and not solving a problem with hadoop.
There are thousands of organizations out there, that have a powerful but 
mostly underused desktop PCs standing on everyone's desk. Now if an 
organization has 10-50 of those PCs and would install hadoop on every desktop 
PC, what could this be useful for?

I already have at least two ideas:

- a distributed, encrypted backup system on top of HDFS

- an automated knowledge system
  - crawl all websites and linked websites bookmarked by users
  - build an index on these sites
  - make this index searchable for all users
  - include public, internal documents

Some questions for the task of the master thesis could be

- Is it possible, to run hadoop in such an environment?
- What are the drawbacks?
- What is missing in hadoop to make it possible?
- What is the ecological impact to use desktop PCs in this way, if this could 
substitute the use of some servers in a datacenter? (The heat emmitted by the 
PCs is even useful in winter.)

Best regards,

Thomas Koch, http://www.koch.ro

P.s. Do you know, that next year's Debian conference will be in Bosnia?
http://wiki.debconf.org/wiki/DebConf11/BanjaLuka

Re: Hadoop as master's thesis

Posted by Amund Tveit <am...@atbrox.com>.
On Mon, Mar 1, 2010 at 3:01 PM, Tonci Buljan <to...@gmail.com> wrote:
> Hello everyone,
>
>  I'm thinking of using Hadoop as a subject in my master's thesis in Computer
> Science. I'm supposed to solve some kind of a problem with Hadoop, but can't
> think of any :)).

Here is an overview of hadoop/mapreduce algorithms that might be of
inspiration when finding a problem to solve:
http://atbrox.com/2010/02/12/mapreduce-hadoop-algorithms-in-academic-papers-updated/

A new dataset related to machine learning:
http://learningtorankchallenge.yahoo.com/

Best regards,
Amund

>
>  We have a lab with 10-15 computers and I tough of installing Hadoop on
> those computers, and now I should write some kind of a program to run on my
> cluster.
>
>  I really hope you understood my problem :). I really need any kind of
> suggestion.
>
>
>  P.S. Sorry for my bad English, I'm from Croatia.
>



-- 
http://atbrox.com - +47 416 26 572

Re: Hadoop as master's thesis

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Bok Tonci,

You'll find good dataset pointers here:

  http://www.simpy.com/user/otis/search/dataset


You may find inspiration for Hadoop usage here, assuming you have ML background:

  http://cwiki.apache.org/MAHOUT/algorithms.html

Oh, and you may also want to look out for GSOC (Google Summer of Code).

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Tonci Buljan <to...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Mon, March 1, 2010 9:24:53 AM
> Subject: Re: Hadoop as master's thesis
> 
> Thank you for your reply.
> 
> 
> I didn't mention that I already installed Hadoop on 2 machines back at home
> (for a essay on Hadoop which I did), one as a namenode and datanode and one
> as a datanode only. Everything worked perfect. I would really try to install
> it on more machines to see how cluster works in more detail. So I was
> thinking:” Now I have a cluster, where do I find a large dataset to work
> with?”.
> 
> 
> I like your idea about publicly available datasets, do you have any links
> on that?
> 
> The other idea, about student grades is also great (thank you for that) and
> I might just start with that.
> 
> 
> Thank you very much, you both really helped me.
> 
> 
> On 1 March 2010 15:15, Mark Kerzner wrote:
> 
> > Tonci,
> >
> > to start with, you can run Hadoop on one computer in pseudo-cluster mode.
> > Installing and configuring will be enough headache on its own. Then you can
> > think of a problem, such as process student records and grades and find
> > some
> > statistics, or grade and their future achievements. Or, you can look at
> > some
> > publicly available datasets and so something with them.
> >
> > Cheers,
> > Mark
> >
> > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan 
> > wrote:
> >
> > > Hello everyone,
> > >
> > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > Computer
> > > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > > can't
> > > think of any :)).
> > >
> > >  We have a lab with 10-15 computers and I tough of installing Hadoop on
> > > those computers, and now I should write some kind of a program to run on
> > my
> > > cluster.
> > >
> > >  I really hope you understood my problem :). I really need any kind of
> > > suggestion.
> > >
> > >
> > >  P.S. Sorry for my bad English, I'm from Croatia.
> > >
> >


Re: Hadoop as master's thesis

Posted by Tonci Buljan <to...@gmail.com>.
Thank you all for your reply.

Matteo, I' m definitely interested in what you did, and I would be very
happy to check it out in detail. Mark Kerzner's link
http://infochimps.org/was very usefull. Thank you Mark for that. I'll
probably download and work
with some data from there.



For Marko (in Croatian)

Nisam ima pojma da postoji još ljudi u Hrvatskoj koji se bave Hadoopom.
Studiram na FESB-u u Splitu I cijela katedra koja se bavi distribuiranim
računanjem je "tanka". Profesor nije ni znao što je Hadoop kada sam ga pitao
za ideju. Java je još veći bauk, isti taj profesor ju drži tako da će bit
prava borba napisat nekakav diplomski na tu temu.

U svakom slučaju, hvala za odgovor.






On 1 March 2010 22:35, Song Liu <la...@gmail.com> wrote:

> Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms
> on hadoop.
>
> My project is to extend algorithms into mapreduce fasion and to discover
> whether there is a optimal choice.  Most of them belong to the Machine
> Learning area. Personally, I think this is a fresh area, and if you search
> the main academic database, you may find few literature about this.
>
> I recently made an proposal about my study on Hadoop, and I would like to
> discuss this with you in depth if you wish.
>
> Another interesting topic is to discover the limit of hadoop. We have a
> very
> large cluster at a very high rank among TOP500, so I'm wondering whether
> hadoop can perform as we expected.
>
> Hope this helpful.
>
> Regards
> Song Liu
>
>
> On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <sw...@us.ibm.com> wrote:
>
> > Hi Tonci
> >
> > Public Data Sets - Check out infochimps.org/ or
> > aws.amazon.com/publicdatasets/
> >
> > I find a lot of the Hadoopified algorithms out there originate from
> > Linguistics departments, TF-IDF is one example, but, have you considered
> > looking into Information Theory ? i.e. Entropy analytics using algorithms
> > like Pointwise Mutual Information. I'd imagine most government security
> > agencies would be interested in using Hadoop for signal processing/code
> > breaking. Especially the cost savings of using commodity machines. The
> > trick will be to find a dataset that suits your algorithm.
> >
> > Kind regards
> > Steve Watt
> >
> >
> >
> >
> > From:
> > Tonci Buljan <to...@gmail.com>
> > To:
> > common-user@hadoop.apache.org
> > Date:
> > 03/01/2010 08:27 AM
> > Subject:
> > Re: Hadoop as master's thesis
> >
> >
> >
> > Thank you for your reply.
> >
> >
> >  I didn't mention that I already installed Hadoop on 2 machines back at
> > home
> > (for a essay on Hadoop which I did), one as a namenode and datanode and
> > one
> > as a datanode only. Everything worked perfect. I would really try to
> > install
> > it on more machines to see how cluster works in more detail. So I was
> > thinking:" Now I have a cluster, where do I find a large dataset to work
> > with?".
> >
> >
> >  I like your idea about publicly available datasets, do you have any
> links
> > on that?
> >
> > The other idea, about student grades is also great (thank you for that)
> > and
> > I might just start with that.
> >
> >
> >  Thank you very much, you both really helped me.
> >
> >
> > On 1 March 2010 15:15, Mark Kerzner <ma...@gmail.com> wrote:
> >
> > > Tonci,
> > >
> > > to start with, you can run Hadoop on one computer in pseudo-cluster
> > mode.
> > > Installing and configuring will be enough headache on its own. Then you
> > can
> > > think of a problem, such as process student records and grades and find
> > > some
> > > statistics, or grade and their future achievements. Or, you can look at
> > > some
> > > publicly available datasets and so something with them.
> > >
> > > Cheers,
> > > Mark
> > >
> > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > > Computer
> > > > Science. I'm supposed to solve some kind of a problem with Hadoop,
> but
> > > > can't
> > > > think of any :)).
> > > >
> > > >  We have a lab with 10-15 computers and I tough of installing Hadoop
> > on
> > > > those computers, and now I should write some kind of a program to run
> > on
> > > my
> > > > cluster.
> > > >
> > > >  I really hope you understood my problem :). I really need any kind
> of
> > > > suggestion.
> > > >
> > > >
> > > >  P.S. Sorry for my bad English, I'm from Croatia.
> > > >
> > >
> >
> >
> >
> >
>

Re: Hadoop as master's thesis

Posted by Steve Loughran <st...@apache.org>.
Song Liu wrote:
> Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms
> on hadoop.
> 
> My project is to extend algorithms into mapreduce fasion and to discover
> whether there is a optimal choice.  Most of them belong to the Machine
> Learning area. Personally, I think this is a fresh area, and if you search
> the main academic database, you may find few literature about this.
> 
> I recently made an proposal about my study on Hadoop, and I would like to
> discuss this with you in depth if you wish.
> 
> Another interesting topic is to discover the limit of hadoop. We have a very
> large cluster at a very high rank among TOP500, so I'm wondering whether
> hadoop can perform as we expected.
> 

A lot of the big clusters have premium network infrastructure and SAN 
mounted storage whose access times are independent of location. 
MapReduce is designed to work on lower-cost storage/network 
infrastructure, saving money there that you can spend on more servers 
and storage. But it does require algorithms to work on local data only, 
or the LAN becomes a bottleneck, fast.



Re: Hadoop as master's thesis

Posted by Song Liu <la...@gmail.com>.
Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms
on hadoop.

My project is to extend algorithms into mapreduce fasion and to discover
whether there is a optimal choice.  Most of them belong to the Machine
Learning area. Personally, I think this is a fresh area, and if you search
the main academic database, you may find few literature about this.

I recently made an proposal about my study on Hadoop, and I would like to
discuss this with you in depth if you wish.

Another interesting topic is to discover the limit of hadoop. We have a very
large cluster at a very high rank among TOP500, so I'm wondering whether
hadoop can perform as we expected.

Hope this helpful.

Regards
Song Liu


On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <sw...@us.ibm.com> wrote:

> Hi Tonci
>
> Public Data Sets - Check out infochimps.org/ or
> aws.amazon.com/publicdatasets/
>
> I find a lot of the Hadoopified algorithms out there originate from
> Linguistics departments, TF-IDF is one example, but, have you considered
> looking into Information Theory ? i.e. Entropy analytics using algorithms
> like Pointwise Mutual Information. I'd imagine most government security
> agencies would be interested in using Hadoop for signal processing/code
> breaking. Especially the cost savings of using commodity machines. The
> trick will be to find a dataset that suits your algorithm.
>
> Kind regards
> Steve Watt
>
>
>
>
> From:
> Tonci Buljan <to...@gmail.com>
> To:
> common-user@hadoop.apache.org
> Date:
> 03/01/2010 08:27 AM
> Subject:
> Re: Hadoop as master's thesis
>
>
>
> Thank you for your reply.
>
>
>  I didn't mention that I already installed Hadoop on 2 machines back at
> home
> (for a essay on Hadoop which I did), one as a namenode and datanode and
> one
> as a datanode only. Everything worked perfect. I would really try to
> install
> it on more machines to see how cluster works in more detail. So I was
> thinking:” Now I have a cluster, where do I find a large dataset to work
> with?”.
>
>
>  I like your idea about publicly available datasets, do you have any links
> on that?
>
> The other idea, about student grades is also great (thank you for that)
> and
> I might just start with that.
>
>
>  Thank you very much, you both really helped me.
>
>
> On 1 March 2010 15:15, Mark Kerzner <ma...@gmail.com> wrote:
>
> > Tonci,
> >
> > to start with, you can run Hadoop on one computer in pseudo-cluster
> mode.
> > Installing and configuring will be enough headache on its own. Then you
> can
> > think of a problem, such as process student records and grades and find
> > some
> > statistics, or grade and their future achievements. Or, you can look at
> > some
> > publicly available datasets and so something with them.
> >
> > Cheers,
> > Mark
> >
> > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > >
> > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > Computer
> > > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > > can't
> > > think of any :)).
> > >
> > >  We have a lab with 10-15 computers and I tough of installing Hadoop
> on
> > > those computers, and now I should write some kind of a program to run
> on
> > my
> > > cluster.
> > >
> > >  I really hope you understood my problem :). I really need any kind of
> > > suggestion.
> > >
> > >
> > >  P.S. Sorry for my bad English, I'm from Croatia.
> > >
> >
>
>
>
>

Anyone use MapReduce for TSP approximations?

Posted by Raymond Jennings III <ra...@yahoo.com>.
I am interested in seeing how mapreduce could be used to approximate the traveling salesman problem.  Anyone have a pointer?

Thanks.


      

SEQ

Posted by Raymond Jennings III <ra...@yahoo.com>.
Are there any examples that show how to create a SEQ file in HDFS ?


      

Re: Hadoop as master's thesis

Posted by Stephen Watt <sw...@us.ibm.com>.
Hi Tonci

Public Data Sets - Check out infochimps.org/ or 
aws.amazon.com/publicdatasets/  

I find a lot of the Hadoopified algorithms out there originate from 
Linguistics departments, TF-IDF is one example, but, have you considered 
looking into Information Theory ? i.e. Entropy analytics using algorithms 
like Pointwise Mutual Information. I'd imagine most government security 
agencies would be interested in using Hadoop for signal processing/code 
breaking. Especially the cost savings of using commodity machines. The 
trick will be to find a dataset that suits your algorithm.

Kind regards
Steve Watt




From:
Tonci Buljan <to...@gmail.com>
To:
common-user@hadoop.apache.org
Date:
03/01/2010 08:27 AM
Subject:
Re: Hadoop as master's thesis



Thank you for your reply.


 I didn't mention that I already installed Hadoop on 2 machines back at 
home
(for a essay on Hadoop which I did), one as a namenode and datanode and 
one
as a datanode only. Everything worked perfect. I would really try to 
install
it on more machines to see how cluster works in more detail. So I was
thinking:” Now I have a cluster, where do I find a large dataset to work
with?”.


 I like your idea about publicly available datasets, do you have any links
on that?

The other idea, about student grades is also great (thank you for that) 
and
I might just start with that.


 Thank you very much, you both really helped me.


On 1 March 2010 15:15, Mark Kerzner <ma...@gmail.com> wrote:

> Tonci,
>
> to start with, you can run Hadoop on one computer in pseudo-cluster 
mode.
> Installing and configuring will be enough headache on its own. Then you 
can
> think of a problem, such as process student records and grades and find
> some
> statistics, or grade and their future achievements. Or, you can look at
> some
> publicly available datasets and so something with them.
>
> Cheers,
> Mark
>
> On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com>
> wrote:
>
> > Hello everyone,
> >
> >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > Computer
> > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > can't
> > think of any :)).
> >
> >  We have a lab with 10-15 computers and I tough of installing Hadoop 
on
> > those computers, and now I should write some kind of a program to run 
on
> my
> > cluster.
> >
> >  I really hope you understood my problem :). I really need any kind of
> > suggestion.
> >
> >
> >  P.S. Sorry for my bad English, I'm from Croatia.
> >
>




Re: Hadoop as master's thesis

Posted by Mark Kerzner <ma...@gmail.com>.
Tonci,

here are Enron email files used in the litigation that they had:
http://edrm.net/resources/data-sets/enron-data-set-files

Here is much more stuff: http://infochimps.org/

Sincerely,
Mark

<http://edrm.net/resources/data-sets/enron-data-set-files>

On Mon, Mar 1, 2010 at 8:24 AM, Tonci Buljan <to...@gmail.com> wrote:

> Thank you for your reply.
>
>
>  I didn't mention that I already installed Hadoop on 2 machines back at
> home
> (for a essay on Hadoop which I did), one as a namenode and datanode and one
> as a datanode only. Everything worked perfect. I would really try to
> install
> it on more machines to see how cluster works in more detail. So I was
> thinking:” Now I have a cluster, where do I find a large dataset to work
> with?”.
>
>
>  I like your idea about publicly available datasets, do you have any links
> on that?
>
> The other idea, about student grades is also great (thank you for that) and
> I might just start with that.
>
>
>  Thank you very much, you both really helped me.
>
>
> On 1 March 2010 15:15, Mark Kerzner <ma...@gmail.com> wrote:
>
> > Tonci,
> >
> > to start with, you can run Hadoop on one computer in pseudo-cluster mode.
> > Installing and configuring will be enough headache on its own. Then you
> can
> > think of a problem, such as process student records and grades and find
> > some
> > statistics, or grade and their future achievements. Or, you can look at
> > some
> > publicly available datasets and so something with them.
> >
> > Cheers,
> > Mark
> >
> > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > >
> > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > Computer
> > > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > > can't
> > > think of any :)).
> > >
> > >  We have a lab with 10-15 computers and I tough of installing Hadoop on
> > > those computers, and now I should write some kind of a program to run
> on
> > my
> > > cluster.
> > >
> > >  I really hope you understood my problem :). I really need any kind of
> > > suggestion.
> > >
> > >
> > >  P.S. Sorry for my bad English, I'm from Croatia.
> > >
> >
>

Re: Hadoop as master's thesis

Posted by Tonci Buljan <to...@gmail.com>.
Thank you for your reply.


 I didn't mention that I already installed Hadoop on 2 machines back at home
(for a essay on Hadoop which I did), one as a namenode and datanode and one
as a datanode only. Everything worked perfect. I would really try to install
it on more machines to see how cluster works in more detail. So I was
thinking:” Now I have a cluster, where do I find a large dataset to work
with?”.


 I like your idea about publicly available datasets, do you have any links
on that?

The other idea, about student grades is also great (thank you for that) and
I might just start with that.


 Thank you very much, you both really helped me.


On 1 March 2010 15:15, Mark Kerzner <ma...@gmail.com> wrote:

> Tonci,
>
> to start with, you can run Hadoop on one computer in pseudo-cluster mode.
> Installing and configuring will be enough headache on its own. Then you can
> think of a problem, such as process student records and grades and find
> some
> statistics, or grade and their future achievements. Or, you can look at
> some
> publicly available datasets and so something with them.
>
> Cheers,
> Mark
>
> On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com>
> wrote:
>
> > Hello everyone,
> >
> >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > Computer
> > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > can't
> > think of any :)).
> >
> >  We have a lab with 10-15 computers and I tough of installing Hadoop on
> > those computers, and now I should write some kind of a program to run on
> my
> > cluster.
> >
> >  I really hope you understood my problem :). I really need any kind of
> > suggestion.
> >
> >
> >  P.S. Sorry for my bad English, I'm from Croatia.
> >
>

Re: Hadoop as master's thesis

Posted by Mark Kerzner <ma...@gmail.com>.
Tonci,

to start with, you can run Hadoop on one computer in pseudo-cluster mode.
Installing and configuring will be enough headache on its own. Then you can
think of a problem, such as process student records and grades and find some
statistics, or grade and their future achievements. Or, you can look at some
publicly available datasets and so something with them.

Cheers,
Mark

On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <to...@gmail.com> wrote:

> Hello everyone,
>
>  I'm thinking of using Hadoop as a subject in my master's thesis in
> Computer
> Science. I'm supposed to solve some kind of a problem with Hadoop, but
> can't
> think of any :)).
>
>  We have a lab with 10-15 computers and I tough of installing Hadoop on
> those computers, and now I should write some kind of a program to run on my
> cluster.
>
>  I really hope you understood my problem :). I really need any kind of
> suggestion.
>
>
>  P.S. Sorry for my bad English, I'm from Croatia.
>

Re: Hadoop as master's thesis

Posted by Jeff Zhang <zj...@gmail.com>.
So you do not have a topic for your thesis yet ?
I think the topic depends on your background, if you have machine learning
experience, I suggest you can try to use hadoop to implement some machine
learning algorithms.



On Mon, Mar 1, 2010 at 6:01 AM, Tonci Buljan <to...@gmail.com> wrote:

> Hello everyone,
>
>  I'm thinking of using Hadoop as a subject in my master's thesis in
> Computer
> Science. I'm supposed to solve some kind of a problem with Hadoop, but
> can't
> think of any :)).
>
>  We have a lab with 10-15 computers and I tough of installing Hadoop on
> those computers, and now I should write some kind of a program to run on my
> cluster.
>
>  I really hope you understood my problem :). I really need any kind of
> suggestion.
>
>
>  P.S. Sorry for my bad English, I'm from Croatia.
>



-- 
Best Regards

Jeff Zhang

Re: Hadoop as master's thesis

Posted by Tonci Buljan <to...@gmail.com>.
Sounds great!!!

Thank you.

On 3 March 2010 09:05, Huy Phan <da...@gmail.com> wrote:

> Hi Matteo,
> it sounds good :)
> We will wait for your work.
>
>
> On 03/03/2010 02:28 PM, Matteo Nasi wrote:
>
>> hi guys,
>> sorry for the delay, it's a busy week :-) There's no problem about sharing
>> my work.
>> However there are some issue to consider:
>> - my final doc is a 135 page description of what I did, but it's written
>> in
>> Italian ... So what I can try to do, is to share a sort of english
>> abstract
>> of each chapter, let's say half a page for all 9 chapters and 4 appendix
>> and
>> include bibliography and sitography;
>> - the web site UI is an intranet site, I can try to make a description of
>> this, maybe with some screenshots
>> - finally I can share all scripts, and jsp code for the web part
>> If you agree I'll work on this by the end of the week.
>> let me know,
>>
>> ciao Matteo
>>
>> On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran<st...@apache.org>
>>  wrote:
>>
>>
>>
>>> Matteo Nasi wrote:
>>>
>>>
>>>
>>>> Hi all,
>>>> I just completed my first level of university degree at Politecnico di
>>>> Milano Italy (Computer Science Engineering) with a thesis on Hadoop:
>>>> "log
>>>> analysis in the cloud", using and comparing custom log analysis script
>>>> on
>>>> local private cluster (8 nodes of old computers) and AWS EMR hadoop
>>>> implementation. I wrote scripts in Pig and Hive and collected results
>>>> into
>>>> a
>>>> custom web interface.
>>>>
>>>>
>>>>
>>> Stick it up online and we will link to it from the Hadoop wiki
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Hadoop as master's thesis

Posted by Huy Phan <da...@gmail.com>.
Hi Matteo,
it sounds good :)
We will wait for your work.

On 03/03/2010 02:28 PM, Matteo Nasi wrote:
> hi guys,
> sorry for the delay, it's a busy week :-) There's no problem about sharing
> my work.
> However there are some issue to consider:
> - my final doc is a 135 page description of what I did, but it's written in
> Italian ... So what I can try to do, is to share a sort of english abstract
> of each chapter, let's say half a page for all 9 chapters and 4 appendix and
> include bibliography and sitography;
> - the web site UI is an intranet site, I can try to make a description of
> this, maybe with some screenshots
> - finally I can share all scripts, and jsp code for the web part
> If you agree I'll work on this by the end of the week.
> let me know,
>
> ciao Matteo
>
> On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran<st...@apache.org>  wrote:
>
>    
>> Matteo Nasi wrote:
>>
>>      
>>> Hi all,
>>> I just completed my first level of university degree at Politecnico di
>>> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log
>>> analysis in the cloud", using and comparing custom log analysis script on
>>> local private cluster (8 nodes of old computers) and AWS EMR hadoop
>>> implementation. I wrote scripts in Pig and Hive and collected results into
>>> a
>>> custom web interface.
>>>
>>>        
>> Stick it up online and we will link to it from the Hadoop wiki
>>
>>
>>
>>      
>    


Re: Hadoop as master's thesis

Posted by Matteo Nasi <ma...@gmail.com>.
hi guys,
sorry for the delay, it's a busy week :-) There's no problem about sharing
my work.
However there are some issue to consider:
- my final doc is a 135 page description of what I did, but it's written in
Italian ... So what I can try to do, is to share a sort of english abstract
of each chapter, let's say half a page for all 9 chapters and 4 appendix and
include bibliography and sitography;
- the web site UI is an intranet site, I can try to make a description of
this, maybe with some screenshots
- finally I can share all scripts, and jsp code for the web part
If you agree I'll work on this by the end of the week.
let me know,

ciao Matteo

On Tue, Mar 2, 2010 at 11:24 AM, Steve Loughran <st...@apache.org> wrote:

> Matteo Nasi wrote:
>
>> Hi all,
>> I just completed my first level of university degree at Politecnico di
>> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log
>> analysis in the cloud", using and comparing custom log analysis script on
>> local private cluster (8 nodes of old computers) and AWS EMR hadoop
>> implementation. I wrote scripts in Pig and Hive and collected results into
>> a
>> custom web interface.
>>
>
> Stick it up online and we will link to it from the Hadoop wiki
>
>
>

Re: Hadoop as master's thesis

Posted by Steve Loughran <st...@apache.org>.
Matteo Nasi wrote:
> Hi all,
> I just completed my first level of university degree at Politecnico di
> Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log
> analysis in the cloud", using and comparing custom log analysis script on
> local private cluster (8 nodes of old computers) and AWS EMR hadoop
> implementation. I wrote scripts in Pig and Hive and collected results into a
> custom web interface.

Stick it up online and we will link to it from the Hadoop wiki



Re: Hadoop as master's thesis

Posted by Matteo Nasi <ma...@gmail.com>.
Hi all,
I just completed my first level of university degree at Politecnico di
Milano Italy (Computer Science Engineering) with a thesis on Hadoop: "log
analysis in the cloud", using and comparing custom log analysis script on
local private cluster (8 nodes of old computers) and AWS EMR hadoop
implementation. I wrote scripts in Pig and Hive and collected results into a
custom web interface.
if you're interested, feel free to ask.

ciao Matteo

On Mon, Mar 1, 2010 at 6:19 PM, Steve Loughran <st...@apache.org> wrote:

> Tonci Buljan wrote:
>
>> Hello everyone,
>>
>>  I'm thinking of using Hadoop as a subject in my master's thesis in
>> Computer
>> Science. I'm supposed to solve some kind of a problem with Hadoop, but
>> can't
>> think of any :)).
>>
>
> well, you need some interesting data, then mine it. So ask around.
> Physicists often have stuff.
>

Re: Hadoop as master's thesis

Posted by Steve Loughran <st...@apache.org>.
Tonci Buljan wrote:
> Hello everyone,
> 
>  I'm thinking of using Hadoop as a subject in my master's thesis in Computer
> Science. I'm supposed to solve some kind of a problem with Hadoop, but can't
> think of any :)).

well, you need some interesting data, then mine it. So ask around. 
Physicists often have stuff.