You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Florents Tselai <fl...@gmail.com> on 2013/01/14 00:18:17 UTC

Where to start refactoring?

Hello,

In the next weeks/months I'll be using mahout for analyzing some big data
 for a start-up and I'd like my work there to be also reflected in mahout.
So I'd like to be a committer. I've already read all the wiki's, guidlines
and have browsed through the jira issues.

Firstly, I'de like to have a GOOD  overview of the codebase and the overall
design.
So, my first thought was to start doing some refactorings (decomposing
methods and so on).

Is there a specific place in the code that needs "cleaning"?

Re: Where to start refactoring?

Posted by Florents Tselai <ts...@dmst.aueb.gr>.

Well I've not started real work yet - I'm just preparing!
But definitely, once I do I'll try to give back to the community. I guess I
could have some sample data :)

On Mon, Jan 14, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com> wrote:

> That is a pity.
>
> Good use cases with realistic (not necessarily real) data would be very
> helpful.  Probably much more impact than small code fixes.
>
> On Sun, Jan 13, 2013 at 5:54 PM, Florents Tselai <tselai@dmst.aueb.gr
> >wrote:
>
> > For now, I'm afraid no, I don't.
> >
> > On Mon, Jan 14, 2013 at 3:31 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Do you have any sample data?
> > >
> > > On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <tselai@dmst.aueb.gr
> > > >wrote:
> > >
> > > > Thanks for the reply!
> > > >
> > > > Yes, you're correct the data source is a smart-meter installed in
> each
> > > > building.
> > > >
> > > > On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > If you have discrete data, then I would think that simple
> > cooccurrence
> > > > > mining would be more useful than full on association mining.
> > > > >
> > > > > But is your data really a time-series?  Are you extracting discrete
> > > > > features from the time series?
> > > > >
> > > > > In the following, I am assuming that when you say "real-time energy
> > > data"
> > > > > you actually mean something like smart meter consumption data for
> > > > > electricity.  You could probably mean total energy emitted by a
> > > > particular
> > > > > set of three thousand quasars as well, but I assume the former is
> > more
> > > > > likely.  Please correct me if you like.
> > > > >
> > > > >
> > > > > One very useful approach that I have seen with time series uses
> past
> > > data
> > > > > to predict the next sample (in the sense of regression).  IF you
> have
> > > > such
> > > > > a regression model you can use Bayesian model clustering to find
> > > multiple
> > > > > patterns for regression.  The output of this clustering is useful
> as
> > > the
> > > > > continuous equivalent of association mining.
> > > > >
> > > > > To be more concrete, suppose that you have several kinds of energy
> > > > > customers:
> > > > >
> > > > > - normal consumers who leave their house empty during the day, but
> > > have a
> > > > > substantial bump in energy consumption in the late afternoon or
> > evening
> > > > and
> > > > > then have a more spread pattern of usage on the weekend.
> > > > >
> > > > > - normal consumers who work a night shift
> > > > >
> > > > > - light offices which have peak usage during normal working hours
> > > > >
> > > > > - light industry with shift work that have relatively constant
> energy
> > > > usage
> > > > >
> > > > > If you build models for the energy consumption of these customers
> > > > > normalized to their previous week's total consumption and have the
> > > > > following features
> > > > >
> > > > > - time of day expressed as 4 sinusoids
> > > > >
> > > > > - day of week expressed as a 1 of 7 indicator
> > > > >
> > > > > - weekend expressed as a boolean
> > > > >
> > > > > I think that you will find that Bayesian model clustering will
> > recover
> > > > your
> > > > > original classes very nicely.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <
> > tselai@dmst.aueb.gr
> > > > > >wrote:
> > > > >
> > > > > > Real-time energy data,
> > > > > > Association mining is in fact the core analysis applied (but not
> > the
> > > > only
> > > > > > one for e.g. it could be classification as well).
> > > > > >
> > > > > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Can you say more about what kind of data and what kind of
> > analysis?
> > > > > > >
> > > > > > > It is usually best if the work you do is motivated by your
> needs.
> > > > > > >
> > > > > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > > > > > <fl...@gmail.com>wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > In the next weeks/months I'll be using mahout for analyzing
> > some
> > > > big
> > > > > > data
> > > > > > > >  for a start-up and I'd like my work there to be also
> reflected
> > > in
> > > > > > > mahout.
> > > > > > > > So I'd like to be a committer. I've already read all the
> > wiki's,
> > > > > > > guidlines
> > > > > > > > and have browsed through the jira issues.
> > > > > > > >
> > > > > > > > Firstly, I'de like to have a GOOD  overview of the codebase
> and
> > > the
> > > > > > > overall
> > > > > > > > design.
> > > > > > > > So, my first thought was to start doing some refactorings
> > > > > (decomposing
> > > > > > > > methods and so on).
> > > > > > > >
> > > > > > > > Is there a specific place in the code that needs "cleaning"?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Where to start refactoring?

Posted by Ted Dunning <te...@gmail.com>.

That is a pity.

Good use cases with realistic (not necessarily real) data would be very
helpful.  Probably much more impact than small code fixes.

On Sun, Jan 13, 2013 at 5:54 PM, Florents Tselai <ts...@dmst.aueb.gr>wrote:

> For now, I'm afraid no, I don't.
>
> On Mon, Jan 14, 2013 at 3:31 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Do you have any sample data?
> >
> > On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <tselai@dmst.aueb.gr
> > >wrote:
> >
> > > Thanks for the reply!
> > >
> > > Yes, you're correct the data source is a smart-meter installed in each
> > > building.
> > >
> > > On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > If you have discrete data, then I would think that simple
> cooccurrence
> > > > mining would be more useful than full on association mining.
> > > >
> > > > But is your data really a time-series?  Are you extracting discrete
> > > > features from the time series?
> > > >
> > > > In the following, I am assuming that when you say "real-time energy
> > data"
> > > > you actually mean something like smart meter consumption data for
> > > > electricity.  You could probably mean total energy emitted by a
> > > particular
> > > > set of three thousand quasars as well, but I assume the former is
> more
> > > > likely.  Please correct me if you like.
> > > >
> > > >
> > > > One very useful approach that I have seen with time series uses past
> > data
> > > > to predict the next sample (in the sense of regression).  IF you have
> > > such
> > > > a regression model you can use Bayesian model clustering to find
> > multiple
> > > > patterns for regression.  The output of this clustering is useful as
> > the
> > > > continuous equivalent of association mining.
> > > >
> > > > To be more concrete, suppose that you have several kinds of energy
> > > > customers:
> > > >
> > > > - normal consumers who leave their house empty during the day, but
> > have a
> > > > substantial bump in energy consumption in the late afternoon or
> evening
> > > and
> > > > then have a more spread pattern of usage on the weekend.
> > > >
> > > > - normal consumers who work a night shift
> > > >
> > > > - light offices which have peak usage during normal working hours
> > > >
> > > > - light industry with shift work that have relatively constant energy
> > > usage
> > > >
> > > > If you build models for the energy consumption of these customers
> > > > normalized to their previous week's total consumption and have the
> > > > following features
> > > >
> > > > - time of day expressed as 4 sinusoids
> > > >
> > > > - day of week expressed as a 1 of 7 indicator
> > > >
> > > > - weekend expressed as a boolean
> > > >
> > > > I think that you will find that Bayesian model clustering will
> recover
> > > your
> > > > original classes very nicely.
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <
> tselai@dmst.aueb.gr
> > > > >wrote:
> > > >
> > > > > Real-time energy data,
> > > > > Association mining is in fact the core analysis applied (but not
> the
> > > only
> > > > > one for e.g. it could be classification as well).
> > > > >
> > > > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Can you say more about what kind of data and what kind of
> analysis?
> > > > > >
> > > > > > It is usually best if the work you do is motivated by your needs.
> > > > > >
> > > > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > > > > <fl...@gmail.com>wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > In the next weeks/months I'll be using mahout for analyzing
> some
> > > big
> > > > > data
> > > > > > >  for a start-up and I'd like my work there to be also reflected
> > in
> > > > > > mahout.
> > > > > > > So I'd like to be a committer. I've already read all the
> wiki's,
> > > > > > guidlines
> > > > > > > and have browsed through the jira issues.
> > > > > > >
> > > > > > > Firstly, I'de like to have a GOOD  overview of the codebase and
> > the
> > > > > > overall
> > > > > > > design.
> > > > > > > So, my first thought was to start doing some refactorings
> > > > (decomposing
> > > > > > > methods and so on).
> > > > > > >
> > > > > > > Is there a specific place in the code that needs "cleaning"?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Where to start refactoring?

Posted by Florents Tselai <ts...@dmst.aueb.gr>.

For now, I'm afraid no, I don't.

On Mon, Jan 14, 2013 at 3:31 AM, Ted Dunning <te...@gmail.com> wrote:

> Do you have any sample data?
>
> On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <tselai@dmst.aueb.gr
> >wrote:
>
> > Thanks for the reply!
> >
> > Yes, you're correct the data source is a smart-meter installed in each
> > building.
> >
> > On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > If you have discrete data, then I would think that simple cooccurrence
> > > mining would be more useful than full on association mining.
> > >
> > > But is your data really a time-series?  Are you extracting discrete
> > > features from the time series?
> > >
> > > In the following, I am assuming that when you say "real-time energy
> data"
> > > you actually mean something like smart meter consumption data for
> > > electricity.  You could probably mean total energy emitted by a
> > particular
> > > set of three thousand quasars as well, but I assume the former is more
> > > likely.  Please correct me if you like.
> > >
> > >
> > > One very useful approach that I have seen with time series uses past
> data
> > > to predict the next sample (in the sense of regression).  IF you have
> > such
> > > a regression model you can use Bayesian model clustering to find
> multiple
> > > patterns for regression.  The output of this clustering is useful as
> the
> > > continuous equivalent of association mining.
> > >
> > > To be more concrete, suppose that you have several kinds of energy
> > > customers:
> > >
> > > - normal consumers who leave their house empty during the day, but
> have a
> > > substantial bump in energy consumption in the late afternoon or evening
> > and
> > > then have a more spread pattern of usage on the weekend.
> > >
> > > - normal consumers who work a night shift
> > >
> > > - light offices which have peak usage during normal working hours
> > >
> > > - light industry with shift work that have relatively constant energy
> > usage
> > >
> > > If you build models for the energy consumption of these customers
> > > normalized to their previous week's total consumption and have the
> > > following features
> > >
> > > - time of day expressed as 4 sinusoids
> > >
> > > - day of week expressed as a 1 of 7 indicator
> > >
> > > - weekend expressed as a boolean
> > >
> > > I think that you will find that Bayesian model clustering will recover
> > your
> > > original classes very nicely.
> > >
> > >
> > >
> > >
> > > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <tselai@dmst.aueb.gr
> > > >wrote:
> > >
> > > > Real-time energy data,
> > > > Association mining is in fact the core analysis applied (but not the
> > only
> > > > one for e.g. it could be classification as well).
> > > >
> > > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > Can you say more about what kind of data and what kind of analysis?
> > > > >
> > > > > It is usually best if the work you do is motivated by your needs.
> > > > >
> > > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > > > <fl...@gmail.com>wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > In the next weeks/months I'll be using mahout for analyzing some
> > big
> > > > data
> > > > > >  for a start-up and I'd like my work there to be also reflected
> in
> > > > > mahout.
> > > > > > So I'd like to be a committer. I've already read all the wiki's,
> > > > > guidlines
> > > > > > and have browsed through the jira issues.
> > > > > >
> > > > > > Firstly, I'de like to have a GOOD  overview of the codebase and
> the
> > > > > overall
> > > > > > design.
> > > > > > So, my first thought was to start doing some refactorings
> > > (decomposing
> > > > > > methods and so on).
> > > > > >
> > > > > > Is there a specific place in the code that needs "cleaning"?
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Where to start refactoring?

Posted by Ted Dunning <te...@gmail.com>.

Do you have any sample data?

On Sun, Jan 13, 2013 at 5:13 PM, Florents Tselai <ts...@dmst.aueb.gr>wrote:

> Thanks for the reply!
>
> Yes, you're correct the data source is a smart-meter installed in each
> building.
>
> On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > If you have discrete data, then I would think that simple cooccurrence
> > mining would be more useful than full on association mining.
> >
> > But is your data really a time-series?  Are you extracting discrete
> > features from the time series?
> >
> > In the following, I am assuming that when you say "real-time energy data"
> > you actually mean something like smart meter consumption data for
> > electricity.  You could probably mean total energy emitted by a
> particular
> > set of three thousand quasars as well, but I assume the former is more
> > likely.  Please correct me if you like.
> >
> >
> > One very useful approach that I have seen with time series uses past data
> > to predict the next sample (in the sense of regression).  IF you have
> such
> > a regression model you can use Bayesian model clustering to find multiple
> > patterns for regression.  The output of this clustering is useful as the
> > continuous equivalent of association mining.
> >
> > To be more concrete, suppose that you have several kinds of energy
> > customers:
> >
> > - normal consumers who leave their house empty during the day, but have a
> > substantial bump in energy consumption in the late afternoon or evening
> and
> > then have a more spread pattern of usage on the weekend.
> >
> > - normal consumers who work a night shift
> >
> > - light offices which have peak usage during normal working hours
> >
> > - light industry with shift work that have relatively constant energy
> usage
> >
> > If you build models for the energy consumption of these customers
> > normalized to their previous week's total consumption and have the
> > following features
> >
> > - time of day expressed as 4 sinusoids
> >
> > - day of week expressed as a 1 of 7 indicator
> >
> > - weekend expressed as a boolean
> >
> > I think that you will find that Bayesian model clustering will recover
> your
> > original classes very nicely.
> >
> >
> >
> >
> > On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <tselai@dmst.aueb.gr
> > >wrote:
> >
> > > Real-time energy data,
> > > Association mining is in fact the core analysis applied (but not the
> only
> > > one for e.g. it could be classification as well).
> > >
> > > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Can you say more about what kind of data and what kind of analysis?
> > > >
> > > > It is usually best if the work you do is motivated by your needs.
> > > >
> > > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > > <fl...@gmail.com>wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > In the next weeks/months I'll be using mahout for analyzing some
> big
> > > data
> > > > >  for a start-up and I'd like my work there to be also reflected in
> > > > mahout.
> > > > > So I'd like to be a committer. I've already read all the wiki's,
> > > > guidlines
> > > > > and have browsed through the jira issues.
> > > > >
> > > > > Firstly, I'de like to have a GOOD  overview of the codebase and the
> > > > overall
> > > > > design.
> > > > > So, my first thought was to start doing some refactorings
> > (decomposing
> > > > > methods and so on).
> > > > >
> > > > > Is there a specific place in the code that needs "cleaning"?
> > > > >
> > > >
> > >
> >
>

Re: Where to start refactoring?

Posted by Florents Tselai <ts...@dmst.aueb.gr>.

Thanks for the reply!

Yes, you're correct the data source is a smart-meter installed in each
building.

On Mon, Jan 14, 2013 at 3:07 AM, Ted Dunning <te...@gmail.com> wrote:

> If you have discrete data, then I would think that simple cooccurrence
> mining would be more useful than full on association mining.
>
> But is your data really a time-series?  Are you extracting discrete
> features from the time series?
>
> In the following, I am assuming that when you say "real-time energy data"
> you actually mean something like smart meter consumption data for
> electricity.  You could probably mean total energy emitted by a particular
> set of three thousand quasars as well, but I assume the former is more
> likely.  Please correct me if you like.
>
>
> One very useful approach that I have seen with time series uses past data
> to predict the next sample (in the sense of regression).  IF you have such
> a regression model you can use Bayesian model clustering to find multiple
> patterns for regression.  The output of this clustering is useful as the
> continuous equivalent of association mining.
>
> To be more concrete, suppose that you have several kinds of energy
> customers:
>
> - normal consumers who leave their house empty during the day, but have a
> substantial bump in energy consumption in the late afternoon or evening and
> then have a more spread pattern of usage on the weekend.
>
> - normal consumers who work a night shift
>
> - light offices which have peak usage during normal working hours
>
> - light industry with shift work that have relatively constant energy usage
>
> If you build models for the energy consumption of these customers
> normalized to their previous week's total consumption and have the
> following features
>
> - time of day expressed as 4 sinusoids
>
> - day of week expressed as a 1 of 7 indicator
>
> - weekend expressed as a boolean
>
> I think that you will find that Bayesian model clustering will recover your
> original classes very nicely.
>
>
>
>
> On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <tselai@dmst.aueb.gr
> >wrote:
>
> > Real-time energy data,
> > Association mining is in fact the core analysis applied (but not the only
> > one for e.g. it could be classification as well).
> >
> > On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Can you say more about what kind of data and what kind of analysis?
> > >
> > > It is usually best if the work you do is motivated by your needs.
> > >
> > > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > > <fl...@gmail.com>wrote:
> > >
> > > > Hello,
> > > >
> > > > In the next weeks/months I'll be using mahout for analyzing some big
> > data
> > > >  for a start-up and I'd like my work there to be also reflected in
> > > mahout.
> > > > So I'd like to be a committer. I've already read all the wiki's,
> > > guidlines
> > > > and have browsed through the jira issues.
> > > >
> > > > Firstly, I'de like to have a GOOD  overview of the codebase and the
> > > overall
> > > > design.
> > > > So, my first thought was to start doing some refactorings
> (decomposing
> > > > methods and so on).
> > > >
> > > > Is there a specific place in the code that needs "cleaning"?
> > > >
> > >
> >
>

Re: Where to start refactoring?

Posted by Ted Dunning <te...@gmail.com>.

If you have discrete data, then I would think that simple cooccurrence
mining would be more useful than full on association mining.

But is your data really a time-series?  Are you extracting discrete
features from the time series?

In the following, I am assuming that when you say "real-time energy data"
you actually mean something like smart meter consumption data for
electricity.  You could probably mean total energy emitted by a particular
set of three thousand quasars as well, but I assume the former is more
likely.  Please correct me if you like.

One very useful approach that I have seen with time series uses past data
to predict the next sample (in the sense of regression).  IF you have such
a regression model you can use Bayesian model clustering to find multiple
patterns for regression.  The output of this clustering is useful as the
continuous equivalent of association mining.

To be more concrete, suppose that you have several kinds of energy
customers:

- normal consumers who leave their house empty during the day, but have a
substantial bump in energy consumption in the late afternoon or evening and
then have a more spread pattern of usage on the weekend.

- normal consumers who work a night shift

- light offices which have peak usage during normal working hours

- light industry with shift work that have relatively constant energy usage

If you build models for the energy consumption of these customers
normalized to their previous week's total consumption and have the
following features

- time of day expressed as 4 sinusoids

- day of week expressed as a 1 of 7 indicator

- weekend expressed as a boolean

I think that you will find that Bayesian model clustering will recover your
original classes very nicely.

On Sun, Jan 13, 2013 at 3:41 PM, Florents Tselai <ts...@dmst.aueb.gr>wrote:

> Real-time energy data,
> Association mining is in fact the core analysis applied (but not the only
> one for e.g. it could be classification as well).
>
> On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Can you say more about what kind of data and what kind of analysis?
> >
> > It is usually best if the work you do is motivated by your needs.
> >
> > On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> > <fl...@gmail.com>wrote:
> >
> > > Hello,
> > >
> > > In the next weeks/months I'll be using mahout for analyzing some big
> data
> > >  for a start-up and I'd like my work there to be also reflected in
> > mahout.
> > > So I'd like to be a committer. I've already read all the wiki's,
> > guidlines
> > > and have browsed through the jira issues.
> > >
> > > Firstly, I'de like to have a GOOD  overview of the codebase and the
> > overall
> > > design.
> > > So, my first thought was to start doing some refactorings (decomposing
> > > methods and so on).
> > >
> > > Is there a specific place in the code that needs "cleaning"?
> > >
> >
>

Re: Where to start refactoring?

Posted by Florents Tselai <ts...@dmst.aueb.gr>.

Real-time energy data,
Association mining is in fact the core analysis applied (but not the only
one for e.g. it could be classification as well).

On Mon, Jan 14, 2013 at 1:34 AM, Ted Dunning <te...@gmail.com> wrote:

> Can you say more about what kind of data and what kind of analysis?
>
> It is usually best if the work you do is motivated by your needs.
>
> On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
> <fl...@gmail.com>wrote:
>
> > Hello,
> >
> > In the next weeks/months I'll be using mahout for analyzing some big data
> >  for a start-up and I'd like my work there to be also reflected in
> mahout.
> > So I'd like to be a committer. I've already read all the wiki's,
> guidlines
> > and have browsed through the jira issues.
> >
> > Firstly, I'de like to have a GOOD  overview of the codebase and the
> overall
> > design.
> > So, my first thought was to start doing some refactorings (decomposing
> > methods and so on).
> >
> > Is there a specific place in the code that needs "cleaning"?
> >
>

Re: Where to start refactoring?

Posted by Ted Dunning <te...@gmail.com>.

Can you say more about what kind of data and what kind of analysis?

It is usually best if the work you do is motivated by your needs.

On Sun, Jan 13, 2013 at 3:18 PM, Florents Tselai
<fl...@gmail.com>wrote:

> Hello,
>
> In the next weeks/months I'll be using mahout for analyzing some big data
>  for a start-up and I'd like my work there to be also reflected in mahout.
> So I'd like to be a committer. I've already read all the wiki's, guidlines
> and have browsed through the jira issues.
>
> Firstly, I'de like to have a GOOD  overview of the codebase and the overall
> design.
> So, my first thought was to start doing some refactorings (decomposing
> methods and so on).
>
> Is there a specific place in the code that needs "cleaning"?
>