You are viewing a plain text version of this content. The canonical link for it is here.
Posted to droids-dev@incubator.apache.org by Tobias Rübner <to...@apache.org> on 2012/12/14 11:11:31 UTC

Droids Cleanup Branch

Hi all,

at the ApacheCon Europe, we decided to perform some cleanup on the Droids
code base.
Currently for beginners Droids is really hard to use.
You have to create a lot of code, before you can get started.

So I created a cleanup branch
https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/

First I just wanted to remove unused and confusing classes, but I ended up
in refactoring the project.
Maybe this is too much, but it would be really nice, if you can have a look
and share your opinions.
I did not change anything on the core concepts, but used the principle that
everything should be managed by a Droid.
For simplicity I did not use any @Deprecated Annotations. Otherwise the
code would be really hard to read.
Currently I implemented only the core module and the walker to show the way
- droids-core
- droids-walker

So basically I moved to crawling (currently not implemented) and walking
stuff to their separate modules.
I renamed the api package to core and moved some interfaces /
implementations to their corresponding packages.
There are a lot of changes in the Droids API to make it easier to use.

I created some test cases in the droids-walker module to show how easy it
now is to create a new walker.
Here is an example that would run:

  Collection<File> initialFiles = new LinkedList<File>();
  initialFiles.add("/home/user/docs");

  SimpleWalkingDroid droid = new SimpleWalkingDroid();
  droid.setInitialFiles(initialFiles);
  droid.addParsers(new FileNameParser());
  droid.addHandlers(new SysoutHandler());

  droid.start();

In this example, the queue and the taskmaster are predefined.
For base cases, like walking or crawling, we should define some basic
conventions.
It would be nice to create a crawling droid just with an URL and everything
else is set up with defaults (which can be overriden).

So please test it and share your opinions.

Tobias

Re: Droids Cleanup Branch

Posted by Tobias Rübner <to...@apache.org>.
Hi Thorsten,

nice to see that it works for you.
Currently I'm doing a rewrite of the droids-crawler module to make it work
with the new API.
I think, we can see this as an example for retrieving whole webpages.

I know that you want to remove the protocol stuff completly, but I think we
still need something to get the content of the task.
I could be done with a parser. But that would create dependencies to the
currently used implementation (crawler or walker ...).
So I think the best way is to create a Fetcher, that retrieves the content.
It could be used for crawling webpages like the old Protocol,
but I could also be used for more specialized tasks, like crawling a
database or a text file.

It would be really nice if you can share your example
and it would be really great to see more activity on the project.

Tobias


On Thu, Jan 3, 2013 at 11:06 PM, Thorsten Scherler <sc...@gmail.com>wrote:

> On 12/14/2012 12:11 PM, Tobias Rübner wrote:
> > Hi all,
> >
> > at the ApacheCon Europe, we decided to perform some cleanup on the Droids
> > code base.
> > Currently for beginners Droids is really hard to use.
> > You have to create a lot of code, before you can get started.
> >
> > So I created a cleanup branch
> >
> https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/
> >
> > First I just wanted to remove unused and confusing classes, but I ended
> up
> > in refactoring the project.
> > Maybe this is too much, but it would be really nice, if you can have a
> look
> > and share your opinions.
> > I did not change anything on the core concepts, but used the principle
> that
> > everything should be managed by a Droid.
> > For simplicity I did not use any @Deprecated Annotations. Otherwise the
> > code would be really hard to read.
> > Currently I implemented only the core module and the walker to show the
> way
> > - droids-core
> > - droids-walker
> >
> > So basically I moved to crawling (currently not implemented) and walking
> > stuff to their separate modules.
> > I renamed the api package to core and moved some interfaces /
> > implementations to their corresponding packages.
> > There are a lot of changes in the Droids API to make it easier to use.
> >
> > I created some test cases in the droids-walker module to show how easy it
> > now is to create a new walker.
> > Here is an example that would run:
> >
> >   Collection<File> initialFiles = new LinkedList<File>();
> >   initialFiles.add("/home/user/docs");
> >
> >   SimpleWalkingDroid droid = new SimpleWalkingDroid();
> >   droid.setInitialFiles(initialFiles);
> >   droid.addParsers(new FileNameParser());
> >   droid.addHandlers(new SysoutHandler());
> >
> >   droid.start();
> >
> > In this example, the queue and the taskmaster are predefined.
> > For base cases, like walking or crawling, we should define some basic
> > conventions.
> > It would be nice to create a crawling droid just with an URL and
> everything
> > else is set up with defaults (which can be overriden).
> >
> > So please test it and share your opinions.
> >
>
> Hi Tobias,
>
> we are currently using the branch to develop a crawling prototype for a
> new project and until now it is working very fine. I am using a crawling
> droid that is using cocoon3 for parsing and handling. So what I did to
> start was to copy droids-crawler and removing every offending code
> (mostly the protocol stuff) and started clean with that.
>
> I can extract a simple example that is based on cocoon3 which is not
> using non of the protocol part but maybe a good starting point.
>
> I am really happy to see such a slim downed version and agree that the
> simpler droids gets the more user will pick up on it.
>
> BTW if we get the project I will be able to contribute again to the
> project. :)
>
> salu2
>
> --
> Thorsten Scherler <scherler.at.gmail.com>
> codeBusters S.L. - web based systems
> <consulting, training and solutions>
>
> http://www.codebusters.es/
>
>

Re: Droids Cleanup Branch

Posted by Thorsten Scherler <sc...@gmail.com>.
On 12/14/2012 12:11 PM, Tobias Rübner wrote:
> Hi all,
>
> at the ApacheCon Europe, we decided to perform some cleanup on the Droids
> code base.
> Currently for beginners Droids is really hard to use.
> You have to create a lot of code, before you can get started.
>
> So I created a cleanup branch
> https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/
>
> First I just wanted to remove unused and confusing classes, but I ended up
> in refactoring the project.
> Maybe this is too much, but it would be really nice, if you can have a look
> and share your opinions.
> I did not change anything on the core concepts, but used the principle that
> everything should be managed by a Droid.
> For simplicity I did not use any @Deprecated Annotations. Otherwise the
> code would be really hard to read.
> Currently I implemented only the core module and the walker to show the way
> - droids-core
> - droids-walker
>
> So basically I moved to crawling (currently not implemented) and walking
> stuff to their separate modules.
> I renamed the api package to core and moved some interfaces /
> implementations to their corresponding packages.
> There are a lot of changes in the Droids API to make it easier to use.
>
> I created some test cases in the droids-walker module to show how easy it
> now is to create a new walker.
> Here is an example that would run:
>
>   Collection<File> initialFiles = new LinkedList<File>();
>   initialFiles.add("/home/user/docs");
>
>   SimpleWalkingDroid droid = new SimpleWalkingDroid();
>   droid.setInitialFiles(initialFiles);
>   droid.addParsers(new FileNameParser());
>   droid.addHandlers(new SysoutHandler());
>
>   droid.start();
>
> In this example, the queue and the taskmaster are predefined.
> For base cases, like walking or crawling, we should define some basic
> conventions.
> It would be nice to create a crawling droid just with an URL and everything
> else is set up with defaults (which can be overriden).
>
> So please test it and share your opinions.
>

Hi Tobias,

we are currently using the branch to develop a crawling prototype for a
new project and until now it is working very fine. I am using a crawling
droid that is using cocoon3 for parsing and handling. So what I did to
start was to copy droids-crawler and removing every offending code
(mostly the protocol stuff) and started clean with that.

I can extract a simple example that is based on cocoon3 which is not
using non of the protocol part but maybe a good starting point.

I am really happy to see such a slim downed version and agree that the
simpler droids gets the more user will pick up on it.

BTW if we get the project I will be able to contribute again to the
project. :)

salu2

-- 
Thorsten Scherler <scherler.at.gmail.com>
codeBusters S.L. - web based systems
<consulting, training and solutions>

http://www.codebusters.es/


Re: Droids Cleanup Branch

Posted by Richard Frovarp <rf...@apache.org>.
On 12/14/2012 05:11 AM, Tobias Rübner wrote:
> Hi all,
>
> at the ApacheCon Europe, we decided to perform some cleanup on the Droids
> code base.
> Currently for beginners Droids is really hard to use.
> You have to create a lot of code, before you can get started.
>
> So I created a cleanup branch
> https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/
>
> First I just wanted to remove unused and confusing classes, but I ended up
> in refactoring the project.
> Maybe this is too much, but it would be really nice, if you can have a look
> and share your opinions.
> I did not change anything on the core concepts, but used the principle that
> everything should be managed by a Droid.
> For simplicity I did not use any @Deprecated Annotations. Otherwise the
> code would be really hard to read.
> Currently I implemented only the core module and the walker to show the way
> - droids-core
> - droids-walker
>
> So basically I moved to crawling (currently not implemented) and walking
> stuff to their separate modules.
> I renamed the api package to core and moved some interfaces /
> implementations to their corresponding packages.
> There are a lot of changes in the Droids API to make it easier to use.
>
> I created some test cases in the droids-walker module to show how easy it
> now is to create a new walker.
> Here is an example that would run:
>
>    Collection<File> initialFiles = new LinkedList<File>();
>    initialFiles.add("/home/user/docs");
>
>    SimpleWalkingDroid droid = new SimpleWalkingDroid();
>    droid.setInitialFiles(initialFiles);
>    droid.addParsers(new FileNameParser());
>    droid.addHandlers(new SysoutHandler());
>
>    droid.start();
>
> In this example, the queue and the taskmaster are predefined.
> For base cases, like walking or crawling, we should define some basic
> conventions.
> It would be nice to create a crawling droid just with an URL and everything
> else is set up with defaults (which can be overriden).
>
> So please test it and share your opinions.
>
> Tobias
>

Thanks for the work. Hopefully I can find time to look at it over the 
next couple of weeks.