You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Olivier Grisel <ol...@ensta.org> on 2011/02/22 15:33:37 UTC

Berlin Buzzword Hackathon?

Hi all,

I will probably attend the next edition of Berlin Buzzwords in June:
http://berlinbuzzwords.de/

Isabel (the organizer) told me earlier that it should be possible to
host a hackathon right after the conference. The topic  about R&D
European projects that involve semantic technologies and open source
projects such as Stanbol.

As Stanbol has a strong dependency on trained statistical models (e.g.
from OpenNLP) and as we will face the same legal issues as OpenNLP
with distributing statistical models coming from copyrighted corpora I
think it would be great to take the opprotunity of such an hackathon
to kick-start an effort to build our own annotated training corpus
from free to redistribute sources such as Wikipedia, Wikinews,
DBpedia, Gutenberg while collaborating with other interested projects
such as OpenNLP and UIMA.

Jorn Kottman from OpenNLP & UIMA fame already expressed some interesst
in attending such a workshop. A practical goal could be to develop
some UI tools to manually refine / correct / complete tokenized and
semi-annotated NER corpus automatically extracted from Wikipedia /
DBpedia using pignlproc [1].

We could base such a work on existing projects such as wordfreak, the
UIMA CasEditor or start a new web based UI for instance.

We could also extend the topic of the hackathon to improve or tools to
build and distribute entity indices from various sources, use Apache
Mahout to build eigen centrality ranking for entities out of the
Wikipedia page links graph, and so on.

Please tell me if you are interested, and on what specific topic you
would like to work.

[1] https://github.com/ogrisel/pignlproc

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Berlin Buzzword Hackathon?

Posted by Olivier Grisel <ol...@ensta.org>.

2011/2/23 Rupert Westenthaler <rw...@apache.org>:
> Hi
>
> On Wed, Feb 23, 2011 at 10:00 AM, Olivier Grisel
> <ol...@ensta.org> wrote:
>>> you says "should be possible to
>>>> host a hackathon right after the conference."
>>> you mean the day after, on 8 ?
>
> I could make it this week to Berlin.
>
> I am just wondering about my role in such an workshop as I am not an
> expert in NLP,  statistical models ...

I think we could have a parallel task on "building entities and topic
indices with popularity ranks". Basically the data-wizardry part of
your work on Stanbol's entityhub. Both tasks should help use greatly
improve the quality of the Stanbol annotators.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Berlin Buzzword Hackathon?

Posted by Stefane Fermigier <sf...@nuxeo.com>.

On Feb 23, 2011, at 12:00 PM, Rupert Westenthaler wrote:

> Hi
> 
> On Wed, Feb 23, 2011 at 10:00 AM, Olivier Grisel
> <ol...@ensta.org> wrote:
>>> you says "should be possible to
>>>> host a hackathon right after the conference."
>>> you mean the day after, on 8 ?
> 
> I could make it this week to Berlin.
> 
> I am just wondering about my role in such an workshop as I am not an
> expert in NLP,  statistical models ...

Right, since the workshop is supposed to bring different groups together, we have to tune the subject(s) so that this can effectively happen.

  S.

-- 
Stefane Fermigier, Founder and Chairman, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com/ - +33 1 40 33 79 87 - http://twitter.com/sfermigier
Join the Nuxeo Group on LinkedIn: http://linkedin.com/groups?gid=43314
New Nuxeo release: http://nuxeo.com/dm54
"There's no such thing as can't. You always have a choice."

Re: Berlin Buzzword Hackathon?

Posted by Rupert Westenthaler <rw...@apache.org>.

Hi

On Wed, Feb 23, 2011 at 10:00 AM, Olivier Grisel
<ol...@ensta.org> wrote:
>> you says "should be possible to
>>> host a hackathon right after the conference."
>> you mean the day after, on 8 ?

I could make it this week to Berlin.

I am just wondering about my role in such an workshop as I am not an
expert in NLP,  statistical models ...

best
Rupert



-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Berlin Buzzword Hackathon?

Posted by Olivier Grisel <ol...@ensta.org>.

2011/2/23 florent andré <fl...@4sengines.com>:
> Salut,
>
> you says "should be possible to
>> host a hackathon right after the conference."
> you mean the day after, on 8 ?

Yes.

> I'm neewbie in NER. Last time you speak about it, I googling a little
> and find it really interesting.
>
> I have some "vague ideas" about a web user interface for this goal. Will
> be interested to refine them.
>
> Do you have some links in your hand about existing user friendly NER
> corpus annotator ?

UIMA has a CasEditor:
http://uima.apache.org/d/uimaj-2.3.1/tools.html#ugr.tools.ce
Otherwise there is wordfreak: http://wordfreak.sourceforge.net/screenshots.html

By looking at the screenshots, wordfreak seems more similar to what I
had in mind but last time I checked the OpenNLP plugin was
unmaintained.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Berlin Buzzword Hackathon?

Posted by florent andré <fl...@4sengines.com>.

Salut,

you says "should be possible to
> host a hackathon right after the conference."
you mean the day after, on 8 ?

I'm neewbie in NER. Last time you speak about it, I googling a little
and find it really interesting.

I have some "vague ideas" about a web user interface for this goal. Will
be interested to refine them.

Do you have some links in your hand about existing user friendly NER
corpus annotator ?

Thanks
++

On 02/22/2011 03:33 PM, Olivier Grisel wrote:
> Hi all,
> 
> I will probably attend the next edition of Berlin Buzzwords in June:
> http://berlinbuzzwords.de/
> 
> Isabel (the organizer) told me earlier that it should be possible to
> host a hackathon right after the conference. The topic  about R&D
> European projects that involve semantic technologies and open source
> projects such as Stanbol.
> 
> As Stanbol has a strong dependency on trained statistical models (e.g.
> from OpenNLP) and as we will face the same legal issues as OpenNLP
> with distributing statistical models coming from copyrighted corpora I
> think it would be great to take the opprotunity of such an hackathon
> to kick-start an effort to build our own annotated training corpus
> from free to redistribute sources such as Wikipedia, Wikinews,
> DBpedia, Gutenberg while collaborating with other interested projects
> such as OpenNLP and UIMA.
> 
> Jorn Kottman from OpenNLP & UIMA fame already expressed some interesst
> in attending such a workshop. A practical goal could be to develop
> some UI tools to manually refine / correct / complete tokenized and
> semi-annotated NER corpus automatically extracted from Wikipedia /
> DBpedia using pignlproc [1].
> 
> We could base such a work on existing projects such as wordfreak, the
> UIMA CasEditor or start a new web based UI for instance.
> 
> We could also extend the topic of the hackathon to improve or tools to
> build and distribute entity indices from various sources, use Apache
> Mahout to build eigen centrality ranking for entities out of the
> Wikipedia page links graph, and so on.
> 
> Please tell me if you are interested, and on what specific topic you
> would like to work.
> 
> [1] https://github.com/ogrisel/pignlproc
>

Re: Berlin Buzzword Hackathon?

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Tue, Feb 22, 2011 at 3:33 PM, Olivier Grisel
<ol...@ensta.org> wrote:
>... Isabel (the organizer) told me earlier that it should be possible to
> host a hackathon right after the conference. The topic  about R&D
> European projects that involve semantic technologies and open source
> projects such as Stanbol....

I think it would be great if Stanbol folks can attent such a hackathon
- personally I'll be unable to join, due to other plans.
-Bertrand