You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Valentin Waeselynck <va...@yahoo.fr> on 2013/12/04 12:52:50 UTC

[Laboratory Toolkit] proposing a new Apache Commons component

Hello to all,

As part of a small research project (which combined techniques of text-mining, machine-learning and natural language generation, not that it's really relevant) I have come to design a small JavaSE library, which I'm for the moment calling the Laboratory Toolkit, for developing our algorithms in a comfortable and flexible manner.

I have found it to be quite generic and reusable, not tied to any application domain, while still being rather accessible, and small enough to comprehend it easily.Therefore, I would like to propose it as a new Apache Commons component. I would be very grateful if one of you could tell me what steps I should follow for that purpose.


I have uploaded it on Github : https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may find the sources, the javadoc, and a small guide I have started to write for it(also attached to this mail).

Of course, I am very open to feedback and criticism on your behalf. The last thing I want is to publish an immature or useless component; nor do I take a positive answer from you for granted.


If I have failed to follow the proper procedure to propose a new candidate component, it is not on purpose, and I apologize in advance.

Whatever your reply, and since I have the chance, I would also like to congratulate you for all your work. The Apache Commons components have really been lifesavers to me, on many occasions.

With best wishes,

 
Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waeselynck@polytechnique.edu
+33 6 80 84 99 54

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Mark Fortner <ph...@gmail.com>.
You might also be interested in apache uima which is a popular text mining
platform.

Mark
On Dec 7, 2013 1:49 AM, "Valentin Waeselynck" <va...@yahoo.fr>
wrote:

> Thanks to all for your interest!
>
> The code examples are on their way, I'm trying to make them as diverse as
> possible. I'll let you know as soon as  they're ready.
>
>
> Thanks for telling me about tika, Oliver, it's very interesting! An
> algorithm that tries to extract the meaning of a variety of documents could
> typically be a combination of tika and the Laboratory Toolkit.
>
> However, the Laboratory Toolkit is less specialized (in fact, it's not
> specialized at all) and less concrete. It is similar in its genericity and
> in the nature of its benefits to, for example, the Executor API in
> java.concurrent. As the Executor API lets you think and design concurrent
> algorithms in terms of tasks and executors, the Laboratory Toolkit lets you
> think and design some other (I haven't found a satisfying description yet)
> algorithms in terms of analyses and laboratories.
>
> Bests,
>
>
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99
>  54
>
>
>
>
> Le Vendredi 6 décembre 2013 21h30, Oliver Heger <
> oliver.heger@oliver-heger.de> a écrit :
>
>
>
> Am 05.12.2013 13:44, schrieb Valentin Waeselynck:
> > Hello, and pleased to meet you,
> >
> > Thank you for your answer.
> >
> > I just asked for confirmation, and I do have full intellectual property
> on this software.
> >
> > About the use cases : no problem, I'll include some code samples. As a
> foreword, let's say it provides a convenient API for creating all sorts of
> custom "information extraction" algorithms.
> If the library is about information extraction, you may also want to
> have a look at the Apache Tika project [1].
>
> Oliver
>
> [1] http://tika.apache.org/
>
> >
> > As for the group of persons willing to maintain this : well, for the
> moment, there is me. As this is a quite small toolkit, I think it's
> sufficient, at least for a start.
> >
> > I'll start working towards the other requirements (maven + test
> coverage) right away and let you know as soon as it's ready.
> >
> >
> >
> > Should I keep answering to the whole ML about this, or only to you?
> >
> > Best regards,
> >
> >
> > Valentin WAESELYNCK
> > Étudiant en 3° année à l'École Polytechnique
> > valentin.waeselynck@polytechnique.edu
> > +33 6 80 84 99 54
> >
> >
> >
> >
> > Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a
> écrit :
> >
> > Bonjour Valentin,
> >
> >
>  welcome to the ML. Good to hear that you've decided to join the open
> source
> > movement.
> >
> > First of all, it would really help, if you could elaborate some use cases
> > for your library. You're talking about building algorithms. What kind of
> > algorithms can be build with Laboratory Toolkit? Can you give some code
> > examples (just create some gists at github that show the the use of
> > Laboratory Toolkit)?
> >
> > There is an important requirement for any code to be incorporated into
> the
> > Apache code base:
> > - the interlectual property (IP) of the code has to be owned completely
> by
> > the contributor. You said, that you've build the Laboratory Toolkit for a
> > research project. Are you sure that you own the code? Or
>  is it the result
> > of your work and thus is owned by your employer?
> >
> > At commons we have some additinal requirements:
> > - There should be a group of people who is willing to maintain the code
> > - Commons components should in general not depend on any other libraries
> > - Commons uses maven as the main build tool, so there should be a maven
> > build available
> > - The code should have a good test coverage
> >
> > You have to figure the IP issue out on your own first.
> > After that, if the community decides to accept this contribution, we can
> > work on the commons requirements.
> >
> > Best regards and thank you,
> > Benedikt
> >
> >
> >
> > 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
> >
> >>   Hello to all,
> >>
> >> As part of a small research project (which combined techniques of
> >> text-mining, machine-learning and natural language generation, not that
> >> it's really relevant) I have come to design a small JavaSE library,
> which
> >> I'm for the moment calling the Laboratory Toolkit, for developing our
> >> algorithms in a comfortable and flexible manner.
> >>
> >> I have found it to be quite generic and reusable, not tied to any
> >> application domain, while still being rather accessible, and
>  small enough
> >> to comprehend it easily. Therefore, I would like to propose it as a new
> >> Apache Commons component. I would be very grateful if one of you could
> >> tell me what steps I should follow for that purpose.
> >>
> >> I have uploaded it on Github :
> >> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
> >> find the sources, the javadoc, and a small guide I have started to write
> >> for it (also attached to this mail).
> >>
> >> Of course, I am very open to feedback and criticism on your behalf. The
> >> last thing I want is to publish an immature or useless component; nor
> do I
> >>
>  take a positive answer from you for granted.
> >>
> >> If I have failed to follow the proper procedure to propose a new
> candidate
> >> component, it is not on purpose, and I apologize in advance.
> >>
> >> Whatever your reply, and since I have the chance, I would also like to
> >> congratulate you for all your work. The Apache Commons components have
> >> really been lifesavers to me, on many occasions.
> >>
> >> With best wishes,
> >>
> >> Valentin WAESELYNCK
> >> Étudiant en 3° année à l'École Polytechnique
> >> valentin.waeselynck@polytechnique.edu
> >> +33 6 80 84 99 54
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
>
> >>
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Valentin Waeselynck <va...@yahoo.fr>.
Thanks to all for your interest!

The code examples are on their way, I'm trying to make them as diverse as possible. I'll let you know as soon as  they're ready.


Thanks for telling me about tika, Oliver, it's very interesting! An algorithm that tries to extract the meaning of a variety of documents could typically be a combination of tika and the Laboratory Toolkit.

However, the Laboratory Toolkit is less specialized (in fact, it's not specialized at all) and less concrete. It is similar in its genericity and in the nature of its benefits to, for example, the Executor API in java.concurrent. As the Executor API lets you think and design concurrent algorithms in terms of tasks and executors, the Laboratory Toolkit lets you think and design some other (I haven't found a satisfying description yet) algorithms in terms of analyses and laboratories.

Bests,

 
Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waeselynck@polytechnique.edu
+33 6 80 84 99
 54




Le Vendredi 6 décembre 2013 21h30, Oliver Heger <ol...@oliver-heger.de> a écrit :
 


Am 05.12.2013 13:44, schrieb Valentin Waeselynck:
> Hello, and pleased to meet you,
> 
> Thank you for your answer.
> 
> I just asked for confirmation, and I do have full intellectual property on this software.
> 
> About the use cases : no problem, I'll include some code samples. As a foreword, let's say it provides a convenient API for creating all sorts of custom "information extraction" algorithms.
If the library is about information extraction, you may also want to
have a look at the Apache Tika project [1].

Oliver

[1] http://tika.apache.org/

> 
> As for the group of persons willing to maintain this : well, for the moment, there is me. As this is a quite small toolkit, I think it's sufficient, at least for a start.
> 
> I'll start working towards the other requirements (maven + test coverage) right away and let you know as soon as it's ready.
> 
>  
> 
> Should I keep answering to the whole ML about this, or only to you?
> 
> Best regards,
> 
> 
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99 54
> 
> 
> 
> 
> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a écrit :
>  
> Bonjour Valentin,
> 
>
 welcome to the ML. Good to hear that you've decided to join the open source
> movement.
> 
> First of all, it would really help, if you could elaborate some use cases
> for your library. You're talking about building algorithms. What kind of
> algorithms can be build with Laboratory Toolkit? Can you give some code
> examples (just create some gists at github that show the the use of
> Laboratory Toolkit)?
> 
> There is an important requirement for any code to be incorporated into the
> Apache code base:
> - the interlectual property (IP) of the code has to be owned completely by
> the contributor. You said, that you've build the Laboratory Toolkit for a
> research project. Are you sure that you own the code? Or
 is it the result
> of your work and thus is owned by your employer?
> 
> At commons we have some additinal requirements:
> - There should be a group of people who is willing to maintain the code
> - Commons components should in general not depend on any other libraries
> - Commons uses maven as the main build tool, so there should be a maven
> build available
> - The code should have a good test coverage
> 
> You have to figure the IP issue out on your own first.
> After that, if the community decides to accept this contribution, we can
> work on the commons requirements.
> 
> Best regards and thank you,
> Benedikt
> 
> 
> 
> 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
> 
>>   Hello to all,
>>
>> As part of a small research project (which combined techniques of
>> text-mining, machine-learning and natural language generation, not that
>> it's really relevant) I have come to design a small JavaSE library, which
>> I'm for the moment calling the Laboratory Toolkit, for developing our
>> algorithms in a comfortable and flexible manner.
>>
>> I have found it to be quite generic and reusable, not tied to any
>> application domain, while still being rather accessible, and
 small enough
>> to comprehend it easily. Therefore, I would like to propose it as a new
>> Apache Commons component. I would be very grateful if one of you could
>> tell me what steps I should follow for that purpose.
>>
>> I have uploaded it on Github :
>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>> find the sources, the javadoc, and a small guide I have started to write
>> for it (also attached to this mail).
>>
>> Of course, I am very open to feedback and criticism on your behalf. The
>> last thing I want is to publish an immature or useless component; nor do I
>>
 take a positive answer from you for granted.
>>
>> If I have failed to follow the proper procedure to propose a new candidate
>> component, it is not on purpose, and I apologize in advance.
>>
>> Whatever your reply, and since I have the chance, I would also like to
>> congratulate you for all your work. The Apache Commons components have
>> really been lifesavers to me, on many occasions.
>>
>> With best wishes,
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org

>>
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Oliver Heger <ol...@oliver-heger.de>.

Am 05.12.2013 13:44, schrieb Valentin Waeselynck:
> Hello, and pleased to meet you,
> 
> Thank you for your answer.
> 
> I just asked for confirmation, and I do have full intellectual property on this software.
> 
> About the use cases : no problem, I'll include some code samples. As a foreword, let's say it provides a convenient API for creating all sorts of custom "information extraction" algorithms.
If the library is about information extraction, you may also want to
have a look at the Apache Tika project [1].

Oliver

[1] http://tika.apache.org/

> 
> As for the group of persons willing to maintain this : well, for the moment, there is me. As this is a quite small toolkit, I think it's sufficient, at least for a start.
> 
> I'll start working towards the other requirements (maven + test coverage) right away and let you know as soon as it's ready.
> 
>  
> 
> Should I keep answering to the whole ML about this, or only to you?
> 
> Best regards,
> 
> 
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99 54
> 
> 
> 
> 
> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a écrit :
>  
> Bonjour Valentin,
> 
> welcome to the ML. Good to hear that you've decided to join the open source
> movement.
> 
> First of all, it would really help, if you could elaborate some use cases
> for your library. You're talking about building algorithms. What kind of
> algorithms can be build with Laboratory Toolkit? Can you give some code
> examples (just create some gists at github that show the the use of
> Laboratory Toolkit)?
> 
> There is an important requirement for any code to be incorporated into the
> Apache code base:
> - the interlectual property (IP) of the code has to be owned completely by
> the contributor. You said, that you've build the Laboratory Toolkit for a
> research project. Are you sure that you own the code? Or is it the result
> of your work and thus is owned by your employer?
> 
> At commons we have some additinal requirements:
> - There should be a group of people who is willing to maintain the code
> - Commons components should in general not depend on any other libraries
> - Commons uses maven as the main build tool, so there should be a maven
> build available
> - The code should have a good test coverage
> 
> You have to figure the IP issue out on your own first.
> After that, if the community decides to accept this contribution, we can
> work on the commons requirements.
> 
> Best regards and thank you,
> Benedikt
> 
> 
> 
> 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
> 
>>   Hello to all,
>>
>> As part of a small research project (which combined techniques of
>> text-mining, machine-learning and natural language generation, not that
>> it's really relevant) I have come to design a small JavaSE library, which
>> I'm for the moment calling the Laboratory Toolkit, for developing our
>> algorithms in a comfortable and flexible manner.
>>
>> I have found it to be quite generic and reusable, not tied to any
>> application domain, while still being rather accessible, and small enough
>> to comprehend it easily. Therefore, I would like to propose it as a new
>> Apache Commons component. I would be very grateful if one of you could
>> tell me what steps I should follow for that purpose.
>>
>> I have uploaded it on Github :
>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>> find the sources, the javadoc, and a small guide I have started to write
>> for it (also attached to this mail).
>>
>> Of course, I am very open to feedback and criticism on your behalf. The
>> last thing I want is to publish an immature or useless component; nor do I
>> take a positive answer from you for granted.
>>
>> If I have failed to follow the proper procedure to propose a new candidate
>> component, it is not on purpose, and I apologize in advance.
>>
>> Whatever your reply, and since I have the chance, I would also like to
>> congratulate you for all your work. The Apache Commons components have
>> really been lifesavers to me, on many occasions.
>>
>> With best wishes,
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Valentin Waeselynck <va...@yahoo.fr>.
Hello to all,

As you asked me, I have changed the structure to make the Laboratory Toolkit a Maven project, and added some code samples to show its use cases. (Sorry for the delay, I've had a rough couple of weeks).

In the code samples, you may find the following examples :
    - accounting : the simplest example, in the field of enterprise finance. It's an  application that takes as an input the accounting documents of a company (the Balance Sheet and the Income Statement), and calculates from these a variety of financial quantities, such as the Net Income, and profitability ratios, such as the Return On Equity.

    - integer : a more mathematical example, in the fields of arithmetics and algebra. The base data is simpy a positive integer; the application computes things like the set of divisors of this integer, then some more advanced algebra objects such as the Ring of modulos of this integer, its canonical Chinese factorization and the isomorphism between them, up to its a set of generators of its Group of Invertibles. 

    - search-engine : implements a basic search-engine on a corpus of documents, using classical ranking functions such as BM25 or TF-IDF in a Vector Space Model. That one is closer to reality, it's directly inspired from an "Introduction to Big Data" class I just had.
    - text : illustrates the HTML guide in the repository, don't look at this directly.
 
Of course, these are only toy examples, and I don't have the ambition of replacing software that already does this very well; but I hope they're informative enough about this API's genericity and possible applications. 

In the real world, this API is originated from my developing an application that generates advertisements snippets from HTML product pages, in which I used this toolkit extensively for extracting and ranking keywords from the HTML document; but I can't show you that code. My experience with this is that I find it much easier to look at a sequence of formulas on a paper, and implement them one by one.

In my opinion, the main features of this API are :
    - making some algorithms easier to develop by expressing their concepts in terms of analyses and results (as an analogy, think of how the Executor API lets you describe concurrent algorithms in terms of tasks and executors)
    - built-in support for the Intercept, Cache and Invoke pattern
    - enforcing a modular architecture without hindering the communication between the modules (i.e the Analysis objects)
    - through the use of Laboratory objects, emulating a new scope that is an alternative to class scope or method scope.
    - separating the concerns of declaring of the steps of an algorithm are computed, and externally requesting their results.
    - encouraging the exploration of a space of strategies and parameters for the algorithms, by concentrating all these parameters in one place (the Equipment object).

I hope you'll like it, and I'm always eager for feedback!

With best wishes,


Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waeselynck@polytechnique.edu
+33 6 80 84 99 54




Le Vendredi 6 décembre 2013 14h21, Benedikt Ritter <br...@apache.org> a écrit :
 
2013/12/5 Christian Grobmeier <gr...@gmail.com>

> On 5 Dec 2013, at 13:44, Valentin Waeselynck wrote:
>
>  Should I keep answering to the whole ML about this, or only to you?
>>
>
> Keep the mailing list in loop. There might be others interested in this.
> In addition ml do document history which is why we always use the ml.


Thanks for chiming in on this, Christian!

Valentin: Before you invest a lot of work to get maven and some tests in
place, let us start with the example code, so that people can decide
 if
your projects fits into commons.

Benedikt


>
>
>
>
>
>
>> Best regards,
>>
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>>
>>
>> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a
>> écrit :
>>
>> Bonjour Valentin,
>>
>> welcome to the ML. Good to hear that you've decided to join the open
>> source
>> movement.
>>
>> First of all, it would really help, if you could elaborate some use cases
>> for your library. You're talking about building algorithms. What kind of
>> algorithms can be build with Laboratory Toolkit? Can you give some code
>> examples (just create some gists at github that show the the use of
>> Laboratory Toolkit)?
>>
>> There is an important requirement for any code to be incorporated into the
>> Apache
 code base:
>> - the interlectual property (IP) of the code has to be owned completely by
>> the contributor. You said, that you've build the Laboratory Toolkit for a
>> research project. Are you sure that you own the code? Or is it the result
>> of your work and thus is owned by your employer?
>>
>> At commons we have some additinal requirements:
>> - There should be a group of people who is willing to maintain the code
>> - Commons components should in general not depend on any other libraries
>> - Commons uses maven as the main build tool, so there should be a maven
>> build available
>> - The code should have a good test coverage
>>
>> You have to figure the IP issue
 out on your own first.
>> After that, if the community decides to accept this contribution, we can
>> work on the commons requirements.
>>
>> Best regards and thank you,
>> Benedikt
>>
>>
>>
>> 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
>>
>>    Hello to all,
>>>
>>> As part of a small research project (which combined techniques of
>>> text-mining, machine-learning and natural language generation, not that
>>> it's really relevant) I have come to design a small JavaSE library,
 which
>>> I'm for the moment calling the Laboratory Toolkit, for developing our
>>> algorithms in a comfortable and flexible manner.
>>>
>>> I have found it to be quite generic and reusable, not tied to any
>>> application domain, while still being rather accessible, and small enough
>>> to comprehend it easily. Therefore, I would like to propose it as a new
>>> Apache Commons component. I would be very grateful if one of you could
>>> tell me what steps I should follow for that purpose.
>>>
>>> I have uploaded it on Github :
>>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>>> find the sources, the javadoc, and a small guide I have started to write
>>> for it (also attached to this mail).
>>>
>>> Of course, I am very open to feedback and criticism on your behalf. The
>>> last thing I want is to publish an immature or useless component; nor do
>>> I
>>> take a positive answer from you for granted.
>>>
>>> If I have failed to follow the proper procedure to propose a new
>>> candidate
>>> component, it is not on purpose, and I apologize in advance.
>>>
>>> Whatever your reply, and since I have the chance, I would also like to
>>> congratulate you for all your
 work. The Apache Commons components have
>>> really been lifesavers to me, on many occasions.
>>>
>>> With best wishes,
>>>
>>> Valentin WAESELYNCK
>>> Étudiant en 3° année à l'École Polytechnique
>>> valentin.waeselynck@polytechnique.edu
>>> +33 6 80 84 99 54
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>
>
> ---
> http://www.grobmeier.de
> @grobmeier
> GPG: 0xA5CC90DB

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Benedikt Ritter <br...@apache.org>.
2013/12/5 Christian Grobmeier <gr...@gmail.com>

> On 5 Dec 2013, at 13:44, Valentin Waeselynck wrote:
>
>  Should I keep answering to the whole ML about this, or only to you?
>>
>
> Keep the mailing list in loop. There might be others interested in this.
> In addition ml do document history which is why we always use the ml.


Thanks for chiming in on this, Christian!

Valentin: Before you invest a lot of work to get maven and some tests in
place, let us start with the example code, so that people can decide if
your projects fits into commons.

Benedikt


>
>
>
>
>
>
>> Best regards,
>>
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>>
>>
>> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a
>> écrit :
>>
>> Bonjour Valentin,
>>
>> welcome to the ML. Good to hear that you've decided to join the open
>> source
>> movement.
>>
>> First of all, it would really help, if you could elaborate some use cases
>> for your library. You're talking about building algorithms. What kind of
>> algorithms can be build with Laboratory Toolkit? Can you give some code
>> examples (just create some gists at github that show the the use of
>> Laboratory Toolkit)?
>>
>> There is an important requirement for any code to be incorporated into the
>> Apache code base:
>> - the interlectual property (IP) of the code has to be owned completely by
>> the contributor. You said, that you've build the Laboratory Toolkit for a
>> research project. Are you sure that you own the code? Or is it the result
>> of your work and thus is owned by your employer?
>>
>> At commons we have some additinal requirements:
>> - There should be a group of people who is willing to maintain the code
>> - Commons components should in general not depend on any other libraries
>> - Commons uses maven as the main build tool, so there should be a maven
>> build available
>> - The code should have a good test coverage
>>
>> You have to figure the IP issue out on your own first.
>> After that, if the community decides to accept this contribution, we can
>> work on the commons requirements.
>>
>> Best regards and thank you,
>> Benedikt
>>
>>
>>
>> 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
>>
>>    Hello to all,
>>>
>>> As part of a small research project (which combined techniques of
>>> text-mining, machine-learning and natural language generation, not that
>>> it's really relevant) I have come to design a small JavaSE library, which
>>> I'm for the moment calling the Laboratory Toolkit, for developing our
>>> algorithms in a comfortable and flexible manner.
>>>
>>> I have found it to be quite generic and reusable, not tied to any
>>> application domain, while still being rather accessible, and small enough
>>> to comprehend it easily. Therefore, I would like to propose it as a new
>>> Apache Commons component. I would be very grateful if one of you could
>>> tell me what steps I should follow for that purpose.
>>>
>>> I have uploaded it on Github :
>>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>>> find the sources, the javadoc, and a small guide I have started to write
>>> for it (also attached to this mail).
>>>
>>> Of course, I am very open to feedback and criticism on your behalf. The
>>> last thing I want is to publish an immature or useless component; nor do
>>> I
>>> take a positive answer from you for granted.
>>>
>>> If I have failed to follow the proper procedure to propose a new
>>> candidate
>>> component, it is not on purpose, and I apologize in advance.
>>>
>>> Whatever your reply, and since I have the chance, I would also like to
>>> congratulate you for all your work. The Apache Commons components have
>>> really been lifesavers to me, on many occasions.
>>>
>>> With best wishes,
>>>
>>> Valentin WAESELYNCK
>>> Étudiant en 3° année à l'École Polytechnique
>>> valentin.waeselynck@polytechnique.edu
>>> +33 6 80 84 99 54
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>
>
> ---
> http://www.grobmeier.de
> @grobmeier
> GPG: 0xA5CC90DB
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Christian Grobmeier <gr...@gmail.com>.
On 5 Dec 2013, at 13:44, Valentin Waeselynck wrote:

> Should I keep answering to the whole ML about this, or only to you?

Keep the mailing list in loop. There might be others interested in this.
In addition ml do document history which is why we always use the ml.




>
> Best regards,
>
>
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99 54
>
>
>
>
> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a 
> écrit :
>
> Bonjour Valentin,
>
> welcome to the ML. Good to hear that you've decided to join the open 
> source
> movement.
>
> First of all, it would really help, if you could elaborate some use 
> cases
> for your library. You're talking about building algorithms. What kind 
> of
> algorithms can be build with Laboratory Toolkit? Can you give some 
> code
> examples (just create some gists at github that show the the use of
> Laboratory Toolkit)?
>
> There is an important requirement for any code to be incorporated into 
> the
> Apache code base:
> - the interlectual property (IP) of the code has to be owned 
> completely by
> the contributor. You said, that you've build the Laboratory Toolkit 
> for a
> research project. Are you sure that you own the code? Or is it the 
> result
> of your work and thus is owned by your employer?
>
> At commons we have some additinal requirements:
> - There should be a group of people who is willing to maintain the 
> code
> - Commons components should in general not depend on any other 
> libraries
> - Commons uses maven as the main build tool, so there should be a 
> maven
> build available
> - The code should have a good test coverage
>
> You have to figure the IP issue out on your own first.
> After that, if the community decides to accept this contribution, we 
> can
> work on the commons requirements.
>
> Best regards and thank you,
> Benedikt
>
>
>
> 2013/12/4 Valentin Waeselynck <va...@yahoo.fr>
>
>>   Hello to all,
>>
>> As part of a small research project (which combined techniques of
>> text-mining, machine-learning and natural language generation, not 
>> that
>> it's really relevant) I have come to design a small JavaSE library, 
>> which
>> I'm for the moment calling the Laboratory Toolkit, for developing our
>> algorithms in a comfortable and flexible manner.
>>
>> I have found it to be quite generic and reusable, not tied to any
>> application domain, while still being rather accessible, and small 
>> enough
>> to comprehend it easily. Therefore, I would like to propose it as a 
>> new
>> Apache Commons component. I would be very grateful if one of you 
>> could
>> tell me what steps I should follow for that purpose.
>>
>> I have uploaded it on Github :
>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>> find the sources, the javadoc, and a small guide I have started to 
>> write
>> for it (also attached to this mail).
>>
>> Of course, I am very open to feedback and criticism on your behalf. 
>> The
>> last thing I want is to publish an immature or useless component; nor 
>> do I
>> take a positive answer from you for granted.
>>
>> If I have failed to follow the proper procedure to propose a new 
>> candidate
>> component, it is not on purpose, and I apologize in advance.
>>
>> Whatever your reply, and since I have the chance, I would also like 
>> to
>> congratulate you for all your work. The Apache Commons components 
>> have
>> really been lifesavers to me, on many occasions.
>>
>> With best wishes,
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
>
>
> -- 
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter


---
http://www.grobmeier.de
@grobmeier
GPG: 0xA5CC90DB

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Valentin Waeselynck <va...@yahoo.fr>.
Hello, and pleased to meet you,

Thank you for your answer.

I just asked for confirmation, and I do have full intellectual property on this software.

About the use cases : no problem, I'll include some code samples. As a foreword, let's say it provides a convenient API for creating all sorts of custom "information extraction" algorithms.

As for the group of persons willing to maintain this : well, for the moment, there is me. As this is a quite small toolkit, I think it's sufficient, at least for a start.

I'll start working towards the other requirements (maven + test coverage) right away and let you know as soon as it's ready.

 

Should I keep answering to the whole ML about this, or only to you?

Best regards,


Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waeselynck@polytechnique.edu
+33 6 80 84 99 54




Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <br...@apache.org> a écrit :
 
Bonjour Valentin,

welcome to the ML. Good to hear that you've decided to join the open source
movement.

First of all, it would really help, if you could elaborate some use cases
for your library. You're talking about building algorithms. What kind of
algorithms can be build with Laboratory Toolkit? Can you give some code
examples (just create some gists at github that show the the use of
Laboratory Toolkit)?

There is an important requirement for any code to be incorporated into the
Apache code base:
- the interlectual property (IP) of the code has to be owned completely by
the contributor. You said, that you've build the Laboratory Toolkit for a
research project. Are you sure that you own the code? Or is it the result
of your work and thus is owned by your employer?

At commons we have some additinal requirements:
- There should be a group of people who is willing to maintain the code
- Commons components should in general not depend on any other libraries
- Commons uses maven as the main build tool, so there should be a maven
build available
- The code should have a good test coverage

You have to figure the IP issue out on your own first.
After that, if the community decides to accept this contribution, we can
work on the commons requirements.

Best regards and thank you,
Benedikt



2013/12/4 Valentin Waeselynck <va...@yahoo.fr>

>  Hello to all,
>
> As part of a small research project (which combined techniques of
> text-mining, machine-learning and natural language generation, not that
> it's really relevant) I have come to design a small JavaSE library, which
> I'm for the moment calling the Laboratory Toolkit, for developing our
> algorithms in a comfortable and flexible manner.
>
> I have found it to be quite generic and reusable, not tied to any
> application domain, while still being rather accessible, and small enough
> to comprehend it easily. Therefore, I would like to propose it as a new
> Apache Commons component. I would be very grateful if one of you could
> tell me what steps I should follow for that purpose.
>
> I have uploaded it on Github :
> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
> find the sources, the javadoc, and a small guide I have started to write
> for it (also attached to this mail).
>
> Of course, I am very open to feedback and criticism on your behalf. The
> last thing I want is to publish an immature or useless component; nor do I
> take a positive answer from you for granted.
>
> If I have failed to follow the proper procedure to propose a new candidate
> component, it is not on purpose, and I apologize in advance.
>
> Whatever your reply, and since I have the chance, I would also like to
> congratulate you for all your work. The Apache Commons components have
> really been lifesavers to me, on many occasions.
>
> With best wishes,
>
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99 54
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [Laboratory Toolkit] proposing a new Apache Commons component

Posted by Benedikt Ritter <br...@apache.org>.
Bonjour Valentin,

welcome to the ML. Good to hear that you've decided to join the open source
movement.

First of all, it would really help, if you could elaborate some use cases
for your library. You're talking about building algorithms. What kind of
algorithms can be build with Laboratory Toolkit? Can you give some code
examples (just create some gists at github that show the the use of
Laboratory Toolkit)?

There is an important requirement for any code to be incorporated into the
Apache code base:
- the interlectual property (IP) of the code has to be owned completely by
the contributor. You said, that you've build the Laboratory Toolkit for a
research project. Are you sure that you own the code? Or is it the result
of your work and thus is owned by your employer?

At commons we have some additinal requirements:
- There should be a group of people who is willing to maintain the code
- Commons components should in general not depend on any other libraries
- Commons uses maven as the main build tool, so there should be a maven
build available
- The code should have a good test coverage

You have to figure the IP issue out on your own first.
After that, if the community decides to accept this contribution, we can
work on the commons requirements.

Best regards and thank you,
Benedikt


2013/12/4 Valentin Waeselynck <va...@yahoo.fr>

>  Hello to all,
>
> As part of a small research project (which combined techniques of
> text-mining, machine-learning and natural language generation, not that
> it's really relevant) I have come to design a small JavaSE library, which
> I'm for the moment calling the Laboratory Toolkit, for developing our
> algorithms in a comfortable and flexible manner.
>
> I have found it to be quite generic and reusable, not tied to any
> application domain, while still being rather accessible, and small enough
> to comprehend it easily. Therefore, I would like to propose it as a new
> Apache Commons component. I would be very grateful if one of you could
> tell me what steps I should follow for that purpose.
>
> I have uploaded it on Github :
> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
> find the sources, the javadoc, and a small guide I have started to write
> for it (also attached to this mail).
>
> Of course, I am very open to feedback and criticism on your behalf. The
> last thing I want is to publish an immature or useless component; nor do I
> take a positive answer from you for granted.
>
> If I have failed to follow the proper procedure to propose a new candidate
> component, it is not on purpose, and I apologize in advance.
>
> Whatever your reply, and since I have the chance, I would also like to
> congratulate you for all your work. The Apache Commons components have
> really been lifesavers to me, on many occasions.
>
> With best wishes,
>
> Valentin WAESELYNCK
> Étudiant en 3° année à l'École Polytechnique
> valentin.waeselynck@polytechnique.edu
> +33 6 80 84 99 54
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter