You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Aliaksandr Autayeu <al...@autayeu.com> on 2012/01/02 21:51:38 UTC

Re: using Scala for opennlp.ml

An interesting post about Scala:
http://goodstuff.im/yes-virginia-scala-is-hard

Jason, by the way, from what I saw following your link, I might love Scala
as much as... StringJoin @@ (MapThread[#1 <> #2 &, {{"I", "l", "v", "
", "P"}, {" ", "o", "e", "F", "!"}}]) (that's in Mathematica language) but
I would consider the article above, keeping in mind the audience of the
project.

Aliaksandr

On Thu, Dec 29, 2011 at 6:46 PM, Aliaksandr Autayeu
<al...@autayeu.com>wrote:

> IMO, Java advantage is that of being a standard de-facto, with mature
> tools, infrastructure and other things. I'm sure, there are many cool toys
> out there (git was a cool toy once, and SVN is still a standard de-facto in
> many places), but ease of use of OpenNLP (standard Java without excessive
> dependencies + maven) is an enormous advantage of OpenNLP in most of my use
> cases.
>
> Aliaksandr
>
> P.S. The link is a nice collection of material!
>
> On Thu, Dec 29, 2011 at 6:27 PM, Jason Baldridge <jasonbaldridge@gmail.com
> > wrote:
>
>> I'd really like to use Scala in the opennlp.ml rewrite, for reasons I've
>> already stated on the list. My thinking on this is to do the first
>> reorganization for opennlp.ml in pure Java, make a release, and then
>> starting mixing in Scala. I've been happily mixing Scala and Java on a
>> number of projects without much fuss. However, I do so in the context of
>> using SBT (simple build tool), rather than maven (SBT can read Maven
>> declarations, FWIW). It is quite straightforward to use, and I'm now using
>> Eclipse with the Scala IDE for Eclipse to build Java/Scala projects - so
>> it
>> should be straightforward for others to get up and running with it.
>>
>> I'd be interested in hearing whether anyone has any particular concerns or
>> objections about this plan. Also interested in hearing whether anyone is
>> particularly keen on the use of Scala.
>>
>> BTW, if you haven't seen much of Scala before, I have some very gentle
>> introductions (aimed at first time programmers) for getting started with
>> it
>> on my blog. You can find links to the posts, plus to lots of other
>> resources here:
>>
>> http://icl-f11.utcompling.com/links
>>
>> --
>> Jason Baldridge
>> Associate Professor, Department of Linguistics
>> The University of Texas at Austin
>> http://www.jasonbaldridge.com
>> http://twitter.com/jasonbaldridge
>>
>
>

Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
 It seems that there is a Maven plugin for building Scala and Java
projects. I'm not a fan of Maven myself, and have been quite happy with
SBT. But, I'm open to exploring options to keep things with Maven.

On Sat, Jan 7, 2012 at 6:24 AM, Aliaksandr Autayeu
<al...@autayeu.com>wrote:

> Maven is quite popular and well supported in Java community. I would be
> cautious moving away from it. May be there is a way the build Scala from
> Maven?
>
> Aliaksandr
>
> On Sat, Jan 7, 2012 at 1:19 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
> > On 1/7/12 4:40 AM, Jason Baldridge wrote:
> >
> >> Though most people seem focused on Eclipse or Intellij. Until recently,
> >> Intellij seemed to be the undisputed best for Scala, but now that
> Typesafe
> >> has been working on Eclipse, that has pretty much caught up (and is what
> >> I'm using). It is also worth mentioning that Emacs or Vi along with SBT
> is
> >> a quite satisfactory development environment for Scala.
> >>
> >
> > Our build is currently still based on maven, and it might take quite
> > some time to migrate that to SBT.
> >
> > Jörn
> >
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
Maven is quite popular and well supported in Java community. I would be
cautious moving away from it. May be there is a way the build Scala from
Maven?

Aliaksandr

On Sat, Jan 7, 2012 at 1:19 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 1/7/12 4:40 AM, Jason Baldridge wrote:
>
>> Though most people seem focused on Eclipse or Intellij. Until recently,
>> Intellij seemed to be the undisputed best for Scala, but now that Typesafe
>> has been working on Eclipse, that has pretty much caught up (and is what
>> I'm using). It is also worth mentioning that Emacs or Vi along with SBT is
>> a quite satisfactory development environment for Scala.
>>
>
> Our build is currently still based on maven, and it might take quite
> some time to migrate that to SBT.
>
> Jörn
>

Re: using Scala for opennlp.ml

Posted by Jörn Kottmann <ko...@gmail.com>.
On 1/7/12 4:40 AM, Jason Baldridge wrote:
> Though most people seem focused on Eclipse or Intellij. Until recently,
> Intellij seemed to be the undisputed best for Scala, but now that Typesafe
> has been working on Eclipse, that has pretty much caught up (and is what
> I'm using). It is also worth mentioning that Emacs or Vi along with SBT is
> a quite satisfactory development environment for Scala.

Our build is currently still based on maven, and it might take quite
some time to migrate that to SBT.

Jörn

Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
For Scala development with an IDE, you do need to have a plugin. Looks like
this is the one for NetBeans:

http://java.net/projects/nbscala

Though most people seem focused on Eclipse or Intellij. Until recently,
Intellij seemed to be the undisputed best for Scala, but now that Typesafe
has been working on Eclipse, that has pretty much caught up (and is what
I'm using). It is also worth mentioning that Emacs or Vi along with SBT is
a quite satisfactory development environment for Scala.

If you are building your own project in Java and want to use a Scala
package, then you don't have to do anything special since Scala produces
byte code and can be used like the products of any Java package.


On Fri, Jan 6, 2012 at 4:51 PM, James Kosin <ja...@gmail.com> wrote:

> Jason,
>
> I'm also not using Eclipse, does Scala have a plugin for Netbeans?  Or
> can you just use the Scala libraries directly from Java without a plugin
> or install?
>
> James
>
> On 1/6/2012 9:46 AM, Jason Baldridge wrote:
> > Thanks everyone, for your feedback.
> >
> > Doing a mixed Scala/Java build actually doesn't add much complexity, in
> my
> > experience. Using SBT and/or the Scala IDE for Eclipse, it all works very
> > straightforwardly. Having said that, I'd be happy with a pure Scala
> package
> > that provides a good Java API.
> >
> > I'm relatively new to Eclipse, but have been using it quite happily for
> > Scala+Java development for the past month. I also find that Scala makes
> one
> > less dependent on an IDE than Java, mainly because code is much more
> > concise and one can have multiple classes per file - so you don't have to
> > navigate around as much code in so many files.
> >
> > As for developers, it sounds like we actually have some critical mass
> here,
> > and basically all my students are using Scala. I personally am much more
> > likely to contribute more code if I can do so in Scala, both because I
> far
> > prefer it and because I'll be creating Scala code for my class this
> > semester that I could ideally do in opennlp.ml.
> >
> > Having some sort of plugin architecture would be possible and probably
> > quite nice, though it probably would not be my first priority.
> >
> > Jason
> >
> > On Fri, Jan 6, 2012 at 6:19 AM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >
> >> I still don't like mixing Java and Scala. If we do a complete rewrite
> >> of the perceptron and maxent implementation Scala would be
> >> an option. If we just do a little Scala and transform some of the
> >> existing classes it doesn't seem reasonable to me to add all the
> >> complexity to the build.
> >>
> >> Jörn
> >>
> >>
> >> On 1/6/12 10:50 AM, Olivier Grisel wrote:
> >>
> >>> My 2 cents:
> >>>
> >>> As a potential contributor to the opennlp.ml package with a good
> >>> background in machine learning, I would say Scala is not a barrier for
> >>> me (even if I don't use it often right now). Maybe even the opposite
> >>> as I find coding in Scala more fun than Java. Profiling and perf
> >>> tuning can be a bit harder though.
> >>>
> >>> The most important drawback I had against Scala in the past was the
> >>> poor / buggy Eclipse support for Scala which made it painful to work
> >>> with multi-language (Java + Scala) projects but the situation has very
> >>> much improved over the past 2 years.
> >>>
> >>>
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by James Kosin <ja...@gmail.com>.
Jason,

I'm also not using Eclipse, does Scala have a plugin for Netbeans?  Or
can you just use the Scala libraries directly from Java without a plugin
or install?

James

On 1/6/2012 9:46 AM, Jason Baldridge wrote:
> Thanks everyone, for your feedback.
>
> Doing a mixed Scala/Java build actually doesn't add much complexity, in my
> experience. Using SBT and/or the Scala IDE for Eclipse, it all works very
> straightforwardly. Having said that, I'd be happy with a pure Scala package
> that provides a good Java API.
>
> I'm relatively new to Eclipse, but have been using it quite happily for
> Scala+Java development for the past month. I also find that Scala makes one
> less dependent on an IDE than Java, mainly because code is much more
> concise and one can have multiple classes per file - so you don't have to
> navigate around as much code in so many files.
>
> As for developers, it sounds like we actually have some critical mass here,
> and basically all my students are using Scala. I personally am much more
> likely to contribute more code if I can do so in Scala, both because I far
> prefer it and because I'll be creating Scala code for my class this
> semester that I could ideally do in opennlp.ml.
>
> Having some sort of plugin architecture would be possible and probably
> quite nice, though it probably would not be my first priority.
>
> Jason
>
> On Fri, Jan 6, 2012 at 6:19 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> I still don't like mixing Java and Scala. If we do a complete rewrite
>> of the perceptron and maxent implementation Scala would be
>> an option. If we just do a little Scala and transform some of the
>> existing classes it doesn't seem reasonable to me to add all the
>> complexity to the build.
>>
>> Jörn
>>
>>
>> On 1/6/12 10:50 AM, Olivier Grisel wrote:
>>
>>> My 2 cents:
>>>
>>> As a potential contributor to the opennlp.ml package with a good
>>> background in machine learning, I would say Scala is not a barrier for
>>> me (even if I don't use it often right now). Maybe even the opposite
>>> as I find coding in Scala more fun than Java. Profiling and perf
>>> tuning can be a bit harder though.
>>>
>>> The most important drawback I had against Scala in the past was the
>>> poor / buggy Eclipse support for Scala which made it painful to work
>>> with multi-language (Java + Scala) projects but the situation has very
>>> much improved over the past 2 years.
>>>
>>>
>


Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
The Play framework is a Java/Scala hybrid:  http://www.playframework.org/

Akka is a core Scala library with a Java API: http://akka.io/

I don't know how central Scala is to it, but Camel has a Scala DSL:
http://camel.apache.org/scala-dsl.html
http://java.dzone.com/articles/apache-camel-and-scala

There are some core NLP/ML libraries written in Scala that are written by
some very smart and interesting people, e.g.:

ScalaNLP: http://www.scalanlp.org/
Factorie: http://code.google.com/p/factorie/

Both of these are ASL 2.0.

I think it is also worth pointing out that there are some promising, very
recent libraries for using Scala to write MapReduce jobs much more
effectively:

https://github.com/twitter/scalding
https://github.com/NICTA/scoobi
https://github.com/cloudera/crunch/tree/master/scrunch

Spark is really interesting too, and it pwns Hadoop on iterative jobs:

https://github.com/mesos/spark

Though it is worth noting that there are Java alternatives like Peregrine
that might have similar properties, like Peregrine (which was just
announced and which I haven't looked into in detail):

http://peregrine_mapreduce.bitbucket.org/

Though then you don't get some of the benefits of the Scala way of doing
things. ;)


On Sat, Jan 14, 2012 at 7:20 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 1/14/12 12:35 PM, Chris Fournier wrote:
>
>> Are there any examples of widely adopted Java/Scala hybrid open-source
>> projects?  This can't be the only project to have faced this question
>> before; one could learn from another project's success/folly.
>>
>
> I was searching for one but couldn't really find one.
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by Jörn Kottmann <ko...@gmail.com>.
On 1/14/12 12:35 PM, Chris Fournier wrote:
> Are there any examples of widely adopted Java/Scala hybrid open-source
> projects?  This can't be the only project to have faced this question
> before; one could learn from another project's success/folly.

I was searching for one but couldn't really find one.

Jörn

Re: using Scala for opennlp.ml

Posted by Chris Fournier <ch...@gmail.com>.
Are there any examples of widely adopted Java/Scala hybrid open-source
projects?  This can't be the only project to have faced this question
before; one could learn from another project's success/folly.

Chris
On Jan 14, 2012 12:11 AM, "Jason Baldridge" <ja...@gmail.com>
wrote:

> It's a perfectly fair question to ask. My first response is that I've been
> programming primarily in Scala for more than a year and I not only enjoy it
> but find myself far more productive with it. I am actually highly reluctant
> to write Java code given that I have Scala as an alternative. I used to use
> Python for quick scripting and Java for larger applications, but now I can
> happily use Scala for both. But it goes beyond that -- it provides plenty
> of opportunities to program in a very different way than either Python or
> Java support. Mainly that is using functional programming -- once you are
> used to it, not being able to program functionally becomes painful, and, in
> my experience, far less productive. It isn't a natural thing for many
> people initially, but a very nice thing about Scala is that it actually
> allows you to code in the imperative style you might be accustomed to while
> gradually bringing functional aspects into your programs. And it goes well
> beyond the nice little examples of how a various bit of Scala code is
> shorter than a given bit of Java that does the same thing -- it leads to
> different, and I would say, generally better design. (Though for what it's
> worth, the significant reduction in boilerplate code over Java is truly
> liberating.)
>
> Other things to like about Scala are type inference, immutable data
> structures, an amazing collections library, much better object orientation
> than Java, and pattern matching (not regex, but switch statements on
> steroids).  The fact that it compiles to Java byte code makes integration
> with Java and use of Java APIs quite straightforward, which was a reason
> for me to prefer it to other alternatives than Java. There's much more,
> including many intangibles that come with experience. Here's an article
> that conveys some of that, from the perspective of coming from Python:
>
> http://www.artima.com/weblogs/viewpost.jsp?thread=328540
>
> As for programmers, there is actually a very strong Scala contingent in the
> NLP and machine learning world, including groups at UMass Amherst,
> Stanford, and UT Austin, and probably elsewhere. Scala is also seeing
> corporate adoption, though of course it has nothing like the numbers of
> Java programmers. Most of my students are now using Scala, so having
> opennlp.ml be in Scala will be convenient for work they could contribute
> to
> the package.
>
> There have been a lot of reasons in the past not to use Scala, especially
> poor IDE support and problems with backward compatibility that made it
> problematic for enterprise projects. That has changed a great deal in the
> past year, especially with the efforts being made by Typesafe.
>
> Happy to discuss more!
>
> Jason
>
> On Tue, Jan 10, 2012 at 8:53 PM, James Kosin <ja...@gmail.com>
> wrote:
>
> > Everyone,
> >
> > +1
> > I'm okay with going forward with this; but, I must ask Why?  I know
> > Scala may be a good thing; but, if it generates Java byte code then
> > isn't there an equivalent way to write the same things in Java?
> >
> > What sort of benefit will we get with the code migrated and written in
> > Scala?  Even the author of that article said not many know the inner
> > workings of the language....  He was one of the few.
> >
> > Maybe we could ask or have a poll taken to see how many know Scala in
> > the community?
> >
> > Sorry for my concerns, or if they seem harsh or over-analytical.
> >
> > Just concerned,
> > James
> >
> > On 1/10/2012 12:20 AM, Jason Baldridge wrote:
> > > +1 to this in general, though I'm not into over-architecting things
> > > initially. Would be great to get things humming and then start
> supporting
> > > more pluggability.
> > >
> > > On Sat, Jan 7, 2012 at 7:33 AM, Jörn Kottmann <ko...@gmail.com>
> > wrote:
> > >
> > >> On 1/7/12 2:22 PM, Grant Ingersoll wrote:
> > >>
> > >>> Being able to take advantage of other classifiers seems like it would
> > be
> > >>> a really nice thing to be able to do.  I'd love to put OpenNLP over
> > Mahout
> > >>> or others.
> > >>>
> > >>> Besides, for testing purposes, if you could plugin the existing
> > >>> capability versus your new rewrite (in Scala) then you could easily
> > compare
> > >>> the two.  I can't imagine the abstraction layer is more than a few
> > >>> interfaces or abstract classes plus a bit of
> > configuration/injection/fill
> > >>> in the blank that allows one to specify the implementation.
> > >>>
> > >> Yes, we need plug-able classifiers and support for extensive
> > >> modification/extension of
> > >> our existing components. You are welcome to help us with that.
> > >>
> > >> One way of implementing this is to specify a (optional) factory class
> > >> during training
> > >> which is used to create a model (classifier). A second type of factory
> > >> class could
> > >> be specified to modify a component.
> > >>
> > >> These factory class names will be stored in our zip model package, and
> > can
> > >> then be used to instantiated the extensions which are necessary to run
> > the
> > >> component.
> > >>
> > >> The disadvantage of this approach is that it might not work well with
> > OSGi.
> > >> A big advantage is that OpenNLP itself will take care of configuring
> > >> everything
> > >> and the code needed to run an OpenNLP component is identical, even if
> > the
> > >> model
> > >> uses "custom" extensions. These must only be on the class path.
> > >>
> > >> Jörn
> > >>
> > >
> > >
> >
> >
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>

Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
It's a perfectly fair question to ask. My first response is that I've been
programming primarily in Scala for more than a year and I not only enjoy it
but find myself far more productive with it. I am actually highly reluctant
to write Java code given that I have Scala as an alternative. I used to use
Python for quick scripting and Java for larger applications, but now I can
happily use Scala for both. But it goes beyond that -- it provides plenty
of opportunities to program in a very different way than either Python or
Java support. Mainly that is using functional programming -- once you are
used to it, not being able to program functionally becomes painful, and, in
my experience, far less productive. It isn't a natural thing for many
people initially, but a very nice thing about Scala is that it actually
allows you to code in the imperative style you might be accustomed to while
gradually bringing functional aspects into your programs. And it goes well
beyond the nice little examples of how a various bit of Scala code is
shorter than a given bit of Java that does the same thing -- it leads to
different, and I would say, generally better design. (Though for what it's
worth, the significant reduction in boilerplate code over Java is truly
liberating.)

Other things to like about Scala are type inference, immutable data
structures, an amazing collections library, much better object orientation
than Java, and pattern matching (not regex, but switch statements on
steroids).  The fact that it compiles to Java byte code makes integration
with Java and use of Java APIs quite straightforward, which was a reason
for me to prefer it to other alternatives than Java. There's much more,
including many intangibles that come with experience. Here's an article
that conveys some of that, from the perspective of coming from Python:

http://www.artima.com/weblogs/viewpost.jsp?thread=328540

As for programmers, there is actually a very strong Scala contingent in the
NLP and machine learning world, including groups at UMass Amherst,
Stanford, and UT Austin, and probably elsewhere. Scala is also seeing
corporate adoption, though of course it has nothing like the numbers of
Java programmers. Most of my students are now using Scala, so having
opennlp.ml be in Scala will be convenient for work they could contribute to
the package.

There have been a lot of reasons in the past not to use Scala, especially
poor IDE support and problems with backward compatibility that made it
problematic for enterprise projects. That has changed a great deal in the
past year, especially with the efforts being made by Typesafe.

Happy to discuss more!

Jason

On Tue, Jan 10, 2012 at 8:53 PM, James Kosin <ja...@gmail.com> wrote:

> Everyone,
>
> +1
> I'm okay with going forward with this; but, I must ask Why?  I know
> Scala may be a good thing; but, if it generates Java byte code then
> isn't there an equivalent way to write the same things in Java?
>
> What sort of benefit will we get with the code migrated and written in
> Scala?  Even the author of that article said not many know the inner
> workings of the language....  He was one of the few.
>
> Maybe we could ask or have a poll taken to see how many know Scala in
> the community?
>
> Sorry for my concerns, or if they seem harsh or over-analytical.
>
> Just concerned,
> James
>
> On 1/10/2012 12:20 AM, Jason Baldridge wrote:
> > +1 to this in general, though I'm not into over-architecting things
> > initially. Would be great to get things humming and then start supporting
> > more pluggability.
> >
> > On Sat, Jan 7, 2012 at 7:33 AM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >
> >> On 1/7/12 2:22 PM, Grant Ingersoll wrote:
> >>
> >>> Being able to take advantage of other classifiers seems like it would
> be
> >>> a really nice thing to be able to do.  I'd love to put OpenNLP over
> Mahout
> >>> or others.
> >>>
> >>> Besides, for testing purposes, if you could plugin the existing
> >>> capability versus your new rewrite (in Scala) then you could easily
> compare
> >>> the two.  I can't imagine the abstraction layer is more than a few
> >>> interfaces or abstract classes plus a bit of
> configuration/injection/fill
> >>> in the blank that allows one to specify the implementation.
> >>>
> >> Yes, we need plug-able classifiers and support for extensive
> >> modification/extension of
> >> our existing components. You are welcome to help us with that.
> >>
> >> One way of implementing this is to specify a (optional) factory class
> >> during training
> >> which is used to create a model (classifier). A second type of factory
> >> class could
> >> be specified to modify a component.
> >>
> >> These factory class names will be stored in our zip model package, and
> can
> >> then be used to instantiated the extensions which are necessary to run
> the
> >> component.
> >>
> >> The disadvantage of this approach is that it might not work well with
> OSGi.
> >> A big advantage is that OpenNLP itself will take care of configuring
> >> everything
> >> and the code needed to run an OpenNLP component is identical, even if
> the
> >> model
> >> uses "custom" extensions. These must only be on the class path.
> >>
> >> Jörn
> >>
> >
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by James Kosin <ja...@gmail.com>.
Everyone,

+1
I'm okay with going forward with this; but, I must ask Why?  I know
Scala may be a good thing; but, if it generates Java byte code then
isn't there an equivalent way to write the same things in Java?

What sort of benefit will we get with the code migrated and written in
Scala?  Even the author of that article said not many know the inner
workings of the language....  He was one of the few.

Maybe we could ask or have a poll taken to see how many know Scala in
the community?

Sorry for my concerns, or if they seem harsh or over-analytical.

Just concerned,
James

On 1/10/2012 12:20 AM, Jason Baldridge wrote:
> +1 to this in general, though I'm not into over-architecting things
> initially. Would be great to get things humming and then start supporting
> more pluggability.
>
> On Sat, Jan 7, 2012 at 7:33 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 1/7/12 2:22 PM, Grant Ingersoll wrote:
>>
>>> Being able to take advantage of other classifiers seems like it would be
>>> a really nice thing to be able to do.  I'd love to put OpenNLP over Mahout
>>> or others.
>>>
>>> Besides, for testing purposes, if you could plugin the existing
>>> capability versus your new rewrite (in Scala) then you could easily compare
>>> the two.  I can't imagine the abstraction layer is more than a few
>>> interfaces or abstract classes plus a bit of configuration/injection/fill
>>> in the blank that allows one to specify the implementation.
>>>
>> Yes, we need plug-able classifiers and support for extensive
>> modification/extension of
>> our existing components. You are welcome to help us with that.
>>
>> One way of implementing this is to specify a (optional) factory class
>> during training
>> which is used to create a model (classifier). A second type of factory
>> class could
>> be specified to modify a component.
>>
>> These factory class names will be stored in our zip model package, and can
>> then be used to instantiated the extensions which are necessary to run the
>> component.
>>
>> The disadvantage of this approach is that it might not work well with OSGi.
>> A big advantage is that OpenNLP itself will take care of configuring
>> everything
>> and the code needed to run an OpenNLP component is identical, even if the
>> model
>> uses "custom" extensions. These must only be on the class path.
>>
>> Jörn
>>
>
>


Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
+1 to this in general, though I'm not into over-architecting things
initially. Would be great to get things humming and then start supporting
more pluggability.

On Sat, Jan 7, 2012 at 7:33 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 1/7/12 2:22 PM, Grant Ingersoll wrote:
>
>> Being able to take advantage of other classifiers seems like it would be
>> a really nice thing to be able to do.  I'd love to put OpenNLP over Mahout
>> or others.
>>
>> Besides, for testing purposes, if you could plugin the existing
>> capability versus your new rewrite (in Scala) then you could easily compare
>> the two.  I can't imagine the abstraction layer is more than a few
>> interfaces or abstract classes plus a bit of configuration/injection/fill
>> in the blank that allows one to specify the implementation.
>>
>
> Yes, we need plug-able classifiers and support for extensive
> modification/extension of
> our existing components. You are welcome to help us with that.
>
> One way of implementing this is to specify a (optional) factory class
> during training
> which is used to create a model (classifier). A second type of factory
> class could
> be specified to modify a component.
>
> These factory class names will be stored in our zip model package, and can
> then be used to instantiated the extensions which are necessary to run the
> component.
>
> The disadvantage of this approach is that it might not work well with OSGi.
> A big advantage is that OpenNLP itself will take care of configuring
> everything
> and the code needed to run an OpenNLP component is identical, even if the
> model
> uses "custom" extensions. These must only be on the class path.
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by Jörn Kottmann <ko...@gmail.com>.
On 1/7/12 2:22 PM, Grant Ingersoll wrote:
> Being able to take advantage of other classifiers seems like it would be a really nice thing to be able to do.  I'd love to put OpenNLP over Mahout or others.
>
> Besides, for testing purposes, if you could plugin the existing capability versus your new rewrite (in Scala) then you could easily compare the two.  I can't imagine the abstraction layer is more than a few interfaces or abstract classes plus a bit of configuration/injection/fill in the blank that allows one to specify the implementation.

Yes, we need plug-able classifiers and support for extensive 
modification/extension of
our existing components. You are welcome to help us with that.

One way of implementing this is to specify a (optional) factory class 
during training
which is used to create a model (classifier). A second type of factory 
class could
be specified to modify a component.

These factory class names will be stored in our zip model package, and can
then be used to instantiated the extensions which are necessary to run the
component.

The disadvantage of this approach is that it might not work well with OSGi.
A big advantage is that OpenNLP itself will take care of configuring 
everything
and the code needed to run an OpenNLP component is identical, even if 
the model
uses "custom" extensions. These must only be on the class path.

Jörn

Re: using Scala for opennlp.ml

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
>> Having some sort of plugin architecture would be possible and probably
>> quite nice, though it probably would not be my first priority.
>
> Being able to take advantage of other classifiers seems like it would be a really nice thing to be able to do.  I'd love to put OpenNLP over Mahout or others.
>
> Besides, for testing purposes, if you could plugin the existing capability versus your new rewrite (in Scala) then you could easily compare the two.  I can't imagine the abstraction layer is more than a few interfaces or abstract classes plus a bit of configuration/injection/fill in the blank that allows one to specify the implementation.
+1 for all of the above.

Aliaksandr

Re: using Scala for opennlp.ml

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 6, 2012, at 9:46 AM, Jason Baldridge wrote:
> 
> Having some sort of plugin architecture would be possible and probably
> quite nice, though it probably would not be my first priority.

Being able to take advantage of other classifiers seems like it would be a really nice thing to be able to do.  I'd love to put OpenNLP over Mahout or others.

Besides, for testing purposes, if you could plugin the existing capability versus your new rewrite (in Scala) then you could easily compare the two.  I can't imagine the abstraction layer is more than a few interfaces or abstract classes plus a bit of configuration/injection/fill in the blank that allows one to specify the implementation.

-Grant

Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
Thanks everyone, for your feedback.

Doing a mixed Scala/Java build actually doesn't add much complexity, in my
experience. Using SBT and/or the Scala IDE for Eclipse, it all works very
straightforwardly. Having said that, I'd be happy with a pure Scala package
that provides a good Java API.

I'm relatively new to Eclipse, but have been using it quite happily for
Scala+Java development for the past month. I also find that Scala makes one
less dependent on an IDE than Java, mainly because code is much more
concise and one can have multiple classes per file - so you don't have to
navigate around as much code in so many files.

As for developers, it sounds like we actually have some critical mass here,
and basically all my students are using Scala. I personally am much more
likely to contribute more code if I can do so in Scala, both because I far
prefer it and because I'll be creating Scala code for my class this
semester that I could ideally do in opennlp.ml.

Having some sort of plugin architecture would be possible and probably
quite nice, though it probably would not be my first priority.

Jason

On Fri, Jan 6, 2012 at 6:19 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> I still don't like mixing Java and Scala. If we do a complete rewrite
> of the perceptron and maxent implementation Scala would be
> an option. If we just do a little Scala and transform some of the
> existing classes it doesn't seem reasonable to me to add all the
> complexity to the build.
>
> Jörn
>
>
> On 1/6/12 10:50 AM, Olivier Grisel wrote:
>
>> My 2 cents:
>>
>> As a potential contributor to the opennlp.ml package with a good
>> background in machine learning, I would say Scala is not a barrier for
>> me (even if I don't use it often right now). Maybe even the opposite
>> as I find coding in Scala more fun than Java. Profiling and perf
>> tuning can be a bit harder though.
>>
>> The most important drawback I had against Scala in the past was the
>> poor / buggy Eclipse support for Scala which made it painful to work
>> with multi-language (Java + Scala) projects but the situation has very
>> much improved over the past 2 years.
>>
>>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: using Scala for opennlp.ml

Posted by Jörn Kottmann <ko...@gmail.com>.
I still don't like mixing Java and Scala. If we do a complete rewrite
of the perceptron and maxent implementation Scala would be
an option. If we just do a little Scala and transform some of the
existing classes it doesn't seem reasonable to me to add all the
complexity to the build.

Jörn

On 1/6/12 10:50 AM, Olivier Grisel wrote:
> My 2 cents:
>
> As a potential contributor to the opennlp.ml package with a good
> background in machine learning, I would say Scala is not a barrier for
> me (even if I don't use it often right now). Maybe even the opposite
> as I find coding in Scala more fun than Java. Profiling and perf
> tuning can be a bit harder though.
>
> The most important drawback I had against Scala in the past was the
> poor / buggy Eclipse support for Scala which made it painful to work
> with multi-language (Java + Scala) projects but the situation has very
> much improved over the past 2 years.
>


Re: using Scala for opennlp.ml

Posted by Olivier Grisel <ol...@ensta.org>.
My 2 cents:

As a potential contributor to the opennlp.ml package with a good
background in machine learning, I would say Scala is not a barrier for
me (even if I don't use it often right now). Maybe even the opposite
as I find coding in Scala more fun than Java. Profiling and perf
tuning can be a bit harder though.

The most important drawback I had against Scala in the past was the
poor / buggy Eclipse support for Scala which made it painful to work
with multi-language (Java + Scala) projects but the situation has very
much improved over the past 2 years.

-- 
Olivier

Re: using Scala for opennlp.ml

Posted by Tommaso Teofili <to...@gmail.com>.
2012/1/4 Grant Ingersoll <gs...@apache.org>

> The big downside I see to Scala here is one of how many current committers
> know it and how many potential contributors know it.  If you are the only
> committer who knows it, that leaves you to do all the bug fixes, etc. until
> you can attract others.
>

I am not a committer but in case you need it I think I may help with Scala
as I have some experience with it in Clerezza and with OpenNLP (I sent some
patches).


>
> A bit off topic, but it seems to me that the ML stuff could be abstracted
> a bit such that different implementations are pluggable.


I think that would be nice too.


> This way, you could go for Scala if you want, but others could plug in
> there own classifiers, etc.  Is that part of this plan?
>
> On Jan 3, 2012, at 12:00 AM, Jason Baldridge wrote:
>
> > That is an interesting post that spurred a lot of discussion months back.
> > David Pollack has a good follow up to that article that goes into some
> more
> > detail about that post:
> >
> > http://goodstuff.im/scala-use-is-less-good-than-java-use-for-at-l
> >
> > The focus is really on the culture of programmers and different types of
> > programmers and which ones Java, PHP, or Scala might be best suited for.
> He
> > ends it with this comment:
> >
> > Oh... and all you wicked smart people who are pushing the boundaries (or
> > think you will) with data size, event frequency and real-time stuff,
> you'll
> > find Scala to be a dream come true and there will be nothing like it that
> > you've ever used (okay, except maybe Haskell).  So, come, build your cool
> > thing on Scala and succeed.
> >
> > That's exactly where opennlp.ml should be. And, it is perfectly
> possible to
> > have such a library be written in Scala but provide a Java API to it,
> like
> > Akka <http://akka.io/> does. (Incidently, Akka is a good reason to use
> > Scala, though one can use it with Java too, just more painful that way.)
> In
> > fact, opennlp.ml would have to be written that way since the first
> "user"
> > of it would be the OpenNLP toolkit.
> >
> > Also, there is a good discussion involving David Pollack and Dick Wall
> here:
> >
> > http://www.infoq.com/articles/barriers-to-scala-adoption
> >
> > Regarding the title of the original blog post ("Yes, Virginia, Scala is
> > hard"), Dick Wall notes that:
> >
> > Yes, but I would choose to expand the title to "Software Development is
> > hard" or perhaps "Vigorous software development is hard". When you set
> out
> > to complete a project or write a system, you have a problem to solve.
> > Chances are that if it is something good and new, it's going to be pretty
> > hard. The complexity of the delivered item will be dictated to some
> degree
> > by the problem to be solved, and that complexity bar will be about the
> same
> > height no matter how you tackle it.
> >
> > Choosing a language with more power is the first way you can get a boost
> on
> > reaching that bar. Choice of libraries is the next, and the remainder you
> > fill in yourself. In Java, the power is (by modern standards) fairly low,
> > leaving a larger gap to reach the bar. Most people fill in with
> libraries,
> > e.g. JPA, Wicket, Spring or perhaps full blown Java EE. These bring their
> > own significant complexity to the project (not to mention their own
> > learning curve). Then the work begins on the final part, the custom work
> > necessary to reach the bar.
> >
> > If you are writing something like a web application, the chances are that
> > the libraries available (of which there are many in Java) will get you
> > almost all the way there, albeit with a significant investment in
> learning
> > the libraries the first time you do it. If the task is something a little
> > less commonplace, perhaps a scientific or mathematical project, or just
> > some totally new idea or approach, you have even more to do. At this
> point
> > you want the most power, flexibility and expressiveness you can get, and
> > that comes back to the language you choose.
> >
> > I have found the value that coding in Scala brings has far outweighed the
> > effort to learn it and its complexities (which I greatly enjoyed learning
> > about, and continue to enjoy learning about). I now pretty much have to
> > stop myself from gagging when forced to write code in Java.
> > David notes in that:
> >
> > If you're doing some form of event processing (trading floor, sports
> > betting, near-real-time data analysis, social networking), Scala is a
> huge
> > win over Java. If you've got complex, distributed systems, Scala and
> > immutability is a huge win. In these scenarios, the costs of using Scala
> > (learning curve, poor tooling, etc.) are small in comparison to the
> > benefits of Scala (immutability, composability, good event processing,
> > excellent libraries/frameworks that provide a starting point for these
> > kinds of systems.)
> >
> > Again, the nature of many machine learning algorithms makes this a good
> > fit. Add to that the existence of systems like Spark and relatively new
> > front-ends for Hadoop such as Scrunch and Scoobi, which makes developing
> > MapReduce algorithms w/ Scala much nicer and far far preferable to the
> pain
> > of coding them up in Java.
> >
> > Note that these discussions are from many months ago. The Scala ecosystem
> > has continued to evolve, including continual improvements to IDE support
> > for Scala development with Eclipse (and probably for Intellij as well).
> >
> > I would also note that Java lacks a truly wonderful feature of languages
> > like Scala, Python, Clojure and others: a REPL that allows you to try out
> > code snippets interactively. This is a great way of testing example code
> > before actually putting it into your system, knowing that it will work.
> > It's also a great teaching tool for people new to the language.
> >
> > FWIW, here's Aliaksandr's example in Scala, which can be tried out in the
> > Scala REPL.
> >
> > (List("I", "l", "v", " ", "P") zip List(" ", "o", "e", "F", "!")) map {
> > case(x,y) => x+y } mkString
> >
> > Once you get functional programming, it is truly painful to do without
> it!
> >
> > Jason
> >
> > On Mon, Jan 2, 2012 at 2:51 PM, Aliaksandr Autayeu
> > <al...@autayeu.com>wrote:
> >
> >> An interesting post about Scala:
> >> http://goodstuff.im/yes-virginia-scala-is-hard
> >>
> >> Jason, by the way, from what I saw following your link, I might love
> Scala
> >> as much as... StringJoin @@ (MapThread[#1 <> #2 &, {{"I", "l", "v", "
> >> ", "P"}, {" ", "o", "e", "F", "!"}}]) (that's in Mathematica language)
> but
> >> I would consider the article above, keeping in mind the audience of the
> >> project.
> >>
> >> Aliaksandr
> >>
> >>
> >> On Thu, Dec 29, 2011 at 6:46 PM, Aliaksandr Autayeu <
> >> aliaksandr@autayeu.com> wrote:
> >>
> >>> IMO, Java advantage is that of being a standard de-facto, with mature
> >>> tools, infrastructure and other things. I'm sure, there are many cool
> toys
> >>> out there (git was a cool toy once, and SVN is still a standard
> de-facto in
> >>> many places), but ease of use of OpenNLP (standard Java without
> excessive
> >>> dependencies + maven) is an enormous advantage of OpenNLP in most of
> my use
> >>> cases.
> >>>
> >>> Aliaksandr
> >>>
> >>> P.S. The link is a nice collection of material!
> >>>
> >>> On Thu, Dec 29, 2011 at 6:27 PM, Jason Baldridge <
> >>> jasonbaldridge@gmail.com> wrote:
> >>>
> >>>> I'd really like to use Scala in the opennlp.ml rewrite, for reasons
> I've
> >>>> already stated on the list. My thinking on this is to do the first
> >>>> reorganization for opennlp.ml in pure Java, make a release, and then
> >>>> starting mixing in Scala. I've been happily mixing Scala and Java on a
> >>>> number of projects without much fuss. However, I do so in the context
> of
> >>>> using SBT (simple build tool), rather than maven (SBT can read Maven
> >>>> declarations, FWIW). It is quite straightforward to use, and I'm now
> >>>> using
> >>>> Eclipse with the Scala IDE for Eclipse to build Java/Scala projects -
> so
> >>>> it
> >>>> should be straightforward for others to get up and running with it.
> >>>>
> >>>> I'd be interested in hearing whether anyone has any particular
> concerns
> >>>> or
> >>>> objections about this plan. Also interested in hearing whether anyone
> is
> >>>> particularly keen on the use of Scala.
> >>>>
> >>>> BTW, if you haven't seen much of Scala before, I have some very gentle
> >>>> introductions (aimed at first time programmers) for getting started
> with
> >>>> it
> >>>> on my blog. You can find links to the posts, plus to lots of other
> >>>> resources here:
> >>>>
> >>>> http://icl-f11.utcompling.com/links
> >>>>
> >>>> --
> >>>> Jason Baldridge
> >>>> Associate Professor, Department of Linguistics
> >>>> The University of Texas at Austin
> >>>> http://www.jasonbaldridge.com
> >>>> http://twitter.com/jasonbaldridge
> >>>>
> >>>
> >>>
> >>
> >
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>

Re: using Scala for opennlp.ml

Posted by Grant Ingersoll <gs...@apache.org>.
The big downside I see to Scala here is one of how many current committers know it and how many potential contributors know it.  If you are the only committer who knows it, that leaves you to do all the bug fixes, etc. until you can attract others.

A bit off topic, but it seems to me that the ML stuff could be abstracted a bit such that different implementations are pluggable.  This way, you could go for Scala if you want, but others could plug in there own classifiers, etc.  Is that part of this plan?  

On Jan 3, 2012, at 12:00 AM, Jason Baldridge wrote:

> That is an interesting post that spurred a lot of discussion months back.
> David Pollack has a good follow up to that article that goes into some more
> detail about that post:
> 
> http://goodstuff.im/scala-use-is-less-good-than-java-use-for-at-l
> 
> The focus is really on the culture of programmers and different types of
> programmers and which ones Java, PHP, or Scala might be best suited for. He
> ends it with this comment:
> 
> Oh... and all you wicked smart people who are pushing the boundaries (or
> think you will) with data size, event frequency and real-time stuff, you'll
> find Scala to be a dream come true and there will be nothing like it that
> you've ever used (okay, except maybe Haskell).  So, come, build your cool
> thing on Scala and succeed.
> 
> That's exactly where opennlp.ml should be. And, it is perfectly possible to
> have such a library be written in Scala but provide a Java API to it, like
> Akka <http://akka.io/> does. (Incidently, Akka is a good reason to use
> Scala, though one can use it with Java too, just more painful that way.) In
> fact, opennlp.ml would have to be written that way since the first "user"
> of it would be the OpenNLP toolkit.
> 
> Also, there is a good discussion involving David Pollack and Dick Wall here:
> 
> http://www.infoq.com/articles/barriers-to-scala-adoption
> 
> Regarding the title of the original blog post ("Yes, Virginia, Scala is
> hard"), Dick Wall notes that:
> 
> Yes, but I would choose to expand the title to "Software Development is
> hard" or perhaps "Vigorous software development is hard". When you set out
> to complete a project or write a system, you have a problem to solve.
> Chances are that if it is something good and new, it's going to be pretty
> hard. The complexity of the delivered item will be dictated to some degree
> by the problem to be solved, and that complexity bar will be about the same
> height no matter how you tackle it.
> 
> Choosing a language with more power is the first way you can get a boost on
> reaching that bar. Choice of libraries is the next, and the remainder you
> fill in yourself. In Java, the power is (by modern standards) fairly low,
> leaving a larger gap to reach the bar. Most people fill in with libraries,
> e.g. JPA, Wicket, Spring or perhaps full blown Java EE. These bring their
> own significant complexity to the project (not to mention their own
> learning curve). Then the work begins on the final part, the custom work
> necessary to reach the bar.
> 
> If you are writing something like a web application, the chances are that
> the libraries available (of which there are many in Java) will get you
> almost all the way there, albeit with a significant investment in learning
> the libraries the first time you do it. If the task is something a little
> less commonplace, perhaps a scientific or mathematical project, or just
> some totally new idea or approach, you have even more to do. At this point
> you want the most power, flexibility and expressiveness you can get, and
> that comes back to the language you choose.
> 
> I have found the value that coding in Scala brings has far outweighed the
> effort to learn it and its complexities (which I greatly enjoyed learning
> about, and continue to enjoy learning about). I now pretty much have to
> stop myself from gagging when forced to write code in Java.
> David notes in that:
> 
> If you're doing some form of event processing (trading floor, sports
> betting, near-real-time data analysis, social networking), Scala is a huge
> win over Java. If you've got complex, distributed systems, Scala and
> immutability is a huge win. In these scenarios, the costs of using Scala
> (learning curve, poor tooling, etc.) are small in comparison to the
> benefits of Scala (immutability, composability, good event processing,
> excellent libraries/frameworks that provide a starting point for these
> kinds of systems.)
> 
> Again, the nature of many machine learning algorithms makes this a good
> fit. Add to that the existence of systems like Spark and relatively new
> front-ends for Hadoop such as Scrunch and Scoobi, which makes developing
> MapReduce algorithms w/ Scala much nicer and far far preferable to the pain
> of coding them up in Java.
> 
> Note that these discussions are from many months ago. The Scala ecosystem
> has continued to evolve, including continual improvements to IDE support
> for Scala development with Eclipse (and probably for Intellij as well).
> 
> I would also note that Java lacks a truly wonderful feature of languages
> like Scala, Python, Clojure and others: a REPL that allows you to try out
> code snippets interactively. This is a great way of testing example code
> before actually putting it into your system, knowing that it will work.
> It's also a great teaching tool for people new to the language.
> 
> FWIW, here's Aliaksandr's example in Scala, which can be tried out in the
> Scala REPL.
> 
> (List("I", "l", "v", " ", "P") zip List(" ", "o", "e", "F", "!")) map {
> case(x,y) => x+y } mkString
> 
> Once you get functional programming, it is truly painful to do without it!
> 
> Jason
> 
> On Mon, Jan 2, 2012 at 2:51 PM, Aliaksandr Autayeu
> <al...@autayeu.com>wrote:
> 
>> An interesting post about Scala:
>> http://goodstuff.im/yes-virginia-scala-is-hard
>> 
>> Jason, by the way, from what I saw following your link, I might love Scala
>> as much as... StringJoin @@ (MapThread[#1 <> #2 &, {{"I", "l", "v", "
>> ", "P"}, {" ", "o", "e", "F", "!"}}]) (that's in Mathematica language) but
>> I would consider the article above, keeping in mind the audience of the
>> project.
>> 
>> Aliaksandr
>> 
>> 
>> On Thu, Dec 29, 2011 at 6:46 PM, Aliaksandr Autayeu <
>> aliaksandr@autayeu.com> wrote:
>> 
>>> IMO, Java advantage is that of being a standard de-facto, with mature
>>> tools, infrastructure and other things. I'm sure, there are many cool toys
>>> out there (git was a cool toy once, and SVN is still a standard de-facto in
>>> many places), but ease of use of OpenNLP (standard Java without excessive
>>> dependencies + maven) is an enormous advantage of OpenNLP in most of my use
>>> cases.
>>> 
>>> Aliaksandr
>>> 
>>> P.S. The link is a nice collection of material!
>>> 
>>> On Thu, Dec 29, 2011 at 6:27 PM, Jason Baldridge <
>>> jasonbaldridge@gmail.com> wrote:
>>> 
>>>> I'd really like to use Scala in the opennlp.ml rewrite, for reasons I've
>>>> already stated on the list. My thinking on this is to do the first
>>>> reorganization for opennlp.ml in pure Java, make a release, and then
>>>> starting mixing in Scala. I've been happily mixing Scala and Java on a
>>>> number of projects without much fuss. However, I do so in the context of
>>>> using SBT (simple build tool), rather than maven (SBT can read Maven
>>>> declarations, FWIW). It is quite straightforward to use, and I'm now
>>>> using
>>>> Eclipse with the Scala IDE for Eclipse to build Java/Scala projects - so
>>>> it
>>>> should be straightforward for others to get up and running with it.
>>>> 
>>>> I'd be interested in hearing whether anyone has any particular concerns
>>>> or
>>>> objections about this plan. Also interested in hearing whether anyone is
>>>> particularly keen on the use of Scala.
>>>> 
>>>> BTW, if you haven't seen much of Scala before, I have some very gentle
>>>> introductions (aimed at first time programmers) for getting started with
>>>> it
>>>> on my blog. You can find links to the posts, plus to lots of other
>>>> resources here:
>>>> 
>>>> http://icl-f11.utcompling.com/links
>>>> 
>>>> --
>>>> Jason Baldridge
>>>> Associate Professor, Department of Linguistics
>>>> The University of Texas at Austin
>>>> http://www.jasonbaldridge.com
>>>> http://twitter.com/jasonbaldridge
>>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: using Scala for opennlp.ml

Posted by Jason Baldridge <ja...@gmail.com>.
That is an interesting post that spurred a lot of discussion months back.
David Pollack has a good follow up to that article that goes into some more
detail about that post:

http://goodstuff.im/scala-use-is-less-good-than-java-use-for-at-l

The focus is really on the culture of programmers and different types of
programmers and which ones Java, PHP, or Scala might be best suited for. He
ends it with this comment:

Oh... and all you wicked smart people who are pushing the boundaries (or
think you will) with data size, event frequency and real-time stuff, you'll
find Scala to be a dream come true and there will be nothing like it that
you've ever used (okay, except maybe Haskell).  So, come, build your cool
thing on Scala and succeed.

That's exactly where opennlp.ml should be. And, it is perfectly possible to
have such a library be written in Scala but provide a Java API to it, like
Akka <http://akka.io/> does. (Incidently, Akka is a good reason to use
Scala, though one can use it with Java too, just more painful that way.) In
fact, opennlp.ml would have to be written that way since the first "user"
of it would be the OpenNLP toolkit.

Also, there is a good discussion involving David Pollack and Dick Wall here:

http://www.infoq.com/articles/barriers-to-scala-adoption

Regarding the title of the original blog post ("Yes, Virginia, Scala is
hard"), Dick Wall notes that:

Yes, but I would choose to expand the title to "Software Development is
hard" or perhaps "Vigorous software development is hard". When you set out
to complete a project or write a system, you have a problem to solve.
Chances are that if it is something good and new, it's going to be pretty
hard. The complexity of the delivered item will be dictated to some degree
by the problem to be solved, and that complexity bar will be about the same
height no matter how you tackle it.

Choosing a language with more power is the first way you can get a boost on
reaching that bar. Choice of libraries is the next, and the remainder you
fill in yourself. In Java, the power is (by modern standards) fairly low,
leaving a larger gap to reach the bar. Most people fill in with libraries,
e.g. JPA, Wicket, Spring or perhaps full blown Java EE. These bring their
own significant complexity to the project (not to mention their own
learning curve). Then the work begins on the final part, the custom work
necessary to reach the bar.

If you are writing something like a web application, the chances are that
the libraries available (of which there are many in Java) will get you
almost all the way there, albeit with a significant investment in learning
the libraries the first time you do it. If the task is something a little
less commonplace, perhaps a scientific or mathematical project, or just
some totally new idea or approach, you have even more to do. At this point
you want the most power, flexibility and expressiveness you can get, and
that comes back to the language you choose.

I have found the value that coding in Scala brings has far outweighed the
effort to learn it and its complexities (which I greatly enjoyed learning
about, and continue to enjoy learning about). I now pretty much have to
stop myself from gagging when forced to write code in Java.
David notes in that:

If you're doing some form of event processing (trading floor, sports
betting, near-real-time data analysis, social networking), Scala is a huge
win over Java. If you've got complex, distributed systems, Scala and
immutability is a huge win. In these scenarios, the costs of using Scala
(learning curve, poor tooling, etc.) are small in comparison to the
benefits of Scala (immutability, composability, good event processing,
excellent libraries/frameworks that provide a starting point for these
kinds of systems.)

Again, the nature of many machine learning algorithms makes this a good
fit. Add to that the existence of systems like Spark and relatively new
front-ends for Hadoop such as Scrunch and Scoobi, which makes developing
MapReduce algorithms w/ Scala much nicer and far far preferable to the pain
of coding them up in Java.

Note that these discussions are from many months ago. The Scala ecosystem
has continued to evolve, including continual improvements to IDE support
for Scala development with Eclipse (and probably for Intellij as well).

I would also note that Java lacks a truly wonderful feature of languages
like Scala, Python, Clojure and others: a REPL that allows you to try out
code snippets interactively. This is a great way of testing example code
before actually putting it into your system, knowing that it will work.
It's also a great teaching tool for people new to the language.

FWIW, here's Aliaksandr's example in Scala, which can be tried out in the
Scala REPL.

(List("I", "l", "v", " ", "P") zip List(" ", "o", "e", "F", "!")) map {
case(x,y) => x+y } mkString

Once you get functional programming, it is truly painful to do without it!

Jason

On Mon, Jan 2, 2012 at 2:51 PM, Aliaksandr Autayeu
<al...@autayeu.com>wrote:

> An interesting post about Scala:
> http://goodstuff.im/yes-virginia-scala-is-hard
>
> Jason, by the way, from what I saw following your link, I might love Scala
> as much as... StringJoin @@ (MapThread[#1 <> #2 &, {{"I", "l", "v", "
> ", "P"}, {" ", "o", "e", "F", "!"}}]) (that's in Mathematica language) but
> I would consider the article above, keeping in mind the audience of the
> project.
>
> Aliaksandr
>
>
> On Thu, Dec 29, 2011 at 6:46 PM, Aliaksandr Autayeu <
> aliaksandr@autayeu.com> wrote:
>
>> IMO, Java advantage is that of being a standard de-facto, with mature
>> tools, infrastructure and other things. I'm sure, there are many cool toys
>> out there (git was a cool toy once, and SVN is still a standard de-facto in
>> many places), but ease of use of OpenNLP (standard Java without excessive
>> dependencies + maven) is an enormous advantage of OpenNLP in most of my use
>> cases.
>>
>> Aliaksandr
>>
>> P.S. The link is a nice collection of material!
>>
>> On Thu, Dec 29, 2011 at 6:27 PM, Jason Baldridge <
>> jasonbaldridge@gmail.com> wrote:
>>
>>> I'd really like to use Scala in the opennlp.ml rewrite, for reasons I've
>>> already stated on the list. My thinking on this is to do the first
>>> reorganization for opennlp.ml in pure Java, make a release, and then
>>> starting mixing in Scala. I've been happily mixing Scala and Java on a
>>> number of projects without much fuss. However, I do so in the context of
>>> using SBT (simple build tool), rather than maven (SBT can read Maven
>>> declarations, FWIW). It is quite straightforward to use, and I'm now
>>> using
>>> Eclipse with the Scala IDE for Eclipse to build Java/Scala projects - so
>>> it
>>> should be straightforward for others to get up and running with it.
>>>
>>> I'd be interested in hearing whether anyone has any particular concerns
>>> or
>>> objections about this plan. Also interested in hearing whether anyone is
>>> particularly keen on the use of Scala.
>>>
>>> BTW, if you haven't seen much of Scala before, I have some very gentle
>>> introductions (aimed at first time programmers) for getting started with
>>> it
>>> on my blog. You can find links to the posts, plus to lots of other
>>> resources here:
>>>
>>> http://icl-f11.utcompling.com/links
>>>
>>> --
>>> Jason Baldridge
>>> Associate Professor, Department of Linguistics
>>> The University of Texas at Austin
>>> http://www.jasonbaldridge.com
>>> http://twitter.com/jasonbaldridge
>>>
>>
>>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge