You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nlpcraft.apache.org by Kamov Sergey <sk...@gmail.com> on 2022/06/04 14:56:30 UTC

Next NlpCraft release

Hi
I want to enumerate next NlpCraft release changes.

Main goals of next release:
  - Simplifying of the system usage.
  - Narrowing of focus - NLP, deleting all unrelated, auxiliary components.
  - Possibility of custom multi-language support.
  - Simplifying of code, technical debt minimization.

1. Removed
  - Client-server approach components, servers cluster support.
  - Any database usage.
  - CLI management console.
  - Docker related stuff.
  - Complex semantic components support.
After these changes NlpCraft becomes simple library with java API.

2.Added and changed
All components plugability support added, including such base as 
tokenizer etc, with EN default implementations of all of them.
Note, that components testability was also significantly simplified, 
which is especially useful for user custom components.

As results - all goals seem in general achieved.
Code, including examples on different languages (EN, FR, RU) are 
accessible in `master` branch.
Th best way to look at the code and review API, components work - start 
and debug 'light-switch' example, EN and FR versions.

Remained tasks: some additional examples, user API clarifying, 
documentation.

Please ask the questions if you have.


Regards,

Sergey Kamov

Re: Next NlpCraft release

Posted by Rahul Padmanabhan <ra...@mail.concordia.ca>.
+1 for the choice of only Scala support.

I’m in the field and using Java for NLP is very rare.

-Rahul Padmanabhan

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Kamov Sergey <sk...@gmail.com>
Sent: Tuesday, June 7, 2022 3:49:47 PM
To: dev@nlpcraft.apache.org <de...@nlpcraft.apache.org>
Subject: Re: Next NlpCraft release

Hi!

All google requests like "NLP libraries" return that most popular is
Python (out of competition )

First result for me

https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.upgrad.com%2Fblog%2Fpython-nlp-libraries-and-applications%2F&amp;data=05%7C01%7Crahul.padmanabhan%40mail.concordia.ca%7Cf2d18957237f4d2f2cc908da48bee35c%7C5569f185d22f4e139850ce5b1abcd2e8%7C0%7C0%7C637902281938710261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=M7ATAzdbNT1z0xhdnAupYwNNVxsyg9aPlfORPLs91cU%3D&amp;reserved=0
https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmedium.com%2Fnlplanet%2Fawesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b&amp;data=05%7C01%7Crahul.padmanabhan%40mail.concordia.ca%7Cf2d18957237f4d2f2cc908da48bee35c%7C5569f185d22f4e139850ce5b1abcd2e8%7C0%7C0%7C637902281938710261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=00R09BnJDCDbavfJqduDKX6GVeTaWfZqTBdwMVXktQU%3D&amp;reserved=0

Java is mentioned for Stanford, sometimes Apache openNlp


Regards,

Sergey

On 07.06.2022 19:21, Furkan KAMACI wrote:
> Hi Sergey,
>
> Is there any survey about which programming languages popular among NLP
> developers?
>
> Kind Regards,
> Furkan KAMACI
>
> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>  wrote:
>
>> Hi
>>
>> One more important thing. We want to support Scala API only for next
>> library’s version.
>> Now seems better to narrow this technological focus too.Current
>> approach, java API and Scala implementation, provoke a lot of technical
>> compromises (collections conversion, performance issues etc)
>> But at the same time, support of java API also doesn’t give us
>> significant benefits, because Java is not so popular among NLP
>> engineers.Focus on Scala allows to have more elegant user API and
>> implementation, also we can promote this solution for members of not so
>> big but active Scala community.
>> If library is successful we always can add java API support again over
>> Scala layer.
>>
>> Regards,
>>
>> Sergey Kamov
>>
>>
>> On 04.06.2022 17:56, Kamov Sergey wrote:
>>> Hi
>>> I want to enumerate next NlpCraft release changes.
>>>
>>> Main goals of next release:
>>>   - Simplifying of the system usage.
>>>   - Narrowing of focus - NLP, deleting all unrelated, auxiliary
>> components.
>>>   - Possibility of custom multi-language support.
>>>   - Simplifying of code, technical debt minimization.
>>>
>>> 1. Removed
>>>   - Client-server approach components, servers cluster support.
>>>   - Any database usage.
>>>   - CLI management console.
>>>   - Docker related stuff.
>>>   - Complex semantic components support.
>>> After these changes NlpCraft becomes simple library with java API.
>>>
>>> 2.Added and changed
>>> All components plugability support added, including such base as
>>> tokenizer etc, with EN default implementations of all of them.
>>> Note, that components testability was also significantly simplified,
>>> which is especially useful for user custom components.
>>>
>>> As results - all goals seem in general achieved.
>>> Code, including examples on different languages (EN, FR, RU) are
>>> accessible in `master` branch.
>>> Th best way to look at the code and review API, components work -
>>> start and debug 'light-switch' example, EN and FR versions.
>>>
>>> Remained tasks: some additional examples, user API clarifying,
>>> documentation.
>>>
>>> Please ask the questions if you have.
>>>
>>>
>>> Regards,
>>>
>>> Sergey Kamov
>>>

Re: Next NlpCraft release

Posted by Rahul Padmanabhan <ra...@mail.concordia.ca>.
I wanted to bring this up to the group regarding how we can possibly 
benefit further from the architecture choices that are being made.

Given, Apache Spark is huge at the moment. There is Spark NLP 
(https://nlp.johnsnowlabs.com/) which I think we can leverage by, making 
NLPCraft use Spark NLP as an option (such as, using the Tokenizer, NER 
options etc.). There are a lot of good features (and a lot of 
pre-trained models) currently available on Spark NLP and it is easily 
available via Scala. While our current approach can be packaged and made 
to be distributed in terms of computing, I think that we may be able to 
leverage a lot of the features from Spark NLP that may make NLPCraft 
take off in use.

A solid use case (imo) which I have seen, especially in the financial 
world is that, there usually exists a lot of text stored in datalakes 
(emails, chat conversations etc.) and the intent etc. is needed. Such 
as, tracking trader emails to see if there is malicious intent. One of 
the ways this can be achieved is, via IDL (and more) using NLPCraft.

Just my thoughts on this.

-Rahul Padmanabhan


On 6/10/22 11:20, Nikita Ivanov wrote:
> Let me step in here since I was the one advocating the team for many of
> these changes...
>
> There are three major problems with the current approach of having Java
> APIs and Scala-based implementation:
> - It significantly increased the complexity especially on the performance
> side (constant conversion between Java and Scala collections, etc).
> - The Java community is just not interested in NLP in general... It's a bit
> strange to me but the lack of interest in the project can be, at least,
> partially attributed to the focus on Java.
> - Project needs a focused target core group - for example, many GO and Rust
> projects greatly benefited from the interest of their core language groups.
>
> Apache Spark is a prime example: initial focus on & growth from the core
> Scala community with Java/Python frontends added later.
>
> My two cents,
> --
> Nikita Ivanov
>
>
>
> On Fri, Jun 10, 2022 at 6:18 AM Paul King <pa...@asert.com.au> wrote:
>
>> Okay, that makes sense. Is there a plan to release the "current
>> master" or is it just a stepping stone to the "next step" which is
>> when the next release will come?
>>
>> Cheers, Paul.
>>
>> On Fri, Jun 10, 2022 at 10:23 PM Kamov Sergey <sk...@gmail.com>
>> wrote:
>>> Sorry for confusing
>>>
>>> - last release (0.9.0) is java client/server system
>>> - current version in 'master' branch (still unreleased) is simple java
>>> API library (without client server)
>>> - next step, which we are discussing, is simple scala API library (like
>>> current master version, but with scala API instead of java)
>>>
>>> Regards,
>>> Sergey
>>>
>>>
>>> On 10.06.2022 15:13, Paul King wrote:
>>>> So, just for my own understanding, is the server Java but the client
>>>> would be Scala?
>>>> Not questioning the decision but the first email in this thread said:
>>>>
>>>>> After these changes NlpCraft becomes simple library with java API.
>>>> I am actually seeing a bit of a renaissance of Java for Data Science
>>>> with numerous new projects like Amazon's DJL opting for Java as the
>>>> base language.
>>>>
>>>> Disclosure, for data science, I mostly use Groovy as a "Python for the
>>>> JVM", so that probably skews the world I see. Most of the NLP folks I
>>>> speak to use Python these days. And concurring with Sergey, Stanford
>>>> and OpenNLP are probably the two more widely used Java libraries I see
>>>> for those folks on the JVM with Datumbox and Smile occasionally used
>>>> as well.
>>>>
>>>> Cheers, Paul.
>>>>
>>>> On Wed, Jun 8, 2022 at 5:49 AM Kamov Sergey<sk...@gmail.com>
>> wrote:
>>>>> Hi!
>>>>>
>>>>> All google requests like "NLP libraries" return that most popular is
>>>>> Python (out of competition )
>>>>>
>>>>> First result for me
>>>>>
>>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.upgrad.com%2Fblog%2Fpython-nlp-libraries-and-applications%2F&amp;data=05%7C01%7Crahul.padmanabhan%40mail.concordia.ca%7C9d9f0ea034a946f2448f08da4af4d3ad%7C5569f185d22f4e139850ce5b1abcd2e8%7C0%7C0%7C637904712629130942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2FWiwZY0avhegaCZsj64%2BYNqTt9p5yIjsiAuhfyMR2cI%3D&amp;reserved=0
>>>>>
>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmedium.com%2Fnlplanet%2Fawesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b&amp;data=05%7C01%7Crahul.padmanabhan%40mail.concordia.ca%7C9d9f0ea034a946f2448f08da4af4d3ad%7C5569f185d22f4e139850ce5b1abcd2e8%7C0%7C0%7C637904712629130942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=TrTaJJ4Ay7S6ZvEdKLw%2BkNgcEFTQ0kYCoYRqon%2BPFlo%3D&amp;reserved=0
>>>>> Java is mentioned for Stanford, sometimes Apache openNlp
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sergey
>>>>>
>>>>> On 07.06.2022 19:21, Furkan KAMACI wrote:
>>>>>> Hi Sergey,
>>>>>>
>>>>>> Is there any survey about which programming languages popular among
>> NLP
>>>>>> developers?
>>>>>>
>>>>>> Kind Regards,
>>>>>> Furkan KAMACI
>>>>>>
>>>>>> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>
>>   wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> One more important thing. We want to support Scala API only for next
>>>>>>> library’s version.
>>>>>>> Now seems better to narrow this technological focus too.Current
>>>>>>> approach, java API and Scala implementation, provoke a lot of
>> technical
>>>>>>> compromises (collections conversion, performance issues etc)
>>>>>>> But at the same time, support of java API also doesn’t give us
>>>>>>> significant benefits, because Java is not so popular among NLP
>>>>>>> engineers.Focus on Scala allows to have more elegant user API and
>>>>>>> implementation, also we can promote this solution for members of
>> not so
>>>>>>> big but active Scala community.
>>>>>>> If library is successful we always can add java API support again
>> over
>>>>>>> Scala layer.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Sergey Kamov
>>>>>>>
>>>>>>>
>>>>>>> On 04.06.2022 17:56, Kamov Sergey wrote:
>>>>>>>> Hi
>>>>>>>> I want to enumerate next NlpCraft release changes.
>>>>>>>>
>>>>>>>> Main goals of next release:
>>>>>>>>     - Simplifying of the system usage.
>>>>>>>>     - Narrowing of focus - NLP, deleting all unrelated, auxiliary
>>>>>>> components.
>>>>>>>>     - Possibility of custom multi-language support.
>>>>>>>>     - Simplifying of code, technical debt minimization.
>>>>>>>>
>>>>>>>> 1. Removed
>>>>>>>>     - Client-server approach components, servers cluster support.
>>>>>>>>     - Any database usage.
>>>>>>>>     - CLI management console.
>>>>>>>>     - Docker related stuff.
>>>>>>>>     - Complex semantic components support.
>>>>>>>> After these changes NlpCraft becomes simple library with java API.
>>>>>>>>
>>>>>>>> 2.Added and changed
>>>>>>>> All components plugability support added, including such base as
>>>>>>>> tokenizer etc, with EN default implementations of all of them.
>>>>>>>> Note, that components testability was also significantly
>> simplified,
>>>>>>>> which is especially useful for user custom components.
>>>>>>>>
>>>>>>>> As results - all goals seem in general achieved.
>>>>>>>> Code, including examples on different languages (EN, FR, RU) are
>>>>>>>> accessible in `master` branch.
>>>>>>>> Th best way to look at the code and review API, components work -
>>>>>>>> start and debug 'light-switch' example, EN and FR versions.
>>>>>>>>
>>>>>>>> Remained tasks: some additional examples, user API clarifying,
>>>>>>>> documentation.
>>>>>>>>
>>>>>>>> Please ask the questions if you have.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Sergey Kamov
>>>>>>>>


Re: Next NlpCraft release

Posted by Nikita Ivanov <ni...@gmail.com>.
Let me step in here since I was the one advocating the team for many of
these changes...

There are three major problems with the current approach of having Java
APIs and Scala-based implementation:
- It significantly increased the complexity especially on the performance
side (constant conversion between Java and Scala collections, etc).
- The Java community is just not interested in NLP in general... It's a bit
strange to me but the lack of interest in the project can be, at least,
partially attributed to the focus on Java.
- Project needs a focused target core group - for example, many GO and Rust
projects greatly benefited from the interest of their core language groups.

Apache Spark is a prime example: initial focus on & growth from the core
Scala community with Java/Python frontends added later.

My two cents,
--
Nikita Ivanov



On Fri, Jun 10, 2022 at 6:18 AM Paul King <pa...@asert.com.au> wrote:

> Okay, that makes sense. Is there a plan to release the "current
> master" or is it just a stepping stone to the "next step" which is
> when the next release will come?
>
> Cheers, Paul.
>
> On Fri, Jun 10, 2022 at 10:23 PM Kamov Sergey <sk...@gmail.com>
> wrote:
> >
> > Sorry for confusing
> >
> > - last release (0.9.0) is java client/server system
> > - current version in 'master' branch (still unreleased) is simple java
> > API library (without client server)
> > - next step, which we are discussing, is simple scala API library (like
> > current master version, but with scala API instead of java)
> >
> > Regards,
> > Sergey
> >
> >
> > On 10.06.2022 15:13, Paul King wrote:
> > > So, just for my own understanding, is the server Java but the client
> > > would be Scala?
> > > Not questioning the decision but the first email in this thread said:
> > >
> > >> After these changes NlpCraft becomes simple library with java API.
> > > I am actually seeing a bit of a renaissance of Java for Data Science
> > > with numerous new projects like Amazon's DJL opting for Java as the
> > > base language.
> > >
> > > Disclosure, for data science, I mostly use Groovy as a "Python for the
> > > JVM", so that probably skews the world I see. Most of the NLP folks I
> > > speak to use Python these days. And concurring with Sergey, Stanford
> > > and OpenNLP are probably the two more widely used Java libraries I see
> > > for those folks on the JVM with Datumbox and Smile occasionally used
> > > as well.
> > >
> > > Cheers, Paul.
> > >
> > > On Wed, Jun 8, 2022 at 5:49 AM Kamov Sergey<sk...@gmail.com>
> wrote:
> > >> Hi!
> > >>
> > >> All google requests like "NLP libraries" return that most popular is
> > >> Python (out of competition )
> > >>
> > >> First result for me
> > >>
> > >> https://www.upgrad.com/blog/python-nlp-libraries-and-applications/
> > >>
> https://medium.com/nlplanet/awesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b
> > >>
> > >> Java is mentioned for Stanford, sometimes Apache openNlp
> > >>
> > >>
> > >> Regards,
> > >>
> > >> Sergey
> > >>
> > >> On 07.06.2022 19:21, Furkan KAMACI wrote:
> > >>> Hi Sergey,
> > >>>
> > >>> Is there any survey about which programming languages popular among
> NLP
> > >>> developers?
> > >>>
> > >>> Kind Regards,
> > >>> Furkan KAMACI
> > >>>
> > >>> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>
>  wrote:
> > >>>
> > >>>> Hi
> > >>>>
> > >>>> One more important thing. We want to support Scala API only for next
> > >>>> library’s version.
> > >>>> Now seems better to narrow this technological focus too.Current
> > >>>> approach, java API and Scala implementation, provoke a lot of
> technical
> > >>>> compromises (collections conversion, performance issues etc)
> > >>>> But at the same time, support of java API also doesn’t give us
> > >>>> significant benefits, because Java is not so popular among NLP
> > >>>> engineers.Focus on Scala allows to have more elegant user API and
> > >>>> implementation, also we can promote this solution for members of
> not so
> > >>>> big but active Scala community.
> > >>>> If library is successful we always can add java API support again
> over
> > >>>> Scala layer.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Sergey Kamov
> > >>>>
> > >>>>
> > >>>> On 04.06.2022 17:56, Kamov Sergey wrote:
> > >>>>> Hi
> > >>>>> I want to enumerate next NlpCraft release changes.
> > >>>>>
> > >>>>> Main goals of next release:
> > >>>>>    - Simplifying of the system usage.
> > >>>>>    - Narrowing of focus - NLP, deleting all unrelated, auxiliary
> > >>>> components.
> > >>>>>    - Possibility of custom multi-language support.
> > >>>>>    - Simplifying of code, technical debt minimization.
> > >>>>>
> > >>>>> 1. Removed
> > >>>>>    - Client-server approach components, servers cluster support.
> > >>>>>    - Any database usage.
> > >>>>>    - CLI management console.
> > >>>>>    - Docker related stuff.
> > >>>>>    - Complex semantic components support.
> > >>>>> After these changes NlpCraft becomes simple library with java API.
> > >>>>>
> > >>>>> 2.Added and changed
> > >>>>> All components plugability support added, including such base as
> > >>>>> tokenizer etc, with EN default implementations of all of them.
> > >>>>> Note, that components testability was also significantly
> simplified,
> > >>>>> which is especially useful for user custom components.
> > >>>>>
> > >>>>> As results - all goals seem in general achieved.
> > >>>>> Code, including examples on different languages (EN, FR, RU) are
> > >>>>> accessible in `master` branch.
> > >>>>> Th best way to look at the code and review API, components work -
> > >>>>> start and debug 'light-switch' example, EN and FR versions.
> > >>>>>
> > >>>>> Remained tasks: some additional examples, user API clarifying,
> > >>>>> documentation.
> > >>>>>
> > >>>>> Please ask the questions if you have.
> > >>>>>
> > >>>>>
> > >>>>> Regards,
> > >>>>>
> > >>>>> Sergey Kamov
> > >>>>>
>

Re: Next NlpCraft release

Posted by Paul King <pa...@asert.com.au>.
Okay, that makes sense. Is there a plan to release the "current
master" or is it just a stepping stone to the "next step" which is
when the next release will come?

Cheers, Paul.

On Fri, Jun 10, 2022 at 10:23 PM Kamov Sergey <sk...@gmail.com> wrote:
>
> Sorry for confusing
>
> - last release (0.9.0) is java client/server system
> - current version in 'master' branch (still unreleased) is simple java
> API library (without client server)
> - next step, which we are discussing, is simple scala API library (like
> current master version, but with scala API instead of java)
>
> Regards,
> Sergey
>
>
> On 10.06.2022 15:13, Paul King wrote:
> > So, just for my own understanding, is the server Java but the client
> > would be Scala?
> > Not questioning the decision but the first email in this thread said:
> >
> >> After these changes NlpCraft becomes simple library with java API.
> > I am actually seeing a bit of a renaissance of Java for Data Science
> > with numerous new projects like Amazon's DJL opting for Java as the
> > base language.
> >
> > Disclosure, for data science, I mostly use Groovy as a "Python for the
> > JVM", so that probably skews the world I see. Most of the NLP folks I
> > speak to use Python these days. And concurring with Sergey, Stanford
> > and OpenNLP are probably the two more widely used Java libraries I see
> > for those folks on the JVM with Datumbox and Smile occasionally used
> > as well.
> >
> > Cheers, Paul.
> >
> > On Wed, Jun 8, 2022 at 5:49 AM Kamov Sergey<sk...@gmail.com>  wrote:
> >> Hi!
> >>
> >> All google requests like "NLP libraries" return that most popular is
> >> Python (out of competition )
> >>
> >> First result for me
> >>
> >> https://www.upgrad.com/blog/python-nlp-libraries-and-applications/
> >> https://medium.com/nlplanet/awesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b
> >>
> >> Java is mentioned for Stanford, sometimes Apache openNlp
> >>
> >>
> >> Regards,
> >>
> >> Sergey
> >>
> >> On 07.06.2022 19:21, Furkan KAMACI wrote:
> >>> Hi Sergey,
> >>>
> >>> Is there any survey about which programming languages popular among NLP
> >>> developers?
> >>>
> >>> Kind Regards,
> >>> Furkan KAMACI
> >>>
> >>> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>   wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> One more important thing. We want to support Scala API only for next
> >>>> library’s version.
> >>>> Now seems better to narrow this technological focus too.Current
> >>>> approach, java API and Scala implementation, provoke a lot of technical
> >>>> compromises (collections conversion, performance issues etc)
> >>>> But at the same time, support of java API also doesn’t give us
> >>>> significant benefits, because Java is not so popular among NLP
> >>>> engineers.Focus on Scala allows to have more elegant user API and
> >>>> implementation, also we can promote this solution for members of not so
> >>>> big but active Scala community.
> >>>> If library is successful we always can add java API support again over
> >>>> Scala layer.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Sergey Kamov
> >>>>
> >>>>
> >>>> On 04.06.2022 17:56, Kamov Sergey wrote:
> >>>>> Hi
> >>>>> I want to enumerate next NlpCraft release changes.
> >>>>>
> >>>>> Main goals of next release:
> >>>>>    - Simplifying of the system usage.
> >>>>>    - Narrowing of focus - NLP, deleting all unrelated, auxiliary
> >>>> components.
> >>>>>    - Possibility of custom multi-language support.
> >>>>>    - Simplifying of code, technical debt minimization.
> >>>>>
> >>>>> 1. Removed
> >>>>>    - Client-server approach components, servers cluster support.
> >>>>>    - Any database usage.
> >>>>>    - CLI management console.
> >>>>>    - Docker related stuff.
> >>>>>    - Complex semantic components support.
> >>>>> After these changes NlpCraft becomes simple library with java API.
> >>>>>
> >>>>> 2.Added and changed
> >>>>> All components plugability support added, including such base as
> >>>>> tokenizer etc, with EN default implementations of all of them.
> >>>>> Note, that components testability was also significantly simplified,
> >>>>> which is especially useful for user custom components.
> >>>>>
> >>>>> As results - all goals seem in general achieved.
> >>>>> Code, including examples on different languages (EN, FR, RU) are
> >>>>> accessible in `master` branch.
> >>>>> Th best way to look at the code and review API, components work -
> >>>>> start and debug 'light-switch' example, EN and FR versions.
> >>>>>
> >>>>> Remained tasks: some additional examples, user API clarifying,
> >>>>> documentation.
> >>>>>
> >>>>> Please ask the questions if you have.
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Sergey Kamov
> >>>>>

Re: Next NlpCraft release

Posted by Kamov Sergey <sk...@gmail.com>.
Sorry for confusing

- last release (0.9.0) is java client/server system
- current version in 'master' branch (still unreleased) is simple java 
API library (without client server)
- next step, which we are discussing, is simple scala API library (like 
current master version, but with scala API instead of java)

Regards,
Sergey


On 10.06.2022 15:13, Paul King wrote:
> So, just for my own understanding, is the server Java but the client
> would be Scala?
> Not questioning the decision but the first email in this thread said:
>
>> After these changes NlpCraft becomes simple library with java API.
> I am actually seeing a bit of a renaissance of Java for Data Science
> with numerous new projects like Amazon's DJL opting for Java as the
> base language.
>
> Disclosure, for data science, I mostly use Groovy as a "Python for the
> JVM", so that probably skews the world I see. Most of the NLP folks I
> speak to use Python these days. And concurring with Sergey, Stanford
> and OpenNLP are probably the two more widely used Java libraries I see
> for those folks on the JVM with Datumbox and Smile occasionally used
> as well.
>
> Cheers, Paul.
>
> On Wed, Jun 8, 2022 at 5:49 AM Kamov Sergey<sk...@gmail.com>  wrote:
>> Hi!
>>
>> All google requests like "NLP libraries" return that most popular is
>> Python (out of competition )
>>
>> First result for me
>>
>> https://www.upgrad.com/blog/python-nlp-libraries-and-applications/
>> https://medium.com/nlplanet/awesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b
>>
>> Java is mentioned for Stanford, sometimes Apache openNlp
>>
>>
>> Regards,
>>
>> Sergey
>>
>> On 07.06.2022 19:21, Furkan KAMACI wrote:
>>> Hi Sergey,
>>>
>>> Is there any survey about which programming languages popular among NLP
>>> developers?
>>>
>>> Kind Regards,
>>> Furkan KAMACI
>>>
>>> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>   wrote:
>>>
>>>> Hi
>>>>
>>>> One more important thing. We want to support Scala API only for next
>>>> library’s version.
>>>> Now seems better to narrow this technological focus too.Current
>>>> approach, java API and Scala implementation, provoke a lot of technical
>>>> compromises (collections conversion, performance issues etc)
>>>> But at the same time, support of java API also doesn’t give us
>>>> significant benefits, because Java is not so popular among NLP
>>>> engineers.Focus on Scala allows to have more elegant user API and
>>>> implementation, also we can promote this solution for members of not so
>>>> big but active Scala community.
>>>> If library is successful we always can add java API support again over
>>>> Scala layer.
>>>>
>>>> Regards,
>>>>
>>>> Sergey Kamov
>>>>
>>>>
>>>> On 04.06.2022 17:56, Kamov Sergey wrote:
>>>>> Hi
>>>>> I want to enumerate next NlpCraft release changes.
>>>>>
>>>>> Main goals of next release:
>>>>>    - Simplifying of the system usage.
>>>>>    - Narrowing of focus - NLP, deleting all unrelated, auxiliary
>>>> components.
>>>>>    - Possibility of custom multi-language support.
>>>>>    - Simplifying of code, technical debt minimization.
>>>>>
>>>>> 1. Removed
>>>>>    - Client-server approach components, servers cluster support.
>>>>>    - Any database usage.
>>>>>    - CLI management console.
>>>>>    - Docker related stuff.
>>>>>    - Complex semantic components support.
>>>>> After these changes NlpCraft becomes simple library with java API.
>>>>>
>>>>> 2.Added and changed
>>>>> All components plugability support added, including such base as
>>>>> tokenizer etc, with EN default implementations of all of them.
>>>>> Note, that components testability was also significantly simplified,
>>>>> which is especially useful for user custom components.
>>>>>
>>>>> As results - all goals seem in general achieved.
>>>>> Code, including examples on different languages (EN, FR, RU) are
>>>>> accessible in `master` branch.
>>>>> Th best way to look at the code and review API, components work -
>>>>> start and debug 'light-switch' example, EN and FR versions.
>>>>>
>>>>> Remained tasks: some additional examples, user API clarifying,
>>>>> documentation.
>>>>>
>>>>> Please ask the questions if you have.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sergey Kamov
>>>>>

Re: Next NlpCraft release

Posted by Paul King <pa...@asert.com.au>.
So, just for my own understanding, is the server Java but the client
would be Scala?
Not questioning the decision but the first email in this thread said:

> After these changes NlpCraft becomes simple library with java API.

I am actually seeing a bit of a renaissance of Java for Data Science
with numerous new projects like Amazon's DJL opting for Java as the
base language.

Disclosure, for data science, I mostly use Groovy as a "Python for the
JVM", so that probably skews the world I see. Most of the NLP folks I
speak to use Python these days. And concurring with Sergey, Stanford
and OpenNLP are probably the two more widely used Java libraries I see
for those folks on the JVM with Datumbox and Smile occasionally used
as well.

Cheers, Paul.

On Wed, Jun 8, 2022 at 5:49 AM Kamov Sergey <sk...@gmail.com> wrote:
>
> Hi!
>
> All google requests like "NLP libraries" return that most popular is
> Python (out of competition )
>
> First result for me
>
> https://www.upgrad.com/blog/python-nlp-libraries-and-applications/
> https://medium.com/nlplanet/awesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b
>
> Java is mentioned for Stanford, sometimes Apache openNlp
>
>
> Regards,
>
> Sergey
>
> On 07.06.2022 19:21, Furkan KAMACI wrote:
> > Hi Sergey,
> >
> > Is there any survey about which programming languages popular among NLP
> > developers?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>  wrote:
> >
> >> Hi
> >>
> >> One more important thing. We want to support Scala API only for next
> >> library’s version.
> >> Now seems better to narrow this technological focus too.Current
> >> approach, java API and Scala implementation, provoke a lot of technical
> >> compromises (collections conversion, performance issues etc)
> >> But at the same time, support of java API also doesn’t give us
> >> significant benefits, because Java is not so popular among NLP
> >> engineers.Focus on Scala allows to have more elegant user API and
> >> implementation, also we can promote this solution for members of not so
> >> big but active Scala community.
> >> If library is successful we always can add java API support again over
> >> Scala layer.
> >>
> >> Regards,
> >>
> >> Sergey Kamov
> >>
> >>
> >> On 04.06.2022 17:56, Kamov Sergey wrote:
> >>> Hi
> >>> I want to enumerate next NlpCraft release changes.
> >>>
> >>> Main goals of next release:
> >>>   - Simplifying of the system usage.
> >>>   - Narrowing of focus - NLP, deleting all unrelated, auxiliary
> >> components.
> >>>   - Possibility of custom multi-language support.
> >>>   - Simplifying of code, technical debt minimization.
> >>>
> >>> 1. Removed
> >>>   - Client-server approach components, servers cluster support.
> >>>   - Any database usage.
> >>>   - CLI management console.
> >>>   - Docker related stuff.
> >>>   - Complex semantic components support.
> >>> After these changes NlpCraft becomes simple library with java API.
> >>>
> >>> 2.Added and changed
> >>> All components plugability support added, including such base as
> >>> tokenizer etc, with EN default implementations of all of them.
> >>> Note, that components testability was also significantly simplified,
> >>> which is especially useful for user custom components.
> >>>
> >>> As results - all goals seem in general achieved.
> >>> Code, including examples on different languages (EN, FR, RU) are
> >>> accessible in `master` branch.
> >>> Th best way to look at the code and review API, components work -
> >>> start and debug 'light-switch' example, EN and FR versions.
> >>>
> >>> Remained tasks: some additional examples, user API clarifying,
> >>> documentation.
> >>>
> >>> Please ask the questions if you have.
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Sergey Kamov
> >>>

Re: Next NlpCraft release

Posted by Kamov Sergey <sk...@gmail.com>.
Hi!

All google requests like "NLP libraries" return that most popular is 
Python (out of competition )

First result for me

https://www.upgrad.com/blog/python-nlp-libraries-and-applications/
https://medium.com/nlplanet/awesome-nlp-21-popular-nlp-libraries-of-2022-2e07a914248b

Java is mentioned for Stanford, sometimes Apache openNlp


Regards,

Sergey

On 07.06.2022 19:21, Furkan KAMACI wrote:
> Hi Sergey,
>
> Is there any survey about which programming languages popular among NLP
> developers?
>
> Kind Regards,
> Furkan KAMACI
>
> On 7 Jun 2022 Tue at 17:37 Kamov Sergey<sk...@gmail.com>  wrote:
>
>> Hi
>>
>> One more important thing. We want to support Scala API only for next
>> library’s version.
>> Now seems better to narrow this technological focus too.Current
>> approach, java API and Scala implementation, provoke a lot of technical
>> compromises (collections conversion, performance issues etc)
>> But at the same time, support of java API also doesn’t give us
>> significant benefits, because Java is not so popular among NLP
>> engineers.Focus on Scala allows to have more elegant user API and
>> implementation, also we can promote this solution for members of not so
>> big but active Scala community.
>> If library is successful we always can add java API support again over
>> Scala layer.
>>
>> Regards,
>>
>> Sergey Kamov
>>
>>
>> On 04.06.2022 17:56, Kamov Sergey wrote:
>>> Hi
>>> I want to enumerate next NlpCraft release changes.
>>>
>>> Main goals of next release:
>>>   - Simplifying of the system usage.
>>>   - Narrowing of focus - NLP, deleting all unrelated, auxiliary
>> components.
>>>   - Possibility of custom multi-language support.
>>>   - Simplifying of code, technical debt minimization.
>>>
>>> 1. Removed
>>>   - Client-server approach components, servers cluster support.
>>>   - Any database usage.
>>>   - CLI management console.
>>>   - Docker related stuff.
>>>   - Complex semantic components support.
>>> After these changes NlpCraft becomes simple library with java API.
>>>
>>> 2.Added and changed
>>> All components plugability support added, including such base as
>>> tokenizer etc, with EN default implementations of all of them.
>>> Note, that components testability was also significantly simplified,
>>> which is especially useful for user custom components.
>>>
>>> As results - all goals seem in general achieved.
>>> Code, including examples on different languages (EN, FR, RU) are
>>> accessible in `master` branch.
>>> Th best way to look at the code and review API, components work -
>>> start and debug 'light-switch' example, EN and FR versions.
>>>
>>> Remained tasks: some additional examples, user API clarifying,
>>> documentation.
>>>
>>> Please ask the questions if you have.
>>>
>>>
>>> Regards,
>>>
>>> Sergey Kamov
>>>

Re: Next NlpCraft release

Posted by Nikita Ivanov <mk...@gmail.com>.
My take is that after Python everything else is a very distant second,
third, etc. Java frontend introduced a lot of complexity to NLPCraft
and given the limited amount of resources I'd vote for further
simplification of the project by concentrating (initially) on just one
language for both frontend and backend.

If necessary, a separate Java frontend API can be added later rather simply.

Furthemore, the choice of Scala 3 is the right one in my opinion.

My two cents.

On Tue, Jun 7, 2022 at 9:21 AM Furkan KAMACI <fu...@gmail.com> wrote:
>
> Hi Sergey,
>
> Is there any survey about which programming languages popular among NLP
> developers?
>
> Kind Regards,
> Furkan KAMACI
>
> On 7 Jun 2022 Tue at 17:37 Kamov Sergey <sk...@gmail.com> wrote:
>
> > Hi
> >
> > One more important thing. We want to support Scala API only for next
> > library’s version.
> > Now seems better to narrow this technological focus too.Current
> > approach, java API and Scala implementation, provoke a lot of technical
> > compromises (collections conversion, performance issues etc)
> > But at the same time, support of java API also doesn’t give us
> > significant benefits, because Java is not so popular among NLP
> > engineers.Focus on Scala allows to have more elegant user API and
> > implementation, also we can promote this solution for members of not so
> > big but active Scala community.
> > If library is successful we always can add java API support again over
> > Scala layer.
> >
> > Regards,
> >
> > Sergey Kamov
> >
> >
> > On 04.06.2022 17:56, Kamov Sergey wrote:
> > >
> > > Hi
> > > I want to enumerate next NlpCraft release changes.
> > >
> > > Main goals of next release:
> > >  - Simplifying of the system usage.
> > >  - Narrowing of focus - NLP, deleting all unrelated, auxiliary
> > components.
> > >  - Possibility of custom multi-language support.
> > >  - Simplifying of code, technical debt minimization.
> > >
> > > 1. Removed
> > >  - Client-server approach components, servers cluster support.
> > >  - Any database usage.
> > >  - CLI management console.
> > >  - Docker related stuff.
> > >  - Complex semantic components support.
> > > After these changes NlpCraft becomes simple library with java API.
> > >
> > > 2.Added and changed
> > > All components plugability support added, including such base as
> > > tokenizer etc, with EN default implementations of all of them.
> > > Note, that components testability was also significantly simplified,
> > > which is especially useful for user custom components.
> > >
> > > As results - all goals seem in general achieved.
> > > Code, including examples on different languages (EN, FR, RU) are
> > > accessible in `master` branch.
> > > Th best way to look at the code and review API, components work -
> > > start and debug 'light-switch' example, EN and FR versions.
> > >
> > > Remained tasks: some additional examples, user API clarifying,
> > > documentation.
> > >
> > > Please ask the questions if you have.
> > >
> > >
> > > Regards,
> > >
> > > Sergey Kamov
> > >



-- 
Nikita Ivanov

Re: Next NlpCraft release

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Sergey,

Is there any survey about which programming languages popular among NLP
developers?

Kind Regards,
Furkan KAMACI

On 7 Jun 2022 Tue at 17:37 Kamov Sergey <sk...@gmail.com> wrote:

> Hi
>
> One more important thing. We want to support Scala API only for next
> library’s version.
> Now seems better to narrow this technological focus too.Current
> approach, java API and Scala implementation, provoke a lot of technical
> compromises (collections conversion, performance issues etc)
> But at the same time, support of java API also doesn’t give us
> significant benefits, because Java is not so popular among NLP
> engineers.Focus on Scala allows to have more elegant user API and
> implementation, also we can promote this solution for members of not so
> big but active Scala community.
> If library is successful we always can add java API support again over
> Scala layer.
>
> Regards,
>
> Sergey Kamov
>
>
> On 04.06.2022 17:56, Kamov Sergey wrote:
> >
> > Hi
> > I want to enumerate next NlpCraft release changes.
> >
> > Main goals of next release:
> >  - Simplifying of the system usage.
> >  - Narrowing of focus - NLP, deleting all unrelated, auxiliary
> components.
> >  - Possibility of custom multi-language support.
> >  - Simplifying of code, technical debt minimization.
> >
> > 1. Removed
> >  - Client-server approach components, servers cluster support.
> >  - Any database usage.
> >  - CLI management console.
> >  - Docker related stuff.
> >  - Complex semantic components support.
> > After these changes NlpCraft becomes simple library with java API.
> >
> > 2.Added and changed
> > All components plugability support added, including such base as
> > tokenizer etc, with EN default implementations of all of them.
> > Note, that components testability was also significantly simplified,
> > which is especially useful for user custom components.
> >
> > As results - all goals seem in general achieved.
> > Code, including examples on different languages (EN, FR, RU) are
> > accessible in `master` branch.
> > Th best way to look at the code and review API, components work -
> > start and debug 'light-switch' example, EN and FR versions.
> >
> > Remained tasks: some additional examples, user API clarifying,
> > documentation.
> >
> > Please ask the questions if you have.
> >
> >
> > Regards,
> >
> > Sergey Kamov
> >

Re: Next NlpCraft release

Posted by Kamov Sergey <sk...@gmail.com>.
Hi

One more important thing. We want to support Scala API only for next 
library’s version.
Now seems better to narrow this technological focus too.Current 
approach, java API and Scala implementation, provoke a lot of technical 
compromises (collections conversion, performance issues etc)
But at the same time, support of java API also doesn’t give us 
significant benefits, because Java is not so popular among NLP 
engineers.Focus on Scala allows to have more elegant user API and 
implementation, also we can promote this solution for members of not so 
big but active Scala community.
If library is successful we always can add java API support again over 
Scala layer.

Regards,

Sergey Kamov


On 04.06.2022 17:56, Kamov Sergey wrote:
>
> Hi
> I want to enumerate next NlpCraft release changes.
>
> Main goals of next release:
>  - Simplifying of the system usage.
>  - Narrowing of focus - NLP, deleting all unrelated, auxiliary components.
>  - Possibility of custom multi-language support.
>  - Simplifying of code, technical debt minimization.
>
> 1. Removed
>  - Client-server approach components, servers cluster support.
>  - Any database usage.
>  - CLI management console.
>  - Docker related stuff.
>  - Complex semantic components support.
> After these changes NlpCraft becomes simple library with java API.
>
> 2.Added and changed
> All components plugability support added, including such base as 
> tokenizer etc, with EN default implementations of all of them.
> Note, that components testability was also significantly simplified, 
> which is especially useful for user custom components.
>
> As results - all goals seem in general achieved.
> Code, including examples on different languages (EN, FR, RU) are 
> accessible in `master` branch.
> Th best way to look at the code and review API, components work - 
> start and debug 'light-switch' example, EN and FR versions.
>
> Remained tasks: some additional examples, user API clarifying, 
> documentation.
>
> Please ask the questions if you have.
>
>
> Regards,
>
> Sergey Kamov
>