You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by harish suvarna <hs...@gmail.com> on 2012/07/27 03:56:26 UTC

What is the right dev environment?

Hi,
I am trying to add Chinese language processing using some opensource
segmenters. I had some communication with Rupert. I am attaching Rupert's
suggestions. This way I amy get some more suggestions help as well as
Rupert's ideas get distributed to all.

I am also following Anuj's blog to learn about Stanbol content enhancement
engine development.

I can successfully build Stanbol and play with the default chain.

I am trying to create the eclipse project now. mvn eclipse:eclipse was
successful too. Then I imported the stanbol directory into eclipse
workspace.
In eclipse certain Stanbol projects are in red.

Description    Resource    Path    Location    Type
The project cannot be built until its prerequisite
org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
projects is recommended    org.apache.stanbol.enhancer.ldpath
Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.entityhub.indexing.core is built. Cleaning and building
all projects is recommended
org.apache.stanbol.entityhub.indexing.destination.solryard
Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.entityhub.core is built. Cleaning and building all
projects is recommended    org.apache.stanbol.entityhub.query.clerezza
    Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.entityhub.core is built. Cleaning and building all
projects is recommended    org.apache.stanbol.entityhub.ldpath
Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
projects is recommended    org.apache.stanbol.enhancer.rdfentities
Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
projects is recommended    org.apache.stanbol.enhancer.test
Unknown    Java Problem
The project cannot be built until its prerequisite
org.apache.stanbol.entityhub.core is built. Cleaning and building all
projects is recommended    org.apache.stanbol.entityhub.site.managed
Unknown    Java Problem
....
...

Any extra steps are needed?
Should I try to build and debug inside eclipse or build using mvn and debug
in eclipse? What developers do in common?

-harish



================================================Previous
communication================================================
Hi,

There are no NER (Named Entity Recognition) models for Chinese text
available via OpenNLP. So the default configuration of Stanbol will
not process Chinese text. What you can do is to configure a
KeywordLinking Engine for Chinese text as this engine can also process
in unknown languages (see [1] for details).

However also the KeywordLinking Engine requires at least n tokenizer
for looking up Words. As there is no specific Tokenizer for OpenNLP
Chinese text it will use the default one that uses a fixed set of
chars to split words (white spaces, hyphens ...). You may better how
well this would work with Chinese texts. My assumption would be that
it is not sufficient - so results will be sub-optimal.

To apply Chinese optimization I see three possibilities:

1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection,
POS tagging, Named Entity Detection)
2. allow the KeywordLinkingEngine to use other already available tools
for text processing (e.g. stuff that is already available for
Solr/Lucene [2] or the paoding chinese segment or referenced in you
mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP,
because representing Tokens, POS ... as RDF would be to much of an
overhead.
3. implement a new EnhancementEngine for processing Chinese text.

Hope this helps to get you started.

best
Rupert

[1] http://incubator.apache.org/stanbol/docs/trunk/multilingual.html
[2]
http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean
harish suvarna
6:33 PM (22 minutes ago)

to Rupert
Thanks a lot Rupert.

I am weighing between options 2 and 3. What is the difference? Optiion 2
sounds like enhancing KeyWordLinkingEngine to deal with chinese text. It
may be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is
like a separate engine. But will I be able to use the stanbol dbpedia
lookup using option 3?

Btw, I created my own enhancement engine chains and I could see them
yesterday in localhost:8080. But today all of them have vanished and only
the default chain shows up. Can I dig them up somewhere in the stanbol
directory?

-harish

I just created the eclipse project

Re: What is the right dev environment?

Posted by Sergio Fernández <se...@salzburgresearch.at>.
Rupert, eclipse:* task has been deprecated in maven, since m2eclipse 
brings a much better support (so only refresh may be required sometimes):

   http://www.sonatype.org/m2eclipse/

Cheers,

On 07/27/2012 07:30 AM, Rupert Westenthaler wrote:
> If I have problems
>
> 1.  mvn eclipse:clean eclipse:eclipse
> 2. refreshing all projects in eclipse
> 3. full project>  clean

-- 
Sergio Fernández
Salzburg Research
+43 662 2288 318
Jakob-Haringer Strasse 5/II
A-5020 Salzburg (Austria)
http://www.salzburgresearch.at

Re: What is the right dev environment?

Posted by harish suvarna <hs...@gmail.com>.
I am stuck on integrating external jars into stanbol dev environment.
I have two jar files. One for langdetection langdetect.jar and one for
paoding.jar.
I wrote a small enhancer engine (just a replica of existing langid for the
sake of my learning). I could successfully compile it using mvn but the
test files inside the engine fail trying to use a class from
langdetect.jar. I tried various techniques in the pon.xml file of langid
engine.

 <dependency>
      <groupId>com.adobe.g11n</groupId>
      <artifactId>langdetect</artifactId>
      <version>1.0</version>
    </dependency>

I made sure that my local .m2 repo has this jar file using mvn
install-file: command.

I also tried the system scope
 <dependency>
            <groupId>com.adobe.g11n</groupId>
            <artifactId>langdetect</artifactId>
            <version>1.0</version>
            <scope>system</scope>
           <systemPath>${basedir}/src/lib/langdetect.jar</systemPath>
</dependency>

I do see an artifact related warning

Trying to get manifest from artifact
/Users/harishs/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[DEBUG] Artifact has no service component entry in manifest
/Users/harishs/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[DEBUG] Trying to get scrinfo from artifact
/Users/harishs/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[DEBUG] Artifact has no scrinfo file (it's optional):
/Users/harishs/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
[DEBUG] Trying to get manifest from artifact /Users/harishs/langdetect.jar
[DEBUG] Artifact has no service component entry in manifest
/Users/harishs/langdetect.jar
[DEBUG] Trying to get scrinfo from artifact /Users/harishs/langdetect.jar
[DEBUG] Artifact has no scrinfo file (it's optional):
/Users/harishs/langdetect.jar

any clues would be great.

-harish


On Sun, Jul 29, 2012 at 6:04 AM, harish suvarna <hs...@gmail.com> wrote:

> I really appreciate your help and thank you very much Rupert.
>
> All the steps helped me and I am up and debugging in Eclipse.
>
> Thanks,
> Harish
>
>
> On Thu, Jul 26, 2012 at 10:30 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi,
>>
>> I do use Eclipse and I usually do not care about classpath related
>> build problems in Eclipse as long as code suggestions do still work.
>>
>> If I have problems
>>
>> 1.  mvn eclipse:clean eclipse:eclipse
>> 2. refreshing all projects in eclipse
>> 3. full project > clean
>>
>> usually solves those problems. NOTE that only calling "mvn
>> eclipse:eclipse" may not solve problems as it only adds new stuff to
>> the project files but does not remove old one. Note that I do prefer
>> to NOT use any Eclipse maven plugin as I had bad experiences with
>> those. However those cases where about two years ago so such tools
>> might have improved in the meantime.
>>
>> For Debugging I do use Eclipse:
>>
>> Unit tests work fine within eclipse. If I want to debug a component
>> within a Stanbol Server I do the following
>>
>> 1. Start the Stanbol Server in debug mode
>>
>>     java -Xmx1024m -XX:MaxPermSize=256m \
>>         -Xdebug
>> -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n \
>>         -jar
>> org.apache.stanbol.launchers.full-0.10.0-incubating-SNAPSHOT.jar
>>
>> 2. connect Eclipse to the Stanbol Server:
>>     * Debug Configurations > Remote Java Application >> create new
>>     * Socket Attach
>>     * Host: localhost and Port as specified with address (8787 in the
>> example above)
>>
>> 3. using the sling installer maven plugin to install/update the module
>> with the component I am working on
>>
>>     mvn clean install -PinstallBundle
>> -Dsling.url=http://localhost:8080/system/console
>>
>>     * Make sure to "disconnect" the debugger before calling this as
>> the debugging might interfere with update process of the module
>>
>> hope this helps
>> best
>> Rupert
>>
>> On Fri, Jul 27, 2012 at 3:56 AM, harish suvarna <hs...@gmail.com>
>> wrote:
>> > Hi,
>> > I am trying to add Chinese language processing using some opensource
>> > segmenters. I had some communication with Rupert. I am attaching
>> Rupert's
>> > suggestions. This way I amy get some more suggestions help as well as
>> > Rupert's ideas get distributed to all.
>> >
>> > I am also following Anuj's blog to learn about Stanbol content
>> enhancement
>> > engine development.
>> >
>> > I can successfully build Stanbol and play with the default chain.
>> >
>> > I am trying to create the eclipse project now. mvn eclipse:eclipse was
>> > successful too. Then I imported the stanbol directory into eclipse
>> > workspace.
>> > In eclipse certain Stanbol projects are in red.
>> >
>> > Description    Resource    Path    Location    Type
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
>> all
>> > projects is recommended    org.apache.stanbol.enhancer.ldpath
>> > Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.entityhub.indexing.core is built. Cleaning and
>> building
>> > all projects is recommended
>> > org.apache.stanbol.entityhub.indexing.destination.solryard
>> > Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
>> > projects is recommended    org.apache.stanbol.entityhub.query.clerezza
>> >     Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
>> > projects is recommended    org.apache.stanbol.entityhub.ldpath
>> > Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
>> all
>> > projects is recommended    org.apache.stanbol.enhancer.rdfentities
>> > Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
>> all
>> > projects is recommended    org.apache.stanbol.enhancer.test
>> > Unknown    Java Problem
>> > The project cannot be built until its prerequisite
>> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
>> > projects is recommended    org.apache.stanbol.entityhub.site.managed
>> > Unknown    Java Problem
>> > ....
>> > ...
>> >
>> > Any extra steps are needed?
>> > Should I try to build and debug inside eclipse or build using mvn and
>> debug
>> > in eclipse? What developers do in common?
>> >
>> > -harish
>> >
>> >
>> >
>> > ================================================Previous
>> > communication================================================
>> > Hi,
>> >
>> > There are no NER (Named Entity Recognition) models for Chinese text
>> > available via OpenNLP. So the default configuration of Stanbol will
>> > not process Chinese text. What you can do is to configure a
>> > KeywordLinking Engine for Chinese text as this engine can also process
>> > in unknown languages (see [1] for details).
>> >
>> > However also the KeywordLinking Engine requires at least n tokenizer
>> > for looking up Words. As there is no specific Tokenizer for OpenNLP
>> > Chinese text it will use the default one that uses a fixed set of
>> > chars to split words (white spaces, hyphens ...). You may better how
>> > well this would work with Chinese texts. My assumption would be that
>> > it is not sufficient - so results will be sub-optimal.
>> >
>> > To apply Chinese optimization I see three possibilities:
>> >
>> > 1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection,
>> > POS tagging, Named Entity Detection)
>> > 2. allow the KeywordLinkingEngine to use other already available tools
>> > for text processing (e.g. stuff that is already available for
>> > Solr/Lucene [2] or the paoding chinese segment or referenced in you
>> > mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP,
>> > because representing Tokens, POS ... as RDF would be to much of an
>> > overhead.
>> > 3. implement a new EnhancementEngine for processing Chinese text.
>> >
>> > Hope this helps to get you started.
>> >
>> > best
>> > Rupert
>> >
>> > [1] http://incubator.apache.org/stanbol/docs/trunk/multilingual.html
>> > [2]
>> >
>> http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean
>> > harish suvarna
>> > 6:33 PM (22 minutes ago)
>> >
>> > to Rupert
>> > Thanks a lot Rupert.
>> >
>> > I am weighing between options 2 and 3. What is the difference? Optiion 2
>> > sounds like enhancing KeyWordLinkingEngine to deal with chinese text. It
>> > may be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is
>> > like a separate engine. But will I be able to use the stanbol dbpedia
>> > lookup using option 3?
>> >
>> > Btw, I created my own enhancement engine chains and I could see them
>> > yesterday in localhost:8080. But today all of them have vanished and
>> only
>> > the default chain shows up. Can I dig them up somewhere in the stanbol
>> > directory?
>> >
>> > -harish
>> >
>> > I just created the eclipse project
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: What is the right dev environment?

Posted by harish suvarna <hs...@gmail.com>.
I really appreciate your help and thank you very much Rupert.

All the steps helped me and I am up and debugging in Eclipse.

Thanks,
Harish

On Thu, Jul 26, 2012 at 10:30 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi,
>
> I do use Eclipse and I usually do not care about classpath related
> build problems in Eclipse as long as code suggestions do still work.
>
> If I have problems
>
> 1.  mvn eclipse:clean eclipse:eclipse
> 2. refreshing all projects in eclipse
> 3. full project > clean
>
> usually solves those problems. NOTE that only calling "mvn
> eclipse:eclipse" may not solve problems as it only adds new stuff to
> the project files but does not remove old one. Note that I do prefer
> to NOT use any Eclipse maven plugin as I had bad experiences with
> those. However those cases where about two years ago so such tools
> might have improved in the meantime.
>
> For Debugging I do use Eclipse:
>
> Unit tests work fine within eclipse. If I want to debug a component
> within a Stanbol Server I do the following
>
> 1. Start the Stanbol Server in debug mode
>
>     java -Xmx1024m -XX:MaxPermSize=256m \
>         -Xdebug
> -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n \
>         -jar
> org.apache.stanbol.launchers.full-0.10.0-incubating-SNAPSHOT.jar
>
> 2. connect Eclipse to the Stanbol Server:
>     * Debug Configurations > Remote Java Application >> create new
>     * Socket Attach
>     * Host: localhost and Port as specified with address (8787 in the
> example above)
>
> 3. using the sling installer maven plugin to install/update the module
> with the component I am working on
>
>     mvn clean install -PinstallBundle
> -Dsling.url=http://localhost:8080/system/console
>
>     * Make sure to "disconnect" the debugger before calling this as
> the debugging might interfere with update process of the module
>
> hope this helps
> best
> Rupert
>
> On Fri, Jul 27, 2012 at 3:56 AM, harish suvarna <hs...@gmail.com>
> wrote:
> > Hi,
> > I am trying to add Chinese language processing using some opensource
> > segmenters. I had some communication with Rupert. I am attaching Rupert's
> > suggestions. This way I amy get some more suggestions help as well as
> > Rupert's ideas get distributed to all.
> >
> > I am also following Anuj's blog to learn about Stanbol content
> enhancement
> > engine development.
> >
> > I can successfully build Stanbol and play with the default chain.
> >
> > I am trying to create the eclipse project now. mvn eclipse:eclipse was
> > successful too. Then I imported the stanbol directory into eclipse
> > workspace.
> > In eclipse certain Stanbol projects are in red.
> >
> > Description    Resource    Path    Location    Type
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
> all
> > projects is recommended    org.apache.stanbol.enhancer.ldpath
> > Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.entityhub.indexing.core is built. Cleaning and
> building
> > all projects is recommended
> > org.apache.stanbol.entityhub.indexing.destination.solryard
> > Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
> > projects is recommended    org.apache.stanbol.entityhub.query.clerezza
> >     Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
> > projects is recommended    org.apache.stanbol.entityhub.ldpath
> > Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
> all
> > projects is recommended    org.apache.stanbol.enhancer.rdfentities
> > Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building
> all
> > projects is recommended    org.apache.stanbol.enhancer.test
> > Unknown    Java Problem
> > The project cannot be built until its prerequisite
> > org.apache.stanbol.entityhub.core is built. Cleaning and building all
> > projects is recommended    org.apache.stanbol.entityhub.site.managed
> > Unknown    Java Problem
> > ....
> > ...
> >
> > Any extra steps are needed?
> > Should I try to build and debug inside eclipse or build using mvn and
> debug
> > in eclipse? What developers do in common?
> >
> > -harish
> >
> >
> >
> > ================================================Previous
> > communication================================================
> > Hi,
> >
> > There are no NER (Named Entity Recognition) models for Chinese text
> > available via OpenNLP. So the default configuration of Stanbol will
> > not process Chinese text. What you can do is to configure a
> > KeywordLinking Engine for Chinese text as this engine can also process
> > in unknown languages (see [1] for details).
> >
> > However also the KeywordLinking Engine requires at least n tokenizer
> > for looking up Words. As there is no specific Tokenizer for OpenNLP
> > Chinese text it will use the default one that uses a fixed set of
> > chars to split words (white spaces, hyphens ...). You may better how
> > well this would work with Chinese texts. My assumption would be that
> > it is not sufficient - so results will be sub-optimal.
> >
> > To apply Chinese optimization I see three possibilities:
> >
> > 1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection,
> > POS tagging, Named Entity Detection)
> > 2. allow the KeywordLinkingEngine to use other already available tools
> > for text processing (e.g. stuff that is already available for
> > Solr/Lucene [2] or the paoding chinese segment or referenced in you
> > mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP,
> > because representing Tokens, POS ... as RDF would be to much of an
> > overhead.
> > 3. implement a new EnhancementEngine for processing Chinese text.
> >
> > Hope this helps to get you started.
> >
> > best
> > Rupert
> >
> > [1] http://incubator.apache.org/stanbol/docs/trunk/multilingual.html
> > [2]
> >
> http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean
> > harish suvarna
> > 6:33 PM (22 minutes ago)
> >
> > to Rupert
> > Thanks a lot Rupert.
> >
> > I am weighing between options 2 and 3. What is the difference? Optiion 2
> > sounds like enhancing KeyWordLinkingEngine to deal with chinese text. It
> > may be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is
> > like a separate engine. But will I be able to use the stanbol dbpedia
> > lookup using option 3?
> >
> > Btw, I created my own enhancement engine chains and I could see them
> > yesterday in localhost:8080. But today all of them have vanished and only
> > the default chain shows up. Can I dig them up somewhere in the stanbol
> > directory?
> >
> > -harish
> >
> > I just created the eclipse project
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: What is the right dev environment?

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi,

I do use Eclipse and I usually do not care about classpath related
build problems in Eclipse as long as code suggestions do still work.

If I have problems

1.  mvn eclipse:clean eclipse:eclipse
2. refreshing all projects in eclipse
3. full project > clean

usually solves those problems. NOTE that only calling "mvn
eclipse:eclipse" may not solve problems as it only adds new stuff to
the project files but does not remove old one. Note that I do prefer
to NOT use any Eclipse maven plugin as I had bad experiences with
those. However those cases where about two years ago so such tools
might have improved in the meantime.

For Debugging I do use Eclipse:

Unit tests work fine within eclipse. If I want to debug a component
within a Stanbol Server I do the following

1. Start the Stanbol Server in debug mode

    java -Xmx1024m -XX:MaxPermSize=256m \
        -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n \
        -jar org.apache.stanbol.launchers.full-0.10.0-incubating-SNAPSHOT.jar

2. connect Eclipse to the Stanbol Server:
    * Debug Configurations > Remote Java Application >> create new
    * Socket Attach
    * Host: localhost and Port as specified with address (8787 in the
example above)

3. using the sling installer maven plugin to install/update the module
with the component I am working on

    mvn clean install -PinstallBundle
-Dsling.url=http://localhost:8080/system/console

    * Make sure to "disconnect" the debugger before calling this as
the debugging might interfere with update process of the module

hope this helps
best
Rupert

On Fri, Jul 27, 2012 at 3:56 AM, harish suvarna <hs...@gmail.com> wrote:
> Hi,
> I am trying to add Chinese language processing using some opensource
> segmenters. I had some communication with Rupert. I am attaching Rupert's
> suggestions. This way I amy get some more suggestions help as well as
> Rupert's ideas get distributed to all.
>
> I am also following Anuj's blog to learn about Stanbol content enhancement
> engine development.
>
> I can successfully build Stanbol and play with the default chain.
>
> I am trying to create the eclipse project now. mvn eclipse:eclipse was
> successful too. Then I imported the stanbol directory into eclipse
> workspace.
> In eclipse certain Stanbol projects are in red.
>
> Description    Resource    Path    Location    Type
> The project cannot be built until its prerequisite
> org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.enhancer.ldpath
> Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.entityhub.indexing.core is built. Cleaning and building
> all projects is recommended
> org.apache.stanbol.entityhub.indexing.destination.solryard
> Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.entityhub.core is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.entityhub.query.clerezza
>     Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.entityhub.core is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.entityhub.ldpath
> Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.enhancer.rdfentities
> Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.enhancer.test
> Unknown    Java Problem
> The project cannot be built until its prerequisite
> org.apache.stanbol.entityhub.core is built. Cleaning and building all
> projects is recommended    org.apache.stanbol.entityhub.site.managed
> Unknown    Java Problem
> ....
> ...
>
> Any extra steps are needed?
> Should I try to build and debug inside eclipse or build using mvn and debug
> in eclipse? What developers do in common?
>
> -harish
>
>
>
> ================================================Previous
> communication================================================
> Hi,
>
> There are no NER (Named Entity Recognition) models for Chinese text
> available via OpenNLP. So the default configuration of Stanbol will
> not process Chinese text. What you can do is to configure a
> KeywordLinking Engine for Chinese text as this engine can also process
> in unknown languages (see [1] for details).
>
> However also the KeywordLinking Engine requires at least n tokenizer
> for looking up Words. As there is no specific Tokenizer for OpenNLP
> Chinese text it will use the default one that uses a fixed set of
> chars to split words (white spaces, hyphens ...). You may better how
> well this would work with Chinese texts. My assumption would be that
> it is not sufficient - so results will be sub-optimal.
>
> To apply Chinese optimization I see three possibilities:
>
> 1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection,
> POS tagging, Named Entity Detection)
> 2. allow the KeywordLinkingEngine to use other already available tools
> for text processing (e.g. stuff that is already available for
> Solr/Lucene [2] or the paoding chinese segment or referenced in you
> mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP,
> because representing Tokens, POS ... as RDF would be to much of an
> overhead.
> 3. implement a new EnhancementEngine for processing Chinese text.
>
> Hope this helps to get you started.
>
> best
> Rupert
>
> [1] http://incubator.apache.org/stanbol/docs/trunk/multilingual.html
> [2]
> http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean
> harish suvarna
> 6:33 PM (22 minutes ago)
>
> to Rupert
> Thanks a lot Rupert.
>
> I am weighing between options 2 and 3. What is the difference? Optiion 2
> sounds like enhancing KeyWordLinkingEngine to deal with chinese text. It
> may be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is
> like a separate engine. But will I be able to use the stanbol dbpedia
> lookup using option 3?
>
> Btw, I created my own enhancement engine chains and I could see them
> yesterday in localhost:8080. But today all of them have vanished and only
> the default chain shows up. Can I dig them up somewhere in the stanbol
> directory?
>
> -harish
>
> I just created the eclipse project



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen