You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by John Pereira <jo...@salzburgresearch.at> on 2012/01/27 11:04:07 UTC

IKS early adopter proposal (job-skills matching service)

Hello
IKS team

Here is the latest IKS early adopter proposal (idea) -

I am posting it here for input on how best to ensure a valuable  
experience for this early adopter company. After a short discussion I  
will get back to them with improvements/instructions on how to proceed  
with the validation.

The issue to address is how applicable is the IKS current components  
(state of maturity) to this proposal, and what are the missing  
features/functionalities that must come from the early adopter to  
realise this use case.

> The context is the one of a Human Resource Department of a big  
> company or any recruitment company. The basic goal is to provide  
> them with an open source  document management system able to deal in  
> in intelligent way with non structured CV, i.e. CVs which comes in  
> Microsoft Word, pdf, Open Office etc. Each time a new CV arrives it  
> is inserted in the document base. Behind the scene this is not just  
> adding a document but passing it to a (possibly remote) Standbol  
> server which enhance it with structured information. This might  
> represent:
>     1) experiences of the candidate
>     2) skills of the candidate
>     3) Education level
>     4) reference data (name, address etc.)
>     5) contact data
> Some of these data might be slightly more structured than just named  
> entities, but definitely in the representation power of rdf. Some of  
> them could be even more semantically enriched, by providing external  
> information on companies, places, specific technologies etc.
>
> As a result of this intelligent population, personnel at the HR dep.  
> would be able to formulate queries such as (just an exemplification):
> - All CV  of people living in Paris older then 27 years
> - All CV of people with skills in SQL server and Java
> - All people wich worked in an high tech company since november 2011  
> up to now
> ....
>
> From a technical point of view the most interesting challenge  
> consists in integrating the set of Stanbol enhancer, with the  
> semantic web services provided atwww.linguagrid.org. In principle it  
> should not be a different integration than what has already been  
> made with OpenCalais WS and Zemanta WS. However there are at lelast  
> two major challenges:
>     1) Multilinguality (French and Italian)
>     2) The fact that we are not dealing with "standard" named entity  
> expressions.
>
> From the point of view of the adopted CV we could go either for  
> Nuxeo or Alfresco, or even a third CMS, depending on two factors:
>     - The adaptability of the specific task of handling CVs;
>     - The level of support of the supporting community;
>     - The acceptance of the CMS for our current prospects of this  
> kind of application


Best Regards,

John Pereira
Senior Project Manager | www.salzburgresearch.at

Jakob-Haringer Strasse 5/3
5020 Salzburg, Austria

Desk: + 43 662 2288 247
Mobile: + 43 664 8142003
Fax: + 43 662 2288 222
Email: john.pereira@salzburgresearch.at

Current Projects:

Interactive Knowledge Stack (IKS) is an open source community,
building a flexible technology platform for semantically enhanced
Content Management Systems (CMS)
www.iks-project.eu

ConnectME facilitates a new interactive media experience built on top
of the convergence of TV/video and the Web.
www.connectme.at/

Re: IKS early adopter proposal (job-skills matching service)

Posted by Olivier Grisel <ol...@ensta.org>.

Looks like a great use case for Stanbol. To make it work that will
probably require to build a dedicated entityhub with entities of type
"Skill". Unfortunately DBpedia does not seem to provide such an
owl:Class in it's ontology.

LinkedIn on the other hand does have such a database with links to
wikipedia articles for semantic grounding. See for instance:

  http://www.linkedin.com/skills/skill/Java

However it does not have any RDF dump and the REST API does not seem
to provide a skill query API that would be suitable for wrapping as an
EntityHub referenced site.

Scraping linkedin skills pages to extract such a graph using tools
such as https://scraperwiki.com/ would be possible but probably
illegal as well :)

So, in opinion is that if the early adaptor has a strong technical
background and is ready to invest some effort in building custom
training set both for  building OpenNLP models and EntityHub indices
it might yield interesting results.

-- 
Olivier

Re: IKS early adopter proposal (job-skills matching service)

Posted by John Pereira <jo...@salzburgresearch.at>.

Hi,

Please see the final version of the related IKS early adopter contract at: 
http://wiki.iks-project.eu/index.php/CELI_Proposal. I have asked the contractors to send a introductory email to this mailing list.

John 


On Jan 30, 2012, at 9:28 AM, Olivier Grisel wrote:

> 2012/1/29 Andreas Kuckartz <A....@ping.de>:
>> John wrote:
>> 
>>>> As a result of this intelligent population, personnel at the
>>>> HR dep. would be able to formulate queries such as (just an
>> exemplification):
>>>> - All CV  of people living in Paris older then 27 years
>>>> - All CV of people with skills in SQL server and Java
>>>> - All people wich worked in an high tech company since november 2011
>> up to now
>> 
>> The first two query examples do not seem to require semantic
>> technologies, they are usual filters in applications which are using
>> ordinary relational databases.
> 
> Semantic technologies would still be useful for extracting the latent
> knowledge of the CV documents into a structured representation with
> classes and properties (Skill, Organization, Location...) with links
> to generic linked data knowledge bases (e.g. DBpedia or Freebase or
> Geonames for organizations and places) or domain specific knowledge
> bases (e.g. skills and organizations with the LinkedIn API).
> 
> If the data is linked to rich entities it is possible to do queries
> involving the hierarchical structures of places (World regions >
> Countries > Regions > Cities), of organizations (Types of
> organizations (Gov, Not for profit, Commercial) > Industries) or of
> skills (IT > Software development > Java).
> 
>> But the last example looks more
>> interesting because of the "high tech company". The user interface
>> probably would have to take that into account (most people working in HR
>> departments do not know anything about SPARQL).
> 
> No user at all should ever have to write SPARQL queries. SPARQL is not
> user interface, it's machine interface.
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel

Re: IKS early adopter proposal (job-skills matching service)

Posted by Olivier Grisel <ol...@ensta.org>.

2012/1/29 Andreas Kuckartz <A....@ping.de>:
> John wrote:
>
>>> As a result of this intelligent population, personnel at the
>>> HR dep. would be able to formulate queries such as (just an
> exemplification):
>>> - All CV  of people living in Paris older then 27 years
>>> - All CV of people with skills in SQL server and Java
>>> - All people wich worked in an high tech company since november 2011
> up to now
>
> The first two query examples do not seem to require semantic
> technologies, they are usual filters in applications which are using
> ordinary relational databases.

Semantic technologies would still be useful for extracting the latent
knowledge of the CV documents into a structured representation with
classes and properties (Skill, Organization, Location...) with links
to generic linked data knowledge bases (e.g. DBpedia or Freebase or
Geonames for organizations and places) or domain specific knowledge
bases (e.g. skills and organizations with the LinkedIn API).

If the data is linked to rich entities it is possible to do queries
involving the hierarchical structures of places (World regions >
Countries > Regions > Cities), of organizations (Types of
organizations (Gov, Not for profit, Commercial) > Industries) or of
skills (IT > Software development > Java).

> But the last example looks more
> interesting because of the "high tech company". The user interface
> probably would have to take that into account (most people working in HR
> departments do not know anything about SPARQL).

No user at all should ever have to write SPARQL queries. SPARQL is not
user interface, it's machine interface.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: IKS early adopter proposal (job-skills matching service)

Posted by Andreas Kuckartz <A....@ping.de>.

John wrote:

>> As a result of this intelligent population, personnel at the
>> HR dep. would be able to formulate queries such as (just an
exemplification):
>> - All CV  of people living in Paris older then 27 years
>> - All CV of people with skills in SQL server and Java
>> - All people wich worked in an high tech company since november 2011
up to now

The first two query examples do not seem to require semantic
technologies, they are usual filters in applications which are using
ordinary relational databases. But the last example looks more
interesting because of the "high tech company". The user interface
probably would have to take that into account (most people working in HR
departments do not know anything about SPARQL).

> we could go either for Nuxeo or Alfresco,
> or even a third CMS, depending on two factors:
> - The adaptability of the specific task of handling CVs;
> - The level of support of the supporting community;
> - The acceptance of the CMS for our current prospects
> of this kind of application

So this early adopter has to learn *both* Apache Stanbol (or other IKS
components) *and* the internals of a CMS?

Both OpenCms and OpenSAGA would satisfy criterion 1. I am currently not
sure about criterion 2 for both and do not know about criterion 3 (which
probably would depend on other unknown requirements or preferences).

The task of creating a document management system for such types of
documents is interesting. It could be generalized.

Cheers,
Andreas