You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Phillip Rhodes <mo...@gmail.com> on 2013/11/14 01:18:23 UTC

A blog post about Stanbol and the Semantic Web

Hey guys, I thought some of you might find this interesting and/or have some
insightful feedback.  I just wrote a blog post responding to a guy who said
"the semantic web has failed", and I cited Stanbol as part of my explanation of
why he's wrong.

http://fogbeam.blogspot.com/2013/11/dominiek-ter-heide-is-dead-wrong.html

I also say that I think Stanbol might be the most important OSS
project of all, which is something I believe may well be true.
Anyway, any feedback or comments are appreciated.


Thanks,


Phil

This message optimized for indexing by NSA PRISM

Re: A blog post about Stanbol and the Semantic Web

Posted by adasal <ad...@gmail.com>.

It's also interesting to see the synaptic web that Dominiek talks about in
his blog post.
I think it's more into streaming of data and without semantic web, you
cannot extract meaningful information/semantics from those data streams..

Why do you think this? I assume there is slower moving and faster moving
information. Large streams and smaller streams. So this would depend on
technology and resources.
I think there is another important dimension to this.
You use the word 'meaningful' but this issue is what is meaningful and how
long does it remain meaningful.
Meaningful should mean some organising principal based upon which actions
are taken. That organising principal could be a financial profit point.
That does not cover all dimensions of meaning though.
Quickly judgements have to be made as to what is truly of profit to people,
or at least what may be of profit, aside from financial motivations. What
I'm pointing out is that the latter is an indifferent and blind driver.

Adam

On Thursday, November 14, 2013, Dileepa Jayakody wrote:

> Very nice blog post.
> It captures the main functions of semantic web and linked data.
> IMO Semantic web is a paradigm and not a set of technologies. It's a set of
> concepts to extract meaningful information/semantics and relationships from
> analyzing related documents. (after all WWW is a collection of related
> documents)
>
> It's also interesting to see the synaptic web that Dominiek talks about in
> his blog post.
> I think it's more into streaming of data and without semantic web, you
> cannot extract meaningful information/semantics from those data streams..
>
> I'm really interested about your concept on combining data sources: email,
> facebook, linkedin and other SNSs to do semantic analysis. I'm doing my MSc
> research on email reputation management which requires semantic analysis of
> email data.
> Please share more info, links about those topics if you have.
>
>
> Thanks,
> Dileepa
>
>
> On Thu, Nov 14, 2013 at 5:48 AM, Phillip Rhodes
> <motley.crue.fan@gmail.com <javascript:;>>wrote:
>
> > Hey guys, I thought some of you might find this interesting and/or have
> > some
> > insightful feedback.  I just wrote a blog post responding to a guy who
> said
> > "the semantic web has failed", and I cited Stanbol as part of my
> > explanation of
> > why he's wrong.
> >
> >
> http://fogbeam.blogspot.com/2013/11/dominiek-ter-heide-is-dead-wrong.html
> >
> > I also say that I think Stanbol might be the most important OSS
> > project of all, which is something I believe may well be true.
> > Anyway, any feedback or comments are appreciated.
> >
> >
> > Thanks,
> >
> >
> > Phil
> >
> > This message optimized for indexing by NSA PRISM
> >
>

Re: A blog post about Stanbol and the Semantic Web

Posted by Dileepa Jayakody <di...@gmail.com>.

Thanks a lot for the detailed information Phil.


On Fri, Nov 15, 2013 at 1:06 AM, Phillip Rhodes
<mo...@gmail.com>wrote:

> > I'm really interested about your concept on combining data sources:
> email,
> > facebook, linkedin and other SNSs to do semantic analysis. I'm doing my
> MSc
> > research on email reputation management which requires semantic analysis
> of
> > email data.
> > Please share more info, links about those topics if you have.
>
> In the current version, it's a fairly naive approach.  We download the
> respective Tweet, Email, Blog Post, or whatever using the appropriate
> protocol, and then use Tika and Boilerpipe to extract the raw text
> (that is mostly for web content, with email and tweet, the raw text is
> already available) and then push that text to Stanbol, making no
> distinction between a tweet or an email or whatever.
>
> When we get the entity graph back from Stanbol, we store all the
> triples and add Statements which link the discrete entities with the
> UUID we assign to each piece of content (eg, tweet, email, blog post,
> etc.)  and now we can look for commonality by just using plain old
> SPARQL queries.
>
> What would be more interesting, and what we'll work on eventually, is
> adding more "smarts" to the actual process of doing the enhancement on
> the Stanbol side.  This could be especially useful for something like
> a Tweet where you don't have much context to work with... but a
> TweetEnhancmentEngine could be "smarter" and dereference the profile
> of the user who posted the tweet, any @mention's, any hyperlinks,
> etc., and factor that in.   Likewise for email, where you could factor
> in knowledge about the sender and the recipient(s) of the mail.
>
> Regarding email research...  you probably already know about this, but
> just in case you don't -  a lot of researchers use the "Enron
> Corpus"[1] for doing research on extracting information from email,
> since it's A. large, B. real-world and C. legally available.   I could
> imagine how some social network analysis combined with something like
> the semantic concept extraction that Stanbol does, applied to a body
> of emails, could be part of a system for doing something related to
> reputation.
>
> [1]: https://www.cs.cmu.edu/~enron/
>
>
> Phil
>

Re: A blog post about Stanbol and the Semantic Web

Posted by Phillip Rhodes <mo...@gmail.com>.

> I'm really interested about your concept on combining data sources: email,
> facebook, linkedin and other SNSs to do semantic analysis. I'm doing my MSc
> research on email reputation management which requires semantic analysis of
> email data.
> Please share more info, links about those topics if you have.

In the current version, it's a fairly naive approach.  We download the
respective Tweet, Email, Blog Post, or whatever using the appropriate
protocol, and then use Tika and Boilerpipe to extract the raw text
(that is mostly for web content, with email and tweet, the raw text is
already available) and then push that text to Stanbol, making no
distinction between a tweet or an email or whatever.

When we get the entity graph back from Stanbol, we store all the
triples and add Statements which link the discrete entities with the
UUID we assign to each piece of content (eg, tweet, email, blog post,
etc.)  and now we can look for commonality by just using plain old
SPARQL queries.

What would be more interesting, and what we'll work on eventually, is
adding more "smarts" to the actual process of doing the enhancement on
the Stanbol side.  This could be especially useful for something like
a Tweet where you don't have much context to work with... but a
TweetEnhancmentEngine could be "smarter" and dereference the profile
of the user who posted the tweet, any @mention's, any hyperlinks,
etc., and factor that in.   Likewise for email, where you could factor
in knowledge about the sender and the recipient(s) of the mail.

Regarding email research...  you probably already know about this, but
just in case you don't -  a lot of researchers use the "Enron
Corpus"[1] for doing research on extracting information from email,
since it's A. large, B. real-world and C. legally available.   I could
imagine how some social network analysis combined with something like
the semantic concept extraction that Stanbol does, applied to a body
of emails, could be part of a system for doing something related to
reputation.

[1]: https://www.cs.cmu.edu/~enron/


Phil

Re: A blog post about Stanbol and the Semantic Web

Posted by Dileepa Jayakody <di...@gmail.com>.

Very nice blog post.
It captures the main functions of semantic web and linked data.
IMO Semantic web is a paradigm and not a set of technologies. It's a set of
concepts to extract meaningful information/semantics and relationships from
analyzing related documents. (after all WWW is a collection of related
documents)

It's also interesting to see the synaptic web that Dominiek talks about in
his blog post.
I think it's more into streaming of data and without semantic web, you
cannot extract meaningful information/semantics from those data streams..

I'm really interested about your concept on combining data sources: email,
facebook, linkedin and other SNSs to do semantic analysis. I'm doing my MSc
research on email reputation management which requires semantic analysis of
email data.
Please share more info, links about those topics if you have.

Thanks,
Dileepa

On Thu, Nov 14, 2013 at 5:48 AM, Phillip Rhodes
<mo...@gmail.com>wrote:

> Hey guys, I thought some of you might find this interesting and/or have
> some
> insightful feedback.  I just wrote a blog post responding to a guy who said
> "the semantic web has failed", and I cited Stanbol as part of my
> explanation of
> why he's wrong.
>
> http://fogbeam.blogspot.com/2013/11/dominiek-ter-heide-is-dead-wrong.html
>
> I also say that I think Stanbol might be the most important OSS
> project of all, which is something I believe may well be true.
> Anyway, any feedback or comments are appreciated.
>
>
> Thanks,
>
>
> Phil
>
> This message optimized for indexing by NSA PRISM
>