You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Mark Loper <ma...@cyrenllc.com> on 2014/02/26 15:57:22 UTC

Use case question and clarification

Hi, this is going to take some clarification so you understand what I’m trying to accomplish. Bare with me.

I’m looking at Stanbol as a solution to some requirements that I have received from my customer. They want dynamic and intelligent content with a social feel across all of their content DBs. I’m looking at around 3PB of data stored in various databases ranging from geospatial imagery, documents, images, and video. I want to show the concept of semantic web can help them see, view and find their data faster and more intelligently. I want to be able to feed documents into Stanbol, get a tag cloud based on information in the object and find out what objects are most related based upon a the relationships that are found over time.

My background is as a developer mainly dealing with geospatial image processing, large object delivery over low coms, security, and OGC systems like ESRI and OpenGeo. CMS, and semantic web is proving to be a large area of study that I’m trying to get up to speed with. I did however get Stanbol up and running quickly and very easily and have run a few documents through the default config and have been happy with the results.

Here is a basic list of what I want to have happen, and I’m having trouble finding a use case that describes anything close.

1. I can’t store the data, it needs to reside at the origin servers. so I’m “enhancing” links to objects/ metadata.
2. I don’t have internet access, so this needs to live on a closed network. I could load my own copies of things like DB-pedia, maybe.
3. I want to grow the intelligence, not start out with everything. The customer is most interested in what is recent, not what is 15 years old, so I don’t need to consume all their data.
4. I want to push every document a user looks at through the system, and then over a short amount of time I expect I will have a decent and growing library of connections between what is current and important.
5. When looking at an image or video, I’d like the user to be able to tag that object and based on that tag add that to the enhancements of that and other objects in the system.
6. I want to display a tag cloud, and/or list of related documents based on what stanbol knows.

I’m not looking for a solution to all this, I realize that much of it is custom, but I feel that the Stanbol services are key to the picture. I can’t find a good example of how this would all fit together, and I don’t think I have the semantic / CMS knowledge to just plow forward. I am looking to have a conversation that will get me moving. Any help you can provide would be very appreciated.

Thank you,

Mark Loper
CTO
Cyren LLC
mark.loper@cyrenllc.com

Re: Use case question and clarification

Posted by Rafa Haro <rh...@apache.org>.

Hi Mark,

Sorry for the delayed response. Please, find my comments inline.

On Wednesday, March 26, 2014, mark.loper@cyrenllc.com <
mark.loper@cyrenllc.com> wrote:

> Well, just wanted you all to know that I've had a very successful POC demo
> this week and we are going to take a closer look at what Stanbol can do for
> the customer.  The customer did have a few questions that I'm not
> completely sure how to answer and I'm hoping you have some time to
> enlighten me.
>
> 1.  Are there any competitors using this type of semantic technology open
> source or commercial?  I said no, but I couldn't be definitive.


If you take a look to the Stanbol list archive, you will find people from
several companies using Stanbol. Also I suppose that most of the committers
use stanbol in their companies. Also you can take a look to Redlink.co
which SaaS platform uses Stanbol as one of the main components.

>
> 2.  What document types does the enhancer work on?  I demoed HTML, but
> they're interested in PDF, Word, Excel.
>
> You can use any document parseable by Apache Tika and you need to include
Tika in your enhancement chain.


> 3.  From an Ontology standpoint, can I somehow weight the enhancement
> results higher based on a customer Ontology than on the built in
> Ontologies?  Thus the customer ontology results would be weighted higher in
> the Confidence value.


If you mean to boost entities from concrete sites, AFAIK it is not
possible.

>
>
> Thank you for your support so far.  When I get to real development there
> will be much more we want to do than what I demoed.  The technology worked
> very well for what I wanted to show.  The next steps will include custom
> configuration and possibly some plugin development and may even include a
> custom enhancer.  Still trying to figure out what the real design will be.
>
> Thank you,
>
> Mark Loper
> CTO
> Cyren LLC
> mark.loper@cyrenllc.com <javascript:;>
> 703-951-3116
>
> On Mar 17, 2014, at 5:39 PM, Rafa Haro <rharo@apache.org <javascript:;>>
> wrote:
>
> > Hi Mark,
> >
> > El 16/03/14 14:31, Mark Loper escribió:
> >> To get around this I'm programmatically filtering my results using the
> confidence value.  Is there a way to configure the enhancement chain to do
> this filtering for me?  With a Rule possibly?  I feel like it can do it,
> but I can't figure out how.  Any thoughts?
> > AFAIK this is not possible yet, but should be more or less
> straightforward to create a "post-processing" engine for performing this
> kind of filtering. Would you like to give it a try?
> >
> > Cheers,
> > Rafa
>
>

Re: Use case question and clarification

Posted by "mark.loper@cyrenllc.com" <ma...@cyrenllc.com>.

Well, just wanted you all to know that I've had a very successful POC demo this week and we are going to take a closer look at what Stanbol can do for the customer.  The customer did have a few questions that I'm not completely sure how to answer and I'm hoping you have some time to enlighten me.

1.  Are there any competitors using this type of semantic technology open source or commercial?  I said no, but I couldn't be definitive.  

2.  What document types does the enhancer work on?  I demoed HTML, but they're interested in PDF, Word, Excel.

3.  From an Ontology standpoint, can I somehow weight the enhancement results higher based on a customer Ontology than on the built in Ontologies?  Thus the customer ontology results would be weighted higher in the Confidence value.

Thank you for your support so far.  When I get to real development there will be much more we want to do than what I demoed.  The technology worked very well for what I wanted to show.  The next steps will include custom configuration and possibly some plugin development and may even include a custom enhancer.  Still trying to figure out what the real design will be.

Thank you,

Mark Loper
CTO
Cyren LLC
mark.loper@cyrenllc.com
703-951-3116

On Mar 17, 2014, at 5:39 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi Mark,
> 
> El 16/03/14 14:31, Mark Loper escribió:
>> To get around this I’m programmatically filtering my results using the confidence value.  Is there a way to configure the enhancement chain to do this filtering for me?  With a Rule possibly?  I feel like it can do it, but I can’t figure out how.  Any thoughts?
> AFAIK this is not possible yet, but should be more or less straightforward to create a "post-processing" engine for performing this kind of filtering. Would you like to give it a try?
> 
> Cheers,
> Rafa

Re: Use case question and clarification

Posted by Rafa Haro <rh...@apache.org>.

Hi Mark,

El 16/03/14 14:31, Mark Loper escribió:
> To get around this I’m programmatically filtering my results using the confidence value.  Is there a way to configure the enhancement chain to do this filtering for me?  With a Rule possibly?  I feel like it can do it, but I can’t figure out how.  Any thoughts?
AFAIK this is not possible yet, but should be more or less 
straightforward to create a "post-processing" engine for performing this 
kind of filtering. Would you like to give it a try?

Cheers,
Rafa

Re: Use case question and clarification

Posted by Mark Loper <ma...@cyrenllc.com>.

So, I’ve gotten much of this running and I’m now down to refinement.  I’m pulling documents off of some RSS feeds, enhancing them and storing them in the Content Hub.  The Content hub portion will need to change for production, but for now it’s good for a POC.  I’m getting pretty good tag results out of the enhancement engines aside from some strange things, for instance some countries are seen as US cities.  

To get around this I’m programmatically filtering my results using the confidence value.  Is there a way to configure the enhancement chain to do this filtering for me?  With a Rule possibly?  I feel like it can do it, but I can’t figure out how.  Any thoughts?

Thanks,
Mark
On Mar 1, 2014, at 9:34 AM, Mark Loper <ma...@cyrenllc.com> wrote:

> Thanks Rafa!  I’m starting some Java integration now trying to put the data to some intelligent use.  I’m glad to hear that the entire stack works off line as built.  That’s very helpful.  I’ll bring up more questions as they come, I just got my environment set up for this, so I have a ways to go before I have a good set of questions.  
> 
> 
> On Feb 26, 2014, at 11:01 AM, Rafa Haro <rh...@apache.org> wrote:
> 
>> Hi Mark,
>> 
>> El 26/02/14 15:57, Mark Loper escribió:
>>> Hi, this is going to take some clarification so you understand what I’m trying to accomplish.  Bare with me.
>>> 
>>> I’m looking at Stanbol as a solution to some requirements that I have received from my customer.  They want dynamic and intelligent content with a social feel across all of their content DBs.  I’m looking at around 3PB of data stored in various databases ranging from geospatial imagery, documents, images, and video.  I want to show the concept of semantic web can help them see, view and find their data faster and more intelligently.  I want to be able to feed documents into Stanbol, get a tag cloud based on information in the object and find out what objects are most related based upon a the relationships that are found over time.
>> Glad to see you are considering Stanbol for such interesting use case :-)
>>> 
>>>  My background is as a developer mainly dealing with geospatial image processing, large object delivery over low coms, security, and OGC systems like ESRI and OpenGeo.  CMS, and semantic web is proving to be a large area of study that I’m trying to get up to speed with.  I did however get Stanbol up and running quickly and very easily and have run a few documents through the default config and have been happy with the results.
>>> 
>>>  Here is a basic list of what I want to have happen, and I’m having trouble finding a use case that describes anything close.
>> Let me try to put some light around these requirements and let's wait for more suggestions from the community
>>> 
>>> 1.  I can’t store the data, it needs to reside at the origin servers.  so I’m “enhancing” links to objects/ metadata.
>> As long as you can gather that content and send it to Stanbol's Enhancer API using one of the supported media types, that is not a problem
>>> 2.  I don’t have internet access, so this needs to live on a closed network.  I could load my own copies of things like DB-pedia, maybe.
>> That is exactly the way Stanbol works out of the box, with local sites, although Stanbol can also use remote sites. For instance, as you might know, a 43K entities DBpedia site is created by default.
>>> 3.  I want to grow the intelligence, not start out with everything.  The customer is most interested in what is recent, not what is 15 years old, so I don’t need to consume all their data.
>> Initially not relevant from the technical point of view.
>>> 4.  I want to push every document a user looks at through the system, and then over a short amount of time I expect I will have a decent and growing library of connections between what is current and important.
>> Currently, Stanbol doesn't provide services for making sense of the extracted metadata using the enhancer. So that is something you would have to build by yourself.
>>> 5.  When looking at an image or video, I’d like the user to be able to tag that object and based on that tag add that to the enhancements of that and other objects in the system.
>> Currently, there aren't engines to enhance images or videos, although this functionality is in the backlog and for example has been proposed as a possible GSoC project for this year. So you would have to manually tag that content.
>>> 6.  I want to display a tag cloud, and/or list of related documents based on what stanbol knows.
>> That could be easily achieved in a custom backend storing the enhancements and it is also possible in a way in Stanbol storing them in a Clerezza graph.
>>> 
>>> 
>>> I’m not looking for a solution to all this, I realize that much of it is custom, but I feel that the Stanbol services are key to the picture.  I can’t find a good example of how this would all fit together, and I don’t think I have the semantic / CMS knowledge to just plow forward.  I am looking to have a conversation that will get me moving.  Any help you can provide would be very appreciated.
>>> 
>>> Thank you,
>>> 
>>> Mark Loper
>>> CTO
>>> Cyren LLC
>>> mark.loper@cyrenllc.com
>> Hope that helps. Cheers,
>> Rafa
>

Re: Use case question and clarification

Posted by Mark Loper <ma...@cyrenllc.com>.

Thanks Rafa!  I’m starting some Java integration now trying to put the data to some intelligent use.  I’m glad to hear that the entire stack works off line as built.  That’s very helpful.  I’ll bring up more questions as they come, I just got my environment set up for this, so I have a ways to go before I have a good set of questions.  


On Feb 26, 2014, at 11:01 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Mark,
> 
> El 26/02/14 15:57, Mark Loper escribió:
>> Hi, this is going to take some clarification so you understand what I’m trying to accomplish.  Bare with me.
>> 
>> I’m looking at Stanbol as a solution to some requirements that I have received from my customer.  They want dynamic and intelligent content with a social feel across all of their content DBs.  I’m looking at around 3PB of data stored in various databases ranging from geospatial imagery, documents, images, and video.  I want to show the concept of semantic web can help them see, view and find their data faster and more intelligently.  I want to be able to feed documents into Stanbol, get a tag cloud based on information in the object and find out what objects are most related based upon a the relationships that are found over time.
> Glad to see you are considering Stanbol for such interesting use case :-)
>> 
>>   My background is as a developer mainly dealing with geospatial image processing, large object delivery over low coms, security, and OGC systems like ESRI and OpenGeo.  CMS, and semantic web is proving to be a large area of study that I’m trying to get up to speed with.  I did however get Stanbol up and running quickly and very easily and have run a few documents through the default config and have been happy with the results.
>> 
>>   Here is a basic list of what I want to have happen, and I’m having trouble finding a use case that describes anything close.
> Let me try to put some light around these requirements and let's wait for more suggestions from the community
>> 
>> 1.  I can’t store the data, it needs to reside at the origin servers.  so I’m “enhancing” links to objects/ metadata.
> As long as you can gather that content and send it to Stanbol's Enhancer API using one of the supported media types, that is not a problem
>> 2.  I don’t have internet access, so this needs to live on a closed network.  I could load my own copies of things like DB-pedia, maybe.
> That is exactly the way Stanbol works out of the box, with local sites, although Stanbol can also use remote sites. For instance, as you might know, a 43K entities DBpedia site is created by default.
>> 3.  I want to grow the intelligence, not start out with everything.  The customer is most interested in what is recent, not what is 15 years old, so I don’t need to consume all their data.
> Initially not relevant from the technical point of view.
>> 4.  I want to push every document a user looks at through the system, and then over a short amount of time I expect I will have a decent and growing library of connections between what is current and important.
> Currently, Stanbol doesn't provide services for making sense of the extracted metadata using the enhancer. So that is something you would have to build by yourself.
>> 5.  When looking at an image or video, I’d like the user to be able to tag that object and based on that tag add that to the enhancements of that and other objects in the system.
> Currently, there aren't engines to enhance images or videos, although this functionality is in the backlog and for example has been proposed as a possible GSoC project for this year. So you would have to manually tag that content.
>> 6.  I want to display a tag cloud, and/or list of related documents based on what stanbol knows.
> That could be easily achieved in a custom backend storing the enhancements and it is also possible in a way in Stanbol storing them in a Clerezza graph.
>> 
>> 
>> I’m not looking for a solution to all this, I realize that much of it is custom, but I feel that the Stanbol services are key to the picture.  I can’t find a good example of how this would all fit together, and I don’t think I have the semantic / CMS knowledge to just plow forward.  I am looking to have a conversation that will get me moving.  Any help you can provide would be very appreciated.
>> 
>> Thank you,
>> 
>> Mark Loper
>> CTO
>> Cyren LLC
>> mark.loper@cyrenllc.com
> Hope that helps. Cheers,
> Rafa

Re: Use case question and clarification

Posted by Rafa Haro <rh...@apache.org>.

Hi Mark,

El 26/02/14 15:57, Mark Loper escribió:
> Hi, this is going to take some clarification so you understand what I’m trying to accomplish.  Bare with me.
>
> I’m looking at Stanbol as a solution to some requirements that I have received from my customer.  They want dynamic and intelligent content with a social feel across all of their content DBs.  I’m looking at around 3PB of data stored in various databases ranging from geospatial imagery, documents, images, and video.  I want to show the concept of semantic web can help them see, view and find their data faster and more intelligently.  I want to be able to feed documents into Stanbol, get a tag cloud based on information in the object and find out what objects are most related based upon a the relationships that are found over time.
Glad to see you are considering Stanbol for such interesting use case :-)
>
>    My background is as a developer mainly dealing with geospatial image processing, large object delivery over low coms, security, and OGC systems like ESRI and OpenGeo.  CMS, and semantic web is proving to be a large area of study that I’m trying to get up to speed with.  I did however get Stanbol up and running quickly and very easily and have run a few documents through the default config and have been happy with the results.
>
>    Here is a basic list of what I want to have happen, and I’m having trouble finding a use case that describes anything close.
Let me try to put some light around these requirements and let's wait 
for more suggestions from the community
>
> 1.  I can’t store the data, it needs to reside at the origin servers.  so I’m “enhancing” links to objects/ metadata.
As long as you can gather that content and send it to Stanbol's Enhancer 
API using one of the supported media types, that is not a problem
> 2.  I don’t have internet access, so this needs to live on a closed network.  I could load my own copies of things like DB-pedia, maybe.
That is exactly the way Stanbol works out of the box, with local sites, 
although Stanbol can also use remote sites. For instance, as you might 
know, a 43K entities DBpedia site is created by default.
> 3.  I want to grow the intelligence, not start out with everything.  The customer is most interested in what is recent, not what is 15 years old, so I don’t need to consume all their data.
Initially not relevant from the technical point of view.
> 4.  I want to push every document a user looks at through the system, and then over a short amount of time I expect I will have a decent and growing library of connections between what is current and important.
Currently, Stanbol doesn't provide services for making sense of the 
extracted metadata using the enhancer. So that is something you would 
have to build by yourself.
> 5.  When looking at an image or video, I’d like the user to be able to tag that object and based on that tag add that to the enhancements of that and other objects in the system.
Currently, there aren't engines to enhance images or videos, although 
this functionality is in the backlog and for example has been proposed 
as a possible GSoC project for this year. So you would have to manually 
tag that content.
> 6.  I want to display a tag cloud, and/or list of related documents based on what stanbol knows.
That could be easily achieved in a custom backend storing the 
enhancements and it is also possible in a way in Stanbol storing them in 
a Clerezza graph.
>
>
> I’m not looking for a solution to all this, I realize that much of it is custom, but I feel that the Stanbol services are key to the picture.  I can’t find a good example of how this would all fit together, and I don’t think I have the semantic / CMS knowledge to just plow forward.  I am looking to have a conversation that will get me moving.  Any help you can provide would be very appreciated.
>
> Thank you,
>
> Mark Loper
> CTO
> Cyren LLC
> mark.loper@cyrenllc.com
Hope that helps. Cheers,
Rafa