You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Ezequiel Foncubierta (JIRA)" <ji...@apache.org> on 2011/07/11 11:30:59 UTC

[jira] [Created] (STANBOL-263) Asynchronous calls support for the engines resource

Asynchronous calls support for the engines resource
---------------------------------------------------

Key: STANBOL-263
URL: https://issues.apache.org/jira/browse/STANBOL-263
Project: Stanbol
Issue Type: New Feature
Components: Enhancer
Reporter: Ezequiel Foncubierta

Enable an async option when you call the engines/ resource. If the system is overloaded, or some engines take a long time to process the content, it should be an option to run asynchronous transactions. Some integrations requires synchronous calls, because they want to tag contents in the same transactions. But, could be some others in which the synchronous calls are not a required feature.

This proposal is because, almost the all the current integration uses synchronous calls. It means that, the content creation process is as following:

1. Create the content in the CMS
2. Send the content to the enhancer
3. Write the enhancer results
4. Relate the content with the extracted entities

So, the CMS performance depends on the Apache Stanbol performance. An alternative, would be creating the content and run a background process to extract the enhancements (using different transactions).

A first way to get it, is by sending a url parameter (e.g. referer=http://system/listener/service). If this parameter is present, then run the enhancements in a background thread and, once finished, then send the results to the specified url in the referer parameter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (STANBOL-263) Asynchronous calls support for the engines resource

Posted by "Ezequiel Foncubierta (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/STANBOL-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063243#comment-13063243 ] 

Ezequiel Foncubierta commented on STANBOL-263:
----------------------------------------------

Bertrand Delacretaz says in the mail list:

I agree with the need for async enhancement engines, one of the ideas
that we had when discussing the initial FISE design (that led to the
Stanbol enhancer) was to have some metadata that says "we're still
working on some parts of this content and might get some more metadata
later" in the content items to handle status info about asynchronous
enhancement.

Async processing should IMO take things like Mechanical Turk into
account, i.e. extremely slow processing, using things like
http://groups.csail.mit.edu/uid/turkit/ maybe.

Another idea, more complicated but potentially much more powerful, is
to use a kind of tuple space for enhancement engines to collaborate,
something like:

-Content item CI is added to the space
-Engine A sees CI, works on it and as a result adds triple A1 to the space
-Engine B sees triple A1, works on it and adds triple B1 to the space
-Engine A sees triple B1 and adds more metadata, based on it an CI, to the space

Engines can then work iteratively on finding out more things about
content items, and this would also allow for correlating metadata
supplied by several engines to improve the metadata quality.

I'm basically just dreaming outloud here, but I think we should take
this into account if we introduce multiple processing chains in the
enhancer: some chains might use a totally different engine
collaboration mechanism like the above, instead of the current
sequential processing.

> Asynchronous calls support for the engines resource
> ---------------------------------------------------
>
>                 Key: STANBOL-263
>                 URL: https://issues.apache.org/jira/browse/STANBOL-263
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>            Reporter: Ezequiel Foncubierta
>
> Enable an async option when you call the engines/ resource. If the system is overloaded, or some engines take a long time to process the content, it should be an option to run asynchronous transactions. Some integrations requires synchronous calls, because they want to tag contents in the same transactions. But, could be some others in which the synchronous calls are not a required feature.
> This proposal is because, almost the all the current integration uses synchronous calls. It means that, the content creation process is as following:
> 1. Create the content in the CMS
> 2. Send the content to the enhancer
> 3. Write the enhancer results
> 4. Relate the content with the extracted entities
> So, the CMS performance depends on the Apache Stanbol performance. An alternative, would be creating the content and run a background process to extract the enhancements (using different transactions).
> A first way to get it, is by sending a url parameter (e.g. referer=http://system/listener/service). If this parameter is present, then run the enhancements in a background thread and, once finished, then send the results to the specified url in the referer parameter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (STANBOL-263) Asynchronous calls support for the engines resource

Posted by "Ezequiel Foncubierta (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/STANBOL-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063241#comment-13063241 ] 

Ezequiel Foncubierta commented on STANBOL-263:
----------------------------------------------

Olivier Grisel says in the mail list:

Sounds like a good approach but nothing is implemented yet. The Async
stuff that occurs in the source code is a left over of early code
prototyping that happened during the first sprint and was never lead
to it's term.

We need to extend the JobManager to fork threads for such stuff. We
could use the JDK ThreadPoolExecutor API to do this quite easily (in
memory queued tasks with basic multicore parallelism).

Later we might also want to provide a way to query for some monitoring
/ progress info: the first query returns a token id that could be used
to query for a description of the progress of the job.

> Asynchronous calls support for the engines resource
> ---------------------------------------------------
>
>                 Key: STANBOL-263
>                 URL: https://issues.apache.org/jira/browse/STANBOL-263
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>            Reporter: Ezequiel Foncubierta
>
> Enable an async option when you call the engines/ resource. If the system is overloaded, or some engines take a long time to process the content, it should be an option to run asynchronous transactions. Some integrations requires synchronous calls, because they want to tag contents in the same transactions. But, could be some others in which the synchronous calls are not a required feature.
> This proposal is because, almost the all the current integration uses synchronous calls. It means that, the content creation process is as following:
> 1. Create the content in the CMS
> 2. Send the content to the enhancer
> 3. Write the enhancer results
> 4. Relate the content with the extracted entities
> So, the CMS performance depends on the Apache Stanbol performance. An alternative, would be creating the content and run a background process to extract the enhancements (using different transactions).
> A first way to get it, is by sending a url parameter (e.g. referer=http://system/listener/service). If this parameter is present, then run the enhancements in a background thread and, once finished, then send the results to the specified url in the referer parameter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira