You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/09/18 19:32:23 UTC

[jira] Commented: (NUTCH-368) Message queueing system

    [ http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_12435539 ] 
            
Doug Cutting commented on NUTCH-368:
------------------------------------

How would you compare this to JMS?

http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/jms/package-summary.html

Is it fundamentally different, primarily a simplification, better integrated w/ Nutch/Hadoop, or what?

> Message queueing system
> -----------------------
>
>                 Key: NUTCH-368
>                 URL: http://issues.apache.org/jira/browse/NUTCH-368
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>         Attachments: msg.tgz
>
>
> This is an implementation of a filesystem-based message queueing system. The motivation for this functionality is explained in HADOOP-490 - there is nothing Nutch-specific in this implementation, so if it's considered generally useful it could be moved there.
> Below are excerpts from the included javadocs.
> The model of the system is as follows:
>     * applications (including map-reduce jobs) may create their own separate message queueing area. Alternatively, they can specifically ask for a named message queue, belonging to a different application or existing as a system-wide queue. Message queues are created under "/mq" and then the message queue id (for map-reduce jobs this is a job id, or it can be any other name passed as job id to the constructor).
>       Please see the example for more information.
>     * a single unit of information passing through queues is a Msg, which has a unique identifier (consisting of creation time and publisher name), string subject, and content (Writable).
>     * single MsgQueue in fact consists of any number of topics. There are four predefined ones: in, out, err, and ctrl.
>     * messages are published to topics, which present a sequential view of messages, sorted by msgId (which corresponds to their order of arrival).
>     * each message queue may periodically poll for changes (MsgQueue.startPolling()), using a separate thread. Polling updates the list of topics and messages. Poll interval is configurable, and defaults to 5 sec.
>     * each detected change in the queue (add/remove topic, add/remove message) may be communicated to registered listeners. Out-of-band messages are not supported in this version, but it's not too complicated to add them. Applications can create listeners watching queues for newly added messages, or deleted messages, added topics or deleted topics, etc.
>     * each instance of MsgQueue using the same physical queue maintains its own view of the queue, keeping track of topics and messages that it considers "processed and discarded". In other words, multiple readers and creators may modify queues, and each knows which messages it already processed and which ones are new. In a similar fashion, instances may willfully "remove" certain topics from their view, even though these topics still physically exist and are available for other instances (and later on they can "add" them to their view again).
>       This somewhat complicated feature was implemented in order to support multiple readers for the same message (e.g. many tasks per one mapred job). Each task needs to register for the same queue, and if they didn't have their own views of the queue, messages would be consumed by the first task that got to them. As it is implemented now, each task may consume messages at its own pace. At the end of the job applications may elect to keep the queue around or to destroy it (and thus remove all topics and messages in it).
>     * messages, topics and queues may be destroyed by any user, at which point they are physically removed from the filesystem. All users will gradually update their views, during the next poll operation.
>     * there is a command-line tool to examine and modify queues, and also to retrieve and send simple text messages. You can run it like this:
>          bin/nutch org.apache.nutch.util.msg.MsgQueueTool ...many options...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira