Dashboard > CI Development > ... > Data Exchange > Messaging patterns in Data Exchange
Log In   View a printable version of the current page.
CI Development
Messaging patterns in Data Exchange
Added by Paul Hubbard , last edited by Paul Hubbard on Apr 16, 2010  (view change) show comment
Labels: 
(None)

In talking with Dorian, one idea that emerged was documenting the messaging patterns in the current DX code, to get design feedback and also to use as requirements for the messaging system.

This also attempts to map the messaging patterns into Magnet/AMQP terminology as sub-items.

Magnet/AMQP patterns

We're moving the messaging requirements and definitions from within each module into a per-group 'bootstrap' module. The idea is that the bootstrap module:

  1. Defines and declares the queues, bindings and exchanges for each module
  2. Starts all the processes (locally for now)

Roughly speaking, something like this (non-working code!)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
dx_components = {
        'cache' : worker_queue_factory,
        'controller' : worker_queue_factory,
        'csv' : worker_queue_factory,
        'distributor' : fanout_factory,
        'fetcher' : worker_queue_factory,
        'notification': fanout_factory,
        'persister' : worker_queue_factory,
        'proxyng' : worker_queue_factory,
        'pub_sub' : worker_queue_factory,
        'user_notification' : worker_queue_factory,
        'user_pub_sub' : worker_queue_factory,
        }

logging.basicConfig(level=logging.DEBUG, \
            format='%(asctime)s %(levelname)s (%(funcName)s) %(message)s')

logging.info('Starting up...')

mon = procmon.ProcessMonitor()

for comp in dx_components:
    logging.info('Declaring ' + comp)
    magnet.declare(comp)

for comp in dx_components:
    logging.info('Adding ' + comp)
    mon.addProcess(comp, ['python', './ooidx/%s.py' % comp], env=environ.data)

Receive-only messaging

  • Persister: Subscribes via pub-sub (below) to all dataset messages, and persists them to disk. This could be worker queue or specific persister, e.g. for 'send a copy to the west coast repository.'
    • The persister uses a pseudo-random listen address via the pub-sub, which complicates the design a bit, but on that random address it should be worker queue pattern.

Request-response (RPC)

  • Attribute store: get/set/query/delete are all RPC.
    • RPC pattern.
  • Persister->pub-sub: subscribe yields a listen address.
    • Worker queue pattern with caveats as noted above, does client-side RPC before listening.
  • Proxy->controller 'get URL' returns a listen address.
    • Worker queue after client-side RPC
  • Web page->controller 'get stats' returns json-encoded usage stats
    • RPC pattern

Send-only (fire and forget)

  • Fetcher->pub-sub. Grabs the data, drops it into pub sub and is done with it.
  • Controller -> fetcher. 'Go get url X and send to address Y.'
  • Controller -> cache , ditto
  • Controller->fetcher 'get get dataset X and send to pub-sub'
  • Pub-sub -> distributor. 'Send this message to this list of listeners'
  • Notification client -> notification 'Send me all events matching this regex'
  • Notification -> notification client 'New dataset event foo'
  • Web page->controller: Purge dataset, register dataset
  • Controller->cache 'purge dataset'

None of these are really declarable messaging patterns as they're sending only.

Third-party coordination

  • Controller tells fetcher to get a URL and send it to the proxy.
    • Fetcher is worker queue model. We will want multiple fetchers running on the same queue of work, which is nicely scalable due to the no-side-effects nature of the work.
  • Controller tells cache to get a URL and send it to the proxy.
    • Worker queue for now. Later on this will change as dataset names become part of the address space.
  • Controller tells fetcher to get a dataset and send it to pub-sub for persister and others.
    • Same as above - worker queue for fetcher

Multicast

  • Distributor and pub-sub combine here. Distributor does the 'send-to-list' function, and pub-sub keeps the subscriber list.
    • Distributor service is fanout. Eventually it should be removed and replaced by a Magnet call inside the pub-sub that creates fanouts as required.
    • Pub-sub is worker queue on its listener, and simply sends to the distributor.
    • Distributor is worker queue on its listener.
  • Event notification and distributor. Same mechanism, different names and smaller messages.
    • Worker queue on the listener, and fanout on the sender side.

Pub-sub and notification

Current code is complicated by the lack of usable multicast, and we thus have a temporary distributor service in place to simulate multicast.

Basic idea: pub-sub (and sister service notification) maintain lists of

Regular expression: address

If a message comes in, it's compared to each regex, and if there's a match the message is forwarded on to the address.

See above for patterns.

Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators