Thursday, October 15, 2009

Azure: Fabric Controller

In Azure, the Fabric Controller's view of a deployed service is a bit like this:



There is a logical service, which consists of several logical roles (e.g., for our example application, the service was the overall application and the roles included a Web and two Worker roles). There may be multiple instances of any single Logical Role, each of which is a Logical Role Instance (LRI). In the above picture, there are three instances of the Logical Role R_2. Each LRI is mapped to a single Logical Node, which is a hosting environment (e.g.,  VM). There may be multiple Logical Nodes (VMs) running on any given Physical Node or computer.

The purple ovals correspond to information provided by the application developer. That is, in the configuration (service model) for this app, the developer provides the Service Description, a Role Description for each role (which provides attributes of that role, such as, what hosting environment is required, how many resources are needed, and so on), and a Role Instance Description. The Fabric Controller maps each Role Instance Descriptor to its corresponding Logical Role Instance. Similarly, each Role Descriptor is mapped to the appropriate Logical Role. (Note that not all of these connections are shown in the above picture.)

In this way, deploying a service consists of mapping the graph describing an application's topology (as provided in that app's service model) TO the graph describing the inventory managed by the FC.

Driving the logical nodes
When an LRI is bound to a logical node (VM), that logical node's goal state is set to be whatever the LRI's goal state is. For example, Logical Node LN_22 in the above has a goal state which corresponds to LRI_22's goal state. The logical node also has a current state, which is obtained from the physical node on which this VM is running. The current state is kept up-to-date by communicating with the underlying physical hardware.

They don't say how it's constructed but evidently  in addition to knowing the current state and the goal state for each VM (OK)  the state machine for the corresponding LRI is also known. A state machine identifies all the possible states an instance can be in as well as what events cause transitions between what states. There is a stream of events  obtained via polling or interrupts  which provide updates on the physical hardware. These events are used  along with the state machine  to determine the current state. Periodically, the FC figures out what the appropriate next steps are given: the current state, the state machine, and the goal state. Then those actions are performed.

Identifying when there is a problem and handling it (automatically)
The FC maintains a cache of what it believes is the current state of each node. The Fabric Agent (which lives on each node) communicates with the FC to help the FC keep this cache updated. The FC detects when a Role dies. Multiple sources participate in monitoring Role health (and notifying the FC when a Role is not healthy): the Load Balancer issues probes to machines (i.e., pings them), a Role can also notify the FC that it (the Role) is unhealthy. When the FC learns (through whatever channel) that some Role is not healthy, the FC updates the current state for that node. Then the appropriate next steps to take  to get that node closer to its goal state  are determined and undertaken.

When a node goes offline, an attempt is made to recover that node. If the reason was a hardware failure (or, generally, something which cannot be remediated automatically) the role is migrated to another node.

(Note that the above exposition would have benefited greatly from a few concrete examples  as far as what a typical goal state is, what a typical current state is, how the state machine (which identifies what's needed to transition between states) for a logical role instance is created, and so on.)

3 comments:

  1. Thanks for this very informative review. This seems to be very interesting, and very helpful for the readers.
    Keep on posting!
    Crm Services

    ReplyDelete
  2. hello... hapi blogging... have a nice day! just visiting here....

    ReplyDelete
  3. Thank you.Well it was nice post and very helpful information on
    Azure Online Training Hyderabad

    ReplyDelete