Wednesday, October 14, 2009

More on Azure

Below is some basic data on Windows Azure as well as brief exploration into relevant background topics (such as managed code and dynamically generated Web page content). The below contains no details on the various Azure Services (such as Live Services and .NET Services) and few details about the Azure OS run-time. It's all pretty high-level and gauzy.

Future posts may fill in more details such as:
  • What does execution look like on the cloud (what kind of isolation between web apps and so on)? Each machine in the fabric runs one instance of Azure OS? And each instance of Azure OS runs an instance of IIS7? And is there a one-to-one mapping between apps and instances of Azure OS (surely not)? Etc.
  • How exactly can one specify the health constraints of his app (where the Fabric Controller uses these conditions to automatically manage the app/service  including scaling the app up or down (creating/deleting instances) or restarting the app)?
  • What do the various Service APIs look like? (E.g., for SQL Data Services, Live Services, .NET Services, and so on) What operations are available?
  • How will MSFT support web apps written in non-MSFT languages (e.g., python), i.e., how will incorporate support for non-managed code?
  • ...
Notes from Manuvir Das, "A Lap Around Azure"


Windows Azure == OS for the cloud.
It's the lowest layer. Services are layered on top of this foundation, including:
  • Live services
  • .NET services
  • SQL services (a.k.a. "Data Services")
  • SharePoint services
  • MSFT Dynamics CRM services
These services are implemented using REST, HTTP, and XML.
What's a cloud? A set of connected servers.
What can you do on the cloud (i.e., on those servers)? Install and run services; store and retrieve data.
A client can access Azure services via calling into a "managed class library" (presumably some code that contains RPC-like stubs which  at run-time  invoke the appropriate code).

In the desktop computing world, an OS provides:
  • An environment in which to run an app (abstracts away the particular underlying hardware config)
  • Access to a shared file system, which provides isolation via access control
  • Resource allocation from a shared pool
  • Support for different programming environments
In a cloud computing OS, you want all of the above plus 24/7 operation, pay-as-you-consume, transparent administration (i.e., which hides the complexity of remote mgmt as much as possible).

What features does Azure provide?
  • Automated service management.
    • The developer defines rules about which code should be executed under which conditions (where a condition might be, "URL x was visited") as well as the code itself. The platform follows these rules in deploying, monitoring, and managing the developer's service.
  • A powerful hosting environment. All of the hardware for actually running and serving your code (servers, load balancers, firewalls?). Two possible execution modes: direct and virtualized (where the latter is via a hypervisor).
  • Scalable and available storage in the cloud. Provides abstractions such as: blobs, tables, and queues.
  • A rich familiar developer experience. Includes a local testing/debugging environment which provides a complete simulation of the cloud. So that developers can test their app in an environment as close to the real thing as possible. The simulated enviro lets the application do everything it would be able to do if it were actually running on the Azure OS in the cloud.
More on Automated Service Management (Is this the so-called Fabric Controller? Yes, I think so.)
The developer writes his code as before. But he also now creates a model, which specifies:
  • What should the service topology look like? How big? Use cloud storage? How many front-end roles? How many back-end roles? Should the front- and back-end roles be able to talk to one another? If so, how? (Note that these various roles can connect to the outside world, which enables the developer to communicate with a particular role directly.) A common idiom is for the different roles (e.g., Web Role, Worker Role) communicate via shared storage  i.e., one adds to a queue (is a producer) and the other takes from that queue (is a consumer).
  • How to define health of my service (i.e., conditions under which I should be alerted because such conditions indicate that the service needs attention of some sort).
    • MS detects failures (of hardware, software, bugs, ...)
    • MS detects violation of health constraints (i.e. detects when your app's execution is outside of the definition of healthy)
    • MS needs a way to transparently fix stuff (ideally w/o a human in the loop). Maybe by rebooting your service, moving it to another server.
    • Achieve this via abstraction. Application code refers to logical resources (rather than to particular IP addresses or underlying CPUs). If need to obtain the actual hardware addresses at run-time, can invoke APIs which provide physical values (e.g., addresses) corresponding to specified logical resources.
  • Configuration settings: what particular values or parameters do I want to be able to change at run-time without having to reploy the entire service?
Note that you don't *have to use* their automated, abstracted thing. They provide a so-called "escape hatch" or Raw mode, which lets the developer build a VM from the ground up and run his service within that VM (where the developer would be responsible for managing that service as well). So this offering much more closely resembles Amazon's EC2 service except even with this the developer doesn't actually supply his own VM (as he would with EC2) but rather configures one of their VMs.

More on Azure Storage: massive scale, availability, durability. Geo-distribution. Geo-replication. This is NOT to be confused with SQL Services because Azure Storage does not expose the full database-management interface (which would include querying, insertion, schema creation, and so on). Only can upload data (and presumably delete it?). And available ways to structure data are very simple.
  • Large items of unstructured data: Blobs, file streams
  • Structured data (referred to as "service state"): Tables, caches
  • Service communication: queues, locks
Cloud Storage is accessible from anywhere on the Internet. Has REST APIs on it.

More on Developer Experience
Support for a variety of programming languages: ASP.net, .NET, native code, PHP
Bunch of tools and support, including for logging, alerts, tracing, ...
Including the much touted "desktop SDK for full simulation of the cloud"

My own look at things  questions and comments
  • Can I only run a web application on Azure? E.g., if I wanted to run a mail server on Azure, could I do that or not? If not, what mechanisms are actually  used to prevent this? Restrict traffic that's not on a standard HTTP/HTTPS port?
    • Yes  according to this PDC 2008 session, an input endpoint (which is an app's externally-reachable interface to the world) must use either port 80 or port 443 (note that an app doesn't have to have an input endpoint). Hence, you couldn't define an app to run on Azure which had an input endpoint of port 25 (for SMTP), for example.
  • For the non-MSFT languages which can run on Azure (e.g., PHP, Python, Ruby), what run-time environment does a program in such a language run? Does the user choose the particular run-time environment?
  • And then what about sandboxing? What kind of isolation among various apps running on the Azure Fabric?
Since someone will ask, what are the benefits of moving an app to the cloud?
  • If you have a customer-facing web application and you have customers scattered across the globe then can use Azure to run an instance of your app at various geographic locations. This will have the effect of reducing latency for customers accessing your application from India, for example.
  • Availability: Relieves the user from having to maintain redundant servers (and infrastructure for fault tolerance); let MSFT handle that.
  • Scalability: Let MS also handle scaling of your application automatically.
    • Frees app-provider up from responsibility of maintaining a number of machines for the service that corresponds to the expected peak load for that service (despite the fact that normal use might be well below peak).
  • Zero-downtime upgrades.
  • If you need huge amounts of storage or the ability to do batch processing or the ability to run an application on a very large data set (for example, as with MapReduce), you can achieve this by running your app / doing your processing in the cloud. Frees you up from having to purchase and maintain physical resources, especially since the job may only be a temporary thing (and hence the physical resources would be idle most of the time).
At some point, will look in more detail at each Azure Service being offered  to understand what can do with that service, who uses it, and so on. But for starters here they are:
  • Compute on Azure OS.
  • Store using Azure OS.
    • Cheap, efficient, not necessarily very expressive.
    • That is, this is NOT a full-service relational database interface that enables SELECT, INSERT, and so on. This is a very simple interface that only exposes a couple different storage formats (blob, queue, simple [non-relational] table) and presumably only a couple ways to manage (operate on) the stored data.
  • Access Live services for...
  • Access .NET services for...
  • Access SQL services for...
  • Access SharePoint services for...
  • Access MSFT Dynamics CRM services for...
    • CRM: Customer Relationship Management; software and/or processes/strategies. Includes all methods by which a company responds to (or reaches out to) its customers. So call center, sales force, marketing, tech support, field services. Players in this area: Oracle (Siebel, PeopleSoft), SAP, salesforce.com, Amdocs, MSFT, Epiphany, and others.
The so-called "Fabric Controller"
The Windows Azure Fabric is a "scalable hosting environment" that is "built on distributed MS data centers." The Fabric Controller manages resources, performs load balancing, observes developer-provided constraints/requirements for an app as well as the real-time conditions for that app/service and responds accordingly (by, for example, automatically provisioning additional resources, restarting the service, taking away some unneeded resources, and so on). The FC scales app resources automatically as demand rises and falls. Used to deploy service and manage upgrades.

The Azure OS performs advanced tracing and logging on apps that run on it  so that developers can monitor the status of their apps as far as compute, storage, and bandwidth. Presumably, the constraints that developers specify are in terms of the type of things that can be observed using this tracing. Note that since MS is allowing apps written in non-MS languages, the types of "signals" that can be used to monitor the health of an app are likely OS-level (rather than language-level) things. For example, easy to identify the # of open sockets and heap size. More difficult for an OS to have visibility into language-level resource usage such as the number of locks created/acquired and so on. Hence, the types of things that one can specify in the application model must be rather generic and observable from the OS-level. I wonder whether  for apps written in MS languages and which will run in an MS environment (such as .NET framework)  one can specify a different (richer) set of constraints in the application model. 

  • Managed code == code written using an MSFT language and which executes on MSFT run-time; e.g., .NET, IIS7, WCF
  • Can only run a web application on Azure OS, not an arbitrary network app.
    • Enforced via only allowing an app to have an input endpoint whose port is 80 or 443; this means that the app can only receive traffic on the port for HTTP or the port associated with HTTPS.
  • The cloud on your desktop: complete offline cloud simulation. Actual cloud == set of connected servers or machines, also referred to as the fabric; it's what your app will run on. The "cloud on your desktop" is a set of processes (all of which run locally), where each process simulates a server. So the set of local processes are your "cloud on the desktop"; also referred to as "development fabric."
  • There's a UI for playing with / seeing this development fabric; can see each service deployment. For each service deployment, can see all the roles that this service has defined and, for each role, all of the instances of that role.
  • When create a new project, two associated files are created: the service definition file (ends in *.csdef) and the service configuration file (ends in *.cscfg). These are XML files which contain metadata about your service. The service definition file defines all of the roles and, for each role, defines the input endpoints for that role (where an endpoint consists of a name, protocol (e.g., HTTP or HTTPS), and port (80 or 443, respectively). If there are any configuration-specific settings or parameters, those parameters are declared here (but given values elsewhere). The service configuration file identifies for each role the number of instances of that role (that should be created) and  for any configuration settings declared in the service definition file  provides values for those settings/parameters.
Horizontal scaling: have code running on a single server; to scale == add more servers
But what about state? How to share state across various instances? (Why need to share state? Because a user could interact with a different instance each time and the fact that there are multiple instances needs to be transparent to that user  i.e., the user can begin a transaction on one server and continue that transaction on another) Solution: Use a single centralized database (a "durable store") to store all state; no server stores any state locally. All servers access state from this store == all servers have same view of state.

Have this available in Azure as the Azure OS storage: blobs, simple tables, queues. Access this storage via REST and ADO.NET Data Services.

A brief detour  some background on Microsoft languages and run-time environments
Evidently Microsoft has developed an infrastructure which enables code to be written in any one of several different languages then run on a variety of platforms. Parts of their infrastructure are reminiscent of Java with its bytecode and platform-specific JVMs. In particular, MSFT created various languages (e.g., C#, J#, VB.NET). Each of these languages has its own compiler which takes a program in the language and produces code in an intermediate language. This code is platform-agnostic (much the same as Java bytecode); the intermediate language is CIL  Common Intermediate Language. Then for each different hardware/OS platform, there is a Common Language Runtime (much the same as there are different JVMs for each hw/OS platform). CIL code is executed on the CLR. The CLR compiles and caches the CIL code just-in-time to the appropriate hardware instructions given the underlying CPU architecture (e.g., x86).

So the Microsoft languages for which a compiler exists (which converts a program in that language into a program in the intermediate language, CIL) are referred to as managed; so a program written in C#, J#, or VB.NET is managed code. Such code (after being compiled into CIL) executes on the CLR, which is a managed environment. Unmanaged code by contrast is that which is not compiled into CIL and does not run within the CLR. So when we say that something is a CLI Language, we mean that there is a compiler that takes a program in that language and produces the corresponding CIL code. (Fyi, CIL was "previously known as MSIL  Microsoft Intermediate Language.") Actually, "managed code" is a general term which covers any code that runs within a VM (rather than executing directly on the underlying hardware). A C++ program written with Microsoft Visual C++ could be compiled into managed code (to run in the .NET CLR) or unmanaged code (using the old MFC framework). But in general all code written in a particular language will be compiled into managed or unmanaged code (rather than being able to be compiled at will into one form or the other).



The CLR evidently exports an API that offers a lot of functionality that is usually provided by an OS. In particular, the CLR provides functions for: memory management, thread management, exception handling, garbage collection, and "security." This functionality is provided in the Class Libraries. The class libraries implement common functions such as: read/write, render graphics, interact w/DB, manipulate XML documents. The .NET Framework Class library consists of two libraries:
  1. Base Class Library (BCL): small subset of entire class library. Core set of classes that serve as the basic API of the CLR. Many of the functions provided by MSCORLIB.DLL and some of those provided in System.DLL and System.core.DLL. Akin to the standard libraries that come with Java.

  2. Framework Class Library (FCL): superset of BCL classes. Entire class library that ships with .NET Framework. Includes expanded set of libraries: WinForms, ADO.NET, ASP.NET, and others.

CLR diag.svg

So, as portrayed above, the Common Language Infrastructure (CLI) consists of the intermediate language (CIL) and the environment within which that intermediate language executes (the CLR). Together the CIL and the CLR comprise the CLI.

What kind of benefits do you get by using a CLI?
  • Can have a program which combines components written in different high-level languages.
  • Can compile once (into CIL) and run anywhere (for which there exists a CLR). 
  • There are also some benefits of managed code generally: the ability to provide stronger type safety (since can do run-time type checking), garbage collection
Another brief foray away from the main topic here: Web Page Content
A web page can have only static content (the web page's content doesn't change) or it can have dynamic content. In the latter case, where does the dynamism come from? Two possibilities.
  1. There might be client-side scripting which changes the way the page is presented depending upon mouse movements, keyboard input, or timing events. So the dynamism is in how the content is presented. Languages used to achieve this include JavaScript (part of Dynamic HTML) and Action Script (part of Flash). These scripts might inject sound, animation or change the text. One can also perform remote scripting using these languages; remote scripting entails a Web page asking for more information from the server without having to reload the page. We see this in XMLHttpRequest, for example.

  2. Secondly, server-side scripts might dynamically generate different web page content depending upon the data provided by the user in an HTML form (e.g., the user enters his name (John) and the generated page says, "Hi John! Welcome to..."), URL parameters, the browser type, or DB/server state. The server-side languages used for this type of dynamic content generation include: PHP, Perl, ASP, ASP.NET, JSP, ColdFusion, and so on. These languages use the Common Gateway Interface (CGI) to produce dynamic web pages.
Suffice it to say, we could explore these topics much more carefully and thoroughly. But the above suffices for our purposes, which are to understand what ASP or ASP.NET are (answer: server-side scripting languages, like PHP).

References
On the topic of Model-View-Control

2 comments:

  1. VEry nice presentation.Gud post really appreciate your good work.
    http://www.pegasyssoft.com

    ReplyDelete
  2. Crm Services could really give aide to those people that have more tasks than everybody else. availing this CMS

    Solutions could get your life run a little smoother.Thanks for this very informative review. This seems to be very

    interesting, and very helpful for the readers.
    Keep on posting!
    Crm Services

    ReplyDelete