Scope
Goal is to make the collaboration tools quickly recoverable by operating them in the cloud. Quickly recoverable means recovery from non-catastrophic failure of software of hardware within less than 2 hours. High availability is currently not a requirement. There is no redundant and cluster deployment of application instances, nor load balancing.
The strategy is to define and author virtual machine images for every application (Confluence, Jira, Alfresco) and software infrastructure (LDAP, Crowd, Web Server, MySQL) involved. Operational environment will be the Amazon Elastic Cloud (EC2). Persistent storage is provided via the Amazon Elastic Block Storage (EBS).
Tasks
| Task | Comment |
|---|---|
| Provide overall production environment design and documentation | Includes static and dynamic. Document on Confluence. |
| Identify and characterize virtual machine images | Every application should be their own VM instance. Characterization includes OS, required packages, config files, security concerns, application data snapshot and versioning mechanism, state-of-health monitoring |
| Identify and characterize block storage partitions | Characterization includes bound to application instance, expected size, growth, bounds, snapshot strategy |
| Select, characterize and author base AMI image | This image should be a standard off the shelf server Linux installation with production quality security setup. Characterization includes packages, ports, security, boot process, contextualization process. Necessary additions include standard contextualization mechanism, monitoring solution etc. |
| Define virtual image development workflow | Use automation where possible, such as scripts to provide all the application setup or tools such as Fabrics. |
| Define and implement standard instance contextualization process | Should be light-weight based on EC2 provided meta-data and user data. Should be present on all instance |
| Design and create global configuration repository | A file repository (such as SVN, GIT) for the purpose of versioning config files, install and deploy scripts, contextualization scripts, archive scriptsin a structured way. |
| Design and implement provisioning mechanism | Startup of all instances and contextualization; synchronization of steps (1 init, 2 app startup). This includes the provisioner part and the common contextualization layer on the instances |
| Analyse failure scenarios and identify likely failure scenarios | Failures include application failure, disk full, data corruption, network down, instance unavailable. Also analyze possible threat scenario. If-then scenarios and recovery plans |
| Develop system test case | Documented procedures for testing the system's initial operating capacity and identified likely failure scenarios with recovery. Also test versioning, migration, attack etc. |
| Run system test with failure cases | Run tests in staging environment |
| Develop versioning and archiving strategy | Snapshots of block storage, consistent daily application data snapshots, off-site backup. |
| Develop security strategy | If-then scenarios for managing server access and for possible threats. Consider firewalls (Amazon, server), running deamons, ports, passwords, PKI |
| Develop data migration strategy | Procedures for migrating to non-head versions of data snapshots after a failure or for other reasons. |
| Develop application update strategy | Patches of the OS and application. How to test, where is the information located, how to download packages, required downtime etc. |
| Provide staging and production environment | Provide support for at least two distinct supported full operational environments in the cloud. |
| Develop production system migration strategy and plan | How to take the existing production versions of collaboration tools and data and bring them to the cloud, with the latest versions of the tools. |
| Test production system migration | Perform the production system migration step in full extent on the cloud |
| Design and implement monitoring |
Based on Intermapper. |