Dashboard > CI Development > ... > Collaboration Tools > CT Iteration2
Log In   View a printable version of the current page.
CI Development
CT Iteration2
Added by Thomas Im , last edited by Thomas Im on May 15, 2009  (view change)
Labels: 
(None)


Description

This page describes efforts associated with the collaboration tools during the 'Zinc' development cycle.  During this development cycle, the test deployment of collaboration tools and the associated monitoring and provisioning components created during the first iteration will be built upon to produce a production-ready deployment of the collaboration tools on EC2. 


Strategy

  • Identify possible failure scenarios, failure modes, failure hypothesis. This includes documentation of dependencies and core system/application parameters to the degree practical and useful for our system
  • Identify available actions and their consequences:
    • Operation and admin action: E.g. start a certain app, restart a node etc
    • Monitoring and testing actions: monitoring system health, testing applications etc.
    • Backup and archive procedures
    • Failure mitigation and recovery actions: E.g. Revert to a previous snapshot, coordinated restart of the system, block end users from accessing system
  • Document core actions and procedures: backup, scripts, servers, access lists, contacts etc.
  • Identify responsibilities and strategy
  • Automate (as decided and needed and practical) critical, error prone steps
  • Provide a staging/test environment similar to the operational environment (without end user access). Document and potentially automate development -> staging -> operation procedure

Deliverables

At the end of this iteration, the collaboration tools on EC2 should be in a "production-ready" state.  A "production-ready" states includes:

1. Fully functional applications running on EC2 under the staging domain: ooici.org (ooici.org/cloud.oceanobservatories.org)

2. Machine images for each component of the collaboration tools that include all applications, dependencies, and scripts necessary for provisioning

4. Provisioning scripts that allow for the startup of instances and running of applications without the need for manual configuration of any components

3. Remote monitoring and logging of the CT tools on EC2

4. A tracking repository to keep details about the collab tool AMIs and running instances such as AMI IDs, EBS volume IDs, pub and private dns, provisioning parameters, etc

5. A means of versioning and archiving application data, machine images and operational scripts

6. Documentation on operational procedures including failure recovery, backups, updates and patching, and startup



Task Outline

 Critical Tasks

Week (days)
Description
2 ?
Finalize design for collab tools deployment considering domain strategy, backup checking and recovery procedures.
3 ?
Create repository to track machine images and their services
3 (2-3)
Add mail services (mta, mailman, list creation scripts) to webapps image
3 (4-5)
Write OS / Application-level test probes to provide information on the availability and usability of the collaboration tools.
4 (2-3)
Create means of keeping Intermapper in sync with tracking repository
4  ?
Implement recovery strategy based on a number of failure scenarios provided by test probes (ex: server is down, app not responding, database corruption, etc)
4 (1-2)
Document procedure for domain switching between ooici.org and oceanobservatories.org
5 (4-5)
Complete testing of collab tools: Migrate data,  step through each app and make sure everything is functioning as it should (monitoring error logs),
recreate failure scenarios to test recovery procedures, receive feedback from users
5 (1-2)
Work out any remaining migration details
6 (2-3)
Final data migration (production server is taken down, domain is pointed to cloud)

 Non-critical Tasks

Week (days)
Description
-- (1)
Add Magnet messaging service to Collab tools AMIs to provide means of controlling agents.
--  ?
Create node, process, and service agents to provide contextualization and control of operating system and applications.
4 (1)
Look into online editing of Office docs for Alfresco
4 (1)
Provide means of bug reporting, feature requests for end users.

Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators