Dashboard > CI Development > ... > Elaboration Coordination > CEI Elaboration Planning Meeting
Log In   View a printable version of the current page.
CI Development
CEI Elaboration Planning Meeting
Added by David Stuebe , last edited by David Stuebe on Mar 10, 2010
Labels: 
(None)

CEI Elaboration Planning Meeting
http://etherpad.ooici.org/ceioverview

Attendees
David Stuebe
John Graybeal
Tim Freeman
David LaBissoniere
Michael Meisinger
Alex
Matt
Kate

Notes
impoprtant to understand Nimbus and its team
trying to gain ear toward Chicago concerns
Nimbus: 3 goals:
infrastructure to deploy clouds for scientific computing (IaaS)
"collection of tools that site on top of IaaS service": context broker/EPU, for OOI and beyond, and supporting services that are needed
produce a collection of open source implementations, create sustainable project in long run
Nimbus Participants (ANL/UofC staff): Kate Keahey, Tim Freeman, David LaBissoniere, John Bresnahan (for OOI). There are other committers (including Alex ).

IaaS - in Nimbus it is the Workspace service
Get Nimbus credentials, use Nimbus as a client; OSS implementation of EC2, for instance for Magellan etc.
KVM vs. VMware: image format is different and not interchangeable. One virtualization impl supports one format. Need also interfaces to the providers (different EC2, Flexiscale/kvm).
libvirt abstracts from execution environments (control commands/start-stop), esp. KVM and Xen, not so much for VMware
libvirt API is equally for all, but API doesn't do everything)
Nimbus has composed additional functionality for end to end use cases (esp. in networks and network security)
For VMware, Nimbus tools on VMM nodes exist that do beyond what libvirt does.
Community moving to KVM? Yes, but Amazon (and HPC shops) preserving status of Xen (2 ver diff. efforts)
Saying "Virtual Machine Monitor" more accurate to refer to physical node for each system (there are term details that no one should worry about).
Since KVM is in Linux kernel, that gives everyone that access; support is gathering (as opposed to fully there for Xen)

we as a team have to come to assess ... is the comparable management environments for Xen and KVM. (What is downside?)
VMware looks successful because of management software
Citrix XenCenter is not so popular for 'enterprise' shops (where non-remote-user mgmt software is strongly needed)

Nimbus presents a uniform interface to remote users (and handles gateway security tasks, federation, etc.). Nimbus itself adapts to physical resources that it makes available to remote users. Possible strategies:
1. adapt to each VMM impl. and "control" physical nodes directly
2. adapt to whole site resource manager (this may be a good place to integrate with VMware). Condor is another resource manager with VMs. OpenQRM is promising in this space (scheduling not a strong suit yet).

This gives enough context for specific questions (which can be handled later).
e.g.:
Can VMware resource manager handle anything behind it besides VMware VMs?
<your question here>

When "Resource managers" is used, usually this refers to manager of a specific execution resource site.

EPU is being built on top of IaaS. Highly Available (HA) service. Elastic responses to demand. Complex/configurable policies.

OOI is EPU stakeholder. Later on it could be a standalone package.

Part of LCA design period is to determine appropriate approach to EPU development over the next two iterations.

EPU is the core of CEI in the sense that other components are subservient to the main functionality story which is anchored at the EPU.

How does Elastic Block Store (EBS) fit into deployable types and registry of deployable types?
can't say right now, conversation against specific applications would be useful
how fast does it need to be to be interesting? how many VMs need to see the volumes (on EC2, you can only mount one to one VM instance)
EBS is a disk that isn't attached to the store
On Amazon, EBS is the answer to the fact that EC2 VMs have a disk that is destroyed after a VM is shutdown or becomes corrupted. They are durable. They are also faster. You can also boot from them now.
They can be snapshotted/cloned so that other VMs can also attach (to a 'divergent' copy).
(educated guess is that the reliable/scalable SAN is datacenter/hardware level with LVM volumes/snapshotting system being exported over GNBD and imported on Xen node). Like with S3, the value of EBS is really "behind" the trivial service interfaces.

Deployable Type: "Recipe" that can be generated, adapted, "baked"
Deployable Unit: adapted to a specific exec environment; not alive
Operational Unit: contextualized to a specfic exec environment; running instance

<< Conversation at the end about sensitivity to Nimbus components just like with any "3rd party" system. But it is recognized that CEI is a separate entity and e.g. in the case of the EPU in the near term, this is a focused integration effort. >>

<< The call ends, agreement that some of the initial objectives were met, desire to move forward from all sides (Matt to consult with John). Design week will include more of these conversations and we will move towards the "details" of the EPU/COI integration discussion. >>

Process
√ Start Etherpad: http://etherpad.ooici.org/ceioverview (We'll take team notes here)

  • Put any new presented material, if any, into Confluence (your choice of locations, we can always move it if necessary)
  • Provide links to the key URLs (drawings, Confluence pages) ahead of time:
  • Use WebEx to share the view you want people to see (We'll try recording the session with WebEx)

Purposes

  • Understanding Capabilities: Convey the nature of CEI and its capabilities to the rest of the participants
  • Subsystem Relationships: See how the CEI virtual processing capabilities are connected to the other OOI subsystems
  • Collaboration: Identify, and if possible address, collaborative opportunities and challenges
  • Social: Appreciate out who the members of the CEI team are and what each does

Background

  • Let us assume an agreed science scenario, for purposes of conversational background (no need to present the scenario below at the meeting
  • like the one from LCO (hurricane Gigi coming => more modeling, more data products and computations => more CPU needed),
  • add the following refinements
  • more sensors are deployed and much higher data rates used, so we'll need more resources to ingest 'routine' sensor data
  • more people are logging on and doing more download and annotation of the data streams,
  • we'll need more resources to support those searches, GUIs, and annotations
  • the usual system support is on-line
  • one CI person is providing operational support for the CI infrastructure: John Jones
  • a CGSN marine IO person is monitoring their deployed systems at WHOI using CI-provided infrastructure: Bill Bones
  • another CGSN computer person is monitoring the CI infrastructure and the quickly growing set of users that are accessing our resources,
  • specifically their local Acquisition point and Execution point

Content Suggestions

  • Use architectural content from DOORS to convey the overall components and services of the system (more technically than LCO allowed)
  • Specifically discuss the dependencies (of CEI on other subsystems, of other subsystems on CEI)
  • at the technical level, what services (APIs) have to be provided by other subsystems for use by CEI?
  • what services/APIs will CEI present to other subsystems?
  • what do each of the subsystem leads/developers need to understand about using CEI?
  • walk through the scenario in which a new data ingest capability container is instantiated and goes into a new Processing Unit
  • how do the subsystems in the CC bootstrap themselves into operation?
  • notionally, how does this happen for an application? don't worry about the actual CI pieces yet)
  • Talk about what the person monitoring resource utilization (let's say that's the CGSN computer person) should see
  • what might be on a user interface
  • what kinds of decisions, or rather judgments, they will be making about the resources vs requirements
  • does any of this become visible to the end science user?
  • associate components to the people developing them – who does what

Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators