These are design notes intended to convey how the various components of Diego communicate and interrelate. It is not comprehensive and is generally up-to-date, although not guaranteed to be. If you find something that you suspect is not up-to-date, please open an issue on this repository.
Diego schedules and runs Tasks and Long-Running Processes:
A Task is guaranteed to be run at most once.
A Long-Running Process (LRP) may have multiple instances. Diego is told of the desired LRPs. Each desired LRP may desire multiple instances, which Diego represents as actual LRPs. Diego attempts to keep the correct number of instances running in the face of network failures and crashes.
Clients submit, update, and retrieve Tasks and LRPs to the BBS (Bulletin Board System) via an RPC-style API over HTTP. Diego’s Auctioneer optimally distributes Tasks and LRPs to the cluster of Diego Cells via an Auction that queries and then sends work to the Cell Reps. Once the auction assigns a Task or LRP to a Cell, the Executor creates a Garden container and executes the work encoded in the Task/LRP. This work is encoded as a generic, platform-independent recipe of composable actions.
The BBS also provides a real-time representation of the state of the Diego cluster (including all desired LRPs, running LRP instances, and in-flight Tasks). The Converger periodically analyzes snapshots of this representation and corrects discrepancies, ensuring that Diego is eventually consistent.
Diego sends real-time streaming logs for Tasks/LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.
Diego is the next-generation runtime powering Cloud Foundry (CF), but Diego is abstracted away from CF: CF simply acts as another Diego client via the BBS API. For now, there is a translation layer called the CC-Bridge that converts the Cloud Controller‘s domain-specific requests to stage and run applications into requests for Tasks and LRPs. Eventually Cloud Controller will be modified to communicate directly with the BBS. The process of staging and running a CF application is complex and filled with platform and implementation-specific details. A collection of binaries known collectively as the App Lifecycle encapsulate these concerns. The Tasks and LRPs produced by the CC-Bridge download the App Lifecycle binaries and execute them to stage, to run, and to health-check CF applications.
Below is a diagrammatic overview of the major repositories and components in Diego and CF (also PDF · clickable map).
Components in the blue region are part of the Diego core and handle the running and monitoring of Tasks and LRPs. These components all come from the Diego BOSH release.
Components in the yellow region provide infrastructure support to Diego and CF components. At the moment, this primarily includes Consul for DNS-based dynamic service discovery and a consistent key-value store for distributed locks and component discovery.
Components in the orange region support routing HTTP traffic to Diego containers. This includes the Route-Emitter from Diego and the Gorouter from CF.
Components in the red region support log and metric aggregation from Diego containers and CF and Diego components.
The green region brings in Cloud Controller and the CC-Bridge. As the diagram shows, the CC-Bridge merely interfaces with the BBS, translating app-specific messages from the CC to the more generic language of Tasks and LRPs.
The following summarizes the roles and responsibilities of the various components in this diagram.
“User-facing” Components
These “user-facing” components all live in cf-release:
routes incoming HTTP traffic to processes within the CF/Diego deployment
this includes routing traffic to both developer apps running within Garden containers and CF components such as CC.
Developers typically interact with CC and the logging system through a client such as the CF CLI.
CC-Bridge Components
The CC-Bridge components interact with the Cloud Controller. They serve primarily to translate app-specific notions into the more general notions of LRPs and Tasks:
mediates staging uploads from the Executor to CC, translating the Executor’s simple HTTP POST into the complex multipart-form upload CC requires.
Nsync splits its responsibilities between two independent processes:
The nsync-listener listens for desired app requests and updates/creates the desired LRPs via the BBS.
The nsync-bulker periodically polls CC for all desired apps to ensure the desired state known to Diego is up-to-date.
TPS also splits its responsibilities between two independent processes:
The tps-listener provides the CC with information about running LRP instances for cf apps and cf app X requests.
The tps-watcher monitors ActualLRP activity for crashes and reports them to CC.
Many of the CC-Bridge components are inherently stateless and will eventually be consolidated into Cloud Controller itself.
Components on the Database VMs
The Database VMs provide Diego’s core components and clients a consistent API to the shared state and operations that manage Tasks and LRPs, as well as the data store for that shared state.
provides an RPC-style API over HTTP to both core Diego components (rep, auctioneer, converger) and external clients (CC-Bridge, route emitter, SSH proxy),
encapsulates access to the backing database and manages data migrations, encoding, and encryption,
performs LRP convergence periodically, comparing DesiredLRPs and their ActualLRPs and taking action to enforce the desired state:
if an instance is missing or unclaimed for too long, it a new auction is requested.
if an extra instance is identified, a stop message is sent to the Rep on the Cell hosting the instance.
performs Task converence periodically, resending auction requests for Tasks that have been pending for too long and completion callbacks for Tasks that have remained completed for too long,
periodically sends aggregate metrics about DesiredLRPs, ActualLRPs, and Tasks to Loggregator,
maintains a lock in consul to ensure only one BBS handles requests, migrations, and convergence at a time.
The BBS requires a backing persistent data store. MySQL and PostgreSQL are supported on current versions, and historically etcd was supported through Diego v1.0.
Components on the Cell
These Diego components run and monitor Tasks and LRPs in Garden containers:
forwards application logs and application and component metrics to doppler
Note that there is a specificity gradient across the Rep, the Executor, and Garden. The Rep is concerned with Tasks and LRPs and knows details about their lifecycles. The Executor knows only how to manage a collection of containers and to run actions in these containers. Garden knows nothing about actions and simply provides a concrete implementation of a platform-specific containerization technology that can run arbitrary commands in containers.
monitors DesiredLRP state and ActualLRP state via the BBS. When a change is detected, the Route-Emitter emits route registration and unregistration messages to the gorouter via the NATS message bus,
periodically emits the entire routing table to the router,
maintains a lock in consul to ensure only one route-emitter handles route registration at a time.
provides abstractions for locks and service registration that encapsulate interactions with consul.
Platform-Specific Components
Diego is largely platform-agnostic. All platform-specific concerns are delegated to two types of components: the garden backends and the app lifecycles.
Garden Backends
Garden contains a set of interfaces each platform-specific backend must implement. These interfaces contain methods to perform the following actions:
create/delete containers
apply resource limits to containers
open and attach network ports to containers
copy files into/out of containers
run processes within containers, streaming back stdout and stderr data
Garden-Windows provides a Windows-specific implementation of a Garden interface.
App Lifecycles
Each App Lifecycle provides a set of binaries that manage a Cloud Foundry-specific application lifecycle. There are three binaries:
The Builderstages a CF application. The CC-Bridge runs the Builder as a Task on every staging request. The Builder perfoms static analysis on the application code and does any necessary pre-processing before the application is first run.
The Launcherruns a CF application. The CC-Bridge sets the Launcher as the Action on the CF application’s DesiredLRP. The Launcher executes the user’s start command with the correct system context (working directory, environment variables, etc.).
The Healthcheck performs a status check of a running CF application from inside the container. The CC-Bridge sets the Healthcheck as the Monitor action on the CF application’s DesiredLRP.
is an integration test suite that launches the various Diego components and exercises them through various test cases. As such, Inigo validates that a given set of component versions are mutually compatible.
in addition to exercising various ordinary test cases, Inigo can exercise exceptional cases, such as when a component fails or is unavailable for a period, that would be more difficult to orchestrate against a BOSH-deployed Diego cluster.
encodes the behavioral details around the auction.
includes a simulation test suite that validates the correctness and performance of the auction algorithm. The simulation can be run for different algorithms, at different scales. The simulation can either be run in-process (for quick feedback loops) or across multiple processes (to understand the role of communication in the auction) or even across multiple machines in a cloud-like infrastructure (to understand the impact of latency on the auction).
the auctioneer and rep use the auction package to participate in the auction.
The BOSH Release
Diego-Release packages Diego as a BOSH release. Its README includes detailed instructions for deploying CF and Diego to a local BOSH-Lite.
Diego-Release is also the canonicalGOPATH for the Diego. All Diego development takes place inside the Diego-Release directory.
Diego Design Notes
These are design notes intended to convey how the various components of Diego communicate and interrelate. It is not comprehensive and is generally up-to-date, although not guaranteed to be. If you find something that you suspect is not up-to-date, please open an issue on this repository.
Migrating to Diego
We’ve put together some guidelines around transitioning applications off of the DEAs and on to Diego. One reason to move your apps to Diego is to try out SSH access to your CF app instances and Diego LRPs.
What does Diego do?
Diego schedules and runs Tasks and Long-Running Processes:
A Task is guaranteed to be run at most once.
A Long-Running Process (LRP) may have multiple instances. Diego is told of the desired LRPs. Each desired LRP may desire multiple instances, which Diego represents as actual LRPs. Diego attempts to keep the correct number of instances running in the face of network failures and crashes.
Clients submit, update, and retrieve Tasks and LRPs to the BBS (Bulletin Board System) via an RPC-style API over HTTP. Diego’s Auctioneer optimally distributes Tasks and LRPs to the cluster of Diego Cells via an Auction that queries and then sends work to the Cell Reps. Once the auction assigns a Task or LRP to a Cell, the Executor creates a Garden container and executes the work encoded in the Task/LRP. This work is encoded as a generic, platform-independent recipe of composable actions.
The BBS also provides a real-time representation of the state of the Diego cluster (including all desired LRPs, running LRP instances, and in-flight Tasks). The Converger periodically analyzes snapshots of this representation and corrects discrepancies, ensuring that Diego is eventually consistent.
Diego sends real-time streaming logs for Tasks/LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.
Diego is the next-generation runtime powering Cloud Foundry (CF), but Diego is abstracted away from CF: CF simply acts as another Diego client via the BBS API. For now, there is a translation layer called the CC-Bridge that converts the Cloud Controller‘s domain-specific requests to stage and run applications into requests for Tasks and LRPs. Eventually Cloud Controller will be modified to communicate directly with the BBS. The process of staging and running a CF application is complex and filled with platform and implementation-specific details. A collection of binaries known collectively as the App Lifecycle encapsulate these concerns. The Tasks and LRPs produced by the CC-Bridge download the App Lifecycle binaries and execute them to stage, to run, and to health-check CF applications.
CF Summit Talks on Diego
What are all these repos and what do they do?
Below is a diagrammatic overview of the major repositories and components in Diego and CF (also PDF · clickable map).
Components in the blue region are part of the Diego core and handle the running and monitoring of Tasks and LRPs. These components all come from the Diego BOSH release.
Components in the yellow region provide infrastructure support to Diego and CF components. At the moment, this primarily includes Consul for DNS-based dynamic service discovery and a consistent key-value store for distributed locks and component discovery.
Components in the orange region support routing HTTP traffic to Diego containers. This includes the Route-Emitter from Diego and the Gorouter from CF.
Components in the red region support log and metric aggregation from Diego containers and CF and Diego components.
The green region brings in Cloud Controller and the CC-Bridge. As the diagram shows, the CC-Bridge merely interfaces with the BBS, translating app-specific messages from the CC to the more generic language of Tasks and LRPs.
The following summarizes the roles and responsibilities of the various components in this diagram.
“User-facing” Components
These “user-facing” components all live in cf-release:
Developers typically interact with CC and the logging system through a client such as the CF CLI.
CC-Bridge Components
The CC-Bridge components interact with the Cloud Controller. They serve primarily to translate app-specific notions into the more general notions of LRPs and Tasks:
cf appsandcf app Xrequests.ActualLRPactivity for crashes and reports them to CC.Many of the CC-Bridge components are inherently stateless and will eventually be consolidated into Cloud Controller itself.
Components on the Database VMs
The Database VMs provide Diego’s core components and clients a consistent API to the shared state and operations that manage Tasks and LRPs, as well as the data store for that shared state.
The BBS requires a backing persistent data store. MySQL and PostgreSQL are supported on current versions, and historically etcd was supported through Diego v1.0.
Components on the Cell
These Diego components run and monitor Tasks and LRPs in Garden containers:
Note that there is a specificity gradient across the Rep, the Executor, and Garden. The Rep is concerned with Tasks and LRPs and knows details about their lifecycles. The Executor knows only how to manage a collection of containers and to run actions in these containers. Garden knows nothing about actions and simply provides a concrete implementation of a platform-specific containerization technology that can run arbitrary commands in containers.
Components on the Brain
Components on the Access VMs
Routing Translation Components
Service Registration and Component Coordination
Platform-Specific Components
Diego is largely platform-agnostic. All platform-specific concerns are delegated to two types of components: the garden backends and the app lifecycles.
Garden Backends
Garden contains a set of interfaces each platform-specific backend must implement. These interfaces contain methods to perform the following actions:
Current implementations:
App Lifecycles
Each App Lifecycle provides a set of binaries that manage a Cloud Foundry-specific application lifecycle. There are three binaries:
Current implementations:
Bringing it all together
CF and Diego consist of many disparate components. Ensuring that these components work together correctly is a challenge addressed by these entities:
The BOSH Release
Diego-Release packages Diego as a BOSH release. Its README includes detailed instructions for deploying CF and Diego to a local BOSH-Lite.
Diego-Release is also the canonical
GOPATHfor the Diego. All Diego development takes place inside the Diego-Release directory.