This is a release to measure the performance of Diego. See the proposal here.
Usage
Note: To deploy with a cf-deployment style manifest using BOSH 2.0, include the ops file under operations/add-diego-perf-release.yml. You will also need to modify the ops file to use your local copy of diego-perf-release:
Deploy diego-release, cf-release. To deploy this release, create a BOSH
deployment manifest with as many pusher instances as you want to use for
testing.
Running Fezzik
bosh ssh stress_tests 0
Run /var/vcap/jobs/caddy/bin/1_fezzik multiple times.
Output is stored in /var/vcap/packages/fezzik/src/github.com/cloudfoundry-incubator/fezzik/reports.json
Running Cedar
Automatically Running 10 Batches of Cedar (Preferred)
The steps mentioned in the previous section are automated by the
./cedar_script. The script will push 10 batches of apps each in its own
spaces. Details below on how to run it:
Run cd /var/vcap/jobs/cedar/bin.
Run the following command to run the experiment:
./cedar_script
To resume the experiment from the nth batch (where n is a number from 1
to 10), add n as an argument to the script. For example, to run from the
fourth batch:
./cedar_script 4
Note: if the spaces are already present from a previous run of the script,
the script will not fail and will instead continue to push to those existing
spaces. Manually delete spaces or the entire CF org if required.
This script also then pushes an extra batch of apps via cedar
and monitors them with arborist. The file
/var/vcap/sys/log/cedar/cedar-arborist-output.json
contains the results from that cedar run, and the file
/var/vcap/sys/log/arborist/arborist-output.json
contains the arborist results.
The script will also output the min/max timestamp for each batch in
/var/vcap/data/cedar/min-<batch#>.json and
/var/vcap/data/cedar/max-<batch#>.json.
Running Cedar from a BOSH deployment
Run ./scripts/generate-deployment-manifest and deploy diego-perf-release
with the generated manifest. If on BOSH-Lite, you can use
./scripts/generate-bosh-lite-manifests.
Run bosh ssh to SSH to the cedar VM in the cf-warden-diego-perf deployment.
Run sudo su.
Run the following commands:
# put the CF CLI on the PATH
export PATH=/var/vcap/packages/cf-cli/bin:$PATH
# target CF and create an org and space for the apps
cf api api.bosh-lite.com --skip-ssl-validation
cf auth admin admin
cf create-org o
cf create-space cedar -o o
cf target -o o -s cedar
cd /var/vcap/packages/cedar
/var/vcap/packages/cedar/bin/cedar \
-n 1 \
-k 2 \
-payload /var/vcap/packages/cedar/assets/temp-app \
-config /var/vcap/packages/cedar/config.json \
-domain bosh-lite.com \
&
Running Cedar Locally
Target a CF deployment.
Target a chosen org and space.
From the root of this repo, run cd src/code.cloudfoundry.org/diego-stress-tests/cedar/assets/stress-app.
Precompile the stress-app to assets/temp-app by running GOOS=linux GOARCH=amd64 go build -o ../temp-app/stress-app.
Run cd ../.. to change back to src/code.cloudfoundry.org/diego-stress-tests/cedar.
Run ./cedar -h to see the list of options you can provide to cedar.
One of the most important options is a JSON-encoded config file that
provides the manifest paths for the different apps being pushed. The
default config.json can be found
here.
Run Arborist from a BOSH deployment
Note: Arborist depends on a successful cedar run, as it uses the output file from
cedar as an input.
Run the example below to monitor apps on a BOSH-Lite installation:
Run ./scripts/generate-bosh-lite-manifests and deploy diego-perf-release with the generated manifest.
Run bosh ssh to SSH to the cedar VM in the cf-warden-diego-perf deployment.
Run sudo su.
Run the following commands to run arborist from a tmux session:
```bash
1. To detach from the `tmux` session, send `Ctrl-b d`.
1. To reattach to the `tmux` session, run `/var/vcap/packages/tmux/bin/tmux attach -t arborist`.
### Run Arborist Locally
1. cd to `src/code.cloudfoundry.org/diego-stress-tests/arborist`
1. Build the arborist binary with `go build`.
1. Run the following to start a test:
```bash
./arborist \
-app-file <cedar-output-file> \
-duration 10m \
-logLevel info \
-request-interval 10s \
-result-file output.json
Arborist has the following usage options:
-app-file string
path to json application file
-domain string
domain where the applications are deployed (default "bosh-lite.com")
-duration duration
total duration to check routability of applications (default 10m0s)
-logLevel string
log level: debug, info, error or fatal (default "info")
-request-interval duration
interval in seconds at which to make requests to each individual app (default 1m0s)
-result-file string
path to result file (default "output.json")
Monitoring the cluster
The team has created three grafana dashboards that include graphs to monitor
interesting metrics. Below are the names and description of each one of those
dashboards:
aggregation/bosh_influxdb_dashboard.json
System metrics (i.e. cpu usage, system load and disk usage) across the entire cluster
aggregation/diego_influxdb_dashboard.json
Diego metrics, e.g. bbs api latency, bbs requests/s, etc.
aggregation/golang_stats_influxdb_dashboard.json
Golang metrics (i.e. number of goroutines, gc pause, etc.)
Importing dashboard
To import any/all of those dashboards. From the home page:
Click on Home (or the dashboard search dropdown)
Click on Import
Choose a file
Save the dashboard (CTRL+S or the drive icon next to the dashboard dropdown)
perfchug is a tool that ships with the diego-perf-release. It takes log
output from cedar, bbs and auctioneer, processes it, and converts it into
something that can be fed into InfluxDB.
To use perfchug locally:
cd <path>/diego-perf-release/src/code.cloudfoundry.org/diego-stress-tests/perfchug.
Run go install to build the executable.
Move the executable into your $PATH.
Once on the $PATH, supply lager-formatted logs to perfchug on its stdin.
The output file will contain one line per query. All query results are valid
json. If there are no data points in InfluxDB, e.g. no failures, InfluxDB will
result an empty result, e.g. {"results":[]}
If the output file parameter is provided, diego_results.sh will also trigger
a post-processing script that condenses the output into metrics.csv, a more
human-readable format.
Snapshotting and Restoring Influxdb (GCP Only)
Snapshotting
Go to the google cloud platform dashboard, and find the influxdb instance.
Find the ‘additional disks’ section, and click on the disk to be snapshotted.
Click ‘Create Snapshot’ at the top of the window that opens up.
Name the snapshot and click ‘Create’.
Restoring a snapshotted InfluxDB
Go to the google cloud platform dashboard, and find the influxdb instance.
Click edit at the top of the page.
Find the ‘additional disks’ section, and add a disk from the snapshot.
Click save at the bottom of the page. The new disk will appear as /dev/sd[a-z] (where [a-z] is the next available letter for a disk name).
Edit /etc/mtab on the influxdb vm to add the new filesystem from /dev/sd[a-z] to /var/vcap/store2.
Run mkdir -p /var/vcap/store2 && cd /var/vcap/store2 && mount /dev/sd[a-z]1.
Edit all references to /var/vcap/store -> /var/vcap/store2 in /var/vcap/jobs/influxdb.
Restart influxdb with monit restart influxdb.
Development
These tests are meant to be run against a real IaaS. However, it is possible to
run them against BOSH-Lite during development. A deployment manifest template is
in templates/bosh-lite.yml. Use
spiff to merge it with a
director_uuid stub.
BOSH Diego Performance Release
This is a release to measure the performance of Diego. See the proposal here.
Usage
Note: To deploy with a cf-deployment style manifest using BOSH 2.0, include the ops file under
operations/add-diego-perf-release.yml. You will also need to modify the ops file to use your local copy ofdiego-perf-release:Prerequisites
Deploy diego-release, cf-release. To deploy this release, create a BOSH deployment manifest with as many pusher instances as you want to use for testing.
Running Fezzik
bosh ssh stress_tests 0/var/vcap/jobs/caddy/bin/1_fezzikmultiple times./var/vcap/packages/fezzik/src/github.com/cloudfoundry-incubator/fezzik/reports.jsonRunning Cedar
Automatically Running 10 Batches of Cedar (Preferred)
The steps mentioned in the previous section are automated by the
./cedar_script. The script will push 10 batches of apps each in its own spaces. Details below on how to run it:cd /var/vcap/jobs/cedar/bin.nth batch (wherenis a number from1to10), addnas an argument to the script. For example, to run from the fourth batch:Note: if the spaces are already present from a previous run of the script, the script will not fail and will instead continue to push to those existing spaces. Manually delete spaces or the entire CF org if required.
This script also then pushes an extra batch of apps via
cedarand monitors them witharborist. The file/var/vcap/sys/log/cedar/cedar-arborist-output.jsoncontains the results from thatcedarrun, and the file/var/vcap/sys/log/arborist/arborist-output.jsoncontains thearboristresults.The script will also output the min/max timestamp for each batch in
/var/vcap/data/cedar/min-<batch#>.jsonand/var/vcap/data/cedar/max-<batch#>.json.Running Cedar from a BOSH deployment
Run
./scripts/generate-deployment-manifestand deploydiego-perf-releasewith the generated manifest. If on BOSH-Lite, you can use./scripts/generate-bosh-lite-manifests.Run
bosh sshto SSH to thecedarVM in thecf-warden-diego-perfdeployment.Run
sudo su.Run the following commands:
Running Cedar Locally
cd src/code.cloudfoundry.org/diego-stress-tests/cedar/assets/stress-app.assets/temp-appby runningGOOS=linux GOARCH=amd64 go build -o ../temp-app/stress-app.cd ../..to change back tosrc/code.cloudfoundry.org/diego-stress-tests/cedar.go buildto build thecedarbinary.Run
./cedar -hto see the list of options you can provide to cedar. One of the most important options is a JSON-encoded config file that provides the manifest paths for the different apps being pushed. The defaultconfig.jsoncan be found here.Run Arborist from a BOSH deployment
Note: Arborist depends on a successful
cedarrun, as it uses the output file fromcedaras an input.Run the example below to monitor apps on a BOSH-Lite installation:
./scripts/generate-bosh-lite-manifestsand deploydiego-perf-releasewith the generated manifest.bosh sshto SSH to thecedarVM in thecf-warden-diego-perfdeployment.sudo su.arboristfrom a tmux session: ```bashstart a new tmux session
/var/vcap/packages/tmux/bin/tmux new -s arboristcd /var/vcap/packages/arborist
/var/vcap/packages/arborist/bin/arborist
-app-file
-duration 10m
-logLevel info
-request-interval 10s
-result-file output.json &
Arborist has the following usage options:
Monitoring the cluster
The team has created three grafana dashboards that include graphs to monitor interesting metrics. Below are the names and description of each one of those dashboards:
aggregation/bosh_influxdb_dashboard.jsonSystem metrics (i.e. cpu usage, system load and disk usage) across the entire cluster
aggregation/diego_influxdb_dashboard.jsonDiego metrics, e.g. bbs api latency, bbs requests/s, etc.
aggregation/golang_stats_influxdb_dashboard.jsonGolang metrics (i.e. number of goroutines, gc pause, etc.)
Importing dashboard
To import any/all of those dashboards. From the home page:
Home(or the dashboard search dropdown)ImportCTRL+Sor the drive icon next to the dashboard dropdown)See grafana export/import doc for more info
Exporting dashboard
To export a dashboard after editing it, do the following:
Manage dashboardbutton (the gear icon next to the dashboard dropdown)ExportSee grafana export/import doc for more info
Aggregating results
Preprocessing using perfchug
perfchugis a tool that ships with the diego-perf-release. It takes log output from cedar, bbs and auctioneer, processes it, and converts it into something that can be fed into InfluxDB.To use
perfchuglocally:cd <path>/diego-perf-release/src/code.cloudfoundry.org/diego-stress-tests/perfchug.go installto build the executable.$PATH.Once on the
$PATH, supply lager-formatted logs toperfchugon its stdin.For example:
will emit influxdb-formatted metrics to stdout.
Automatic downloading and aggregation
We wrote a script to automate the entire process. The script does the following:
In order to use the script, you need to do the following:
You are on a jump box inside the deployment, e.g. director
You are bosh targeted to the right environment
You have perfchug, veritas and bosh on your PATH
Create a new directory and
cdinto it. This will be used as the working directory for the script. BOSH logs will be downloaded in this directory.From that directory run:
The output file will contain one line per query. All query results are valid json. If there are no data points in InfluxDB, e.g. no failures, InfluxDB will result an empty result, e.g.
{"results":[]}If the output file parameter is provided,
diego_results.shwill also trigger a post-processing script that condenses the output intometrics.csv, a more human-readable format.Snapshotting and Restoring Influxdb (GCP Only)
Snapshotting
Restoring a snapshotted InfluxDB
/etc/mtabon the influxdb vm to add the new filesystem from/dev/sd[a-z]to/var/vcap/store2.mkdir -p /var/vcap/store2 && cd /var/vcap/store2 && mount /dev/sd[a-z]1./var/vcap/store->/var/vcap/store2in/var/vcap/jobs/influxdb.monit restart influxdb.Development
These tests are meant to be run against a real IaaS. However, it is possible to run them against BOSH-Lite during development. A deployment manifest template is in
templates/bosh-lite.yml. Use spiff to merge it with adirector_uuidstub.