The rate limit service is a Go/gRPC service designed to enable generic rate limit scenarios from different types of
applications. Applications request a rate limit decision based on a domain and a set of descriptors. The service
reads the configuration from disk via runtime, composes a cache key, and talks to the Redis cache. A
decision is then returned to the caller.
Docker Image
For every main commit, an image is pushed to Dockerhub. There is currently no versioning (post v1.4.0) and tags are based on commit sha.
Distroless Base Image
The Docker image uses Google’s distroless base image (gcr.io/distroless/static-debian12:nonroot) for enhanced security and minimal attack surface. Distroless images contain only the application and its runtime dependencies, omitting unnecessary OS components like package managers, shells, and other utilities.
The image is pinned to a specific SHA digest for deterministic builds and uses the nonroot variant to run as a non-privileged user, following security best practices.
Benefits of Distroless:
Enhanced Security: Minimal attack surface with no unnecessary components
Smaller Image Size: Significantly smaller than traditional base images
Reduced Vulnerabilities: Fewer components means fewer potential security issues
Better Compliance: Meets security requirements for minimal base images
Non-root Execution: Runs as a non-privileged user (UID 65532) for enhanced security
Deterministic Builds: Pinned to specific SHA digest ensures reproducible builds
Debugging with Distroless:
For debugging purposes, you can use the debug variant of the distroless image:
FROM gcr.io/distroless/static-debian12:debug
COPY --from=build /go/bin/ratelimit /bin/ratelimit
This provides shell access and debugging tools while maintaining the security benefits of distroless.
v1.0.0 tagged on commit 0ded92a2af8261d43096eba4132e45b99a3b8b14. Ratelimit has been in production use at Lyft for over 2 years.
v1.1.0 introduces the data-plane-api proto and initiates the deprecation of the legacy ratelimit.proto.
e91321bcommit deleted support for the legacy ratelimit.proto.
The current version of ratelimit protocol is changed to v3 rls.proto
while v2 rls.proto is still supported
as a legacy protocol.
Ensure you set the correct platform if running OSX host with a linux container e.g.
GOOS=linux make compile
To compile and run tests:
make tests
To run the server locally using some sensible default settings you can do this (this will setup the server to read the configuration files from the path you specify):
The docker-compose setup uses a distroless-based container for the ratelimit service. In order to run the docker-compose setup from the root of the repo, run
docker-compose up
The ratelimit service is built using the main Dockerfile which uses Google’s distroless base image for enhanced security and minimal attack surface. The distroless image contains only the application and its runtime dependencies, omitting unnecessary OS components like package managers and shells.
If you want to run with two redis instances, you will need to modify
the docker-compose.yml file to run a second redis container, and change the environment variables
as explained in the two redis instances section.
Full test environment - Configure rate limits through files
To run a fully configured environment to demo Envoy based rate limiting, run:
export CONFIG_TYPE=FILE
docker-compose -f docker-compose-example.yml up --build --remove-orphans
This will run ratelimit, redis, prom-statsd-exporter and two Envoy containers such that you can demo rate limiting by hitting the below endpoints.
curl localhost:8888/test
curl localhost:8888/header -H "foo: foo" # Header based
curl localhost:8888/twoheader -H "foo: foo" -H "bar: bar" # Two headers
curl localhost:8888/twoheader -H "foo: foo" -H "baz: baz" # This will be rate limited
curl localhost:8888/twoheader -H "foo: foo" -H "bar: banned" # Ban a particular header value
curl localhost:8888/twoheader -H "foo: foo" -H "baz: shady" # This will never be ratelimited since "baz" with value "shady" is in shadow_mode
curl localhost:8888/twoheader -H "foo: foo" -H "baz: not-so-shady" # This is subject to rate-limiting because the it's now in shadow_mode
Edit examples/ratelimit/config/example.yaml to test different rate limit configs. Hot reloading is enabled.
The descriptors in example.yaml and the actions in examples/envoy/proxy.yaml should give you a good idea on how to configure rate limits.
To see the metrics in the example
# The metrics for the shadow_mode keys
curl http://localhost:9102/metrics | grep -i shadow
Full test environment - Configure rate limits through an xDS Management Server
To run a fully configured environment to demo Envoy based rate limiting, run:
export CONFIG_TYPE=GRPC_XDS_SOTW
docker-compose -f docker-compose-example.yml --profile xds-config up --build --remove-orphans
This will run in xds-config docker-compose profile which will run example xDS-Server, ratelimit, redis, prom-statsd-exporter and two Envoy containers such that you can demo rate limiting by hitting the below endpoints.
curl localhost:8888/test
curl localhost:8888/header -H "foo: foo" # Header based
curl localhost:8888/twoheader -H "foo: foo" -H "bar: bar" # Two headers
curl localhost:8888/twoheader -H "foo: foo" -H "baz: baz" # This will be rate limited
curl localhost:8888/twoheader -H "foo: foo" -H "bar: banned" # Ban a particular header value
curl localhost:8888/twoheader -H "foo: foo" -H "baz: shady" # This will never be ratelimited since "baz" with value "shady" is in shadow_mode
curl localhost:8888/twoheader -H "foo: foo" -H "baz: not-so-shady" # This is subject to rate-limiting because the it's now in shadow_mode
# The metrics for the shadow_mode keys
curl http://localhost:9102/metrics | grep -i shadow
Self-contained end-to-end integration test
Integration tests are coded as bash-scripts in integration-test/scripts.
The test suite will spin up a docker-compose environment from integration-test/docker-compose-integration-test.yml
If the test suite fails it will exit with code 1.
make integration_tests
Configuration
The configuration format
The rate limit configuration file format is YAML (mainly so that comments are supported).
Definitions
Domain: A domain is a container for a set of rate limits. All domains known to the Ratelimit service must be
globally unique. They serve as a way for different teams/projects to have rate limit configurations that don’t conflict.
Descriptor: A descriptor is a list of key/value pairs owned by a domain that the Ratelimit service uses to
select the correct rate limit to use when limiting. Descriptors are case-sensitive. Examples of descriptors are:
Each descriptor in a descriptor list must have a key. It can also optionally have a value to enable a more specific
match. The “rate_limit” block is optional and if present sets up an actual rate limit rule. See below for how the
rule is defined. If the rate limit is not present and there are no nested descriptors, then the descriptor is
effectively whitelisted. Otherwise, nested descriptors allow more complex matching and rate limiting scenarios.
The rate limit block specifies the actual rate limit that will be used when there is a match.
Currently the service supports per second, minute, hour, and day limits. More types of limits may be added in the
future based on user demand.
Replaces
The replaces key indicates that this descriptor will replace the configuration set by another descriptor.
If there is a rule being evaluated, and multiple descriptors can apply, the replaces descriptor will drop evaluation of
the descriptor which it is replacing.
To enable this, any descriptor which should potentially be replaced by another should have a name keyword in the
rate_limit section, and any descriptor which should potentially replace the original descriptor should have a name
keyword in its respective replaces section. Whenever limits match to both rules, only the rule which replaces the
original will take effect, and the limit of the original will not be changed after evaluation.
For example, let’s say you have a bunch of endpoints and each is classified under read or write, with read having a
certain limit and write having another. Each user has a certain limit for both endpoints. However, let’s say that you
want to increase a user’s limit to a single read endpoint. The only option without using replaces would be to increase
their limit for the read category. The replaces keyword allows increasing the limit of a single endpoint in this case.
ShadowMode
A shadow_mode key in a rule indicates that whatever the outcome of the evaluation of the rule, the end-result will always be “OK”.
When a block is in ShadowMode all functions of the rate limiting service are executed as normal, with cache-lookup and statistics
An additional statistic is added to keep track of how many times a key with “shadow_mode” has overridden result.
There is also a Global Shadow Mode
Including detailed metrics for unspecified values
Setting the detailed_metric: true for a descriptor will extend the metrics that are produced. Normally a descriptor that matches a value that is not explicitly listed in the configuration will from a metrics point-of-view be rolled-up into the base entry. This can be problematic if you want to have those details available for analysis.
NB! This should only be enabled in situations where the potentially large cardinality of metrics that this can lead to is acceptable.
Including descriptor values in metrics
Setting value_to_metric: true (default: false) for a descriptor will include the descriptor’s runtime value in the metric key, even when the descriptor value is not explicitly defined in the configuration. This allows you to track metrics per descriptor value when the value comes from the runtime request, providing visibility into different rate limit scenarios without needing to pre-define every possible value.
Note: If a value is explicitly specified in a descriptor (e.g., value: "GET"), that value is always included in the metric key regardless of the value_to_metric setting. The value_to_metric flag only affects descriptors where the value is not explicitly defined in the configuration.
When combined with wildcard matching, the full runtime value is included in the metric key, not just the wildcard prefix. This feature works independently of detailed_metric - when detailed_metric is set, it takes precedence and value_to_metric is ignored.
Sharing thresholds for wildcard matches
Setting share_threshold: true (default: false) for a descriptor with a wildcard value (ending with *) allows all values matching that wildcard to share the same rate limit threshold, instead of using isolated thresholds for each matching value.
This is useful when you want to apply a single rate limit across multiple resources that match a wildcard pattern. For example, if you have a rule for files/*, both files/a.pdf and files/b.csv will share the same threshold when share_threshold: true is set.
Important notes:
share_threshold can only be used with wildcard values (values ending with *)
When share_threshold: true is enabled, all matching values share the same cache key and rate limit counter
When share_threshold: false (or not set), each matching value has its own isolated threshold
When combined with value_to_metric: true, the metric key includes the wildcard prefix (the part before *) instead of the full runtime value, to reflect that values are sharing a threshold
When combined with detailed_metric: true, the metric key also includes the wildcard prefix for entries with share_threshold enabled
In the configuration above
the domain is “mongo_cps” and we setup 2 different rate limits in the top level descriptor list. Each of the limits
have the same key (“database”). They have a different value (“users”, and “default”), and each of them setup a 500
request per second rate limit.
Example 2
A slightly more complex example:
domain: messaging
descriptors:
# Only allow 5 marketing messages a day
- key: message_type
value: marketing
descriptors:
- key: to_number
rate_limit:
unit: day
requests_per_unit: 5
# Only allow 100 messages a day to any unique phone number
- key: to_number
rate_limit:
unit: day
requests_per_unit: 100
In the preceding example, the domain is “messaging” and we setup two different scenarios that illustrate more
complex functionality. First, we want to limit on marketing messages to a specific number. To enable this, we make
use of nested descriptor lists. The top level descriptor is (“message_type”, “marketing”). However this descriptor
does not have a limit assigned so it’s just a placeholder. Contained within this entry we have another descriptor list
that includes an entry with key “to_number”. However, notice that no value is provided. This means that the service
will match against any value supplied for “to_number” and generate a unique limit. Thus, (“message_type”, “marketing”),
(“to_number”, “2061111111”) and (“message_type”, “marketing”),(“to_number”, “2062222222”) will each get 5 requests
per day.
The configuration also sets up another rule without a value. This one creates an overall limit for messages sent to
any particular number during a 1 day period. Thus, (“to_number”, “2061111111”) and (“to_number”, “2062222222”) both
get 100 requests per day.
When calling the rate limit service, the client can specify multiple descriptors to limit on in a single call. This
limits round trips and allows limiting on aggregate rule definitions. For example, using the preceding configuration,
the client could send this complete request (in pseudo IDL):
And the service will rate limit against all matching rules and return an aggregate result; a logical OR of all
the individual rate limit decisions.
Example 3
An example to illustrate matching order.
domain: edge_proxy_per_ip
descriptors:
- key: remote_address
rate_limit:
unit: second
requests_per_unit: 10
# Black list IP
- key: remote_address
value: 50.0.0.5
rate_limit:
unit: second
requests_per_unit: 0
In the preceding example, we setup a generic rate limit for individual IP addresses. The architecture’s edge proxy can
be configured to make a rate limit service call with the descriptor ("remote_address", "50.0.0.1") for example. This IP would
get 10 requests per second as
would any other IP. However, the configuration also contains a second configuration that explicitly defines a
value along with the same key.
If the descriptor ("remote_address", "50.0.0.5") is received, the service
will attempt the most specific match possible. This means
the most specific descriptor at the same level as your request. Thus, key/value is always attempted as a match before just key.
Example 4
The Ratelimit service matches requests to configuration entries with the same level, i.e
same number of tuples in the request’s descriptor as nested levels of descriptors
in the configuration file. For instance, the following request:
Would not match the following configuration. Even though the first descriptor in
the request matches the 1st level descriptor in the configuration, the request has
two tuples in the descriptor.
domain: example4
descriptors:
- key: key
value: value
rate_limit:
requests_per_unit: 300
unit: second
However, it would match the following configuration:
domain: example4
descriptors:
- key: key
value: value
descriptors:
- key: subkey
rate_limit:
requests_per_unit: 300
unit: second
Example 5
We can also define unlimited rate limit descriptors:
For an unlimited descriptor, the request will not be sent to the underlying cache (Redis/Memcached), but will be quickly returned locally by the ratelimit instance.
This can be useful for collecting statistics, or if one wants to define a descriptor that has no limit but the client wants to distinguish between such descriptor and one that does not exist.
The return value for unlimited descriptors will be an OK status code with the LimitRemaining field set to MaxUint32 value.
Example 6
A rule using shadow_mode is useful for soft-launching rate limiting. In this example
user-a of the auth-service would not get rate-limited regardless of the rate of requests, there would however be statistics related to the breach of the configured limit of 10 req / sec.
user-b would be limited to 20 req / sec however.
domain: example6
descriptors:
- key: service
descriptors:
- key: user
value: user-a
rate_limit:
requests_per_unit: 10
unit: second
shadow_mode: true
- key: user
value: user-b
rate_limit:
requests_per_unit: 20
unit: second
Example 7
When the replaces keyword is used, that limit will replace any limit which has the name being replaced as its name, and
the original descriptor’s limit will not be affected.
In the example below, the following limits will apply:
(key_1, value_1), (user, bkthomps): 5 / sec
(key_2, value_2), (user, bkthomps): 10 / sec
(key_1, value_1), (key_2, value_2), (user, bkthomps): 10 / sec since the (key_1, value_1), (user, bkthomps) rule was replaced and this will not affect the 5 / sec limit that would take effect with (key_2, value_2), (user, bkthomps)
In this example we demonstrate how a descriptor without a specified value is configured to override the default behavior and include the matched-value in the metrics.
Rate limiting configuration and tracking works as normally
Value supports wildcard matching using *, which can appear at any position — trailing, middle, or multiple times. Each * matches zero or more characters.
Trailing wildcard — matches any value starting with the given prefix:
Note: When detailed_metric: true is set on a descriptor, it takes precedence and value_to_metric is ignored for that descriptor.
Example 11
Using share_threshold: true to share rate limits across wildcard matches:
domain: example11
descriptors:
# With share_threshold: true, all files/* matches share the same threshold
- key: files
value: files/*
share_threshold: true
rate_limit:
unit: hour
requests_per_unit: 10
# Without share_threshold, each files_no_share/* match has its own isolated threshold
- key: files_no_share
value: files_no_share/*
share_threshold: false
rate_limit:
unit: hour
requests_per_unit: 10
With this configuration:
Requests for files/a.pdf, files/b.csv, and files/c.txt all share the same threshold of 10 requests per hour
If 5 requests are made for files/a.pdf and 5 requests for files/b.csv, a request for files/c.txt will be rate limited (OVER_LIMIT) because the shared threshold of 10 has been reached
Requests for files_no_share/a.pdf and files_no_share/b.csv each have their own isolated threshold of 10 requests per hour
If 10 requests are made for files_no_share/a.pdf (exhausting its quota), requests for files_no_share/b.csv will still be allowed (up to 10 requests)
Metric key: example11_metrics.route_api.method_GET (includes the wildcard prefix api instead of the full value api/v1)
This reflects that all api/* routes share the same threshold, while still providing visibility into which API routes are being accessed.
Loading Configuration
Rate limit service supports following configuration loading methods. You can define which methods to use by configuring environment variable CONFIG_TYPE.
When the environment variable FORCE_START_WITHOUT_INITIAL_CONFIG set to false, the Rate limit service will wait for initial rate limit configuration before
starting the server (gRPC, Rest server endpoints). When set to true the server will start even without initial configuration.
File Based Configuration Loading
The Ratelimit service uses a library written by Lyft called goruntime to do configuration loading. Goruntime monitors
a designated path, and watches for symlink swaps to files in the directory tree to reload configuration files.
The path to watch can be configured via the settings
package with the following environment variables:
Configuration files are loaded from RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/*.yaml
There are two methods for triggering a configuration reload:
Symlink RUNTIME_ROOT to a different directory.
Update the contents inside RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/ directly.
The former is the default behavior. To use the latter method, set the RUNTIME_WATCH_ROOT environment variable to false.
The following filesystem operations on configuration files inside RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/ will force a reload of all config files:
Write
Create
Chmod
Remove
For more information on how runtime works you can read its README.
By default it is not possible to define multiple configuration files within RUNTIME_SUBDIRECTORY referencing the same domain.
To enable this behavior set MERGE_DOMAIN_CONFIG to true.
The xDS client in the Rate limit service configure Rate limit service with the provided configuration.
In case of connection failures, the xDS Client retries the connection to the xDS server with exponential backoff and the backoff parameters are configurable.
XDS_CLIENT_BACKOFF_JITTER: set to "true" to add jitter to the exponential backoff.
XDS_CLIENT_BACKOFF_INITIAL_INTERVAL: The base amount of time the xDS client waits before retrying the connection after failure. Default: “10s”
XDS_CLIENT_BACKOFF_MAX_INTERVAL: The max backoff interval is the upper limit on the amount of time the xDS client will wait between retries. After reaching the max backoff interval, the next retries will continue using the max interval. Default: “60s”
XDS_CLIENT_BACKOFF_RANDOM_FACTOR: This is a factor by which the initial interval is multiplied to calculate the next backoff interval. Default: “0.5”
The followings are the gRPC connection options.
XDS_CLIENT_MAX_MSG_SIZE_IN_BYTES: The maximum message size in bytes that the xDS client can receive.
As well Ratelimit supports TLS connections, these can be configured using the following environment variables:
CONFIG_GRPC_XDS_SERVER_USE_TLS: set to "true" to enable a TLS connection with the xDS configuration management server.
CONFIG_GRPC_XDS_CLIENT_TLS_CERT, CONFIG_GRPC_XDS_CLIENT_TLS_KEY, and CONFIG_GRPC_XDS_SERVER_TLS_CACERT to provides files to specify a TLS connection configuration to the xDS configuration management server.
CONFIG_GRPC_XDS_SERVER_TLS_SAN: (Optional) Override the SAN value to validate from the server certificate.
When using xDS you can configure extra headers that will be added to GRPC requests to the xDS Management server.
Extra headers can be useful for providing additional authorization information. This can be configured using the following environment variable:
CONFIG_GRPC_XDS_CLIENT_ADDITIONAL_HEADERS - set to "<k1:v1>,<k2:v2>" to add multiple headers to GRPC requests.
Log Format
A centralized log collection system works better with logs in json format. JSON format avoids the need for custom parsing rules.
The Ratelimit service produces logs in a text format by default. For Example:
time="2020-09-10T17:22:35Z" level=debug msg="loading domain: messaging"
time="2020-09-10T17:22:35Z" level=debug msg="loading descriptor: key=messaging.message_type_marketing"
time="2020-09-10T17:22:35Z" level=debug msg="loading descriptor: key=messaging.message_type_marketing.to_number ratelimit={requests_per_unit=5, unit=DAY}"
time="2020-09-10T17:22:35Z" level=debug msg="loading descriptor: key=messaging.to_number ratelimit={requests_per_unit=100, unit=DAY}"
time="2020-09-10T17:21:55Z" level=warning msg="Listening for debug on ':6070'"
time="2020-09-10T17:21:55Z" level=warning msg="Listening for HTTP on ':8080'"
time="2020-09-10T17:21:55Z" level=debug msg="waiting for runtime update"
time="2020-09-10T17:21:55Z" level=warning msg="Listening for gRPC on ':8081'"
JSON Log format can be configured using the following environment variables:
LOG_FORMAT=json
Output example:
{"@message":"loading domain: messaging","@timestamp":"2020-09-10T17:22:44.926010192Z","level":"debug"}
{"@message":"loading descriptor: key=messaging.message_type_marketing","@timestamp":"2020-09-10T17:22:44.926019315Z","level":"debug"}
{"@message":"loading descriptor: key=messaging.message_type_marketing.to_number ratelimit={requests_per_unit=5, unit=DAY}","@timestamp":"2020-09-10T17:22:44.926037174Z","level":"debug"}
{"@message":"loading descriptor: key=messaging.to_number ratelimit={requests_per_unit=100, unit=DAY}","@timestamp":"2020-09-10T17:22:44.926048993Z","level":"debug"}
{"@message":"Listening for debug on ':6070'","@timestamp":"2020-09-10T17:22:44.926113905Z","level":"warning"}
{"@message":"Listening for gRPC on ':8081'","@timestamp":"2020-09-10T17:22:44.926182006Z","level":"warning"}
{"@message":"Listening for HTTP on ':8080'","@timestamp":"2020-09-10T17:22:44.926227031Z","level":"warning"}
{"@message":"waiting for runtime update","@timestamp":"2020-09-10T17:22:44.926267808Z","level":"debug"}
GRPC Keepalive
Client-side GRPC DNS re-resolution in scenarios with auto scaling enabled might not work as expected and the current workaround is to configure connection keepalive on server-side.
The behavior can be fixed by configuring the following env variables for the ratelimit server:
GRPC_MAX_CONNECTION_AGE: a duration for the maximum amount of time a connection may exist before it will be closed by sending a GoAway. A random jitter of +/-10% will be added to MaxConnectionAge to spread out connection storms.
GRPC_MAX_CONNECTION_AGE_GRACE: an additive period after MaxConnectionAge after which the connection will be forcibly closed.
Health-check
Health check status is determined internally by individual components.
Currently, we have three components that determine the overall health status of the rate limit service.
Each of the individual component’s health needs to be healthy for the overall to report healthy.
Some components may be turned OFF via configurations so overall health is not effected by that component’s health status.
Redis health (Turned ON. Defaults to healthy)
Configuration status (Turned OFF unless configured to be ON via HEALTHY_WITH_AT_LEAST_ONE_CONFIG_LOADED see below section. Defaults to unhealthy)
If the environment variable is enabled then, it will start in an unhealthy state and become healthy when at least one config is loaded. If we later fail to load any configs, it will go unhealthy again.
Sigterm (Turned ON. Defaults to healthy)
Turns unhealthy if receives sigterm signal
All components needs to be healthy for overall health to be healthy.
Health-check configurations
Health check can be configured to check if rate-limit configurations are loaded using the following environment variable.
If HEALTHY_WITH_AT_LEAST_ONE_CONFIG_LOADED is enabled then health check will start as unhealthy and becomes healthy if
it detects at least one domain is loaded with the config. If it detects no config again then it will change to unhealthy.
GRPC server
By default the ratelimit gRPC server binds to 0.0.0.0:8081. To change this set
GRPC_HOST and/or GRPC_PORT. If you want to run the server on a unix domain
socket then set GRPC_UDS, e.g. GRPC_UDS=/<dir>/ratelimit.sock and leave
GRPC_HOST and GRPC_PORT unmodified.
Request Fields
For information on the fields of a Ratelimit gRPC request please read the information
on the RateLimitRequest message type in the Ratelimit proto file.
GRPC Client
The gRPC client will interact with ratelimit server and tell you if the requests are over limit.
Commandline flags
-dial_string: used to specify the address of ratelimit server. It defaults to localhost:8081.
-domain: used to specify the domain.
-descriptors: used to specify one descriptor. You can pass multiple descriptors like following:
go run main.go -domain test \
-descriptors name=foo,age=14 -descriptors name=bar,age=18
Global ShadowMode
There is a global shadow-mode which can make it easier to introduce rate limiting into an existing service landscape. It will override whatever result is returned by the regular rate limiting process.
Configuration
The global shadow mode is configured with an environment variable
Setting environment variable SHADOW_MODE to true will enable the feature.
Statistics
There is an additional service-level statistics generated that will increment whenever the global shadow mode has overridden a rate limiting result.
Statistics
The rate limit service generates various statistics for each configured rate limit rule that will be useful for end
users both for visibility and for setting alarms. Ratelimit uses gostats as its statistics library. Please refer
to gostats’ documentation for more information on the library.
Statistics default to using StatsD and configured via the env vars from gostats.
To output statistics to stdout instead, set env var USE_STATSD to false
Configure statistics output frequency with STATS_FLUSH_INTERVAL, where the type is time.Duration, e.g. 10s is the default value.
To disable statistics entirely, set env var DISABLE_STATS to true
As specified in the domain value in the YAML runtime file
KEY_VALUE:
A combination of the key value
Nested descriptors would be suffixed in the stats path
The default mode is that the value-part is omitted if the rule that matches is a descriptor without a value. Specifying the detailed_metric configuration parameter changes this behavior and creates a unique metric even in this situation.
STAT:
near_limit: Number of rule hits over the NearLimit ratio threshold (currently 80%) but under the threshold rate.
over_limit: Number of rule hits exceeding the threshold rate
total_hits: Number of rule hits in total
shadow_mode: Number of rule hits where shadow_mode would trigger and override the over_limit result
To use a custom near_limit ratio threshold, you can specify with NEAR_LIMIT_RATIO environment variable. It defaults to 0.8 (0-1 scale). These are examples of generated stats for some configured rate limit rules from the above examples:
EXTRA_TAGS: set to "<k1:v1>,<k2:v2>" to tag all emitted stats with the provided tags. You might want to tag build commit or release version, for example.
PROMETHEUS_ADDR: The port to listen on for Prometheus metrics. Defaults to :9090
PROMETHEUS_PATH: The path to listen on for Prometheus metrics. Defaults to /metrics
PROMETHEUS_MAPPER_YAML: The path to the YAML file that defines the mapping from statsd to prometheus metrics.
PROMETHEUS_RESPONSE_TIME_AS_MILLISECONDS: true to keep the legacy millisecond behavior for ratelimit_server.*.response_time in the built-in mapper. Ignored when PROMETHEUS_MAPPER_YAML is set.
Define the mapping from statsd to prometheus metrics in a YAML file.
Find more information about the mapping in the Metric Mapping and Configuration.
The default setting is:
The debug port can be used to interact with the running process.
$ curl 0:6070/
/debug/pprof/: root of various pprof endpoints. hit for help.
/rlconfig: print out the currently loaded configuration for debugging
/stats: print out stats
You can specify the debug server address with the DEBUG_HOST and DEBUG_PORT environment variables. They currently default to 0.0.0.0 and 6070 respectively.
Local Cache
Ratelimit optionally uses freecache as its local caching layer, which stores the over-the-limit cache keys, and thus avoids reading the
redis cache again for the already over-the-limit keys. The local cache size can be configured via LocalCacheSizeInBytes in the settings.
If LocalCacheSizeInBytes is 0, local cache is disabled.
Redis
Ratelimit uses Redis as its caching layer. Ratelimit supports two operation modes:
One Redis server for all limits.
Two Redis instances: one for per second limits and another one for all other limits.
As well Ratelimit supports TLS connections and authentication. These can be configured using the following environment variables:
REDIS_TLS & REDIS_PERSECOND_TLS: set to "true" to enable a TLS connection for the specific connection type.
REDIS_TLS_CLIENT_CERT, REDIS_TLS_CLIENT_KEY, and REDIS_TLS_CACERT to provides files to specify a TLS connection configuration to Redis server that requires client certificate verification. (This is effective when REDIS_TLS or REDIS_PERSECOND_TLS is set to to "true").
REDIS_TLS_SKIP_HOSTNAME_VERIFICATION set to "true" will skip hostname verification in environments where the certificate has an invalid hostname, such as GCP Memorystore.
REDIS_AUTH & REDIS_PERSECOND_AUTH: set to "password" to enable password-only authentication to the Redis master/replica nodes.
REDIS_AUTH & REDIS_PERSECOND_AUTH: set to "username:password" to enable username-password authentication to the Redis master/replica nodes.
REDIS_SENTINEL_AUTH & REDIS_PERSECOND_SENTINEL_AUTH: set to "password" or "username:password" to enable authentication to Redis Sentinel nodes. This is separate from REDIS_AUTH/REDIS_PERSECOND_AUTH which authenticate to the Redis master/replica nodes. Only used when REDIS_TYPE or REDIS_PERSECOND_TYPE is set to "sentinel". If not set, no authentication will be attempted when connecting to Sentinel nodes.
CACHE_KEY_PREFIX: a string to prepend to all cache keys
For controlling the behavior of cache key incrementation when any of them is already over the limit, you can use the following configuration:
STOP_CACHE_KEY_INCREMENT_WHEN_OVERLIMIT: Set this configuration to true to disallow key incrementation when one of the keys is already over the limit.
STOP_CACHE_KEY_INCREMENT_WHEN_OVERLIMIT is useful when multiple descriptors are included in a single request. Setting this to true can prevent the incrementation of other descriptors’ counters if any of the descriptors is already over the limit.
Redis type
Ratelimit supports different types of redis deployments:
The deployment type can be specified with the REDIS_TYPE / REDIS_PERSECOND_TYPE environment variables. Depending on the type defined, the REDIS_URL and REDIS_PERSECOND_URL are expected to have the following formats:
“single”: Depending on the socket type defined, either a single hostname:port pair or a unix domain socket reference.
“sentinel”: A comma separated list with the first string as the master name of the sentinel cluster followed by hostname:port pairs. The list size should be >= 2. The first item is the name of the master and the rest are the sentinels.
“cluster”: A comma separated list of hostname:port pairs with all the nodes in the cluster.
Connection Pool Settings
Pool Size
REDIS_POOL_SIZE: the number of connections to keep in the pool. Default: 10
REDIS_PERSECOND_POOL_SIZE: pool size for per-second Redis. Default: 10
Connection Timeout
Controls the maximum duration for Redis connection establishment, read operations, and write operations.
REDIS_TIMEOUT: sets the timeout for Redis connection and I/O operations. Default: 10s
REDIS_PERSECOND_TIMEOUT: sets the timeout for per-second Redis connection and I/O operations. Default: 10s
Pool On-Empty Behavior
Controls what happens when all connections in the pool are in use and a new request arrives.
REDIS_POOL_ON_EMPTY_BEHAVIOR: controls what happens when the pool is empty. Default: CREATE
CREATE: create a new overflow connection after waiting for REDIS_POOL_ON_EMPTY_WAIT_DURATION. This is the default radix behavior.
ERROR: return an error after waiting for REDIS_POOL_ON_EMPTY_WAIT_DURATION. This enforces a strict pool size limit.
WAIT: block until a connection becomes available. This enforces a strict pool size limit but may cause goroutine buildup.
REDIS_POOL_ON_EMPTY_WAIT_DURATION: the duration to wait before taking the configured action (CREATE or ERROR). Default: 1s
REDIS_PERSECOND_POOL_ON_EMPTY_BEHAVIOR: same as above for per-second Redis pool. Default: CREATE
REDIS_PERSECOND_POOL_ON_EMPTY_WAIT_DURATION: same as above for per-second Redis pool. Default: 1s
Pipelining
By default, for each request, ratelimit will pick up a connection from pool, write multiple redis commands in a single write then reads their responses in a single read. This reduces network delay.
For high throughput scenarios, ratelimit supports write buffering via radix v4’s WriteFlushInterval. It can be configured using the following environment variables:
REDIS_PIPELINE_WINDOW & REDIS_PERSECOND_PIPELINE_WINDOW: controls how often buffered writes are flushed to the network connection. When set to a non-zero value (e.g., 150us-500us), radix v4 will buffer multiple concurrent write operations and flush them together, reducing system calls and improving throughput. If zero, each write is flushed immediately. Required for Redis Cluster mode.
REDIS_PIPELINE_LIMIT & REDIS_PERSECOND_PIPELINE_LIMIT: DEPRECATED - These settings have no effect in radix v4. Write buffering is controlled solely by the window settings above.
Write buffering is disabled by default (window = 0). For optimal performance, set REDIS_PIPELINE_WINDOW to 150us-500us depending on your latency requirements and load patterns.
One Redis Instance
To configure one Redis instance use the following environment variables:
REDIS_SOCKET_TYPE
REDIS_URL
REDIS_POOL_SIZE
REDIS_TYPE (optional)
This setup will use the same Redis server for all limits.
Two Redis Instances
To configure two Redis instances use the following environment variables:
REDIS_SOCKET_TYPE
REDIS_URL
REDIS_POOL_SIZE
REDIS_PERSECOND: set this to "true".
REDIS_PERSECOND_SOCKET_TYPE
REDIS_PERSECOND_URL
REDIS_PERSECOND_POOL_SIZE
REDIS_PERSECOND_TYPE (optional)
This setup will use the Redis server configured with the _PERSECOND_ vars for
per second limits, and the other Redis server for all other limits.
Health Checking for Redis Active Connection
To configure whether to return health check failure if there is no active redis connection
REDIS_HEALTH_CHECK_ACTIVE_CONNECTION : (default is “false”)
Memcache
Experimental Memcache support has been added as an alternative to Redis in v1.5.
To configure a Memcache instance use the following environment variables instead of the Redis variables:
MEMCACHE_HOST_PORT=: a comma separated list of hostname:port pairs for memcache nodes (mutually exclusive with MEMCACHE_SRV)
MEMCACHE_SRV=: an SRV record to lookup hosts from (mutually exclusive with MEMCACHE_HOST_PORT)
MEMCACHE_SRV_REFRESH=0: refresh the list of hosts every n seconds, if 0 no refreshing will happen, supports duration suffixes: “ns”, “us” (or “µs”), “ms”, “s”, “m”, “h”.
BACKEND_TYPE=memcache
CACHE_KEY_PREFIX: a string to prepend to all cache keys
MEMCACHE_MAX_IDLE_CONNS=2: the maximum number of idle TCP connections per memcache node, 2 is the default of the underlying library
MEMCACHE_TLS: set to "true" to connect to the server with TLS.
MEMCACHE_TLS_CLIENT_CERT, MEMCACHE_TLS_CLIENT_KEY, and MEMCACHE_TLS_CACERT to provide files that parameterize the memcache client TLS connection configuration.
MEMCACHE_TLS_SKIP_HOSTNAME_VERIFICATION set to "true" will skip hostname verification in environments where the certificate has an invalid hostname.
With memcache mode increments will happen asynchronously, so it’s technically possible for
a client to exceed quota briefly if multiple requests happen at exactly the same time.
Note that Memcache has a max key length of 250 characters, so operations referencing very long
descriptors will fail. Descriptors sent to Memcache should not contain whitespaces or control characters.
When using multiple memcache nodes in MEMCACHE_HOST_PORT=, one should provide the identical list of memcache nodes
to all ratelimiter instances to ensure that a particular cache key is always hashed to the same memcache node.
Custom headers
Ratelimit service can be configured to return custom headers with the ratelimit information. It will populate the response_headers_to_add as part of the RateLimitResponse.
The following environment variables control the custom response feature:
LIMIT_RESPONSE_HEADERS_ENABLED - Enables the custom response headers
LIMIT_LIMIT_HEADER - The default value is “RateLimit-Limit”, setting the environment variable will specify an alternative header name
LIMIT_REMAINING_HEADER - The default value is “RateLimit-Remaining”, setting the environment variable will specify an alternative header name
LIMIT_RESET_HEADER - The default value is “RateLimit-Reset”, setting the environment variable will specify an alternative header name
Tracing
Ratelimit service supports exporting spans in OLTP format. See OpenTelemetry for more information.
The following environment variables control the tracing feature:
TRACING_ENABLED - Enables the tracing feature. Only “true” and “false”(default) are allowed in this field.
TRACING_EXPORTER_PROTOCOL - Controls the protocol of exporter in tracing feature. Only “http”(default) and “grpc” are allowed in this field.
TRACING_SERVICE_NAME - Controls the service name appears in tracing span. The default value is “RateLimit”.
TRACING_SERVICE_NAMESPACE - Controls the service namespace appears in tracing span. The default value is empty.
TRACING_SERVICE_INSTANCE_ID - Controls the service instance id appears in tracing span. It is recommended to put the pod name or container name in this field. The default value is a randomly generated version 4 uuid if unspecified.
Other fields in OTLP Exporter Documentation. These section needs to be correctly configured in order to enable the exporter to export span to the correct destination.
TRACING_SAMPLING_RATE - Controls the sampling rate, defaults to 1 which means always sample. Valid range: 0.0-1.0. For high volume services, adjusting the sampling rate is recommended.
You may use the following commands to quickly setup a openTelemetry collector together with a Jaeger all-in-one binary for quickstart:
envoy-announce: Low frequency mailing
list where we will email announcements only.
envoy-users: General user discussion.
Please add [ratelimit] to the email subject.
envoy-dev: Envoy developer discussion (APIs,
feature design, etc.). Please add [ratelimit] to the email subject.
Slack: Slack, to get invited go here.
We have the IRC/XMPP gateways enabled if you prefer either of those. Once an account is created,
connection instructions for IRC/XMPP can be found here.
The #ratelimit-users channel is used for discussions about the ratelimit service.
Overview
The rate limit service is a Go/gRPC service designed to enable generic rate limit scenarios from different types of applications. Applications request a rate limit decision based on a domain and a set of descriptors. The service reads the configuration from disk via runtime, composes a cache key, and talks to the Redis cache. A decision is then returned to the caller.
Docker Image
For every main commit, an image is pushed to Dockerhub. There is currently no versioning (post v1.4.0) and tags are based on commit sha.
Distroless Base Image
The Docker image uses Google’s distroless base image (
gcr.io/distroless/static-debian12:nonroot) for enhanced security and minimal attack surface. Distroless images contain only the application and its runtime dependencies, omitting unnecessary OS components like package managers, shells, and other utilities.The image is pinned to a specific SHA digest for deterministic builds and uses the
nonrootvariant to run as a non-privileged user, following security best practices.Benefits of Distroless:
Debugging with Distroless:
For debugging purposes, you can use the debug variant of the distroless image:
This provides shell access and debugging tools while maintaining the security benefits of distroless.
Supported Envoy APIs
v3 rls.proto is currently supported. Support for v2 rls proto is now deprecated.
API Deprecation History
v1.0.0tagged on commit0ded92a2af8261d43096eba4132e45b99a3b8b14. Ratelimit has been in production use at Lyft for over 2 years.v1.1.0introduces the data-plane-api proto and initiates the deprecation of the legacy ratelimit.proto.e91321bcommit deleted support for the legacy ratelimit.proto. The current version of ratelimit protocol is changed to v3 rls.proto while v2 rls.proto is still supported as a legacy protocol.4bb32826deleted support for legacy v2 rls.protoBuilding and Testing
Install Redis-server.
Make sure go is setup correctly and checkout rate limit service into your go path. More information about installing go here.
In order to run the integration tests using a local Redis server please run two Redis-server instances: one on port
6379and another on port6380To setup for the first time (only done once):
To compile:
Ensure you set the correct platform if running OSX host with a linux container e.g.
To compile and run tests:
To run the server locally using some sensible default settings you can do this (this will setup the server to read the configuration files from the path you specify):
Docker-compose setup
The docker-compose setup uses a distroless-based container for the ratelimit service. In order to run the docker-compose setup from the root of the repo, run
The ratelimit service is built using the main Dockerfile which uses Google’s distroless base image for enhanced security and minimal attack surface. The distroless image contains only the application and its runtime dependencies, omitting unnecessary OS components like package managers and shells.
If you want to run with two redis instances, you will need to modify the docker-compose.yml file to run a second redis container, and change the environment variables as explained in the two redis instances section.
Full test environment - Configure rate limits through files
To run a fully configured environment to demo Envoy based rate limiting, run:
This will run ratelimit, redis, prom-statsd-exporter and two Envoy containers such that you can demo rate limiting by hitting the below endpoints.
Edit
examples/ratelimit/config/example.yamlto test different rate limit configs. Hot reloading is enabled.The descriptors in
example.yamland the actions inexamples/envoy/proxy.yamlshould give you a good idea on how to configure rate limits.To see the metrics in the example
Full test environment - Configure rate limits through an xDS Management Server
To run a fully configured environment to demo Envoy based rate limiting, run:
This will run in
xds-configdocker-compose profile which will run example xDS-Server, ratelimit, redis, prom-statsd-exporter and two Envoy containers such that you can demo rate limiting by hitting the below endpoints.Edit
examples/xds-sotw-config-server/resource.goto test different rate limit configs.To see the metrics in the example
Self-contained end-to-end integration test
Integration tests are coded as bash-scripts in
integration-test/scripts.The test suite will spin up a docker-compose environment from
integration-test/docker-compose-integration-test.ymlIf the test suite fails it will exit with code 1.
Configuration
The configuration format
The rate limit configuration file format is YAML (mainly so that comments are supported).
Definitions
Descriptor list definition
Each configuration contains a top level descriptor list and potentially multiple nested lists beneath that. The format is:
Each descriptor in a descriptor list must have a key. It can also optionally have a value to enable a more specific match. The “rate_limit” block is optional and if present sets up an actual rate limit rule. See below for how the rule is defined. If the rate limit is not present and there are no nested descriptors, then the descriptor is effectively whitelisted. Otherwise, nested descriptors allow more complex matching and rate limiting scenarios.
Rate limit definition
The rate limit block specifies the actual rate limit that will be used when there is a match. Currently the service supports per second, minute, hour, and day limits. More types of limits may be added in the future based on user demand.
Replaces
The replaces key indicates that this descriptor will replace the configuration set by another descriptor.
If there is a rule being evaluated, and multiple descriptors can apply, the replaces descriptor will drop evaluation of the descriptor which it is replacing.
To enable this, any descriptor which should potentially be replaced by another should have a name keyword in the rate_limit section, and any descriptor which should potentially replace the original descriptor should have a name keyword in its respective replaces section. Whenever limits match to both rules, only the rule which replaces the original will take effect, and the limit of the original will not be changed after evaluation.
For example, let’s say you have a bunch of endpoints and each is classified under read or write, with read having a certain limit and write having another. Each user has a certain limit for both endpoints. However, let’s say that you want to increase a user’s limit to a single read endpoint. The only option without using replaces would be to increase their limit for the read category. The replaces keyword allows increasing the limit of a single endpoint in this case.
ShadowMode
A shadow_mode key in a rule indicates that whatever the outcome of the evaluation of the rule, the end-result will always be “OK”.
When a block is in ShadowMode all functions of the rate limiting service are executed as normal, with cache-lookup and statistics
An additional statistic is added to keep track of how many times a key with “shadow_mode” has overridden result.
There is also a Global Shadow Mode
Including detailed metrics for unspecified values
Setting the
detailed_metric: truefor a descriptor will extend the metrics that are produced. Normally a descriptor that matches a value that is not explicitly listed in the configuration will from a metrics point-of-view be rolled-up into the base entry. This can be problematic if you want to have those details available for analysis.NB! This should only be enabled in situations where the potentially large cardinality of metrics that this can lead to is acceptable.
Including descriptor values in metrics
Setting
value_to_metric: true(default:false) for a descriptor will include the descriptor’s runtime value in the metric key, even when the descriptor value is not explicitly defined in the configuration. This allows you to track metrics per descriptor value when the value comes from the runtime request, providing visibility into different rate limit scenarios without needing to pre-define every possible value.Note: If a value is explicitly specified in a descriptor (e.g.,
value: "GET"), that value is always included in the metric key regardless of thevalue_to_metricsetting. Thevalue_to_metricflag only affects descriptors where the value is not explicitly defined in the configuration.When combined with wildcard matching, the full runtime value is included in the metric key, not just the wildcard prefix. This feature works independently of
detailed_metric- whendetailed_metricis set, it takes precedence andvalue_to_metricis ignored.Sharing thresholds for wildcard matches
Setting
share_threshold: true(default:false) for a descriptor with a wildcard value (ending with*) allows all values matching that wildcard to share the same rate limit threshold, instead of using isolated thresholds for each matching value.This is useful when you want to apply a single rate limit across multiple resources that match a wildcard pattern. For example, if you have a rule for
files/*, bothfiles/a.pdfandfiles/b.csvwill share the same threshold whenshare_threshold: trueis set.Important notes:
share_thresholdcan only be used with wildcard values (values ending with*)share_threshold: trueis enabled, all matching values share the same cache key and rate limit countershare_threshold: false(or not set), each matching value has its own isolated thresholdvalue_to_metric: true, the metric key includes the wildcard prefix (the part before*) instead of the full runtime value, to reflect that values are sharing a thresholddetailed_metric: true, the metric key also includes the wildcard prefix for entries withshare_thresholdenabledExamples
Example 1
Let’s start with a simple example:
In the configuration above the domain is “mongo_cps” and we setup 2 different rate limits in the top level descriptor list. Each of the limits have the same key (“database”). They have a different value (“users”, and “default”), and each of them setup a 500 request per second rate limit.
Example 2
A slightly more complex example:
In the preceding example, the domain is “messaging” and we setup two different scenarios that illustrate more complex functionality. First, we want to limit on marketing messages to a specific number. To enable this, we make use of nested descriptor lists. The top level descriptor is (“message_type”, “marketing”). However this descriptor does not have a limit assigned so it’s just a placeholder. Contained within this entry we have another descriptor list that includes an entry with key “to_number”. However, notice that no value is provided. This means that the service will match against any value supplied for “to_number” and generate a unique limit. Thus, (“message_type”, “marketing”), (“to_number”, “2061111111”) and (“message_type”, “marketing”),(“to_number”, “2062222222”) will each get 5 requests per day.
The configuration also sets up another rule without a value. This one creates an overall limit for messages sent to any particular number during a 1 day period. Thus, (“to_number”, “2061111111”) and (“to_number”, “2062222222”) both get 100 requests per day.
When calling the rate limit service, the client can specify multiple descriptors to limit on in a single call. This limits round trips and allows limiting on aggregate rule definitions. For example, using the preceding configuration, the client could send this complete request (in pseudo IDL):
And the service will rate limit against all matching rules and return an aggregate result; a logical OR of all the individual rate limit decisions.
Example 3
An example to illustrate matching order.
In the preceding example, we setup a generic rate limit for individual IP addresses. The architecture’s edge proxy can be configured to make a rate limit service call with the descriptor
("remote_address", "50.0.0.1")for example. This IP would get 10 requests per second as would any other IP. However, the configuration also contains a second configuration that explicitly defines a value along with the same key. If the descriptor("remote_address", "50.0.0.5")is received, the service will attempt the most specific match possible. This means the most specific descriptor at the same level as your request. Thus, key/value is always attempted as a match before just key.Example 4
The Ratelimit service matches requests to configuration entries with the same level, i.e same number of tuples in the request’s descriptor as nested levels of descriptors in the configuration file. For instance, the following request:
Would not match the following configuration. Even though the first descriptor in the request matches the 1st level descriptor in the configuration, the request has two tuples in the descriptor.
However, it would match the following configuration:
Example 5
We can also define unlimited rate limit descriptors:
For an unlimited descriptor, the request will not be sent to the underlying cache (Redis/Memcached), but will be quickly returned locally by the ratelimit instance. This can be useful for collecting statistics, or if one wants to define a descriptor that has no limit but the client wants to distinguish between such descriptor and one that does not exist.
The return value for unlimited descriptors will be an OK status code with the LimitRemaining field set to MaxUint32 value.
Example 6
A rule using shadow_mode is useful for soft-launching rate limiting. In this example
user-aof theauth-servicewould not get rate-limited regardless of the rate of requests, there would however be statistics related to the breach of the configured limit of 10 req / sec.user-bwould be limited to 20 req / sec however.Example 7
When the replaces keyword is used, that limit will replace any limit which has the name being replaced as its name, and the original descriptor’s limit will not be affected.
In the example below, the following limits will apply:
Example 8
In this example we demonstrate how a descriptor without a specified value is configured to override the default behavior and include the matched-value in the metrics.
Rate limiting configuration and tracking works as normally
The metrics keys will be the following:
“key1_unspecified_value” “key1_unspecified_value2” “key1_value1”
rather than the normal “key1” “key1_value1”
Example 9
Value supports wildcard matching using
*, which can appear at any position — trailing, middle, or multiple times. Each*matches zero or more characters.Trailing wildcard — matches any value starting with the given prefix:
Matches
value1,value2,valueXYZ, etc.Middle wildcard — matches values with a fixed prefix and suffix:
Matches
/api/123/action,/api/user-id/action. Does not match/api/123/other.Multiple wildcards — each
*matches an independent segment, in order:Matches
/api/v1/resource/123/action,/api/v2/resource/456/action.Example 10
Using
value_to_metric: trueto include descriptor values in metrics when values are not explicitly defined in the configuration:With this configuration, requests with different runtime values for
routeandhttp_methodwill generate separate metrics:Request:
route=api,http_method=GET,subject_id=123Metric key:
example10.route_api.http_method_GET.subject_idRequest:
route=web,http_method=POST,subject_id=456Metric key:
example10.route_web.http_method_POST.subject_idWithout
value_to_metric: true, both requests would use the same metric key:example10.route.http_method.subject_id.When combined with wildcard matching, the full runtime value is included:
user=alice,action=readfile,resource=documentsexample10_wildcard.user_alice.action_readfile.resourceNote: When
detailed_metric: trueis set on a descriptor, it takes precedence andvalue_to_metricis ignored for that descriptor.Example 11
Using
share_threshold: trueto share rate limits across wildcard matches:With this configuration:
files/a.pdf,files/b.csv, andfiles/c.txtall share the same threshold of 10 requests per hourfiles/a.pdfand 5 requests forfiles/b.csv, a request forfiles/c.txtwill be rate limited (OVER_LIMIT) because the shared threshold of 10 has been reachedfiles_no_share/a.pdfandfiles_no_share/b.csveach have their own isolated threshold of 10 requests per hourfiles_no_share/a.pdf(exhausting its quota), requests forfiles_no_share/b.csvwill still be allowed (up to 10 requests)Combining
share_thresholdwithvalue_to_metric:route=api/v1,method=GETexample11_metrics.route_api.method_GET(includes the wildcard prefixapiinstead of the full valueapi/v1)This reflects that all
api/*routes share the same threshold, while still providing visibility into which API routes are being accessed.Loading Configuration
Rate limit service supports following configuration loading methods. You can define which methods to use by configuring environment variable
CONFIG_TYPE.CONFIG_TYPEFILE(Default)GRPC_XDS_SOTWWhen the environment variable
FORCE_START_WITHOUT_INITIAL_CONFIGset tofalse, the Rate limit service will wait for initial rate limit configuration before starting the server (gRPC, Rest server endpoints). When set totruethe server will start even without initial configuration.File Based Configuration Loading
The Ratelimit service uses a library written by Lyft called goruntime to do configuration loading. Goruntime monitors a designated path, and watches for symlink swaps to files in the directory tree to reload configuration files.
The path to watch can be configured via the settings package with the following environment variables:
Configuration files are loaded from RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/*.yaml
There are two methods for triggering a configuration reload:
RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/directly.The former is the default behavior. To use the latter method, set the
RUNTIME_WATCH_ROOTenvironment variable tofalse.The following filesystem operations on configuration files inside
RUNTIME_ROOT/RUNTIME_SUBDIRECTORY/RUNTIME_APPDIRECTORY/will force a reload of all config files:For more information on how runtime works you can read its README.
By default it is not possible to define multiple configuration files within
RUNTIME_SUBDIRECTORYreferencing the same domain. To enable this behavior setMERGE_DOMAIN_CONFIGtotrue.xDS Management Server Based Configuration Loading
xDS Management Server is a gRPC server which implements the Aggregated Discovery Service (ADS). The xDS Management server serves Discovery Response with Ratelimit Configuration Resources and with Type URL
"type.googleapis.com/ratelimit.config.ratelimit.v3.RateLimitConfig".The xDS client in the Rate limit service configure Rate limit service with the provided configuration. In case of connection failures, the xDS Client retries the connection to the xDS server with exponential backoff and the backoff parameters are configurable.
XDS_CLIENT_BACKOFF_JITTER: set to"true"to add jitter to the exponential backoff.XDS_CLIENT_BACKOFF_INITIAL_INTERVAL: The base amount of time the xDS client waits before retrying the connection after failure. Default: “10s”XDS_CLIENT_BACKOFF_MAX_INTERVAL: The max backoff interval is the upper limit on the amount of time the xDS client will wait between retries. After reaching the max backoff interval, the next retries will continue using the max interval. Default: “60s”XDS_CLIENT_BACKOFF_RANDOM_FACTOR: This is a factor by which the initial interval is multiplied to calculate the next backoff interval. Default: “0.5”The followings are the gRPC connection options.
XDS_CLIENT_MAX_MSG_SIZE_IN_BYTES: The maximum message size in bytes that the xDS client can receive.For more information on xDS protocol please refer to the envoy proxy documentation.
You can refer to the sample xDS configuration management server.
The xDS server for listening for configuration can be set via settings package with the following environment variables:
As well Ratelimit supports TLS connections, these can be configured using the following environment variables:
CONFIG_GRPC_XDS_SERVER_USE_TLS: set to"true"to enable a TLS connection with the xDS configuration management server.CONFIG_GRPC_XDS_CLIENT_TLS_CERT,CONFIG_GRPC_XDS_CLIENT_TLS_KEY, andCONFIG_GRPC_XDS_SERVER_TLS_CACERTto provides files to specify a TLS connection configuration to the xDS configuration management server.CONFIG_GRPC_XDS_SERVER_TLS_SAN: (Optional) Override the SAN value to validate from the server certificate.When using xDS you can configure extra headers that will be added to GRPC requests to the xDS Management server. Extra headers can be useful for providing additional authorization information. This can be configured using the following environment variable:
CONFIG_GRPC_XDS_CLIENT_ADDITIONAL_HEADERS- set to"<k1:v1>,<k2:v2>"to add multiple headers to GRPC requests.Log Format
A centralized log collection system works better with logs in json format. JSON format avoids the need for custom parsing rules. The Ratelimit service produces logs in a text format by default. For Example:
JSON Log format can be configured using the following environment variables:
Output example:
GRPC Keepalive
Client-side GRPC DNS re-resolution in scenarios with auto scaling enabled might not work as expected and the current workaround is to configure connection keepalive on server-side. The behavior can be fixed by configuring the following env variables for the ratelimit server:
GRPC_MAX_CONNECTION_AGE: a duration for the maximum amount of time a connection may exist before it will be closed by sending a GoAway. A random jitter of +/-10% will be added to MaxConnectionAge to spread out connection storms.GRPC_MAX_CONNECTION_AGE_GRACE: an additive period after MaxConnectionAge after which the connection will be forcibly closed.Health-check
Health check status is determined internally by individual components. Currently, we have three components that determine the overall health status of the rate limit service. Each of the individual component’s health needs to be healthy for the overall to report healthy. Some components may be turned OFF via configurations so overall health is not effected by that component’s health status.
HEALTHY_WITH_AT_LEAST_ONE_CONFIG_LOADEDsee below section. Defaults to unhealthy)Health-check configurations
Health check can be configured to check if rate-limit configurations are loaded using the following environment variable.
If
HEALTHY_WITH_AT_LEAST_ONE_CONFIG_LOADEDis enabled then health check will start as unhealthy and becomes healthy if it detects at least one domain is loaded with the config. If it detects no config again then it will change to unhealthy.GRPC server
By default the ratelimit gRPC server binds to
0.0.0.0:8081. To change this setGRPC_HOSTand/orGRPC_PORT. If you want to run the server on a unix domain socket then setGRPC_UDS, e.g.GRPC_UDS=/<dir>/ratelimit.sockand leaveGRPC_HOSTandGRPC_PORTunmodified.Request Fields
For information on the fields of a Ratelimit gRPC request please read the information on the RateLimitRequest message type in the Ratelimit proto file.
GRPC Client
The gRPC client will interact with ratelimit server and tell you if the requests are over limit.
Commandline flags
-dial_string: used to specify the address of ratelimit server. It defaults tolocalhost:8081.-domain: used to specify the domain.-descriptors: used to specify one descriptor. You can pass multiple descriptors like following:Global ShadowMode
There is a global shadow-mode which can make it easier to introduce rate limiting into an existing service landscape. It will override whatever result is returned by the regular rate limiting process.
Configuration
The global shadow mode is configured with an environment variable
Setting environment variable
SHADOW_MODEtotruewill enable the feature.Statistics
There is an additional service-level statistics generated that will increment whenever the global shadow mode has overridden a rate limiting result.
Statistics
The rate limit service generates various statistics for each configured rate limit rule that will be useful for end users both for visibility and for setting alarms. Ratelimit uses gostats as its statistics library. Please refer to gostats’ documentation for more information on the library.
Statistics default to using StatsD and configured via the env vars from gostats.
To output statistics to stdout instead, set env var
USE_STATSDtofalseConfigure statistics output frequency with
STATS_FLUSH_INTERVAL, where the type istime.Duration, e.g.10sis the default value.To disable statistics entirely, set env var
DISABLE_STATStotrueRate Limit Statistic Path:
DOMAIN:
KEY_VALUE:
The default mode is that the value-part is omitted if the rule that matches is a descriptor without a value. Specifying the
detailed_metricconfiguration parameter changes this behavior and creates a unique metric even in this situation.STAT:
To use a custom near_limit ratio threshold, you can specify with
NEAR_LIMIT_RATIOenvironment variable. It defaults to0.8(0-1 scale). These are examples of generated stats for some configured rate limit rules from the above examples:Statistics options
EXTRA_TAGS: set to"<k1:v1>,<k2:v2>"to tag all emitted stats with the provided tags. You might want to tag build commit or release version, for example.DogStatsD
To enable dogstatsd integration set:
USE_DOG_STATSD:trueto use DogStatsDdogstatsd also enables so called
mogrifierswhich can convert from traditional stats tags into a combination of stat name and tags.To enable mogrifiers, set a comma-separated list of them in
DOG_STATSD_MOGRIFIERS.e.g.
USE_DOG_STATSD_MOGRIFIERS:FOO,BARFor each mogrifier, define variables that declare the mogrification
DOG_STATSD_MOGRIFIERS_%s_PATTERN: The regex pattern to match onDOG_STATSD_MOGRIFIERS_%s_NAME: The name of the metric to emit. Can contain variables.DOG_STATSD_MOGRIFIERS_%s_TAGS: Comma-separated list of tags to emit. Can contain variables.Variables within mogrifiers are strings such as
$1,$2,$3which can be used to reference a match group from the regex pattern.Example
In the example below we will set mogrifier DOMAIN to adjust
some.original.metric.TAGtosome.original.metricwith tagdomain:TAGFirst enable a single mogrifier:
USE_DOG_STATSD_MOGRIFIERS:DOMAINThen, declare the rules for the
DOMAINmodifier:DOG_STATSD_MOGRIFIER_DOMAIN_PATTERN:^some\.original\.metric\.(.*)$DOG_STATSD_MOGRIFIER_DOMAIN_NAME:some.original.metricDOG_STATSD_MOGRIFIER_DOMAIN_TAGS:domain:$1Continued example:
Let’s also set another mogrifier which outputs the hits metrics with a domain and descriptor tag
First, enable an extra mogrifier:
USE_DOG_STATSD_MOGRIFIERS:DOMAIN,HITSThen, declare additional rules for the
DESCRIPTORmogrifierDOG_STATSD_MOGRIFIER_HITS_PATTERN:^ratelimit\.service\.rate_limit\.([^.]+)\.(.*)\.([^.]+)$DOG_STATSD_MOGRIFIER_HITS_NAME:ratelimit.service.rate_limit.$3DOG_STATSD_MOGRIFIER_HITS_TAGS:domain:$1,descriptor:$2Prometheus
To enable Prometheus integration set:
USE_PROMETHEUS:trueto use PrometheusPROMETHEUS_ADDR: The port to listen on for Prometheus metrics. Defaults to:9090PROMETHEUS_PATH: The path to listen on for Prometheus metrics. Defaults to/metricsPROMETHEUS_MAPPER_YAML: The path to the YAML file that defines the mapping from statsd to prometheus metrics.PROMETHEUS_RESPONSE_TIME_AS_MILLISECONDS:trueto keep the legacy millisecond behavior forratelimit_server.*.response_timein the built-in mapper. Ignored whenPROMETHEUS_MAPPER_YAMLis set.Define the mapping from statsd to prometheus metrics in a YAML file. Find more information about the mapping in the Metric Mapping and Configuration. The default setting is:
HTTP Port
The ratelimit service listens to HTTP 1.1 (by default on port 8080) with two endpoints:
/json endpoint
Takes an HTTP POST with a JSON body of the form e.g.
The service will return an http 200 if this request is allowed (if no ratelimits exceeded) or 429 if one or more ratelimits were exceeded.
The response is a RateLimitResponse encoded with proto3-to-json mapping:
Debug Port
The debug port can be used to interact with the running process.
You can specify the debug server address with the
DEBUG_HOSTandDEBUG_PORTenvironment variables. They currently default to0.0.0.0and6070respectively.Local Cache
Ratelimit optionally uses freecache as its local caching layer, which stores the over-the-limit cache keys, and thus avoids reading the redis cache again for the already over-the-limit keys. The local cache size can be configured via
LocalCacheSizeInBytesin the settings. IfLocalCacheSizeInBytesis 0, local cache is disabled.Redis
Ratelimit uses Redis as its caching layer. Ratelimit supports two operation modes:
As well Ratelimit supports TLS connections and authentication. These can be configured using the following environment variables:
REDIS_TLS&REDIS_PERSECOND_TLS: set to"true"to enable a TLS connection for the specific connection type.REDIS_TLS_CLIENT_CERT,REDIS_TLS_CLIENT_KEY, andREDIS_TLS_CACERTto provides files to specify a TLS connection configuration to Redis server that requires client certificate verification. (This is effective whenREDIS_TLSorREDIS_PERSECOND_TLSis set to to"true").REDIS_TLS_SKIP_HOSTNAME_VERIFICATIONset to"true"will skip hostname verification in environments where the certificate has an invalid hostname, such as GCP Memorystore.REDIS_AUTH&REDIS_PERSECOND_AUTH: set to"password"to enable password-only authentication to the Redis master/replica nodes.REDIS_AUTH&REDIS_PERSECOND_AUTH: set to"username:password"to enable username-password authentication to the Redis master/replica nodes.REDIS_SENTINEL_AUTH&REDIS_PERSECOND_SENTINEL_AUTH: set to"password"or"username:password"to enable authentication to Redis Sentinel nodes. This is separate fromREDIS_AUTH/REDIS_PERSECOND_AUTHwhich authenticate to the Redis master/replica nodes. Only used whenREDIS_TYPEorREDIS_PERSECOND_TYPEis set to"sentinel". If not set, no authentication will be attempted when connecting to Sentinel nodes.CACHE_KEY_PREFIX: a string to prepend to all cache keysFor controlling the behavior of cache key incrementation when any of them is already over the limit, you can use the following configuration:
STOP_CACHE_KEY_INCREMENT_WHEN_OVERLIMIT: Set this configuration totrueto disallow key incrementation when one of the keys is already over the limit.STOP_CACHE_KEY_INCREMENT_WHEN_OVERLIMITis useful when multiple descriptors are included in a single request. Setting this totruecan prevent the incrementation of other descriptors’ counters if any of the descriptors is already over the limit.Redis type
Ratelimit supports different types of redis deployments:
The deployment type can be specified with the
REDIS_TYPE/REDIS_PERSECOND_TYPEenvironment variables. Depending on the type defined, theREDIS_URLandREDIS_PERSECOND_URLare expected to have the following formats:Connection Pool Settings
Pool Size
REDIS_POOL_SIZE: the number of connections to keep in the pool. Default:10REDIS_PERSECOND_POOL_SIZE: pool size for per-second Redis. Default:10Connection Timeout
Controls the maximum duration for Redis connection establishment, read operations, and write operations.
REDIS_TIMEOUT: sets the timeout for Redis connection and I/O operations. Default:10sREDIS_PERSECOND_TIMEOUT: sets the timeout for per-second Redis connection and I/O operations. Default:10sPool On-Empty Behavior
Controls what happens when all connections in the pool are in use and a new request arrives.
REDIS_POOL_ON_EMPTY_BEHAVIOR: controls what happens when the pool is empty. Default:CREATECREATE: create a new overflow connection after waiting forREDIS_POOL_ON_EMPTY_WAIT_DURATION. This is the default radix behavior.ERROR: return an error after waiting forREDIS_POOL_ON_EMPTY_WAIT_DURATION. This enforces a strict pool size limit.WAIT: block until a connection becomes available. This enforces a strict pool size limit but may cause goroutine buildup.REDIS_POOL_ON_EMPTY_WAIT_DURATION: the duration to wait before taking the configured action (CREATEorERROR). Default:1sREDIS_PERSECOND_POOL_ON_EMPTY_BEHAVIOR: same as above for per-second Redis pool. Default:CREATEREDIS_PERSECOND_POOL_ON_EMPTY_WAIT_DURATION: same as above for per-second Redis pool. Default:1sPipelining
By default, for each request, ratelimit will pick up a connection from pool, write multiple redis commands in a single write then reads their responses in a single read. This reduces network delay.
For high throughput scenarios, ratelimit supports write buffering via radix v4’s WriteFlushInterval. It can be configured using the following environment variables:
REDIS_PIPELINE_WINDOW&REDIS_PERSECOND_PIPELINE_WINDOW: controls how often buffered writes are flushed to the network connection. When set to a non-zero value (e.g., 150us-500us), radix v4 will buffer multiple concurrent write operations and flush them together, reducing system calls and improving throughput. If zero, each write is flushed immediately. Required for Redis Cluster mode.REDIS_PIPELINE_LIMIT&REDIS_PERSECOND_PIPELINE_LIMIT: DEPRECATED - These settings have no effect in radix v4. Write buffering is controlled solely by the window settings above.Write buffering is disabled by default (window = 0). For optimal performance, set
REDIS_PIPELINE_WINDOWto 150us-500us depending on your latency requirements and load patterns.One Redis Instance
To configure one Redis instance use the following environment variables:
REDIS_SOCKET_TYPEREDIS_URLREDIS_POOL_SIZEREDIS_TYPE(optional)This setup will use the same Redis server for all limits.
Two Redis Instances
To configure two Redis instances use the following environment variables:
REDIS_SOCKET_TYPEREDIS_URLREDIS_POOL_SIZEREDIS_PERSECOND: set this to"true".REDIS_PERSECOND_SOCKET_TYPEREDIS_PERSECOND_URLREDIS_PERSECOND_POOL_SIZEREDIS_PERSECOND_TYPE(optional)This setup will use the Redis server configured with the
_PERSECOND_vars for per second limits, and the other Redis server for all other limits.Health Checking for Redis Active Connection
To configure whether to return health check failure if there is no active redis connection
REDIS_HEALTH_CHECK_ACTIVE_CONNECTION: (default is “false”)Memcache
Experimental Memcache support has been added as an alternative to Redis in v1.5.
To configure a Memcache instance use the following environment variables instead of the Redis variables:
MEMCACHE_HOST_PORT=: a comma separated list of hostname:port pairs for memcache nodes (mutually exclusive withMEMCACHE_SRV)MEMCACHE_SRV=: an SRV record to lookup hosts from (mutually exclusive withMEMCACHE_HOST_PORT)MEMCACHE_SRV_REFRESH=0: refresh the list of hosts every n seconds, if 0 no refreshing will happen, supports duration suffixes: “ns”, “us” (or “µs”), “ms”, “s”, “m”, “h”.BACKEND_TYPE=memcacheCACHE_KEY_PREFIX: a string to prepend to all cache keysMEMCACHE_MAX_IDLE_CONNS=2: the maximum number of idle TCP connections per memcache node,2is the default of the underlying libraryMEMCACHE_TLS: set to"true"to connect to the server with TLS.MEMCACHE_TLS_CLIENT_CERT,MEMCACHE_TLS_CLIENT_KEY, andMEMCACHE_TLS_CACERTto provide files that parameterize the memcache client TLS connection configuration.MEMCACHE_TLS_SKIP_HOSTNAME_VERIFICATIONset to"true"will skip hostname verification in environments where the certificate has an invalid hostname.With memcache mode increments will happen asynchronously, so it’s technically possible for a client to exceed quota briefly if multiple requests happen at exactly the same time.
Note that Memcache has a max key length of 250 characters, so operations referencing very long descriptors will fail. Descriptors sent to Memcache should not contain whitespaces or control characters.
When using multiple memcache nodes in
MEMCACHE_HOST_PORT=, one should provide the identical list of memcache nodes to all ratelimiter instances to ensure that a particular cache key is always hashed to the same memcache node.Custom headers
Ratelimit service can be configured to return custom headers with the ratelimit information. It will populate the response_headers_to_add as part of the RateLimitResponse.
The following environment variables control the custom response feature:
LIMIT_RESPONSE_HEADERS_ENABLED- Enables the custom response headersLIMIT_LIMIT_HEADER- The default value is “RateLimit-Limit”, setting the environment variable will specify an alternative header nameLIMIT_REMAINING_HEADER- The default value is “RateLimit-Remaining”, setting the environment variable will specify an alternative header nameLIMIT_RESET_HEADER- The default value is “RateLimit-Reset”, setting the environment variable will specify an alternative header nameTracing
Ratelimit service supports exporting spans in OLTP format. See OpenTelemetry for more information.
The following environment variables control the tracing feature:
TRACING_ENABLED- Enables the tracing feature. Only “true” and “false”(default) are allowed in this field.TRACING_EXPORTER_PROTOCOL- Controls the protocol of exporter in tracing feature. Only “http”(default) and “grpc” are allowed in this field.TRACING_SERVICE_NAME- Controls the service name appears in tracing span. The default value is “RateLimit”.TRACING_SERVICE_NAMESPACE- Controls the service namespace appears in tracing span. The default value is empty.TRACING_SERVICE_INSTANCE_ID- Controls the service instance id appears in tracing span. It is recommended to put the pod name or container name in this field. The default value is a randomly generated version 4 uuid if unspecified.TRACING_SAMPLING_RATE- Controls the sampling rate, defaults to 1 which means always sample. Valid range: 0.0-1.0. For high volume services, adjusting the sampling rate is recommended.You may use the following commands to quickly setup a openTelemetry collector together with a Jaeger all-in-one binary for quickstart:
TLS
Ratelimit supports TLS for it’s gRPC endpoint.
The following environment variables control the TLS feature:
GRPC_SERVER_USE_TLS- Enables gRPC connections to server over TLSGRPC_SERVER_TLS_CERT- Path to the file containing the server cert chainGRPC_SERVER_TLS_KEY- Path to the file containing the server private keyRatelimit uses goruntime to watch the TLS certificate and key and will hot reload them on changes.
mTLS
Ratelimit supports mTLS when Envoy sends requests to the service.
TLS must be enabled on the gRPC endpoint in order for mTLS to work see TLS.
The following variables can be set to enable mTLS on the Ratelimit service.
GRPC_CLIENT_TLS_CACERT- Path to the file containing the client CA certificate.GRPC_CLIENT_TLS_SAN- (Optional) DNS Name to validate from the client cert during mTLS authIn the envoy config use, add the
transport_socketsection to the ratelimit service cluster configContact
[ratelimit]to the email subject.[ratelimit]to the email subject.#ratelimit-userschannel is used for discussions about the ratelimit service.