Adding CNI release v1.22.2 (#3723)
Adding CNI v1.22.1 release
Disabling SG discovery for ESD feature (#3720)
Adding CNI v1.22.1 release
Disabling SG discovery for ESD feature
comment out SG discovery test
disable esd sg reconciliation
add back refreshSGIDs flow
nit: formatting fix
revert SG discovery, maintain original refreshSGID behavior
Co-authored-by: Hao Zhou zhuhz@amazon.com Co-authored-by: Hao Zhou haouc@users.noreply.github.com
Adding CNI v1.22.2 release
adjust changelog for v1.22.2
Co-authored-by: Hao Zhou zhuhz@amazon.com Co-authored-by: Hao Zhou haouc@users.noreply.github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
amazon-vpc-cni-k8s
Networking plugin for pod networking in Kubernetes using Elastic Network Interfaces on AWS.
Setup
Download the latest version of the yaml and apply it to the cluster.
Launch kubelet with network plugins set to cni (
--network-plugin=cni), the cni directories configured (--cni-config-dirand--cni-bin-dir) and node ip set to the primary IPv4 address of the primary ENI for the instance (--node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)). It is also recommended that you set--max-podsequal to (the number of ENIs for the instance type × (the number of IPs per ENI - 1)) + 2; for details, see vpc_ip_resource_limit.go. Setting--max-podswill prevent scheduling that exceeds the IP address resources available to the kubelet.The default manifest expects
--cni-conf-dir=/etc/cni/net.dand--cni-bin-dir=/opt/cni/bin.Alternatively there is also a Helm chart: eks/aws-vpc-cni
IAM Policy
See here for required IAM policies.
Building
makedefaults tomake build-linuxthat builds the Linux binaries.unit-test,format,lintandvetprovide ways to run the respective tests/tools and should be run before submitting a PR.make dockerwill create a docker container usingdocker buildxthat contains the finished binaries, with a tag ofamazon/amazon-k8s-cni:latestmake docker-unit-testsuses a docker container to run all unit tests..go-versionunless a differentGOLANG_IMAGEtag is passed in.Components
There are 2 components:
ipamd, a long-running node-Local IP Address Management (IPAM) daemon, is responsible for:The details can be found in Proposal: CNI plugin for Kubernetes networking over AWS VPC.
Help & Feedback
For help, please consider the following venues (in order):
#aws-vpc-cnichannel in the Kubernetes Slack community.Recommended Version
For all Kubernetes releases, we recommend installing the latest VPC CNI release. The following table denotes our oldest recommended VPC CNI version for each actively supported Kubernetes release.
Version Upgrade
Upgrading (or downgrading) the VPC CNI version should result in no downtime. Existing pods should not be affected and will not lose network connectivity. New pods will be in pending state until the VPC CNI is fully initialized and can assign pod IP addresses. In v1.12.0+, VPC CNI state is restored via an on-disk file:
/var/run/aws-node/ipam.json. In lower versions, state is restored via calls to container runtime.ENI Allocation
When a worker node first joins the cluster, there is only 1 ENI along with all of the addresses on the ENI. Without any configuration, ipamd always tries to keep one extra ENI.
When the number of pods running on the node exceeds the number of addresses on a single ENI, the CNI backend starts allocating a new ENI using the following allocation scheme:
For example, a m4.4xlarge node can have up to 8 ENIs, and each ENI can have up to 30 IP addresses. See Elastic Network Interfaces documentation for details.
For a detailed explanation, see
WARM_ENI_TARGET,WARM_IP_TARGETandMINIMUM_IP_TARGET.Privileged mode
VPC CNI makes use of privileged mode (
privileged: true) in the manifest for itsaws-vpc-cni-initandaws-eks-nodeagentcontainers.aws-vpc-cni-initcontainer requires elevated privilege to set the networking kernel parameters whileaws-eks-nodeagentcontainer requires these privileges for attaching BPF probes to enforce network policyNetwork Policies
In Kubernetes, by default, all pod-to-pod communication is allowed. Communication can be restricted with Kubernetes NetworkPolicy objects.
VPC CNI versions v1.14, and greater support Kubernetes Network Policies.. Network Policies specify how pods can communicate over the network, at the IP address or port level. The VPC CNI implements the Kubernetes NetworkPolicy API. Network policies generally include a pod selector, and Ingress/Egress rules.
For EKS clusters, review Configure your cluster for Kubernetes network policies in the Amazon EKS User Guide.
The AWS VPC CNI implementation of network policies may be enabled in self-managed clusters. This requires the VPC CNI agent, the Network Policy Controller, and Network Policy Node Agent.
Review the Network Policy FAQ for more information.
Network Policy Related Components
ConfigMap
If the VPC CNI is installed as an Amazon EKS add-ons (also known as a managed add-on), configure it using AWS APIs as described in the EKS User Guide.
If the VPC CNI is installed with a Helm Chart, the ConfigMap is installed in your cluster. Review the Helm Chart information.
Otherwise, the VPC CNI may be configured with a ConfigMap, as shown below:
enable-network-policy-controllerHelm Charts
AWS publishes a Helm chart to install the VPC CNI. Review how to install the helm chart, and the configuration parameters for the chart.
CNI Configuration Variables
The Amazon VPC CNI plugin for Kubernetes supports a number of configuration options, which are set through environment variables. The following environment variables are available, and all of them are optional.
AWS_MANAGE_ENIS_NON_SCHEDULABLE(v1.12.6+)Type: Boolean as a String
Default:
falseSpecifies whether IPAMD should allocate or deallocate ENIs on a non-schedulable node.
AWS_VPC_CNI_NODE_PORT_SUPPORTType: Boolean as a String
Default:
trueSpecifies whether
NodePortservices are enabled on a worker node’s primary network interface. This requires additionaliptablesrules, and the kernel’s reverse path filter on the primary interface is set toloose.AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFGType: Boolean as a String
Default:
falseSpecifies that your pods may use subnets and security groups that are independent of your worker node’s VPC configuration. By default, pods share the same subnet and security groups as the worker node’s primary interface. Setting this variable to
truecausesipamdto use the security groups and VPC subnet in a worker node’sENIConfigfor elastic network interface allocation. You must create anENIConfigcustom resource for each subnet that your pods will reside in, and then annotate or label each worker node to use a specificENIConfig. Multiple worker nodes can be annotated or labelled with the sameENIConfig, but each Worker node can be annotated with a singleENIConfigat a time. Further, the subnet in theENIConfigmust belong to the same Availability Zone that the worker node resides in. For more information, see CNI Custom Networking in the Amazon EKS User Guide.ENI_CONFIG_ANNOTATION_DEFType: String
Default:
k8s.amazonaws.com/eniConfigSpecifies node annotation key name. This should be used when
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true. Annotation value will be used to setENIConfigname. Note that annotations take precedence over labels.ENI_CONFIG_LABEL_DEFType: String
Default:
k8s.amazonaws.com/eniConfigSpecifies node label key name. This should be used when
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true. Label value will be used to setENIConfigname. Note that annotations will take precedence over labels. To use labels, ensure there is no annotation with keyk8s.amazonaws.com/eniConfigor defined key (inENI_CONFIG_ANNOTATION_DEF) set on the node. To select anENIConfigbased upon availability zone set this totopology.kubernetes.io/zoneand create anENIConfigcustom resource for each availability zone (e.g.us-east-1a). Note that tagfailure-domain.beta.kubernetes.io/zoneis deprecated and replaced with the tagtopology.kubernetes.io/zone.HOST_CNI_BIN_PATHType: String
Default:
/host/opt/cni/binSpecifies the location to install CNI binaries. Note that the
aws-nodedaemonset mounts/opt/cni/binto/host/opt/cni/bin. The value you choose must be a location that theaws-nodepod can write to.HOST_CNI_CONFDIR_PATHType: String
Default:
/host/etc/cni/net.dSpecifies the location to install the VPC CNI conflist. Note that the
aws-nodedaemonset mounts/etc/cni/net.dto/host/etc/cni/net.d. The value you choose must be a location that theaws-nodepod can write to.AWS_VPC_ENI_MTU(v1.6.0+)Type: Integer as a String
Default: 9001
Used to configure the MTU size for attached ENIs. The valid range for IPv4 is from
576to9001, while the valid range for IPv6 is from1280to9001.AWS_VPC_K8S_CNI_EXTERNALSNATType: Boolean as a String
Default:
falseSpecifies whether an external NAT gateway should be used to provide SNAT of secondary ENI IP addresses. If set to
true, the SNATiptablesrule and off-VPC IP rule are not applied, and these rules are removed if they have already been applied. SNAT can be disabled in scenarios where pods need direct access to external networks (such as VPN, Direct Connect, or other VPCs) without NAT translation, and where pods are not expected to require direct Internet access via an Internet Gateway. When SNAT is disabled, nodes are typically placed in private subnets, with outbound Internet connectivity provided through an AWS NAT Gateway or another external NAT device.AWS_VPC_K8S_CNI_RANDOMIZESNATType: String
Default:
prngValid Values:
hashrandom,prng,noneSpecifies whether the SNAT
iptablesrule should randomize the outgoing ports for connections. This setting takes effect whenAWS_VPC_K8S_CNI_EXTERNALSNAT=false, which is the default setting. The default setting forAWS_VPC_K8S_CNI_RANDOMIZESNATisprng, meaning that--random-fullywill be added to the SNATiptablesrule. For old versions ofiptablesthat do not support--random-fullythis option will fall back to--random. To disable random port allocation, if you for example rely on sequential port allocation for outgoing connections set it tonone.Note: Any options other than
nonewill cause outbound connections to be assigned a source port that is not necessarily part of the ephemeral port range set at the OS level (/proc/sys/net/ipv4/ip_local_port_range). This is relevant for any customers that might have NACLs restricting traffic based on the port range found inip_local_port_range.AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS(v1.6.0+)Type: String
Default: empty
Specify a comma-separated list of IPv4 CIDRs to exclude from SNAT. For every item in the list an
iptablesrule and off-VPC IP rule will be applied. If an item is not a valid ipv4 range it will be skipped. This should be used whenAWS_VPC_K8S_CNI_EXTERNALSNAT=false.POD_MTU(v1.16.4+)Type: Integer as a String
Note: If unset, the default value is derived from
AWS_VPC_ENI_MTU, which defaults to9001. Default: 9001Used to configure the MTU size for pod virtual interfaces. The valid range for IPv4 is from
576to9001, while the valid range for IPv6 is from1280to9001.WARM_ENI_TARGETType: Integer as a String
Default:
1Specifies the number of free elastic network interfaces (and all of their available IP addresses) that the
ipamddaemon should attempt to keep available for pod assignment on the node. By default,ipamdattempts to keep 1 elastic network interface and all of its IP addresses available for pod assignment. The number of IP addresses per network interface varies by instance type. For more information, see IP Addresses Per Network Interface Per Instance Type in the Amazon EC2 User Guide for Linux Instances.For example, an
m4.4xlargelaunches with 1 network interface and 30 IP addresses. If 5 pods are placed on the node and 5 free IP addresses are removed from the IP address warm pool, thenipamdattempts to allocate more interfaces untilWARM_ENI_TARGETfree interfaces are available on the node.NOTE! If
WARM_IP_TARGETis set, then this environment variable is ignored and theWARM_IP_TARGETbehavior is used instead.WARM_IP_TARGETType: Integer
Default: None
Specifies the number of free IP addresses that the
ipamddaemon should attempt to keep available for pod assignment on the node. Setting this to a non-positive value is the same as setting this to 0 or not setting the variable. WithENABLE_PREFIX_DELEGATIONset totruethenipamddaemon will check if the existing (/28) prefixes are enough to maintain theWARM_IP_TARGETif it is not sufficient then more prefixes will be attached.For example,
WARM_IP_TARGETis set to 5, thenipamdattempts to keep 5 free IP addresses available at all times. If the elastic network interfaces on the node are unable to provide these free addresses,ipamdattempts to allocate more interfaces untilWARM_IP_TARGETfree IP addresses are available.ENABLE_PREFIX_DELEGATIONset totrueandWARM_IP_TARGETis 16. Initially, 1 (/28) prefix is sufficient but once a single pod is assigned IP then remaining free IPs are 15 hence IPAMD will allocate 1 more prefix to achieve 16WARM_IP_TARGETNOTE! Avoid this setting for large clusters, or if the cluster has high pod churn. Setting it will cause additional calls to the EC2 API and that might cause throttling of the requests. It is strongly suggested to set
MINIMUM_IP_TARGETwhen usingWARM_IP_TARGET.If both
WARM_IP_TARGETandMINIMUM_IP_TARGETare set,ipamdwill attempt to meet both constraints. This environment variable overridesWARM_ENI_TARGETbehavior. For a detailed explanation, seeWARM_ENI_TARGET,WARM_IP_TARGETandMINIMUM_IP_TARGET.If
ENABLE_PREFIX_DELEGATIONset totrueandWARM_IP_TARGEToverridesWARM_PREFIX_TARGETbehavior. For a detailed explanation, seeWARM_PREFIX_TARGET,WARM_IP_TARGETandMINIMUM_IP_TARGET.MINIMUM_IP_TARGET(v1.6.0+)Type: Integer
Default: None
Specifies the number of total IP addresses that the
ipamddaemon should attempt to allocate for pod assignment on the node.MINIMUM_IP_TARGETbehaves identically toWARM_IP_TARGETexcept that instead of setting a target number of free IP addresses to keep available at all times, it sets a target number for a floor on how many total IP addresses are allocated. Setting to a non-positive value is same as setting this to 0 or not setting the variable.MINIMUM_IP_TARGETis for pre-scaling,WARM_IP_TARGETis for dynamic scaling. For example, suppose a cluster has an expected pod density of approximately 30 pods per node. IfWARM_IP_TARGETis set to 30 to ensure there are enough IPs allocated up front by the CNI, then 30 pods are deployed to the node, the CNI will allocate an additional 30 IPs, for a total of 60, accelerating IP exhaustion in the relevant subnets. If insteadMINIMUM_IP_TARGETis set to 30 andWARM_IP_TARGETto 2, after the 30 pods are deployed the CNI would allocate an additional 2 IPs. This still provides elasticity, but uses roughly half as many IPs as using WARM_IP_TARGET alone (32 IPs vs 60 IPs).This also improves the reliability of the EKS cluster by reducing the number of calls necessary to allocate or deallocate private IPs, which may be throttled, especially at scaling-related times.
NOTE!
MINIMUM_IP_TARGETis set,WARM_ENI_TARGETwill be ignored. Please utilizeWARM_IP_TARGETinstead.MINIMUM_IP_TARGETis set andWARM_IP_TARGETis not set,WARM_IP_TARGETis assumed to be 0, which leads to the number of IPs attached to the node will be the value ofMINIMUM_IP_TARGET. This configuration will prevent future ENIs/IPs from being allocated. It is strongly recommended thatWARM_IP_TARGETshould be set greater than 0 whenMINIMUM_IP_TARGETis set.MAX_ENIType: Integer
Default: None
Specifies the maximum number of ENIs that will be attached to the node. When
MAX_ENIis unset or 0 (or lower), the setting is not used, and the maximum number of ENIs is always equal to the maximum number for the instance type in question. Even whenMAX_ENIis a positive number, it is limited by the maximum number for the instance type.AWS_VPC_K8S_CNI_LOGLEVELType: String
Default:
DEBUGValid Values:
DEBUG,INFO,WARN,ERROR,FATAL. (Not case sensitive)Specifies the log level for
ipamdandcni-metric-helper.AWS_VPC_K8S_CNI_LOG_FILEType: String
Default:
/host/var/log/aws-routed-eni/ipamd.logValid Values:
stdout,stderr, or a file pathSpecifies where to write the logging output of
ipamd:stdout,stderr, or a file path other than the default (/var/log/aws-routed-eni/ipamd.log).Note:
/host/var/log/...is the container file-system path, which maps to/var/log/...on the node.Note: The IPAMD process runs within the
aws-nodepod, so writing tostdoutorstderrwill write toaws-nodepod logs.AWS_VPC_K8S_PLUGIN_LOG_FILEType: String
Default:
/var/log/aws-routed-eni/plugin.logValid Values:
stderror a file path. Note that setting to the empty string is an alias forstderr, and this comes from upstream kubernetes best practices.Specifies where to write the logging output for
aws-cniplugin:stderror a file path other than the default (/var/log/aws-routed-eni/plugin.log).Note:
stdoutcannot be supported for plugin log. Please refer to #1248 for more details.Note: In EKS 1.24+, the CNI plugin is exec’ed by the container runtime, so
stderris for the container-runtime process, NOT theaws-nodepod. In older versions, the CNI plugin was exec’ed by kubelet, sostderris for the kubelet process.Note: If chaining an external plugin (i.e. Cilium) that does not provide a
pluginLogFilein its config file, the CNI plugin will by default write toos.Stderr.AWS_VPC_K8S_PLUGIN_LOG_LEVELType: String
Default:
DEBUGValid Values:
DEBUG,INFO,WARN,ERROR,FATAL. (Not case sensitive)Specifies the loglevel for
aws-cniplugin.INTROSPECTION_BIND_ADDRESSType: String
Default:
127.0.0.1:61679Specifies the bind address for the introspection endpoint.
A Unix Domain Socket can be specified with the
unix:prefix before the socket path.DISABLE_INTROSPECTIONType: Boolean as a String
Default:
falseSpecifies whether introspection endpoints are disabled on a worker node. Setting this to
truewill reduce the debugging information we can get from the node when running theaws-cni-support.shscript.DISABLE_METRICSType: Boolean as a String
Default:
falseSpecifies whether the prometheus metrics endpoint is disabled or not for ipamd. By default metrics are published on
:61678/metrics.AWS_VPC_K8S_CNI_VETHPREFIXType: String
Default:
eniSpecifies the veth prefix used to generate the host-side veth device name for the CNI. The prefix can be at most 4 characters long. The prefixes
eth,vlan, andloare reserved by the CNI plugin and cannot be specified. We recommend using prefix name not shared by any other network interfaces on the worker node instance.ADDITIONAL_ENI_TAGS(v1.6.0+)Type: String
Default:
{}Example values:
{"tag_key": "tag_val"}Metadata applied to ENI helps you categorize and organize your resources for billing or other purposes. Each tag consists of a custom-defined key and an optional value. Tag keys can have a maximum character length of 128 characters. Tag values can have a maximum length of 256 characters. These tags will be added to all ENIs on the host.
Important: Custom tags should not contain
k8s.amazonaws.comprefix as it is reserved. If the tag hask8s.amazonaws.comstring, tag addition will be ignored.AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER(deprecated v1.12.1+)Type: Boolean as a String
Default:
trueSpecifies whether ipamd should configure rp filter for primary interface. Setting this to
falsewill require rp filter to be configured through init container.NOTE!
AWS_VPC_K8S_CNI_CONFIGURE_RPFILTERhas been deprecated, so setting this environment variable results in a no-op. The init container unconditionally configures the rp filter for the primary interface.CLUSTER_NAMEType: String
Default:
""Specifies the cluster name to tag allocated ENIs with. See the “Cluster Name tag” section below.
CLUSTER_ENDPOINT(v1.12.1+)Type: String
Default:
""Specifies the cluster endpoint to use for connecting to the api-server without relying on kube-proxy. This is an optional configuration parameter that can improve the initialization time of the AWS VPC CNI.
NOTE! When setting CLUSTER_ENDPOINT, it is STRONGLY RECOMMENDED that you enable private endpoint access for your API server, otherwise VPC CNI requests can traverse the public NAT gateway and may result in additional charges.
ENABLE_POD_ENI(v1.7.0+)Type: Boolean as a String
Default:
falseTo enable security groups for pods you need to have at least an EKS 1.17 eks.3 cluster.
Setting
ENABLE_POD_ENItotruewill allow IPAMD to add thevpc.amazonaws.com/has-trunk-attachedlabel to the node if the instance has the capacity to attach an additional ENI.The label notifies vpc-resource-controller to attach a Trunk ENI to the instance. The label value is initially set to
falseand is marked totrueby IPAMD when vpc-resource-controller attaches a Trunk ENI to the instance. However, there might be cases where the label value will remainfalseif the instance doesn’t support ENI Trunking.Once enabled the VPC resource controller will then advertise branch network interfaces as extended resources on these nodes in your cluster. Branch interface capacity is additive to existing instance type limits for secondary IP addresses and prefixes. For example, a c5.4xlarge can continue to have up to 234 secondary IP addresses or 234 /28 prefixes assigned to standard network interfaces and up to 54 branch network interfaces. Each branch network interface only receives a single primary IP address and this IP address will be allocated to pods with a security group(branch ENI pods). The maximum number of branch ENIs per instance type is defined in the vpc-resource-controller limits and is only supported on Nitro-based instances.
Any of the WARM targets do not impact the scale of the branch ENI pods so you will have to set the WARM_{ENI/IP/PREFIX}_TARGET based on the number of non-branch ENI pods. If you are having the cluster mostly using pods with a security group consider setting WARM_IP_TARGET to a very low value instead of default WARM_ENI_TARGET or WARM_PREFIX_TARGET to reduce wastage of IPs/ENIs.
NOTE! Toggling
ENABLE_POD_ENIfromtruetofalsewill not detach the Trunk ENI from an instance. To delete/detach the Trunk ENI from an instance, you need to recycle the instance.POD_SECURITY_GROUP_ENFORCING_MODE(v1.11.0+)Type: String
Default:
strictValid Values:
strict,standardOnce
ENABLE_POD_ENIis set totrue, this value controls how the traffic of pods with the security group behaves.strictmode: all inbound/outbound traffic from pod with security group will be enforced by security group rules. This is the default mode if POD_SECURITY_GROUP_ENFORCING_MODE is not set.strictmode is supported when kube-proxy configured iniptablesmode (default with EKS). If kube-proxy is configured inipvsmode, please setPOD_SECURITY_GROUP_ENFORCING_MODEtostandard.standardmode: the traffic of pod with security group behaves same as pods without a security group, except that each pod occupies a dedicated branch ENI.NOTE!: To make new behavior be in effect after switching the mode, existing pods with security group must be recycled. Alternatively, you can restart the nodes as well.
DISABLE_TCP_EARLY_DEMUX(v1.7.3+)Type: Boolean as a String
Default:
falseIf
ENABLE_POD_ENIis set totrue, for the kubelet to connect via TCP (for liveness or readiness probes) to pods that are using per pod security groups,DISABLE_TCP_EARLY_DEMUXshould be set totrueforamazon-k8s-cni-initthe container underinitcontainers. This will increase the local TCP connection latency slightly. Details on why this is needed can be found in this #1212 comment. To use this setting, a Linux kernel version of at least 4.6 is needed on the worker node.You can use the below command to enable
DISABLE_TCP_EARLY_DEMUXtotrue-ENABLE_SUBNET_DISCOVERY(v1.18.0+)Type: Boolean as a String
Default:
trueSubnet discovery is enabled by default. VPC-CNI will pick the subnet with the most number of free IPs from the nodes’ VPC/AZ to create the secondary ENIs. The subnets considered are the subnet the node is created in and subnets tagged with
kubernetes.io/role/cni. IfENABLE_SUBNET_DISCOVERYis set tofalseor if DescribeSubnets fails due to IAM permissions, all secondary ENIs will be created in the subnet the node is created in.ENABLE_PREFIX_DELEGATION(v1.9.0+)Type: Boolean as a String
Default:
falseTo enable prefix delegation on nitro instances. Setting
ENABLE_PREFIX_DELEGATIONtotruewill start allocating a prefix (/28 for IPv4 and /80 for IPv6) instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the limit on private IPs allowed by your instance. Setting or resetting ofENABLE_PREFIX_DELEGATIONwhile pods are running or if ENIs are attached is supported and the new pods allocated will get IPs based on the mode of IPAMD but the max pods of kubelet should be updated which would need either kubelet restart or node recycle.Setting ENABLE_PREFIX_DELEGATION to true will not increase the density of branch ENI pods. The limit on the number of branch network interfaces per instance type will remain the same. Each branch network will be allocated a primary IP and this IP will be allocated for the branch ENI pods.
Please refer to VPC CNI Feature Matrix section below for additional information around using Prefix delegation with Custom Networking and Security Groups Per Pod features.
Note:
ENABLE_PREFIX_DELEGATIONneeds to be set totruewhen VPC CNI is configured to operate in IPv6 mode (supported in v1.10.0+). Prefix Delegation in IPv4 and IPv6 modes is supported on Nitro based Bare Metal instances as well from v1.11+. If you’re using Prefix Delegation feature on Bare Metal instances, downgrading to an earlier version of VPC CNI from v1.11+ will be disruptive and not supported.WARM_PREFIX_TARGET(v1.9.0+)Type: Integer
Default: None
Specifies the number of free IPv4(/28) prefixes that the
ipamddaemon should attempt to keep available for pod assignment on the node. Setting to a non-positive value is same as setting this to 0 or not setting the variable. This environment variable works whenENABLE_PREFIX_DELEGATIONis set totrueand is overridden whenWARM_IP_TARGETandMINIMUM_IP_TARGETare configured.DISABLE_NETWORK_RESOURCE_PROVISIONING(v1.9.1+)Type: Boolean as a String
Default:
falseSetting
DISABLE_NETWORK_RESOURCE_PROVISIONINGtotruewill make IPAMD depend only on IMDS to get attached ENIs and IPs/prefixes.ENABLE_BANDWIDTH_PLUGIN(v1.10.0+)Type: Boolean as a String
Default:
falseSetting
ENABLE_BANDWIDTH_PLUGINtotruewill update10-aws.conflistto include upstream bandwidth plugin as a chained plugin.NOTE: Kubernetes Network Policy is supported in Amazon VPC CNI starting with version v1.14.0. Note that bandwidth plugin is not compatible with Amazon VPC CNI based Network policy. Network Policy agent uses TC (traffic classifier) system to enforce configured network policies for the pods. The policy enforcement will fail if bandwidth plugin is enabled due to conflict between TC configuration of bandwidth plugin and Network policy agent. We’re exploring options to support bandwidth plugin along with Network policy feature and the issue is tracked here
ANNOTATE_POD_IP(v1.9.3+)Type: Boolean as a String
Default:
falseSetting
ANNOTATE_POD_IPtotruewill allow IPAMD to add an annotationvpc.amazonaws.com/pod-ipsto the pod with pod IP.There is a known issue with kubelet taking time to update
Pod.Status.PodIPleading to calico being blocked on programming the policy. SettingANNOTATE_POD_IPtotruewill enable AWS VPC CNI plugin to add Pod IP as an annotation to the pod spec to address this race condition.To annotate the pod with pod IP, you will have to add
patchpermission for pods resource in aws-node clusterrole. You can use the below command -NOTE: Adding
patchpermissions to theaws-nodeDaemonset increases the security scope for the plugin, so add this permission only after performing a proper security assessment of the tradeoffs.ENABLE_IPv4(v1.10.0+)Type: Boolean as a String
Default:
trueVPC CNI can operate in either IPv4 or IPv6 mode. Setting
ENABLE_IPv4totruewill configure it in IPv4 mode (default mode).Note: Dual-stack mode isn’t yet supported. So, enabling both IPv4 and IPv6 will be treated as an invalid configuration.
ENABLE_IPv6(v1.10.0+)Type: Boolean as a String
Default:
falseVPC CNI can operate in either IPv4 or IPv6 mode. Setting
ENABLE_IPv6totrue(both underaws-nodeandaws-vpc-cni-initcontainers in the manifest) will configure it in IPv6 mode. IPv6 is only supported in Prefix Delegation mode, soENABLE_PREFIX_DELEGATIONneeds to be set totrueif VPC CNI is configured to operate in IPv6 mode. Prefix delegation is only supported on nitro instances.Note: Please make sure that the required IPv6 IAM policy is applied (Refer to IAM Policy section above). Dual stack mode isn’t yet supported. So, enabling both IPv4 and IPv6 will be treated as invalid configuration. Please refer to the VPC CNI Feature Matrix section below for additional information.
ENABLE_NFTABLES(introduced in v1.12.1, deprecated in v1.13.2+)Type: Boolean as a String
Default:
falseVPC CNI uses
iptables-legacyby default. SettingENABLE_NFTABLEStotruewill update VPC CNI to useiptables-nft.Note: VPC CNI image contains
iptables-legacyandiptables-nft. Switching between them is done viaupdate-alternatives. It is strongly recommended that the iptables mode matches that which is used by the base OS andkube-proxy. Switching modes while pods are running or rules are installed will not trigger reconciliation. It is recommended that rules are manually updated or nodes are drained and cordoned before updating. If reloading node, ensure that previous rules are not set to be persisted.AWS_EXTERNAL_SERVICE_CIDRS(v1.12.6+)Type: String
Default: empty
Specify a comma-separated list of IPv4 CIDRs that must be routed via main routing table. This is required for secondary ENIs to reach endpoints outside of VPC that are backed by a service. For every item in the list, an
ip rulewill be created with a priority greater than theip rulecapturing egress traffic from the container. If an item is not a valid IPv4 CIDR, it will be skipped.AWS_EC2_ENDPOINT(v1.13.0+)Type: String
Default: empty
Specify the EC2 endpoint to use. This is useful if you are using a custom endpoint for EC2. For example, if you are using a proxy for EC2, you can set this to the proxy endpoint. Any kind of URL or IP address is valid such as
https://localhost:8080orhttp://ec2.us-west-2.customaws.com. If this is not set, the default EC2 endpoint will be used.DISABLE_LEAKED_ENI_CLEANUP(v1.13.0+)Type: Boolean as a String
Default:
falseOn IPv4 clusters, IPAMD schedules an hourly background task per node that cleans up leaked ENIs. Setting this environment variable to
truedisables that job. The primary motivation to disable this task is to decrease the amount of EC2 API calls made from each node. Note that disabling this task should be considered carefully, as it requires users to manually cleanup ENIs leaked in their account. See #1223 for a related discussion.ENABLE_V6_EGRESS(v1.13.0+)Type: Boolean as a String
Default:
falseSpecifies whether PODs in an IPv4 cluster support IPv6 egress. If env is set to
true, rangefd00::ac:00/118is reserved for IPv6 egress.This environment variable must be set for both the
aws-vpc-cni-initandaws-nodecontainers in order for this feature to work properly. This feature also requires that the node has an IPv6 address assigned to its primary ENI, as this address is used for SNAT to IPv6 endpoints outside of the cluster. If the configuration prerequisites are not met, theegress-cniplugin is not enabled and an error log is printed in theaws-nodecontainer.Note that enabling/disabling this feature only affects whether newly created pods have an IPv6 interface created. Therefore, it is recommended that you reboot existing nodes after enabling/disabling this feature.
The value set in
POD_MTU/AWS_VPC_ENI_MTUis used to configure the MTU size of egress interface.ENABLE_V4_EGRESS(v1.15.1+)Type: Boolean as a String
Default:
trueSpecifies whether PODs in an IPv6 cluster support IPv4 egress. If env is set to
true, range169.254.172.0/22is reserved for IPv4 egress. When enabled, traffic egressing an IPv6 pod destined to an IPv4 endpoint will be SNAT’ed via the node IPv4 address.Note that enabling/disabling this feature only affects whether newly created pods have an IPv4 interface created. Therefore, it is recommended that you reboot existing nodes after enabling/disabling this feature.
The value set in
POD_MTU/AWS_VPC_ENI_MTUis used to configure the MTU size of egress interface.IP_COOLDOWN_PERIOD(v1.15.0+)Type: Integer as a String
Default:
30Specifies the number of seconds an IP address is in cooldown after pod deletion. The cooldown period gives network proxies, such as kube-proxy, time to update node iptables rules when the IP was registered as a valid endpoint, such as for a service. Modify this value with caution, as kube-proxy update time scales with the number of nodes and services.
Note: 0 is a supported value, however it is highly discouraged. Note: Higher cooldown periods may lead to a higher number of EC2 API calls as IPs are in cooldown cache.
DISABLE_POD_V6(v1.15.0+)Type: Boolean as a String
Default:
falseWhen
DISABLE_POD_V6is set, the tuning plugin is chained and configured to disable IPv6 networking in each newly created pod network namespace. Set this variable when you have an IPv4 cluster and containerized applications that cannot tolerate IPv6 being enabled. Container runtimes such ascontainerdwill enable IPv6 in newly created container network namespaces regardless of host settings.Note that if you set this while using Multus, you must ensure that any chained plugins do not depend on IPv6 networking. You must also ensure that chained plugins do not also modify these sysctls.
NETWORK_POLICY_ENFORCING_MODE(v1.17.1+)Type: String
Default:
standardNetwork Policy agent now supports two modes for Network Policy enforcement - Strict and Standard. By default, the Amazon VPC CNI plugin for Kubernetes configures network policies for pods in parallel with the pod provisioning. In the
standardmode, until all of the policies are configured for the new pod, containers in the new pod will start with a default allow policy. A default allow policy means that all ingress and egress traffic is allowed to and from the new pods. However, in thestrictmode, a new pod will start with a default deny policy and all Egress and Ingress connections will be blocked till Network Policies are configured. In Strict Mode, you must have a network policy defined for every pod in your cluster. Host Networking pods are exempted from this requirement.In standard mode, return traffic is always allowed for any packets that were initially sent under the default allow policy. However, once network policies are applied, the next outgoing packet will be evaluated against the active policies, and it will be allowed or denied accordingly.
If you remove the Network Policy Agent container from the aws-node DaemonSet, you must also ensure that NETWORK_POLICY_ENFORCING_MODE environment variable is not set. Setting this value while the NP agent is absent can lead to failures during pod creation.
ENABLE_IMDS_ONLY_MODE(v1.19.6+)Type: Boolean as a String
Default:
falseSetting
ENABLE_IMDS_ONLY_MODEtotrueenables the CNI plugin to operate in environments with strict VPC or IAM restrictions where EC2 API access is limited or unavailable. In this mode, the CNI plugin relies solely on the Instance Metadata Service (IMDS) to retrieve information about ENIs (Elastic Network Interfaces) and determine IP addresses to assign. These ENIs are only discovered at startup, so ENIs and IPs must be pre-attached and pre-assigned before CNI plugin starts up. Enabling this mode automatically setsDISABLE_NETWORK_RESOURCE_PROVISIONINGandDISABLE_LEAKED_ENI_CLEANUPtotrue, as the CNI plugin will not make any EC2 API calls during operation.ENABLE_MULTI_NIC(v1.20.0+)Type: Boolean as a String
Default:
falseThe CNI plugin by default only manages network card 0 and assigns a single IP address to each Pod. Setting
ENABLE_MULTI_NICtotrueenables the Amazon VPC CNI plugin to manage all eligible network cards on supported multi-card instance types.A network card will be managed if at least one of the following conditions is met:
a. The network card does not have any devices attached to it b. The network card has an
efaOR anenadevice attached to it c. The network card has anefa-onlyAND anenadevice attached to itAnnotations
Multi Homed Pods (v1.20.0+)
The
k8s.amazonaws.com/nicConfig: multi-nic-attachmentannotation enables multi-homing for a pod, allowing it to receive an IP address from each managed network card on the node. While this provides multiple network paths, applications must explicitly utilize these interfaces to take advantage of the additional bandwidth. To enable this feature, setENABLE_MULTI_NICtotruein the Amazon VPC CNI configuration and schedule the pod on an instance type that supports multiple network cards. If you are using the AWS VPC CNI implementation of network policies, these policies are applied symmetrically to all interfaces of the pod.Note - Downgrade considerations
ENABLE_MULTI_NICtofalse.VPC CNI Feature Matrix
IPv4IPv6ENI tags related to Allocation
This plugin interacts with the following tags on ENIs:
cluster.k8s.amazonaws.com/namenode.k8s.amazonaws.com/instance_idnode.k8s.amazonaws.com/no_manageCluster Name tag
The tag
cluster.k8s.amazonaws.com/namewill be set to the cluster name of the aws-node daemonset which created the ENI.Instance ID tag
The tag
node.k8s.amazonaws.com/instance_idwill be set to the instance ID of the aws-node instance that allocated this ENI.No Manage tag
The tag
node.k8s.amazonaws.com/no_manageis read by the aws-node daemonset to determine whether an ENI attached to the machine should not be configured or used for private IPs.This tag is not set by the cni plugin itself, but rather may be set by a user to indicate that an ENI is intended for host networking pods, or for some other process unrelated to Kubernetes.
Note: Attaching an ENI with the
no_managetag will result in an incorrect value for the Kubelet’s--max-podsconfiguration option. Consider also updating theMAX_ENIand--max-podsconfiguration options on this plugin and the kubelet respectively if you are making use of this tag.Subnet tags related to Allocation
This plugin additionally interacts with the
kubernetes.io/role/cnitag on subnets whenENABLE_SUBNET_DISCOVERYis set totrue.CNI role tag
The tag
kubernetes.io/role/cniis read by the aws-node daemonset to determine if a secondary subnet can be used for creating secondary ENIs.This tag is not set by the cni plugin itself, but rather must be set by a user to indicate that a subnet can be used for secondary ENIs. Secondary subnets to be used must have this tag. The primary subnet (node’s subnet) is not required to be tagged.
Container Runtime
For VPC CNI >=v1.12.0, IPAMD have switched to use an on-disk file
/var/run/aws-node/ipam.jsonto track IP allocations, thus became container runtime agnostic and no longer requires access to Container Runtime Interface(CRI) socket.cri.hostPath.path. If you need to install a VPC CNI <v1.12.0 with helm chart, a Helm chart version that <v1.2.0 should be used.For VPC CNI <v1.12.0, IPAMD still depends on CRI to track IP allocations using pod sandboxes information upon its starting.
/var/run/cri.sockand hostPath should be pointed to CRI used by kubelet, such as/var/run/containerd/containerd.sockfor containerd.--set cri.hostPath.path=/var/run/containerd/containerd.sockcan set above for you.Notes
L-IPAMD(aws-node daemonSet) running on every worker node requires access to the Kubernetes API server. If it can not reach the Kubernetes API server, ipamd will exit and CNI will not be able to get any IP address for Pods. Here is a way to confirm ifaws-nodehas access to the Kubernetes API server.Security disclosures
If you think you’ve found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here or email AWS security directly.
Contributing
See CONTRIBUTING.md