The EKS Node Monitoring Agent detects health issues on Amazon EKS worker nodes by parsing system logs and surfacing status information through Kubernetes NodeConditions. When paired with Amazon EKS node auto repair, detected issues can trigger automatic node replacement or reboot.
For each category, the agent applies a dedicated NodeCondition to worker nodes (e.g., KernelReady, NetworkingReady, StorageReady, AcceleratedHardwareReady). These conditions integrate with Amazon EKS node auto repair to automatically remediate unhealthy nodes.
Project Layout
.
├── api/ # API definitions and CRDs
├── charts/ # Helm chart for deployment
├── cmd/ # Application entry point
├── examples/ # Integration examples
├── hack/ # Build and utility scripts
├── monitors/ # Health monitoring plugins
├── pkg/ # Core packages
└── test/ # Integration tests
By default all monitors are enabled. Individual monitors can be disabled via the Helm chart’s nodeAgent.monitors configuration or by providing a config file at /etc/nma/config.yaml.
Helm Values
Each monitor supports enabled: true/false to enable or disable it:
The networking monitor additionally supports allowedIPTablesChains to suppress UnexpectedRejectRule warnings for rules in custom chains. Entries must use table/chain format:
The corresponding NodeCondition (e.g., NetworkingReady) is not set on the node, avoiding false-positive healthy status for unmonitored subsystems.
Building
# Build the binary
make build
# Run tests
make test
# Build container image
make docker-build
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on:
Reporting bugs and feature requests
Submitting pull requests
Code of conduct
Security issue notifications
Security
If you discover a potential security issue, please report it via the AWS vulnerability reporting page. Do not create a public GitHub issue for security vulnerabilities.
EKS Node Monitoring Agent
The EKS Node Monitoring Agent detects health issues on Amazon EKS worker nodes by parsing system logs and surfacing status information through Kubernetes
NodeConditions. When paired with Amazon EKS node auto repair, detected issues can trigger automatic node replacement or reboot.For detailed configuration options and usage documentation, refer to the Amazon EKS Node Health documentation.
Overview
The agent runs as a DaemonSet on each node and monitors for issues across several categories:
For each category, the agent applies a dedicated
NodeConditionto worker nodes (e.g.,KernelReady,NetworkingReady,StorageReady,AcceleratedHardwareReady). These conditions integrate with Amazon EKS node auto repair to automatically remediate unhealthy nodes.Project Layout
Installation
It is recommended to install the EKS Node Health Monitoring Agent as an EKS add-on. For Helm installation instructions, see charts/eks-node-monitoring-agent/README.md.
For detailed configuration options and usage documentation, refer to the Amazon EKS Node Health documentation.
Configuring Monitors
By default all monitors are enabled. Individual monitors can be disabled via the Helm chart’s
nodeAgent.monitorsconfiguration or by providing a config file at/etc/nma/config.yaml.Helm Values
Each monitor supports
enabled: true/falseto enable or disable it:The networking monitor additionally supports
allowedIPTablesChainsto suppressUnexpectedRejectRulewarnings for rules in custom chains. Entries must usetable/chainformat:Config File Format
The agent reads a YAML config file mounted at
/etc/nma/config.yaml. Omitted monitors default to enabled.Valid plugin names:
kernel-monitor,networking,storage-monitor,nvidia,neuron,runtime.When a monitor is disabled:
NodeCondition(e.g.,NetworkingReady) is not set on the node, avoiding false-positive healthy status for unmonitored subsystems.Building
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on:
Security
If you discover a potential security issue, please report it via the AWS vulnerability reporting page. Do not create a public GitHub issue for security vulnerabilities.
See CONTRIBUTING.md for more information.
License
This project is licensed under the Apache-2.0 License. See LICENSE for the full license text.