Create Airflow 3.2.0 Image (#450)
Description of changes:
- This PR creates the Airflow 3.2.0 image with necessary changes (For clarity each change is separate to standalone commit):
- Statsd: Updated import from
airflow.metrics.statsd_loggertoairflow._shared.observability.metrics.statsd_loggerbecause the module was relocated in Airflow 3.2.0.- Celery Config: Added
AIRFLOW__CELERY__EXTRA_CELERY_CONFIGwith JSON-serialized broker config because Airflow 3.2.0’screate_celery_app()no longer readscelery_config_optionsandgetsection()returns all values as strings, breaking nested dict types likepredefined_queues.- Disable Triggerer queue config: Disabled triggerer queues (
AIRFLOW__TRIGGERER__QUEUES_ENABLED=False) because MWAA runs a single triggerer without--queuesCLI support, and enabling it would cause all deferrable tasks to hang indefinitely.- Unit tests are also imported to 3.2.0 with changes accordingly.
Testing:
- Passed all unit tests.
- Deployed the image to MWAA and tested out basic functionality working.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Co-authored-by: Yiwen Wang wwangyw@amazon.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
aws-mwaa-docker-images
Overview
This repository contains the Docker Images that Amazon MWAA uses to run Airflow.
You can also use it locally if you want to run a MWAA-like environment for testing, experimentation, and development purposes.
Currently, Airflow v2.9.2 and above are supported. Future versions in parity with Amazon MWAA will be added as well. Notice, however, that we do not plan to support previous Airflow versions supported by MWAA.
Using the Airflow Image
To experiment with the image using a vanilla Docker setup, follow these steps:
cd <amazon-mwaa-docker-images path>/images/airflow/2.9.2run.shfile with your account ID, environment name and account credentials, api-server URLhttp://host_name:8080). The permissions associated with the provided credentials will be assigned to the Airflow components that would be started with the next step. So, if you receive any error message indicating lack of permissions, then try providing the permissions to the identity whose credentials were used../run.shThis will build and run all the necessary containers and automatically create the following CloudWatch log groups:{ENV_NAME}-DAGProcessing{ENV_NAME}-Scheduler{ENV_NAME}-Worker{ENV_NAME}-Task{ENV_NAME}-WebServerAirflow should be up and running now. You can access the web server on your localhost on port 8080.
Authentication from version 3.0.1 onward
For environments created using this repository starting with version 3.0.1, we default to using
SimpleAuthManager, which is also the default auth manager in Airflow 3.0.0+. By default,SIMPLE_AUTH_MANAGER_ALL_ADMINSis set to true, which means no username/password is required, and all users will have admin access. You can specify users and roles using the SIMPLE_AUTH_MANAGER_USERS environment variable in the format:To enforce authentication with explicit user passwords and roles, set:
In this mode, a password will be automatically generated for each user and printed in the webserver logs as soon as webserver starts.
Generated Docker Images
When you build the Docker images of a certain Airflow version, using either
build.shorrun.sh(which automatically also callsbuild.shfor you), multiple Docker images will actually be generated. For example, for Airflow 2.9, you will notice the following images:Each of the postfixes added to the image tag represents a certain build type, as explained below:
explorer: The ‘explorer’ build type is almost identical to the default build type except that it doesn’t include an entrypoint, meaning that if you run this image locally, it will not actually start Airflow. This is useful for debugging purposes to run the image and look around its content without starting airflow. For example, you might want to explore the file system and see what is available where.privileged: Privileged images are the same as their non-privileged counterpart except that they run as therootuser instead. This gives the user of this Docker image elevated permissions. This can be useful if the user wants to do some experiments as the root user, e.g. installing DNF packages, creating new folders outside the airflow user folder, among others.dev: These images have extra packages installed for debugging purposes. For example, typically you wouldn’t want to install a text editor in a Docker image that you use for production. However, during debugging, you might want to open some files and inspect their contents, make some changes, etc. Thus, we install an editor in the dev images to aid with such use cases. Similarly, we install tools likewgetto make it possible for the user to fetch web pages. For a complete listing of what is installed indevimages, see thebootstrap-devfolders.Extra commands
Requirements
For details on installing Python depedencies, and optionally bundling wheel files, see the Managing Python dependencies in requirements.txt in the Amazon MWAA user guide.
requirements/requirements.txtrequirements.txtwithout running Apache Airflow, run:Startup script
startup_script. Add your script there asstartup.shstartup.shscript.startup.shwithout running Apache Airflow, run:Reset database
process fails with "dag_stats_table already exists", you’ll need to reset your database. You just need to restart your container by exiting and rerunning therun.shscriptSecurity
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.