Reuse the gen-source-db discovery machinery for run-tree (#6)
Summary: I’m not totally sure if you want this, but it will let me start using run-tree instead of doing a separate gen-source-db + run step.
In addition to reducing duplication, this gives run-tree access to site-packages.
Pull Request resolved: https://github.com/facebook/Lifeguard/pull/6
Reviewed By: brittanyrey
Differential Revision: D101821013
Pulled By: martindemello
fbshipit-source-id: 4896deee586fc2ff8c35e1a42f6d483e8552c7bb
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
Lifeguard for Lazy Imports
A fast static analysis tool to aid adoption of Lazy Imports in Python.
What are Lazy Imports?
In Python, every
importstatement executes immediately when a module is loaded. This overhead is incurred regardless of whether that import is actually used. PEP 810 introduces explicit Lazy Imports to Python, which defer the actual loading of a module until the imported name is first accessed. Lazy Imports can significantly reduce memory usage, startup times, and import overhead, especially in large codebases with deep dependency trees.However, some Python patterns depend on imports executing immediately. For example:
sys.modulesmanipulation — code that reads or writessys.modulesassumes prior imports have already executed.__init_subclass__— class creation side effects may depend on imports being resolved.Adapting an existing codebase to use Lazy Imports can be a daunting task, especially at scale. Lifeguard identifies these incompatible patterns so you can adopt Lazy Imports with confidence.
How does Lifeguard work?
Lifeguard analyzes Python source files for a given project in parallel. It walks each module’s AST to detect effects and maps Lazy-Imports-incompatible effects to errors. The analyzer takes a conservative approach towards its analysis: any module that cannot be programmatically determined to be safe to import lazily is marked unsafe by default. This means Lifeguard will err on the side of marking potentially compatible modules as incompatible, leaving potential performance optimizations on the table in favor of production safety.
For a deeper look at the analysis pipeline and architecture, see docs/architecture.md.
Project Stage: Beta
Lifeguard is in active development. We are aiming to be ready for general use by the Python 3.15 final release.
Items on our roadmap
lazykeyword added in PEP 810 — but we fully intend to support this ahead of the 3.15 release.Prerequisites
rustup default nightly.git clone --recurse-submodules https://github.com/facebook/Lifeguard.gitIf you already cloned without
--recurse-submodules, rungit submodule update --init --recursive.Quick Start
The fastest way to try Lifeguard is the
run-treesubcommand, which analyzes every.pyfile under a directory. No additional setup needed.For example, using the bundled sample project:
For a full walkthrough including interpreting the output, see GETTING_STARTED.md.
Running Lifeguard
For larger projects where you need more control, you can generate a source DB — a JSON file that tells Lifeguard the full set of Python files in your project and their module paths (see Input Format for details). Follow these steps:
Optionally, if your project has library dependencies, you can point Lifeguard at your site-packages by adding a
lifeguardsection to yourpyproject.toml:You can find out your site-packages path via
python -m site. Thegen-source-dbsubcommand reads this section automatically when generating the source DB.Note: The script may not discover all of your project’s dependencies. If Lifeguard reports missing modules, you may need to manually add entries to the generated source DB.
OUTPUT_PATH.Example Verbose Output:
Input Format
In some modes, Lifeguard requires a source DB — a JSON file mapping Python module paths to their locations on disk. The format is:
You can generate this automatically using
cargo run -- gen-source-db(see Running Lifeguard), or create it by hand.Output Format
Lifeguard writes a JSON file with two fields:
LAZY_ELIGIBLEA dictionary mapping modules that are safe for Lazy Imports to a list of their dependencies that must be imported eagerly. For example:
"module1": []—module1is fully safe for Lazy Imports with no restrictions."module2": ["module3", "module4"]—module2is safe for Lazy Imports, but only ifmodule3andmodule4have already been imported.Important: Modules that do not appear as keys in this dictionary have been analyzed as unsafe for Lazy Imports.
LOAD_IMPORTS_EAGERLYA set of modules where all imports within the module must be loaded eagerly. Lazy Imports is essentially temporarily disabled for these modules. Note the distinction: other modules can still lazily import a module in the
LOAD_IMPORTS_EAGERLYset, but when that module does load, its ownimportstatements must execute immediately rather than being deferred.This set is only used for specific corner cases:
__del__) — unpredictable execution timing means imports must be available at finalization.exec()calls — dynamic code execution negates static analysis guarantees.sys.modulesaccess — reading or writingsys.modulescould depend on prior imports having already executed.For more details, see docs/load_imports_eagerly.md.
Using the Output
As a standalone linter
Lifeguard can be used as a standalone linter to identify which specific lines in your codebase are incompatible with Lazy Imports. Run the analyzer with
--verbose-outputto get a human-readable report showing per-module errors with line numbers (see Running Lifeguard). This lets you treat Lifeguard like a linter: run it in CI or locally, review the flagged lines, and fix them. In this manner, Lifeguard is used as a guide to safely enable Lazy Imports.To drive a lazy import loader
The JSON output is designed to drive a lazy import loader’s filter function. In Python 3.15,
importlib.util.lazy_importaccepts a filter callback that controls which imports are deferred and which are loaded eagerly. Lifeguard’s output provides the data needed to build this filter — usingLAZY_ELIGIBLEto identify safe modules and their constraints, andLOAD_IMPORTS_EAGERLYto identify modules that need all imports resolved upfront.We plan to provide tooling for easy ingestion of Lifeguard’s output ahead of the Python 3.15 release. This is a work in progress.
Implementation
Lifeguard is implemented in Rust. We leverage ruff for AST traversal and re-use several crates from pyrefly. We also extend
.pyistub files to annotate known side effects in third-party libraries — for example, marking that a particular module-level function call in a dependency has observable behavior. These stubs are stored in theresources/folder. See resources/stubs/stubs.md for details on how effect annotations work alongside standard type stubs.License
By contributing to Lifeguard, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.