AWS CodeArtifact is a fully managed artifact repository service
that makes it easy for organizations to securely store and share software packages used for application development.
On [launch date] we introduced a feature called “Package Origin Control” which allows customers to protect themselves
against “dependency substitution“ or
“dependency confusion“ attacks.
While this feature protects new packages by default, packages which lived in CodeArtifact repositories prior to the
feature release are not protected without explicit configuration.
The purpose of this toolkit is to provide repository administrators with an easy way to set Origin Control policies
in bulk on packages that have not received the default protection because they pre-date feature release.
This can be achieved by blocking upstream versions for internal packages. The toolkit also supports blocking publishing
package versions to avoid the creation of a potentially vulnerable mixed state for external packages as well.
The toolkit is comprised of two scripts: a first one called generate_package_configurations.py for creating a manifest
file listing the packages in a domain alongside their proposed origin configuration to apply, and a second one named
apply_package_configurations.py that reads the manifest file and applies the configuration within.
generate_package_configurations.py can operate on a whole repository, or on a subset of packages
(specified either via filters, or though a list) and supports two origin control resolution modes:
Manual: Supply the origin configuration yourself via a manifest file. This is an appropriate option if you
already maintain a list of internal packages, or if they are published in a consistent internal namespace which
allows for them to be easily selected.
Automated: Identifies which packages should have their upstreams blocked by analyzing the upstream repository
graph and external connections, looking for evidence that package versions are only available from the repository at
hand- in which case it determines it can disable sourcing of upstream versions can be done without risk of breaking
builds. This is a good option if you want a quick way to tighten your security posture without having to manually
analyze your whole repository.
apply_package_configurations.py takes the manifest file generated by generate_package_configurations.py as in input,
and applies the origin control changes by calling the new PutPackageOriginConfiguration API.
Precisely because it is meant to set these values in bulk,
this script supports backup and revert operations by default, as well as dry-run and step-by-step confirmation options.
If you identify an issue after applying origin control changes, you will be able to safely revert to the original,
working configuration before trying again. See the Backup and restore section for details.
Installing
The toolkit only depends on the boto3 and tqdm packages. In order to install, simply run:
pip install -r requirements.txt
Configuring
The toolkit uses the same configuration as the AWS CLI to run. This means that you can either set one the following
two environment variable sets:
Alternatively, you can use the --profile flag to indicate what specific AWS CLI profile you want to use. Please note
that this flag is only used for authentication purposes and thus even if you have specified a region parameter in your
AWS CLI profile, you will still be required to pass in a --profile flag to the script.
Additionally, the account you are using to authenticate must have at least repository-level read permissions to run
the first stage in manual mode, and read permissions on all repositories upstream of the target repository in auto mode.
Stage 2 requires repository read permissions if the backup feature is enabled (default) as well as write permissions
to execute unless you want to use dry-run mode(see the “More Options” section below).
Using
The toolkit works on a per-repository basis. It is structured in two stages: in the first one a manifest is produced
consisting of a CSV listing all the packages in the target repository, alongside their desired origin
control configuration. The second stage is responsible for taking the generated CSV and setting the desired origin
control configuration on every package listed within.
Stage 1: Generating the changes manifest
The first stage is invoked through generate_package_configurations.py. It requires values for
domain, repository as well as region to be supplied.
Specifying origin configuration
Origin configuration is always supplied as as string like
publish=[value],upstream=[value]
where [value] can be either ALLOW or BLOCK. So by default all existing packages will have
publish=ALLOW,upstream=ALLOW
In order to tighten security for an internally-published package, you would want to disable upstream versions like
publish=ALLOW,upstream=BLOCK
Conversely, if you wanted to prevent users from publishing new versions to a package, you would set:
publish=BLOCK,upstream=ALLOW
These settings are always supplied as a tuple and should be thought of as working in concert.
Generating from list vs generating from query
Once the repository, domain, and region values have been supplied, you must select which packages to generate origin
control configurations for. It is possible to select either all packages within the repository, or a subset.
In order to select a subset of packages available in the repository, two options are available: either by supplying
a list of package names or through a query. Please note that multiple namespaces and package formats aren’t supported at once and
you will have to repeat this operation explicitly for each one.
Working with a supplied packages list is as easy as specifying the input file name, which should have one name per line.
For example, if you wanted to BLOCK upstreams for some internal npm packages as listed in an inputfile.log
file:
Once you selected a package set, you have two ways of bulk-setting the origin configuration for each package in it:
the simplest is by explicitly setting the policy via the --set-restrictions flag, which we refer to as “manual” mode.
Otherwise, you can use “automatic” mode simply by omitting the above flag. This mode is meant for administrators who
want the most hassle-free experience: the toolkit will try to identify packages which can have their upstreams blocked
safely, and will otherwise fall back on allowing upstreams.
The heuristic will block acquisition of new versions from upstreams if and only if the target repository doesn’t have
direct access to an external connection, and no versions of the package are available via any of the upstreams,
either because the target repository doesn’t have any upstreams or because none of the upstreams have the package.
Therefore, we assume there isn’t an immediate external connection attached to the repository for the package format(s)
you are trying to run this script against.
In order to generate the list of new origin control configuration for the same subset of packages as in the
previous example, simply omit the --set-restrictions flag and run:
By default the script will save its results to a file called origin_configuration.csv. You can use the --output flag
to change this to a path of your liking.
Stage 2: Applying changes
Once a well-formed CSV has been produced, it can be fed to the second stage, apply_package_configurations.py.
The same parameters as before (region, domain, repository) must be provided even though they are also present
in the CSV columns. This is to ensure there is no ambiguity and to confirm you are operating on the right repository.
Invoking the second stage on origin_configuration.csv therefore looks like this:
--validate-only: Verifies that the CSV is well-formed
--dry-run: Doesn’t actually call the API, but shows what the script would do.
--trace: Enables a more verbose mode.
--list-failed : In case of failure, lists packages that have failed to update the origin control configuration.
--retry-failed : Tries again to set the origin control configuration for packages that have failed to do so.
--ask-confirmation: Requires step-by-step confirmation for all write actions.
--num-workers: Controls the number of parallel workers making calls to CodeArtifact (default: 4)
Backup and restore
By default, before changing any origin control configurations the script will back up the existing configuration for
every package it touches (this behavior can be disabled with the --no-backup flag). Should you want to revert to the
previous configuration, you can simply use the --revert flag on the same input file.
AWS CodeArtifact Package Origin Control toolkit
Overview
AWS CodeArtifact is a fully managed artifact repository service that makes it easy for organizations to securely store and share software packages used for application development. On [launch date] we introduced a feature called “Package Origin Control” which allows customers to protect themselves against “dependency substitution“ or “dependency confusion“ attacks.
While this feature protects new packages by default, packages which lived in CodeArtifact repositories prior to the feature release are not protected without explicit configuration.
The purpose of this toolkit is to provide repository administrators with an easy way to set Origin Control policies in bulk on packages that have not received the default protection because they pre-date feature release. This can be achieved by blocking upstream versions for internal packages. The toolkit also supports blocking publishing package versions to avoid the creation of a potentially vulnerable mixed state for external packages as well.
More information can be found on the origin control feature documentation as well as in the blog post announcing the availability of this toolkit.
Structure
The toolkit is comprised of two scripts: a first one called
generate_package_configurations.pyfor creating a manifest file listing the packages in a domain alongside their proposed origin configuration to apply, and a second one namedapply_package_configurations.pythat reads the manifest file and applies the configuration within.generate_package_configurations.pycan operate on a whole repository, or on a subset of packages (specified either via filters, or though a list) and supports two origin control resolution modes:apply_package_configurations.pytakes the manifest file generated bygenerate_package_configurations.pyas in input, and applies the origin control changes by calling the newPutPackageOriginConfigurationAPI.Precisely because it is meant to set these values in bulk, this script supports backup and revert operations by default, as well as dry-run and step-by-step confirmation options. If you identify an issue after applying origin control changes, you will be able to safely revert to the original, working configuration before trying again. See the Backup and restore section for details.
Installing
The toolkit only depends on the
boto3andtqdmpackages. In order to install, simply run:Configuring
The toolkit uses the same configuration as the AWS CLI to run. This means that you can either set one the following two environment variable sets:
or
Alternatively, you can use the
--profileflag to indicate what specific AWS CLI profile you want to use. Please note that this flag is only used for authentication purposes and thus even if you have specified aregionparameter in your AWS CLI profile, you will still be required to pass in a--profileflag to the script.Additionally, the account you are using to authenticate must have at least repository-level read permissions to run the first stage in manual mode, and read permissions on all repositories upstream of the target repository in auto mode. Stage 2 requires repository read permissions if the backup feature is enabled (default) as well as write permissions to execute unless you want to use dry-run mode(see the “More Options” section below).
Using
The toolkit works on a per-repository basis. It is structured in two stages: in the first one a manifest is produced consisting of a CSV listing all the packages in the target repository, alongside their desired origin control configuration. The second stage is responsible for taking the generated CSV and setting the desired origin control configuration on every package listed within.
Stage 1: Generating the changes manifest
The first stage is invoked through
generate_package_configurations.py. It requires values fordomain,repositoryas well asregionto be supplied.Specifying origin configuration
Origin configuration is always supplied as as string like
where
[value]can be eitherALLOWorBLOCK. So by default all existing packages will haveIn order to tighten security for an internally-published package, you would want to disable upstream versions like
Conversely, if you wanted to prevent users from publishing new versions to a package, you would set:
These settings are always supplied as a tuple and should be thought of as working in concert.
Generating from list vs generating from query
Once the repository, domain, and region values have been supplied, you must select which packages to generate origin control configurations for. It is possible to select either all packages within the repository, or a subset.
In order to select a subset of packages available in the repository, two options are available: either by supplying a list of package names or through a query. Please note that multiple namespaces and package formats aren’t supported at once and you will have to repeat this operation explicitly for each one.
Working with a supplied packages list is as easy as specifying the input file name, which should have one name per line. For example, if you wanted to
BLOCKupstreams for some internalnpmpackages as listed in aninputfile.logfile:You would call the first stage script:
Alternatively, you can select the packages in question:
Automatic vs. manual origin control setting
Once you selected a package set, you have two ways of bulk-setting the origin configuration for each package in it: the simplest is by explicitly setting the policy via the
--set-restrictionsflag, which we refer to as “manual” mode.For example
Otherwise, you can use “automatic” mode simply by omitting the above flag. This mode is meant for administrators who want the most hassle-free experience: the toolkit will try to identify packages which can have their upstreams blocked safely, and will otherwise fall back on allowing upstreams.
The heuristic will block acquisition of new versions from upstreams if and only if the target repository doesn’t have direct access to an external connection, and no versions of the package are available via any of the upstreams, either because the target repository doesn’t have any upstreams or because none of the upstreams have the package. Therefore, we assume there isn’t an immediate external connection attached to the repository for the package format(s) you are trying to run this script against.
In order to generate the list of new origin control configuration for the same subset of packages as in the previous example, simply omit the
--set-restrictionsflag and run:Saving to a file
By default the script will save its results to a file called
origin_configuration.csv. You can use the--outputflag to change this to a path of your liking.Stage 2: Applying changes
Once a well-formed CSV has been produced, it can be fed to the second stage,
apply_package_configurations.py.The same parameters as before (
region,domain,repository) must be provided even though they are also present in the CSV columns. This is to ensure there is no ambiguity and to confirm you are operating on the right repository.Invoking the second stage on
origin_configuration.csvtherefore looks like this:More options
--validate-only: Verifies that the CSV is well-formed--dry-run: Doesn’t actually call the API, but shows what the script would do.--trace: Enables a more verbose mode.--list-failed: In case of failure, lists packages that have failed to update the origin control configuration.--retry-failed: Tries again to set the origin control configuration for packages that have failed to do so.--ask-confirmation: Requires step-by-step confirmation for all write actions.--num-workers: Controls the number of parallel workers making calls to CodeArtifact (default: 4)Backup and restore
By default, before changing any origin control configurations the script will back up the existing configuration for every package it touches (this behavior can be disabled with the
--no-backupflag). Should you want to revert to the previous configuration, you can simply use the--revertflag on the same input file.Links
License
This software is released under the Apache 2.0 license.