SILENCE
This repository contains source code and data of the paper “Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem“ published at 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).
- Weiwei Xu, Hao He, Kai Gao, Minghui Zhou. Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem. ASE’2023
📢 A tool based on SILENCE for compliance analysis and incompatibility remediation for Python projects is now available at https://licenserec.com/#/compliance.
Intorduction
In this paper, we first conduct a large-scale empirical study of license incompatibility in PyPI ecosystem. Inspired by our findings, we propose SILENCE, an SMT-solver-based incompatibility remediator for licenses in the
dependency graph. Given a release and its dependency graph with one or more license incompatibilities, SILENCE
- finds alternative licenses that are compatible with the dependency graph, and
- searches for alternative graphs with no license incompatibilities and minimal difference with the original graph.
The results are aggregated as a report of recommended remediations (i.e., migrations, removals, version pinnings, or license changes) for developers to consider.
Dirs and files
licensing_data_collection
: licensing information collection
dep_resolve
: Python dependency tree resolution
knowledge_base
contains license compatibility matrix, migration patterns, license keywords and so on.
analysis.py
: data analysis of empirical study
RQ1_license_distribution.ipynb
: results of RQ1
RQ1_license_evolution.ipynb
: results of RQ1
RQ2_license_incompatibility.ipynb
: results of RQ2
RQ3_license_remediation_practice.md
: note of RQ3
remediator.py
: implementation of SILENCE’s SMT-based part
relicenser.py
: implementation of SILENCE’s relicenser
res
contains results and evaluation of SILENCE.
data
contains our dataset
SILENCE.py
contains the main function of SILENSE.
How to start?
The dataset is stored in the package
collection in the license
database. You can get the dataset in data
directory (due to the file size limit on GitHub, please download the dataset at here) and you need to import it into MongoDB. Run:
mongorestore --db=license --gzip data/package.bson.gz
For results of the empirical study, you can run:
RQ1_license_distribution.ipynb
RQ1_license_evolution.ipynb
RQ2_license_incompatibility.ipynb
To get remediations of all incompatibilities in top 5,000 downloaded packages, you can run:
python remediator.py all
python relicenser.py
If you want to get remediations in dependency graph for a specific package version, run:
python SILENCE.py -n name -v version
For example, if you want to get remediations for fiftyone 0.18.0 in the paper, you can run:
python SILENCE.py -n fiftyone -v 0.18.0
you will get the output as follows:
Possible Remediations for fiftyone 0.18.0:
1. Change project license to GPL-3.0-only, GPL-3.0-or-later, or AGPL-3.0-only;
2. Or make the following dependency changes:
a) Remove ndjson;
b) Pin voxel51-eta to 0.1.9;
c) Pin pillow to 6.2.2;
d) Pin imageio to 2.9.0;
e) Pin h11 to 0.11.0.
3. Or make the following dependency changes:
a) Remove voxel51-eta;
b) Remove ndjson;
c) Pin h11 to 0.11.0.
License
The project is licensed under MulanPubL-2.0.
Citation
For citing, please use the following BibTex citation:
@inproceedings{SILENCE2023,
title={Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem},
author={Xu, Weiwei and He, Hao and Gao, Kai and Zhou, Minghui},
booktitle={2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
pages={178--190},
year={2023},
organization={IEEE}
}
SILENCE
This repository contains source code and data of the paper “Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem“ published at 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).
Tool
📢 A tool based on SILENCE for compliance analysis and incompatibility remediation for Python projects is now available at https://licenserec.com/#/compliance.
Intorduction
In this paper, we first conduct a large-scale empirical study of license incompatibility in PyPI ecosystem. Inspired by our findings, we propose SILENCE, an SMT-solver-based incompatibility remediator for licenses in the dependency graph. Given a release and its dependency graph with one or more license incompatibilities, SILENCE
Dirs and files
licensing_data_collection
: licensing information collectiondep_resolve
: Python dependency tree resolutionknowledge_base
contains license compatibility matrix, migration patterns, license keywords and so on.analysis.py
: data analysis of empirical studyRQ1_license_distribution.ipynb
: results of RQ1RQ1_license_evolution.ipynb
: results of RQ1RQ2_license_incompatibility.ipynb
: results of RQ2RQ3_license_remediation_practice.md
: note of RQ3remediator.py
: implementation of SILENCE’s SMT-based partrelicenser.py
: implementation of SILENCE’s relicenserres
contains results and evaluation of SILENCE.data
contains our datasetSILENCE.py
contains the main function of SILENSE.How to start?
The dataset is stored in the
package
collection in thelicense
database. You can get the dataset indata
directory (due to the file size limit on GitHub, please download the dataset at here) and you need to import it into MongoDB. Run:For results of the empirical study, you can run:
To get remediations of all incompatibilities in top 5,000 downloaded packages, you can run:
If you want to get remediations in dependency graph for a specific package version, run:
For example, if you want to get remediations for fiftyone 0.18.0 in the paper, you can run:
you will get the output as follows:
License
The project is licensed under MulanPubL-2.0.
Citation
For citing, please use the following BibTex citation: