DOC Rework LinearRegression documentation (#7218)
Reference Issues/PRs
Towards #1447
What does this implement/fix? Explain your changes.
Rework the Linear Regression documentation.
Additionally, adds
cudf,numpy, andsklearnto the Sphinx configuration file to enable cross-library references in the future.Any other comments?
N/A
Authors:
- Virgil Chan (https://github.com/virchan)
- Simon Adorf (https://github.com/csadorf)
Approvers:
- Divye Gala (https://github.com/divyegala)
cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects.
cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML’s Python API matches the API from scikit-learn.
For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
As an example, the following Python snippet loads input and computes DBSCAN clusters, all on GPU, using cuDF:
Output:
cuML also features multi-GPU and multi-node-multi-GPU operation, using Dask, for a growing list of algorithms. The following Python snippet reads input from a CSV file and performs a NearestNeighbors query across a cluster of Dask workers, using multiple GPUs on a single node:
Initialize a
LocalCUDAClusterconfigured with UCXX for fast transport of CUDA arraysLoad data and perform
k-Nearest Neighborssearch.cuml.daskestimators also supportDask.Arrayas input:For additional examples, browse our complete API documentation, or check out our example walkthrough notebooks. Finally, you can find complete end-to-end examples in the notebooks-contrib repo.
Supported Algorithms
Installation
See the RAPIDS Release Selector for the command line to install either nightly or official release cuML packages via conda, pip, or Docker.
Build/Install from Source
See the build guide.
Scikit-learn Compatibility
cuML is compatible with scikit-learn version 1.4 or higher.
Contributing
Please see our guide for contributing to cuML.
References
The RAPIDS team has a number of blogs with deeper technical dives and examples. You can find them here on Medium.
For additional details on the technologies behind cuML, as well as a broader overview of the Python Machine Learning landscape, see Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence (2020) by Sebastian Raschka, Joshua Patterson, and Corey Nolet.
Please consider citing this when using cuML in a project. You can use the citation BibTeX:
Contact
Find out more details on the RAPIDS site
The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.