Fix README.md
@version: 1.0.0 @author:Tianjian Jiang
本实验基于K3S集群建立了一个基于联邦学习以及增量学习的仿真实验,用于验证联邦学习与增量学习在边缘计算以及物联网中的作用。
此程序未基于任何开源代码,其中部分代码以及注释由DeepSeek生成。
新建目录: sudo mkdir -p /mnt/shared
sudo mkdir -p /mnt/shared
安装NFS-Server: sudo apt install nfs-kernel-server
sudo apt install nfs-kernel-server
编辑配置文件: sudo vim /etc/exports,加入一行 /mnt/shared *(rw,sync,no_subtree_check)
sudo vim /etc/exports
/mnt/shared *(rw,sync,no_subtree_check)
导出配置: sudo exportfs -a, sudo exportfs -r
sudo exportfs -a
sudo exportfs -r
重启服务器: sudo systemctl start nfs-kernel-server sudo systemctl enable nfs-kernel-server
sudo systemctl start nfs-kernel-server
sudo systemctl enable nfs-kernel-server
检查状态: sudo systemctl status nfs-kernel-server, 如有报错再行处理。
sudo systemctl status nfs-kernel-server
挂载目录: sudo mount -t nfs 10.3.0.15:/mnt/shared /mnt/shared
sudo mount -t nfs 10.3.0.15:/mnt/shared /mnt/shared
更新库: sudo apt-get update
sudo apt-get update
安装证书与curl包: sudo apt-get install ca-certificates curl
sudo apt-get install ca-certificates curl
加入权限: sudo install -m 0755 -d /etc/apt/keyrings
sudo install -m 0755 -d /etc/apt/keyrings
获取docker的公开Key: sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
加入权限: sudo chmod a+r /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
加入Docker库:
echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
安装Docker: sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
curl -sfL https://get.k3s.io | sh -
国内镜像加速: curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn sh -
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn sh -
curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -
国内镜像加速:curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -
其中myserver需要替换为Server的IP,mynodetoken需要替换为密钥,即通过在Server端获取,命令为:cat /var/lib/rancher/k3s/server/node-token
myserver
mynodetoken
cat /var/lib/rancher/k3s/server/node-token
安装本地Docker仓库 docker pull registry:2
docker pull registry:2
运行本地Docker仓库 docker run -d -p 5000:5000 --restart=always --name registry registry:2
docker run -d -p 5000:5000 --restart=always --name registry registry:2
编译Docker文件(以聚合为例,Node节点的训练Docker亦需要编译): sudo docker build -t aggregator-learner_new:9 -f aggregator/Dockerfile .
sudo docker build -t aggregator-learner_new:9 -f aggregator/Dockerfile .
打标签 sudo docker tag docker.io/library/aggregator-learner_new:9 localhost:5000/aggregator-learner_new:9
sudo docker tag docker.io/library/aggregator-learner_new:9 localhost:5000/aggregator-learner_new:9
推送到本地仓库 docker push localhost:5000/aggregator-learner_new:9
docker push localhost:5000/aggregator-learner_new:9
注意: 聚合的Docker在Master上面部署,其他的训练Docker在Node上面部署。这里后面的Tag需要自己手动设置,尽量后编译的不要与之前设置的重复,修改Tag之后需要在k8s文件夹中对应的yaml进行修改。
安装pandas,sklearn,numpy库,即
pip install pandas pip install scikit-learn pip install numpy
kubectl create namespace federated
kubectl apply -f pv-shared.yaml
./run_federation.sh
kubectl delete job fed-train-node-1-batch-1 -n federated
kubectl apply -f k8s/node-job_node1.yaml
kubectl logs <pod names> -n federated
./scripts/delete_jobs.sh
©Copyright 2023 CCF 开源发展委员会 Powered by Trustie& IntelliDE 京ICP备13000930号
基于K3S的联邦学习与增量学习
@version: 1.0.0 @author:Tianjian Jiang
实验介绍
本实验基于K3S集群建立了一个基于联邦学习以及增量学习的仿真实验,用于验证联邦学习与增量学习在边缘计算以及物联网中的作用。
此程序未基于任何开源代码,其中部分代码以及注释由DeepSeek生成。
预备环境
1. 安装NFS
Master端
新建目录:
sudo mkdir -p /mnt/shared
安装NFS-Server:
sudo apt install nfs-kernel-server
编辑配置文件:
sudo vim /etc/exports
,加入一行/mnt/shared *(rw,sync,no_subtree_check)
导出配置:
sudo exportfs -a
,sudo exportfs -r
重启服务器:
sudo systemctl start nfs-kernel-server
sudo systemctl enable nfs-kernel-server
检查状态:
sudo systemctl status nfs-kernel-server
, 如有报错再行处理。Node端
新建目录:
sudo mkdir -p /mnt/shared
安装NFS-Server:
sudo apt install nfs-kernel-server
挂载目录:
sudo mount -t nfs 10.3.0.15:/mnt/shared /mnt/shared
2. 安装Docker
更新库:
sudo apt-get update
安装证书与curl包:
sudo apt-get install ca-certificates curl
加入权限:
sudo install -m 0755 -d /etc/apt/keyrings
获取docker的公开Key:
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
加入权限:
sudo chmod a+r /etc/apt/keyrings/docker.asc
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
加入Docker库:
sudo apt-get update
安装Docker:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
3. 安装K3S
Master节点
curl -sfL https://get.k3s.io | sh -
国内镜像加速:
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn sh -
Node节点
curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -
国内镜像加速:
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -
其中
myserver
需要替换为Server的IP,mynodetoken
需要替换为密钥,即通过在Server端获取,命令为:cat /var/lib/rancher/k3s/server/node-token
4. 安装编译Docker
安装本地Docker仓库
docker pull registry:2
运行本地Docker仓库
docker run -d -p 5000:5000 --restart=always --name registry registry:2
编译Docker文件(以聚合为例,Node节点的训练Docker亦需要编译):
sudo docker build -t aggregator-learner_new:9 -f aggregator/Dockerfile .
打标签
sudo docker tag docker.io/library/aggregator-learner_new:9 localhost:5000/aggregator-learner_new:9
推送到本地仓库
docker push localhost:5000/aggregator-learner_new:9
注意: 聚合的Docker在Master上面部署,其他的训练Docker在Node上面部署。这里后面的Tag需要自己手动设置,尽量后编译的不要与之前设置的重复,修改Tag之后需要在k8s文件夹中对应的yaml进行修改。
5. 配置Python库(用于本地测试)
安装pandas,sklearn,numpy库,即
6. 新建命名空间
kubectl create namespace federated
7. 设置pvc
kubectl apply -f pv-shared.yaml
实验
1. 启动联邦训练
./run_federation.sh
Tools
删除一个jod
kubectl delete job fed-train-node-1-batch-1 -n federated
部署测试:
kubectl apply -f k8s/node-job_node1.yaml
查看训练Log
kubectl logs <pod names> -n federated
删除全部的jobs
./scripts/delete_jobs.sh