Skip to main content
Version: Next

User Guide for Ascend Devices in Volcano

Introduction

Volcano supports vNPU feature for both Ascend 310 and Ascend 910 using the ascend-device-plugin. It also supports managing heterogeneous Ascend cluster(Cluster with multiple Ascend types, i.e. 910A,910B2,910B3,310p)

Use case:

  • NPU and vNPU cluster for Ascend 910 series
  • NPU and vNPU cluster for Ascend 310 series
  • Heterogeneous Ascend cluster

This feature is only available in volcano >= 1.14.

Quick Start

Prerequisites

ascend-docker-runtime

Install Volcano

helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace

Additional installation methods can be found here.

Label the Node with ascend=on

kubectl label node {ascend-node} ascend=on

Deploy hami-scheduler-device config map

kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-configmap.yaml

Deploy ascend-device-plugin

kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-plugin.yaml

For more information, refer to the ascend-device-plugin documentation.

Scheduler Config Update

Update the scheduler configuration:

kubectl edit cm -n volcano-system volcano-scheduler-configmap
kind: ConfigMap
apiVersion: v1
metadata:
name: volcano-scheduler-configmap
namespace: volcano-system
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: predicates
- name: deviceshare
arguments:
deviceshare.AscendHAMiVNPUEnable: true # enable ascend vnpu
deviceshare.SchedulePolicy: binpack # scheduling policy. binpack / spread
deviceshare.KnownGeometriesCMNamespace: kube-system
deviceshare.KnownGeometriesCMName: hami-scheduler-device
note

You may notice that, volcano-vgpu has its own KnownGeometriesCMName and KnownGeometriesCMNamespace, which means if you want to use both vNPU and vGPU in a same volcano cluster, you need to merge the configMap from both sides and set it here.

Usage

apiVersion: v1
kind: Pod
metadata:
name: ascend-pod
spec:
schedulerName: volcano
containers:
- name: ubuntu-container
image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
command: ["sleep"]
args: ["100000"]
resources:
limits:
huawei.com/Ascend310P: "1"
huawei.com/Ascend310P-memory: "4096"

The supported Ascend chips and their ResourceNames are shown in the following table:

ChipNameResourceNameResourceMemoryName
910Ahuawei.com/Ascend910Ahuawei.com/Ascend910A-memory
910B2huawei.com/Ascend910B2huawei.com/Ascend910B2-memory
910B3huawei.com/Ascend910B3huawei.com/Ascend910B3-memory
910B4huawei.com/Ascend910B4huawei.com/Ascend910B4-memory
910B4-1huawei.com/Ascend910B4-1huawei.com/Ascend910B4-1-memory
310P3huawei.com/Ascend310Phuawei.com/Ascend310P-memory