Skip to content

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Jan 22, 2026

Note

Introduces VRAM-aware vGPU allocation with TTL-cached profile metadata and a configurable cache TTL.

  • Adds TTL-based caching for GPU profile metadata in devices/mdev.go (SetGPUProfileCacheTTL, getCachedProfiles) and parses framebuffer sizes for profiles
  • Implements VRAM usage calculation and least-loaded GPU selection (calculateGPUVRAMUsage, selectLeastLoadedVF); CreateMdev now picks a VF from the least-loaded GPU
  • Speeds up and refines profile availability counting by grouping VFs per parent and summing per-VF available_instances
  • Adds GPU_PROFILE_CACHE_TTL to config and wires it in main.go via devices.SetGPUProfileCacheTTL
  • Minor import/order/test formatting tweaks

Written by Cursor Bugbot for commit d7e7aaa. This will update automatically on new commits. Configure here.

@sjmiller609
Copy link
Collaborator Author

looks good on staging:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.06             Driver Version: 580.105.06     CUDA Version: N/A      |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    On  |   00000000:82:00.0 Off |                    0 |
| N/A   18C    P8             36W /  350W |    3649MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S                    On  |   00000000:E3:00.0 Off |                    0 |
| N/A   18C    P8             35W /  350W |    3649MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           13921    C+G   vgpu                                   1824MiB |
|    0   N/A  N/A           14018    C+G   vgpu                                   1824MiB |
|    1   N/A  N/A           13966    C+G   vgpu                                   1824MiB |
|    1   N/A  N/A           14068    C+G   vgpu                                   1824MiB |
+-----------------------------------------------------------------------------------------+
root@dev-yul-hypeman-1:~# hypeman resources
RESOURCE   CAPACITY       EFFECTIVE      ALLOCATED      AVAILABLE      OVERSUB
---------------------------------------------------------------------------
cpu        128            512            32             480            4.0x
memory     377.6 GB       377.6 GB       44.0 GB        333.6 GB       1.0x
disk       1.7 TB         1.7 TB         45.2 GB        1.7 TB         1.0x
network    1.2 Gbps       2.5 Gbps       15 Mbps        2.5 Gbps       2.0x

GPU: vgpu mode (4/64 slots used)
PROFILE        VRAM       AVAILABLE
----------------------------------------
NVIDIA L40S-1B 1.0 GB     0
NVIDIA L40S-2B 2.0 GB     60
NVIDIA L40S-1Q 1.0 GB     0
NVIDIA L40S-2Q 2.0 GB     60
NVIDIA L40S-3Q 3.0 GB     0
NVIDIA L40S-4Q 4.0 GB     0
NVIDIA L40S-6Q 6.0 GB     0
NVIDIA L40S-8Q 8.0 GB     0
NVIDIA L40S-12Q 12.0 GB    0
NVIDIA L40S-16Q 16.0 GB    0
NVIDIA L40S-24Q 24.0 GB    0
NVIDIA L40S-48Q 48.0 GB    0
NVIDIA L40S-1A 1.0 GB     0
NVIDIA L40S-2A 2.0 GB     60
  • above - we see 3649MiB / 46068MiB on both GPUs, so both are getting scheduled to evenly.
  • Also, we only see the 2Gb VRAM profiles as being available after we deploy one, correct for "non-heterogenous" gpu ode which doesn't allowing mixing profile sizes.

Copy link
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation of GPU-aware load balancing. The TTL-based caching with double-checked locking is well done, and the VRAM-based selection heuristic is a reasonable approach. One minor nit about the config wiring, but nothing blocking.

@sjmiller609 sjmiller609 merged commit 1616beb into main Jan 22, 2026
4 checks passed
@sjmiller609 sjmiller609 deleted the load-balance-gpu branch January 22, 2026 19:35
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

parentGPU := vfToParent[mdev.VFAddress]
if parentGPU == "" {
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VFs without parent GPU have VRAM usage ignored

Medium Severity

In calculateGPUVRAMUsage, mdevs on VFs with empty ParentGPU are skipped (if parentGPU == "" { continue }), so their VRAM is never counted. However, in selectLeastLoadedVF, these same VFs ARE included in allGPUs and freeVFsByGPU for selection. This means VFs without a physfn symlink are grouped under an empty-string "GPU" that always appears to have 0 VRAM usage, making them preferentially selected even when they already have active mdevs. This could cause load imbalance.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants