Command Buffers

Motivation

In OpenGL, calling glDrawArrays immediately sends work to the GPU (or at least, the driver pretends it does). In Vulkan, you record commands into a buffer, then submit that buffer to a queue. The GPU processes the queue asynchronously while your CPU moves on.

This separation exists for three reasons:

Batching. One submission of many commands is cheaper than many individual calls. Each submission has overhead (kernel transitions, driver bookkeeping), so bundling hundreds of draw calls into a single command buffer and submitting once is dramatically faster.
Reuse. You can record a command buffer once and submit it many times. If a scene doesn’t change, why re-record every frame?
Multi-threading. Different CPU threads can record into different command buffers simultaneously, then submit them all on one thread. This is how modern engines scale across CPU cores.

Intuition

The shopping list analogy

A command buffer is a shopping list. You write down everything you need (“bind this pipeline”, “draw 36 vertices”, “copy this image”), then hand the list to someone else (a GPU queue) who goes and does it all. You don’t stand in the store waiting for each item, you hand off the list and do other work.

The lifecycle looks like this:

┌────────────┐     ┌────────────┐     ┌────────────┐
│   Record   │────>│   Submit   │────>│  Execute   │
│  (CPU)     │     │  (CPU→GPU) │     │  (GPU)     │
│            │     │            │     │            │
│ "bind X"   │     │ hand off   │     │ GPU reads  │
│ "draw Y"   │     │ to queue   │     │ the list   │
│ "copy Z"   │     │            │     │ and acts   │
└────────────┘     └────────────┘     └────────────┘

The CPU is free to do other work (including recording the next frame’s command buffer) while the GPU executes.

Command pools: why they exist

Allocating command buffers one at a time would be like allocating individual bytes from the OS. It’s correct, but the overhead per allocation is huge. Command pools solve this by pre-allocating a chunk of memory, then handing out command buffers from that pool cheaply.

┌──────────── Command Pool ────────────┐
│                                      │
│  ┌──────────┐  ┌──────────┐          │
│  │ CmdBuf 0 │  │ CmdBuf 1 │  ...     │
│  └──────────┘  └──────────┘          │
│                                      │
│  (all allocated from one pool)       │
│  (pool is tied to one queue family)  │
└──────────────────────────────────────┘

Each pool is tied to a single queue family. This lets the driver optimize the memory layout for that queue type.

Before reading on: if command pools are tied to a single queue family, and you want to record commands for both a graphics queue and a transfer queue, how many pools do you need?

Primary vs secondary command buffers

Primary command buffers are what you submit to queues. They can contain any command.

Secondary command buffers cannot be submitted directly. Instead, they are executed from within a primary command buffer using cmd_execute_commands. Think of them as subroutines: you record reusable chunks of work (like “render the UI”) into secondary buffers, then call them from your primary buffer.

Primary command buffer:
  begin render pass
  bind pipeline A
  draw meshes
  execute_commands(secondary_ui_buffer)   ← calls the secondary
  end render pass

Most applications start with primary buffers only and add secondary buffers when they need multi-threaded recording or reusable sub-passes.

Worked example: record and submit

This example creates a command pool, allocates a command buffer, records a simple buffer copy, and submits it.

Step 1: Create a command pool

use vulkan_rust::vk;
use vk::*;

// Create a pool for the graphics queue family.
// RESET_COMMAND_BUFFER lets us reset individual command buffers
// instead of resetting the entire pool.
let pool_info = CommandPoolCreateInfo::builder()
    .flags(CommandPoolCreateFlags::RESET_COMMAND_BUFFER)
    .queue_family_index(graphics_queue_family);

let command_pool = unsafe {
    device.create_command_pool(&pool_info, None)?
};

Step 2: Allocate a command buffer

use vulkan_rust::vk;
use vk::*;

// Allocate one primary command buffer from the pool.
let alloc_info = CommandBufferAllocateInfo::builder()
    .command_pool(command_pool)
    .level(CommandBufferLevel::PRIMARY)
    .command_buffer_count(1);

// allocate_command_buffers returns a Vec of handles.
let command_buffer = unsafe {
    device.allocate_command_buffers(&alloc_info)?
}[0];

Step 3: Record commands

use vulkan_rust::vk;
use vk::*;

// Begin recording. ONE_TIME_SUBMIT tells the driver this buffer
// will be submitted once and then reset or freed, enabling
// driver-side optimizations.
let begin_info = CommandBufferBeginInfo::builder()
    .flags(CommandBufferUsageFlags::ONE_TIME_SUBMIT);

unsafe {
    device.begin_command_buffer(command_buffer, &begin_info)?;
};

// Record a buffer copy command.
// This does NOT execute the copy. It records the instruction
// into the command buffer for later execution.
let copy_region = BufferCopy {
    src_offset: 0,
    dst_offset: 0,
    size: 1024,
};

unsafe {
    device.cmd_copy_buffer(
        command_buffer,
        src_buffer,
        dst_buffer,
        &[copy_region],
    );
};

// Finish recording.
unsafe { device.end_command_buffer(command_buffer)? };

Before reading on: between begin_command_buffer and end_command_buffer, the command buffer is in the “recording” state. What do you think happens if you try to submit a command buffer that is still in the recording state?

Step 4: Submit to a queue

use vulkan_rust::vk;
use vk::*;

// Build a submit info. This describes:
//   - which command buffers to execute
//   - which semaphores to wait on before starting
//   - which semaphores to signal when done
let submit_info = SubmitInfo::builder()
    .command_buffers(&[command_buffer]);

// Submit to the graphics queue.
// The Fence (here Fence::null()) will be signaled when the GPU
// finishes all commands in this submission. Passing null means
// "I don't need to know when it's done from the CPU."
unsafe {
    device.queue_submit(
        graphics_queue,
        &[*submit_info],
        Fence::null(),
    )?;
};

// For this example, we wait for the queue to finish before
// continuing. In a real application, you would use a fence
// instead of blocking the CPU.
unsafe { device.queue_wait_idle(graphics_queue)? };

Step 5: Clean up

use vulkan_rust::vk;
use vk::*;

// Option A: Free the command buffer back to the pool.
unsafe {
    device.free_command_buffers(command_pool, &[command_buffer]);
};

// Option B: Reset for reuse (only if pool was created with
// RESET_COMMAND_BUFFER flag).
unsafe {
    device.reset_command_buffer(
        command_buffer,
        CommandBufferResetFlags::empty(),
    )?;
};

// When you're done with the pool entirely:
unsafe { device.destroy_command_pool(command_pool, None) };
// This implicitly frees all command buffers allocated from it.

Command buffer states

A command buffer is always in one of these states:

                  allocate
   ┌────────────────────────────────┐
   v                                │
 Initial ──begin──> Recording ──end──> Executable ──submit──> Pending
   ^                                      │                      │
   │                                      │                      │
   └──────────── reset ───────────────────┘       (GPU finishes) |
   │                                                             │
   └─────────────────────────────────────────────────────────────┘
                    (returns to Executable or Initial)

State	What you can do
Initial	Nothing useful. Call `begin_command_buffer` to start recording.
Recording	Record commands (`cmd_*` methods). Call `end_command_buffer` when done.
Executable	Submit to a queue. Or reset to record again.
Pending	The GPU is executing it. Do not touch it. Wait for completion.

The most common mistake is trying to re-record or reset a command buffer while it is still pending (the GPU hasn’t finished yet). Validation layers will catch this.

Common patterns

One-shot command buffer for transfers

Many operations (uploading textures, transitioning image layouts) need a command buffer just once. The pattern:

use vulkan_rust::vk;
use vk::*;

unsafe fn one_shot_submit(
    device: &Device,
    pool: CommandPool,
    queue: Queue,
    record: impl FnOnce(CommandBuffer),
) -> VkResult<()> {
    // Allocate
    let alloc_info = CommandBufferAllocateInfo::builder()
        .command_pool(pool)
        .level(CommandBufferLevel::PRIMARY)
        .command_buffer_count(1);
    let cmd = unsafe { device.allocate_command_buffers(&alloc_info)? }[0];

    // Record
    let begin = CommandBufferBeginInfo::builder()
        .flags(CommandBufferUsageFlags::ONE_TIME_SUBMIT);
    unsafe { device.begin_command_buffer(cmd, &begin)? };
    record(cmd);
    unsafe { device.end_command_buffer(cmd)? };

    // Submit and wait
    let submit = SubmitInfo::builder()
        .command_buffers(&[cmd]);
    unsafe {
        device.queue_submit(queue, &[*submit], Fence::null())?;
        device.queue_wait_idle(queue)?;
    };

    // Free
    unsafe { device.free_command_buffers(pool, &[cmd]) };
    Ok(())
}

This is the pattern used for staging buffer uploads in the Memory Management chapter.

Per-frame command buffers

For rendering, you typically have one command buffer per frame in flight:

Frame 0: [record on CPU] ──submit──> [execute on GPU]
Frame 1: [record on CPU] ──submit──> [execute on GPU]
          ↑                              ↑
          recording while               executing the
          GPU runs the                  commands we
          previous frame                just submitted

Each frame waits for its fence before re-recording. See Synchronization for how fences and semaphores coordinate this.

Formal reference

Command pool creation flags

Flag	Meaning
`TRANSIENT`	Hint: command buffers from this pool are short-lived. Lets the driver optimize allocation.
`RESET_COMMAND_BUFFER`	Allows individual command buffers to be reset. Without this, you can only reset the entire pool.
`PROTECTED`	Command buffers allocated from this pool can operate on protected resources.

Command buffer begin flags

Flag	Meaning
`ONE_TIME_SUBMIT`	This buffer will be submitted once, then reset or freed. Enables driver optimizations.
`RENDER_PASS_CONTINUE`	Secondary command buffer: this will be entirely inside a render pass.
`SIMULTANEOUS_USE`	This buffer can be submitted to multiple queues or resubmitted while still pending.

Recording methods on Device

All recording methods follow the pattern device.cmd_*(command_buffer, ...). The device dispatches to the correct function pointer, the command_buffer identifies which buffer to record into. Examples:

Method	Purpose
`cmd_bind_pipeline(cb, bind_point, pipeline)`	Set the active pipeline
`cmd_draw(cb, vertices, instances, first_vert, first_inst)`	Draw without an index buffer
`cmd_copy_buffer(cb, src, dst, &[regions])`	Copy between buffers
`cmd_begin_render_pass(cb, &begin_info, contents)`	Start a render pass
`cmd_end_render_pass(cb)`	End the current render pass

The full list has ~150 cmd_* methods covering every Vulkan command.

Destruction rules

Wait for the GPU before freeing. A command buffer in the Pending state must not be freed or reset. Use a fence or device_wait_idle.
Destroying a pool frees all its buffers. You do not need to free command buffers individually before destroying their pool.
Pools are not thread-safe. If two threads record command buffers from the same pool, you must synchronize externally. The typical solution: one pool per thread.

SubmitInfo structure

SubmitInfo connects command buffers to synchronization primitives:

SubmitInfo {
    wait_semaphores    + wait_dst_stage_mask   ← "wait for these before starting"
    command_buffers                             ← "execute these"
    signal_semaphores                           ← "signal these when done"
}

The wait_dst_stage_mask specifies which pipeline stages must wait, not the entire submission. This enables the GPU to start early stages while still waiting for a semaphore on a later stage.

API reference links

Key takeaways

Commands are recorded, not executed. Recording is cheap CPU work; execution happens asynchronously on the GPU.
Command pools amortize allocation cost. One pool per queue family, typically one pool per thread.
Command buffers have states: Initial → Recording → Executable → Pending. Never touch a Pending buffer.
Use ONE_TIME_SUBMIT for throw-away work (uploads, transitions). Use per-frame buffers with fences for rendering.
The SubmitInfo struct is where command buffers meet synchronization. That connection is the topic of the next chapter.

Keyboard shortcuts

vulkan_rust Guide