As GPU driven rendering pipelines start to take shape, the old traditional way of dealing with descriptor heaps and views, which mainly live on the CPU side, start to break. We always want the GPU to do all the work, meaning the compute shaders should be able to find what is visible, the LOD to use, and what to draw. We prepare the GPU with a massive descriptor heap where every texture or resource ever needed for the lifetime of the application is made available for the GPU to use which should be addressable by the index. In addition, a bindless descriptor also provides a nice interface for developers since it is a cleaner method to query resources rather than the traditional method of juggling descriptor heap, views, ranges, and root signatures for each frame or draw call.
I want to talk more about why bindless is a better method manage descriptors in the general sense, so not much detail will be given to the GPU driven reasons.

Requires Shader Model 6.6+
The Traditional Way to Bind
It helps to understand why bindless is better in most cases when we compare to the traditional way of dealing with descriptors and heaps .
The Data Flow
Perhaps it is easier to understand the data pipeline with how textures in GPU works when working from the end to the beginning. This part of the blog will answer the question, "How do I get my texture image for Direct3D to use?"
Some intermediate steps are skipped, I am going over the process at a high level only!
HLSL
Let us start from the .hlsl side where we say that our shader needs an albedo map. At a minimum, we need Texture2D for the texture, and a shared SamplerState, which dictates how the GPU should read the texture data (filtering, wrap, LOD).
Texture2D albedo : register(t0);
SamplerState samp : register(s0);Root Signature
The Root Signature is where we define what types of data the shaders will expect. There is no memory backing this data, it is simply like a .h file for the GPU. We define Root Parameters types for the Root Signature, depending how we want the GPU to read the data. Keep note that the Root Signature itself only is limited to 64 DWORDs.
- Root Constants: Inlined 32 bit value with no heap needed. Fastest.
- Root Descriptor: GPU virtual address inlined, no heap indirection. Fast.
- Descriptor Table: A slice into a shader visible heap.
D3D12_ROOT_PARAMETER_TYPE_CBV -> b D3D12_ROOT_PARAMETER_TYPE_SRV -> t D3D12_ROOT_PARAMETER_TYPE_UAV -> u D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER -> s
Descriptor Heaps
Since the Root Signature will need a descriptor heap, this is where our application needs a shader visible heap for descriptors to live in. A descriptor is just GPU metadata that describes what a particular set of bytes do, in the Direct3D12 case, a ID3D12Resource.
D3D12_DESCRIPTOR_HEAP_DESC desc = {
.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
.NumDescriptors = 1000,
.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE,
.NodeMask = 0,
};
ID3D12DescriptorHeap* srvHeap;
device->CreateDescriptorHeap(&desc, IID_PPV_ARGS(&srvHeap));Descriptors
Let us assume we have a ID3D12Resource somewhere and committed it with device->CreateCommittedResource(...) already, which we will call `textureResource.
Also, making sure we have a descriptor for the shader resource view.
ID3D12Resource* textureResource;
D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc = {
.Format = DXGI_FORMAT_R8G8B8A8_UNORM,
.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D,
.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING,
.Texture2D = { .MipLevels = 1 },
};
Before we call the CreateShaderResourceView function, we need the free slot where the next available index into the heap is. Usually, we will wrap the descriptor and heap mechanism in some sort of data structure, but we will trivially just assume the first index is available and free.
After, then we finally create the descriptor from the heap created earlier.
// Previous code...
D3D12_CPU_DESCRIPTOR_HANDLE handle =
srvHeap->GetCPUDescriptorHandleForHeapStart();
device->CreateShaderResourceView(textureResource, &srvDesc, handle);Binding
For completion sake, let's just finish the data flow with the command list. In order, you have to ensure you do the following using the cmdList->...:
- Bind the heap (per frame)
SetDescriptorHeaps(1, &srvHeap); - Set the Root Signature parameter (per draw)
SetGraphicsRootDescriptorTable(0, gpuHandle);
The gpuHandle is used for shader access, while the cpuHandle are any CPU side operations. But both use a similar API, so make sure you are at the correct pointer location.
Now, the shader knows exactly where the texture is using the descriptor heap and resource.
The Problem
Do notice that if we happen to change the texture, then we have to do another call to SetGraphicsRootDescriptorTable. If we have many materials and textures, this will be hard to manage. In addition, for modern GPU driven rendering with ExecuteIndirect, we can't even use SetGraphicsRootDescriptorTable, so this is where the idea of Bindless Descriptors.
Going Bindless
The idea of a bindless descriptor is to have a massive descriptor heap that every single texture, constant buffer, and UAVs will be stored in.
Creating the Heap
It is the same function call like we did before, but now we are declaring a bigger NumDescriptors.
ID3D12DescriptorHeap* bindlessHeap;
D3D12_DESCRIPTOR_HEAP_DESC desc = {
.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
.NumDescriptors = 1000000, // MILLION
.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE,
};
device->CreateDescriptorHeap(&desc, IID_PPV_ARGS(&bindlessHeap));Allocations
You have to manage the allocations of the heap now. In other words, there must be a function, such as allocate() that will return the next free index slot for the bindless data structure heap.
Therefore, implementing a free list to track the next free slot is necessary in order to prevent fragmentation and not waste resources. You would also need to implement a free(), where the next free slot will be for that freed slot which will override that pointer position.
An Easier Root Signature
We still need to use a root parameter for our Root Signature, but it is now simplified. Do keep in mind the Num32BitValues , which is essentially the max size where we will index from later in the .hlsl shader.
D3D12_ROOT_PARAMETER1 param = {
.ParameterType = D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS,
.Constants = {.ShaderRegister=0, .RegisterSpace=0, .Num32BitValues=1},
.ShaderVisibility = D3D12_SHADER_VISIBILITY_ALL,
};The key part is using the flag D3D12_ROOT_SIGNATURE_FLAG_CBV_SRV_UAV_HEAP_DIRECTLY_INDEXED , which is definitely aptly named, and let's us reference the heap inside of the .hlsl with the keyword ResourceDescriptorHeap.
// Previous code...
D3D12_VERSIONED_ROOT_SIGNATURE_DESC rootDesc = {
.Version = D3D_ROOT_SIGNATURE_VERSION_1_2,
.Desc_1_2 = {
.NumParameters = 1,
.pParameters = ¶m,
.Flags = D3D12_ROOT_SIGNATURE_FLAG_CBV_SRV_UAV_HEAP_DIRECTLY_INDEXED,
},
};Per Draw
Afterwards, all we have to do is tell the command list to SetDescriptorHeaps the new bindless heap, and SetGraphicsRootSignature the root signature like always.
Then, per draw, we just SetGraphicsRoot32BitConstants and the continue to call the whatever draw function the application needs to do. Do note that you have to keep track of the index value of the free slot given from some bindless data structure wrapper so you know where the shader resource view actually lives in the giant heap.
uint32_t albedoSrvIndex = 3; // Hope you had this handy
// (0) root param index, (1) 32 bit value to set
cmdList->SetGraphicsRoot32BitConstants(0, 1, &albedoSrvIndex, 0);
cmdList->DrawIndexedInstanced(...);So yes, we still are setting the root constants for each draw, but it is way faster than having to set all kinds of descriptor tables. Here we are just pushing minimal bytes in order for the GPU to find our data.
HLSL Changes for Bindless
Our .hlsl slightly changes with this new method. Before, we are just declaring beforehand of the texture and sampler in the traditional method:
// @: Traditional way...
Texture2D albedo : register(t0);
SamplerState samp : register(s0);We actually get rid of the t0 and s0 identifiers in bindless, and really only need the b0 since the constant buffer is how the data gets referenced.
// @: New, shiny method!
struct DrawConstants {
uint albedoIndex;
};
ConstantBuffer<DrawConstants> rc : register(b0);Then notice that we are now using the ResourceDescriptorHeap in order to index into the bindless array (the 3 index in our example).
// @: New, shiny method!
// Previous code...
float4 PSMain(PSInput input) : SV_Target
{
Texture2D albedo = ResourceDescriptorHeap[rc.albedoIndex];
SamplerState samp = SamplerDescriptorHeap[0];
return albedo.Sample(samp, input.uv);
}How does it know it is 3? Recall that we pushed the 3 into the root parameter. This just says that we push 3 into root parameter 0 (first parameter), which is actually mapping to register(b0). Our shader reads the rc constant buffer as b0. Therefore, it is not a constant buffer in the traditional sense where we have to give the shader some per frame data like a mvp or lights, it just happens to be the fastest way to push the bytes needed for the shader to read the bindless heap.
// @: Remember this from earlier?
cmdList->SetGraphicsRoot32BitConstants(0, 1, &albedoSrvIndex, 0);Recap
Just to make sure everything makes sense, here is a summary. In the traditional binding method, we declare ranges in which some srv data lives on our heap and feed that into the root signature. We also have to ensure that the .hlsl matches what we expect from the t0 or whatever is needed. Then, we set the table per draw, and the for another draw, set another table range.
In bindless, we just have a giant heap, where we just push the index of where the srv lives in as root constants instead of tables in order for the .hlsl to find where the descriptors that are needed live.
It is a lot more convenient and simpler to manage resources with bindless, but there are some tradeoffs. The main one being that is harder to debug as a wrong index can just read garbage data from the heap and just proceed like nothing is wrong. There are also some waste since we have to commit the descriptor heap memory up front for as many descriptors we need for the lifetime of the application. Shader validation will also be an issue as a simple index can really be anything (such as a Texture2D or StructuredBuffer), so a runtime mismatch can occur here.
Bindless descriptors are still a good way to manage any renderer though despite the trade offs, and certainly easier to manage!