User Guide
A high-level overview of the concepts and functionality in Tephra.
This user guide aims to introduce the concepts in Tephra in a succinct and practical way. After reading it, you should be knowledgeable about how the API should be used and how to design effective, high-performance engines and applications around it. Most concepts will have example code to help illustrate common usage, with symbols linking to their respective documentation.
Paragraphs marked like this will offer insights into the relevant implementation details of the library and justifications for its design. It is not needed to read these when first learning the API, but being aware of the inner workings of any tool is necessary to master it.
Prior knowledge of computer graphics is assumed, but experience with low-level graphics APIs, such as Vulkan and Dx12, should not be required to understand this documentation.
However, this guide is still incomplete and may not touch on all the subjects sufficiently enough. Please reach out or submit an issue if you have a suggestion for how it should be improved.
Introduction
Tephra is a C++ library that provides a higher level abstraction layer around the Vulkan graphics API. It aims to bring some of the convenience, safety and low barrier of entry from older APIs without sacrificing much of the performance and control that Vulkan brings, all under a modern C++17 design. It is not a renderer, but a graphics interface akin to OpenGL or Dx11.
Vulkan is a low-level graphics and compute API developed by Khronos. Their aim was to bring an API that doesn't rely on complex drivers translating high-level functionality into the actual commands that need to be sent to the device. Such translation, for example in the OpenGL and Dx11 APIs, required the driver to track the used resources, make guesses about the future to insert synchronization and compile pipelines on the fly. An interface that allows the user to directly push those commands in a cross-platform and cross-vendor way is a major boon for bringing better performance and control. However, much of the same functionality that the driver used to do now needs to be implemented by the user. A simple demo that renders a single triangle on the screen takes more than a thousand lines of code and is hard to extend and maintain.
There is an unreachable goal of having the same convenience of the old APIs, but with the advantages of the new. Tephra tries to get as close to that goal as possible. It implements automatic synchronization and resource tracking much like the drivers used to do, but only for the high-level commands where it is needed the most. Low-level commands, like binds, draws and dispatches enjoy very low overhead and the possibility of multi-threaded recording. Tephra asks for more information from the user and earlier, compared to OpenGL, but it is still a lot less verbose than Vulkan. A similar demo could be written in Tephra in around 100 lines.
Besides this user guide, Tephra also has an extensive API documentation in Doxygen, which can be browsed here. Every symbol mentioned in this user guide also links to its documentation here. The generated documentation is also fully searchable - notice the search icon in the top right corner of this page.
Setup and build
Tephra is designed to be used as a statically loaded library. Differences in build configuration or minor/major library version between its interface and the source may break binary compatibility. It is recommended to always build the library within your solution. Tephra also accepts several preprocessor defines to toggle debug information, see ug-general-concepts-debugging-and-validation. These are only used in the library's source files and do not affect binary compatibility.
On Windows, the easiest way to build Tephra is to install the Vulkan SDK. The Visual Studio projects use the Vulkan headers and associated libraries through the VULKAN_SDK
environment variable set up by the SDK. On other platforms, Tephra can be built with CMake with Vulkan as a dependency. The minimum supported version of the Vulkan interface can be queried with tp::
Folder structure
/build
- Project files used for building the library, tests, examples and documentation./documentation/dox
- Documentation source files./documentation/html
- HTML output of this documentation./examples
- Example projects and demos showcasing the use of the library./include
- Include file directory of Tephra and the third party libraries used./include/tephra
- The core Tephra interface./include/tephra/tools
- Generic classes used to simplify the interface./include/tephra/utils
- Optional Tephra utilities that build upon the base interface.
/include/vma/
- Vulkan Memory Allocator include directory./include/interface_glue.hpp
- A user editable file for easier integration of the library.
/src
- The source code files for Tephra and its third party libraries./tests
- Testing suite for the library.
Additional resources
While familiarity of the Vulkan API should not be required, a broad understanding of how it works may help give some context to the rest of this user guide. Its ecosystem is very wide and a large part of it is applicable to the users of Tephra as well. Here is a brief list of relevant resources and material for further reading:
- Vulkan in 30 minutes - A nice introduction to the concepts in the API.
- VulkanGuide - One of the better tutorials that will guide you through the use of the Vulkan API in more detail.
- Vulkan specification - The main reference for the functionality of the Vulkan API.
- Vulkan hardware database - User reported list of the capabilities of every Vulkan compatible device. Very useful for figuring out which features, extensions and formats are commonly supported.
- The "Awesome Vulkan" repo - A comprehensive list of anything else you might ever need about Vulkan.
General concepts
This section introduces concepts and design choices that apply to the library as a whole. While they are important to understand and deserve an early mention, for a quick start, feel free to skip to the Initialization section and return back to this one later.
Interface tools
Tephra provides several tools to assist with forming its interface and allowing easier integration into existing codebases. The /include/interface_glue.hpp
file provides means for customizing this interface.
Many objects have a lifetime which must be managed by the user through ownership semantics. By default, tp::std::unique_ptr
in the interface_
file, but if needed, it can be changed to std::shared_ptr
or any custom owning pointer implementation. All ownable objects inherit from tp::
All arrays in the interface are represented by tp::tp::ArrayView<int> view = {1, 2, 3};
won't compile. tp::tp::someFunction({1, 2, 3});
These array views may also be constructed from existing arrays and objects with the tp::interface_
may be a good place to provide additional overloads to those functions for custom collections of contiguous elements. See /include/tephra/tools/array.hpp
for existing overloads for C style arrays, std::vector
and std::array
.
Another useful tool is the tp::
Debugging and validation
Drivers implementing the Vulkan API aren't required to check that it is being used correctly by the user. Incorrect behavior is more often than not unspecified and can lead to anything from driver crashes to working perfectly fine on your machine, the latter being more insidious. As such, validation layers are shipped as part of the Vulkan SDK, which can be then enabled during development. They are then able to validate correct usage without impacting performance when they are not needed.
Tephra validation works similarly. The library by default doesn't check for correct usage, only when built with the TEPHRA_
Tephra validation is far from complete. User errors or bugs in the library may silently manifest as incorrect usage of the Vulkan API, so it is recommended to also enable Vulkan validation during development. To allow Tephra to consume the resulting validation messages and direct them to the debug report handler, the tp::
The library also needs to be given a way to report validation and other kinds of messages. The tp::
Most Tephra functions that create an object also accept a debug name parameter. The name can be later used to better identify the objects in validation messages. If the tp::
It is recommended to name your objects and label command ranges extensively. It will be invaluable when debugging your application.
Object lifetime and hierarchy
There are two kinds of types in the base library: The first are pure data structures like tp::
Below is the parent-child hierarchy of Tephra's objects. Some of these objects follow special rules, as indicated by the symbols next to their names in [] and explained below.
- tp::
Application [FL]
Symbol legend:
- [F]: All children created from this object must be destroyed by the time the object itself is destroyed.
- [L]: All children created from this object must only be used locally, in the same context as its parent. For example, a tp::
Buffer created by a particular device must only be used in jobs of that device. Similarly job-local objects may only be used inside their parent job. - [E]: The object's lifetime must be extended during job recording. When the object is used inside a command recorded to a tp::
Job, the object must not be destroyed until the job is either enqueued or destroyed. The use of its children also counts, like when using the tp:: ImageView of an tp:: Image, the parent image must stay alive. - [N]: The object is not owned by the user, its lifetime is always managed by the library.
Vulkan requires most of its handles to stay alive during the entire time the GPU is using them, which, in a renderer setting, means keeping them around for several frames. Tephra handles this by extending the lifetime of all objects that hold such Vulkan handles. When an object gets destroyed, its handles get stored in a per-device container with information about the jobs that have been enqueued so far. During some device method calls, like tp::
This is done efficiently through the use of globally incrementing job IDs that are used as values for Vulkan's timeline semaphores. When a handle is to be destroyed, a value T gets assigned to it, which is the ID of the last enqueued job. To test whether it can be freed at a later time, the value T is compared to the state of every device queue. If the last signalled timeline semaphore value for that queue is greater than T, or if the queue has finished executing every job previously submitted to it, the handle is guaranteed not to be used and can be safely destroyed.
This method avoids tracking how the handles are actually used, but comes with the downside that the lifetime is extended regardless of whether the object has actually been used in recent jobs or not. For most handles this does not matter, but it may delay the release of potentially large amounts of memory held by buffers and images. An alternative solution for resources may be implemented in the future.
Thread safety
Whenever possible, Tephra offers thread safety by virtue of immutability. Many objects cannot be changed after they were created and so pose no harm being used from different threads at the same time. The default rule is that any object may be accessed by multiple threads if both of those acceses are read-only, such as through const methods or by being passed as a const reference. Recording a command like tp::
For the sake of convenience, the methods of tp::
Generally, pool objects like tp::
Examples of allowed multithreaded usage:
- Allocating objects from the same tp::
Device. - Recording commands that operate on the same object to different tp::
Job instances, as long as the jobs were created from different tp:: JobResourcePool instances. - Allocating tp::
DescriptorSet objects that refer to the same resource from different tp:: DescriptorPool instances. - Mapping and writing to disjoint regions of the same tp::
Buffer. - Recording commands to distinct tp::
CommandList objects that will execute within the same tp:: Job, as long as the lists are being recorded with different tp:: CommandPool instances. - Compiling pipelines on the same tp::
Device and using the same tp:: PipelineCache. - Destroying an object created from a pool while that pool is in use by another thread.
Examples of incorrect multithreaded usage:
- Destroying an object while another thread is still using it.
- Recording commands to the same tp::
Job or tp:: CommandList. - Recording commands to different tp::
Job instances that were created from the same tp:: JobResourcePool. - Recording commands to different tp::
CommandList instances that were created using the same tp:: CommandPool. - Allocating tp::
DescriptorSet objects from the same tp:: DescriptorPool. - Enqueuing different tp::
Job instances to the same tp:: DeviceQueue. - Enqueuing a tp::
Job while recording commands to another one that was created from the same tp:: JobResourcePool.
Vulkan interoperation
While this area is still in progress, the library intends to provide a high degree of interoperability with base Vulkan, so that bleeding-edge extensions and third party libraries can be used more comfortably. For providing additional extension-specific information to Vulkan, many functions and setup structures accept a void*
pointer that will be appended as the pNext
pointer to the relevant Vulkan call. Tephra enums that are marked as Vulkan-compatible can also accept Vulkan values for the corresponding enums that are added by extensions. The extensions offered in tp::
Some Tephra objects can be created from existing Vulkan handles. First, a handle tp::
Most objects also expose access to the internal Vulkan handles with methods like tp::
Initialization
Application
The first stepping stone is to create a tp::
The first is tp::
Two essential debugging parameters follow. tp::
The debug handler is an interface that will be used by the library when a validation message or an error occurs. For simplicity, there is a standard implementation of it in Tephra's utilities that outputs to a C++ stream, filtered by the chosen message severities and types. See Standard report handler. If all validation and messaging is disabled, the debug handler is unused.
Next, we can define any requested extensions. These can be either one of the predefined tp::
To finally create the application object, call the static method tp::
#include <tephra/tephra.hpp> #include <tephra/utils/standard_report_handler.hpp> int main() { bool debugMode = true; // Turn off for release auto debugHandler = tp::utils::StandardReportHandler(std::cerr, tp::DebugMessageSeverity::Warning | tp::DebugMessageSeverity::Error); // Request surface extension, so we can output to a window std::vector<const char*> appExtensions = { tp::ApplicationExtension::KHR_Surface }; auto appSetup = tp::ApplicationSetup( tp::ApplicationIdentifier("Tephra user guide"), tp::VulkanValidationSetup(debugMode), &debugHandler, tp::view(appExtensions)); std::unique_ptr<tp::Application> app; try { app = tp::Application::createApplication(appSetup); } catch (tp::RuntimeError) { // Not supported return; } }
Choosing a device
The main purpose of the tp::
std::vector<const char*> deviceExtensions = { tp::DeviceExtension::KHR_Swapchain }; const tp::PhysicalDevice* chosenDevice = nullptr; for (const tp::PhysicalDevice& candidateDevice : application->getPhysicalDevices()) { // Choose a discrete GPU that supports swapchains, geometry shaders and 32-bit depth buffers if (candidateDevice.type != tp::DeviceType::DiscreteGPU) { continue; } for (const char* ext : deviceExtensions) { if (!candidateDevice.isExtensionAvailable(ext)) continue; } if (!candidateDevice.vkQueryFeatures<VkPhysicalDeviceFeatures>().geometryShader) { continue; } auto depthCaps = candidateDevice.queryFormatCapabilities(tp::Format::DEPTH32_D32_SFLOAT); if (!depthCaps.usageMask.contains(tp::FormatUsage::DepthStencilAttachment)) continue; chosenDevice = &candidateDevice; break; } if (chosenDevice == nullptr) { // No physical device supported return; }
Creating a device
Creating a tp::
The second parameter, after the physical device pointer, is a list of device queues you wish to use with the device. tp::
- tp::
QueueType:: Transfer for transfer-only operations, like copying data from one resource to another. - tp::
QueueType:: Compute for compute workloads executing compute shaders. It also supports transfer operations. - tp::
QueueType:: Graphics for graphics workloads executing the graphics pipeline. It also supports compute and transfer operations. The graphics queue is the most powerful and a useful default, but may not be supported on some compute-only accelerator cards that do not support rendering.
A tp::
After you have selected the queues, you must select the extensions. These are similar to application extensions, but are drawn from tp::
Besides extensions, some functionality of the device needs to be enabled by the use of features. There is VkPhysicalDeviceFeatures as well as other feature structs provided by Vulkan that contain a boolean value for each feature that can be enabled. You've already seen in the previous example how the device support for a feature can be queried with tp::true
in a tp::
tp::VkFeatureMap features; features.get<VkPhysicalDeviceFeatures>().geometryShader = true;
Next up is an optional configuration of Tephra's internal memory allocator, VMA, which is used to satisfy all memory needs of resources you can request through the library. Because it is more efficient in Vulkan to allocate larger blocks of memory, but it's difficult to find the right block size for every application, it is available as a configurable parameter, set to 256 MB by default.
Finally, we can create the device using the prepared setup structure, but instead of using a static method, a device is created through a tp::
// Create one main queue and two transfer queues for this example tp::DeviceQueue mainQueue = tp::DeviceQueue(tp::QueueType::Graphics); tp::DeviceQueue copyQueues[] = { tp::DeviceQueue(tp::QueueType::Transfer, 0), tp::DeviceQueue(tp::QueueType::Transfer, 1) }; tp::DeviceQueue allQueues[] = { mainQueue, copyQueues[0], copyQueues[1] }; // We have already prepared the supported physical device, extensions and features auto deviceSetup = tp::DeviceSetup( chosenDevice, tp::view(allQueues), tp::view(deviceExtensions), &features); std::unique_ptr<tp::Device> = application->createDevice(deviceSetup);
Tephra's queues do not map one-to-one to Vulkan queues. A Vulkan physical device can expose any number of queues, which themselves don't necessarily map to the actual hardware queues. Therefore, Tephra offers the user to create as many queues of any supported type. If more queues are requested than what are available, they get assigned to Vulkan queues in a round-robin fashion. The details about the mapping of any particular tp::
Resources
Most meaningful operations that can be done on devices read and write data from objects in memory called "resources". In Tephra, a resource is either a tp::
Commands generally don't operate on resource objects directly, but instead through resource views - tp::size
bytes starting at some offset
of a tp::buffer->getView(offset, size);
and then bind the view instead. For convenience, resource views have a very similar interface to the resources themselves and a resource is also implicitly convertible to a default view of its entire range. The idea is that you can be pass these views around in your code whenever you don't need the ownership semantics of passing the actual resource.
Buffers
tp::
Buffers can be created through tp::
The second parameter describes the types of memory that the buffer may be allocated from. Different devices may have different types of memory available and as such it is important to be able to run your application efficiently on any device. Tephra exposes device memory locations along three different aspects. First, memory can be device-local. Every kind of memory that may be allocated through the library are accessible by the device, but only device-local locations can be used at peak performance. With discrete GPUs, this means the memory is present in VRAM, compared to CPU RAM for non-device-local memory. Second, only host visible memory locations are directly accessible by the host (the CPU). Host-visible memory can also be cached, which helps with host read performance.
These aspects are combined into 5 distinct tp::
Because the availability of memory locations may differ on different platforms, Tephra offers an additional layer of abstraction here. The second parameter of tp::
- tp::
MemoryPreference:: Device guarantees that only device-local memory will be allocated, otherwise memory allocation error is thrown. This preference should be used when the resource does not need to be directly accessible by the host, but fast access by the device is needed. - tp::
MemoryPreference:: Host can be used for resources that should live in host memory. Meant for large data that is being read by the device infrequently and shouldn't be wasting the potentially limited device-local, host visible memory. This is the best progression for staging buffers used to copy data to device-local memory. - tp::
MemoryPreference:: UploadStream should be used for priority resources that are written to by the host and need to be read by the device with low latency. If device locality is required, the resulting memory location of the allocation should be checked for a potential fallback to be used as a staging buffer. - tp::
MemoryPreference:: ReadbackStream is to be used for priority resources that are written to by the device and need to be read by the host with low latency.
Buffers that were allocated from a location that is visible by the host - so all of them besides tp::
template <typename T> void copyDataToBuffer(tp::BufferView& buffer, const std::vector<T>& data) { tp::HostMappedMemory memory = buffer.mapForHostAccess(tp::MemoryAccess::WriteOnly); std::copy(data.begin(), data.end(), memory.getPtr<T>()); }
Note that the device executes work asynchronously from the host and low-level graphics APIs like Vulkan do not abstract away that fact. Any data you write to a buffer on the CPU that is then further accessed by the GPU must not be overwritten until all of those accesses finish. For uploading temporary or frequently changing data, Tephra offers a safer and more convenient method described in Job-local resources.
Regular buffer views can be created through tp::createTexelView
rather than getTexelView
), but are still cheap to copy once created.
// Rounds the value to the nearest larger multiple of m. template <typename T> constexpr T roundUpToMultiple(T v, T m) { return ((v + m - 1) / m) * m; } // Container for vertex and index data struct Mesh { const std::vector<std::byte>* vertexData; const std::vector<std::byte>* indexData; }; // Holds vertex and index data for multiple meshes all in a single buffer struct VertexIndexBuffer { std::unique_ptr<tp::Buffer> buffer; std::vector<tp::BufferView> meshVertices; std::vector<tp::BufferView> meshIndices; VertexIndexBuffer(tp::Device* device, const std::vector<Mesh>& meshes, const char* name) { // For now put the buffer into host-visible memory so we can map and write to it directly. // Later on you will see how to use staging buffers to upload data to resources in // device-only memory. const tp::MemoryPreference& memory = tp::MemoryPreference::UploadStream; // We can put both vertex and index data in one buffer tp::BufferUsageMask usage = tp::BufferUsage::HostMapped | tp::BufferUsage::VertexBuffer | tp::BufferUsage::IndexBuffer; std::size_t alignment = tp::Buffer::getRequiredViewAlignment(device, usage); std::size_t bufferSize = 0; // Suballocate all data from one buffer, ensuring correct alignment for (const Mesh& mesh : meshes) { bufferSize = roundUpToMultiple(bufferSize + mesh.vertexData->size(), alignment); bufferSize = roundUpToMultiple(bufferSize + mesh.indexData->size(), alignment); } // Create the buffer now that we know the full size buffer = device->allocateBuffer({ bufferSize, usage }, memory, name); // Store the views to all sections and write their data std::size_t offset = 0; for (const Mesh& mesh : meshes) { meshVertices.push_back(buffer->getView(offset, mesh.vertexData->size())); copyDataToBuffer(meshVertices.back(), *mesh.vertexData); offset = roundUpToMultiple(offset + mesh.vertexData->size(), alignment); meshIndices.push_back(buffer->getView(offset, mesh.indexData->size())); copyDataToBuffer(meshIndices.back(), *mesh.indexData); offset = roundUpToMultiple(offset + mesh.indexData->size(), alignment); } } };
Images
tp::
The tp::
You can also specify the format of the image and the extent in texels across all three dimensions. The values in the extra dimensions beyond the type of the image must be set to 1. Images can also be created with mipmaps, with each level having half the extent of the previous one, rounded up. The number of mip levels and the number of array layers are given next. Note that for 3D images, the number of array levels must be set to 1. 2D images can also have a multisampling level set above x1 for antialiasing purposes.
Image views can be created with a different format than that of the parent image. To allow that, all the required formats must be listed in the compatibleFormats
field. The formats must be in the same tp::
To create a tp::
The easiest way to specify a subresource range is to call tp::baseMipLevel
and baseArrayLevel
will always be 0). This range can then be reduced with tp::pickLayers
, tp::pickMipLevels
. A subresource range with only one mip level, but potentially multiple array layers is defined as tp::
struct Cubemap { std::unique_ptr<tp::Image> image; tp::ImageView cubemapView; std::array<tp::ImageView, 6> sliceViews; Cubemap(tp::Device* device, uint32_t faceSize, tp::Format format, const char* name) { tp::ImageUsageMask usage = tp::ImageUsage::SampledImage | tp::ImageUsage::TransferDst; auto extent = tp::Extent3D(faceSize, faceSize, 1); // Include mipmap chain uint32_t mips = static_cast<uint32_t>(std::ceil(std::log2(faceSize))); // 6 faces to a cubemap uint32_t arrayLayerCount = 6; // Create a cubemap-compatible 2D image array auto setup = tp::ImageSetup(tp::ImageType::Image2DCubeCompatible, usage, format, extent, mips, arrayLayerCount); image = device->allocateImage(setup, name); // Default image view will consider this image as a 2D image array, // so create a cubemap view: auto cubemapViewSetup = tp::ImageViewSetup( tp::ImageViewType::ViewCube, image->getWholeRange()); cubemapView = image->createView(cubemapViewSetup); // Also create views for each slice: for (uint32_t i = 0; i < arrayLayerCount; i++) { auto sliceViewSetup = tp::ImageViewSetup( tp::ImageViewType::View2D, image->getWholeRange().pickLayer(i)); sliceViews[i] = image->createView(sliceViewSetup); } } };
Resource views in Tephra are intended to be easy and relatively cheap to create, even image and texel buffer views. This does not directly fit Vulkan's model of resource views, which have handles that must be explicitly created and destroyed. To facilitate this, all required view handles are created and owned by their parent resources. They are cached and reused, so requesting the same view twice doesn't create any duplicate Vulkan handles. These handles only get destroyed when the parent resource gets destroyed. There is currently no way to clean them up earlier, but this shouldn't be an issue for most use cases that only create at most a couple dozen views per image or texel buffer views for buffers.
Submitting work
Jobs
To have the device perform any kind of work, a tp::
To create jobs, a tp::
auto jobPoolSetup = tp::JobResourcePoolSetup(mainQueue); std::unique_ptr<tp::JobResourcePool> mainJobPool = device->createJobResourcePool(jobPoolSetup, "Main job pool");
A single job pool should be used repeatedly for many jobs. It works best for similar tasks, like rendering a scene every frame, where each frame's jobs use a similar amount of resources that can be efficiently reused from frame to frame. By default, the allocated memory is only released when the pool gets destroyed, but anything that has been unused for a certain amount of time can be released manually by calling tp::
The pool can be used, through tp::waitJobSemaphores
parameter of tp::
Enqueued jobs don't actually start executing until they have been submitted. To submit all the jobs that have been enqueued to a particular queue so far, call tp::
// Create and record the job tp::Job job = mainJobPool->createJob({}, "Example job"); recordSomeCommands(job); // Enqueue the job to finalize the recording tp::JobSemaphore semaphore = device->enqueueJob(mainQueue, std::move(job)); // Finally submit it for execution and, for demonstration purposes, immediately wait for it to be // done on the device device->submitQueuedJobs(mainQueue); device->waitForJobSemaphores({ semaphore });
tp::
The reason timeline semaphores are used instead of the usual binary semaphores is that the latter is limited to one signal - one wait pair, meaning you would have to specify how many times you will wait on any given semaphore ahead of time, which would be very inconvenient. A single device-wide counter instead of separate counters for each queue make it simple to determine the set of enqueued or submitted jobs that an object could have been used in, simplifying the extension of lifetimes of Vulkan handles, as described by the implementation note in the Object lifetime and hierarchy section.
Job command recording
A tp::cmd
are the command functions. They record work into the job in the order that they are called, which is then the same order that they will be executed in. The other kind of functions serve to create job-local resources, to be discussed in a later section.
The command recording design in Tephra has two levels. The commands recorded into a tp::
A single pass represents a scope of commands that, from the job's point of view, consumes data in a set of resources as input to write some data to another set of resources as output. A render pass defines a set of attachments, aka "render targets", that its draw commands can render to. For example, rendering a single shadow map will likely be represented as a single render pass with many draw commands inside for drawing all the visible geometry. A tonemapping post-processing pass could also be a render pass with a single draw command for rendering a full-screen quad (or triangle). A compute pass similarly needs to be provided a list of resources that the compute dispatches inside will be writing to. Resources that are only used as input don't need to be listed explicitly in render and compute passes, so long as they were exported beforehand with tp::
The calls to tp::
Sometimes the scalability of deferred recording of multiple command lists is overkill if all you want to do is a single draw for your post-processing. For that there is another, more convenient way of recording commands to a pass: Inline callbacks. Instead of providing an array of command lists, you can pass a function that will be called by Tephra to record the commands just-in-time during tp::
As mentioned above, command functions in tp::
Job command recording is somewhat more interesting. Job commands, along with all their data, get recorded into an internal command buffer. This buffer is composed of 4kB blocks of memory that are allocated as needed. Each command can store arbitrary amounts of data (either stored inline if small enough, otherwise using a separate allocation). These allocations get reused for subsequent jobs created from the same pool.
This command buffer gets replayed during tp::
Compute Passes
Compute passes are the simpler of the two passes that can be executed inside a tp::
The second parameter for tp::
Alternatively, you can pass a tp::
tp::
Then there are tp::
There can be multiple dispatches inside a single compute pass, but beware that any execution and memory dependencies between the dispatches need to be synchronized manually, such as when a later dispatch reads data written by the previous one. This can be handled by calling tp::
If manual synchronization seems daunting, you can always split the dispatches into separate compute passes, which will then get handled automatically, as long as the accesses in each pass are defined properly.
// Divides v by d and rounds up to the nearest integer template <typename T> constexpr T divideRoundUp(T v, T d) { return (v + d - 1) / d; } // Records commands to perform a separable gaussian blur with a compute shader class SeparableBlur { public: SeparableBlur(tp::Device* device) { // Shader pipeline initialization here, see later examples } // Blur the given image in-place using the provided temporary image. Records to the job inline. void doBlur(tp::Job& job, const tp::ImageView& inOutImage, const tp::ImageView& tempImage) { // Get the size of the image, assume the temporary image is compatible tp::Extent3D extent = inOutImage.getExtent(); // Horizontal pass samples inOutImage and writes to tempImage tp::ImageComputeAccess horizontalPassAccesses[] = { { inOutImage, inOutImage.getWholeRange(), tp::ComputeAccess::ComputeShaderSampledRead }, { tempImage, tempImage.getWholeRange(), tp::ComputeAccess::ComputeShaderStorageWrite } }; tp::DescriptorSetView horizontalPassResources = job.allocateLocalDescriptorSet( &blurPassDescriptorLayout, { inOutImage, tempImage }); job.cmdExecuteComputePass( tp::ComputePassSetup({}, tp::view(horizontalPassAccesses)), [=](tp::ComputeList& computeList) { computeList.cmdBindComputePipeline(blurPassPipeline); // Bind inOutImage to input slot, tempImage to output slot computeList.cmdBindDescriptorSets(blurPassPipelineLayout, { horizontalPassResources }); // Set the horizontalPass shader push constant to true computeList.cmdPushConstants(blurPassPipelineLayout, tp::ShaderStage::Compute, true); // In a horizontal pass, each workgroup will blur 256x1 pixels computeList.cmdDispatch(divideRoundUp(extent.width, ShaderWorkgroupSize), extent.height); }, "Blur horizontal pass"); // Vertical pass samples tempImage and writes back to inOutImage tp::ImageComputeAccess verticalPassAccesses[] = { { tempImage, tempImage.getWholeRange(), tp::ComputeAccess::ComputeShaderSampledRead }, { inOutImage, inOutImage.getWholeRange(), tp::ComputeAccess::ComputeShaderStorageWrite } }; tp::DescriptorSetView verticalPassResources = job.allocateLocalDescriptorSet( &blurPassDescriptorLayout, { tempImage, inOutImage }); job.cmdExecuteComputePass( tp::ComputePassSetup({}, tp::view(verticalPassAccesses)), [=](tp::ComputeList& computeList) { computeList.cmdBindComputePipeline(blurPassPipeline); // Bind inOutImage to input slot, tempImage to output slot computeList.cmdBindDescriptorSets(blurPassPipelineLayout, { verticalPassResources }); // Set the horizontalPass shader push constant to false computeList.cmdPushConstants(blurPassPipelineLayout, tp::ShaderStage::Compute, false); // In a vertical pass, each workgroup will blur 1x256 pixels computeList.cmdDispatch(extent.width, divideRoundUp(extent.height, ShaderWorkgroupSize)); }, "Blur vertical pass"); } private: // The workgroup size of the shader static constexpr uint32_t ShaderWorkgroupSize = 256; tp::PipelineLayout blurPassPipelineLayout; tp::DescriptorSetLayout blurPassDescriptorLayout; tp::Pipeline blurPassPipeline; };
Render Passes
Render passes are the graphics counterpart of compute passes. As mentioned before, a render pass is a collection of consecutive rendering commands that share the same set of attachments, aka render targets. You can draw thousands of objects and millions of triangles within a single render pass, for example when drawing the main camera's view to a color and depth buffer, or even just one or two triangles for a full-screen effect.
To record a render pass, use the tp::
The tp::DontCare
. Discarding or clearing are likely going to be the fastest options, but should only be used if you know you won't need the existing contents. Similarly, the tp::DontCare
, which can be useful, for example, for a depth buffer that is used just for the depth tests and its contents aren't needed afterwards. If you selected the tp::
tp::
The attachment setup structures are also the place to request multisampled attachments to be resolved. The resolveImage
parameter can be set to another image with identical parameters, just without multisampling, to resolve the multisampled attachment to it at the end of the render pass. tp::
The tp::viewMask
parameter to a non-zero value.
Recording commands is similar to compute passes and was described in the previous section. The main differences is that we are using tp::
// Showcase of a simple render pass with a multisampled color and depth buffer with resolve class RenderPassExample { public: explicit RenderPassExample(tp::MultisampleLevel multisampleLevel) : multisampleLevel(multisampleLevel) { // Assume we're always dealing with multisampling in this example assert(multisampleLevel != tp::MultisampleLevel::x1); } // Prepares a pipeline for use in this render pass void setupPipeline(tp::GraphicsPipelineSetup& setup) { // Set pipeline attachment formats and our multisample level setup.setDepthStencilAttachment(depthFormat); setup.setColorAttachments({ colorFormat }); setup.setMultisampling(multisampleLevel); // We could also set other pipeline settings here that will be common to the render pass, // like blending modes or multi-view rendering } // Adds the render pass to the job and allocates resources for it void setupPass(tp::Job& job, const tp::ImageView& resolvedImage) { assert(resolvedImage.getFormat() == colorFormat); // Create the extra attachments as job-local images, see the next chapter for details auto imageSetup = tp::ImageSetup(tp::ImageType::Image2D, tp::ImageUsage::ColorAttachment, colorFormat, resolvedImage.getExtent(), 1, 1, multisampleLevel); tp::ImageView colorImage = job.allocateLocalImage(imageSetup, "Multisampled color"); imageSetup.usage = tp::ImageUsage::DepthStencilAttachment; imageSetup.format = depthFormat; tp::ImageView depthImage = job.allocateLocalImage(imageSetup, "Multisampled depth"); // Let's clear the images as part of the render pass tp::ClearValue clearColor = tp::ClearValue::ColorFloat(0.0f, 0.0f, 0.0f, 0.0f); tp::ClearValue clearDepth = tp::ClearValue::DepthStencil(1.0f, 0); // We clear the depth and color images, but we don't need the data after the render pass auto depthAttachment = tp::DepthStencilAttachment(depthImage, false, tp::AttachmentLoadOp::Clear, tp::AttachmentStoreOp::DontCare, clearDepth); // We resolve the color attachment auto colorAttachment = tp::ColorAttachment(colorImage, tp::AttachmentLoadOp::Clear, tp::AttachmentStoreOp::DontCare, clearColor, resolvedImage, tp::ResolveMode::Average); // Record the render pass, no additional non-attachment accesses to declare auto renderPassSetup = tp::RenderPassSetup( depthAttachment, tp::viewOne(colorAttachment), {}, {}); // Record to a list this time, rather than using an inline callback job.cmdExecuteRenderPass(renderPassSetup, { tp::viewOne(renderList) }); // We'll need a command pool for that commandPool = job.createCommandPool(); } // Draws objects to the prepared renderList after setupPass gets called and the job is enqueued, // but before it is submitted void drawObjects(const std::vector<Object>& objects, tp::Viewport viewport, tp::Rect2D scissor) { renderList.beginRecording(commandPool); renderList.cmdSetViewport({ viewport }); renderList.cmdSetScissor({ scissor }); for (const Object& object : objects) { // Object's draw method here is responsible for binding pipelines compatible with the // render pass (ones that called setupPipeline) object.Draw(); } renderList.endRecording(); } private: static const tp::Format depthFormat = tp::Format::DEPTH32_D32_SFLOAT; static const tp::Format colorFormat = tp::Format::COL32_B8G8R8A8_UNORM; tp::MultisampleLevel multisampleLevel; tp::CommandPool* commandPool; tp::RenderList renderList; };
Job-local resources
While the Tephra device provides means to allocate persistent resources that can be used at any time until they are destroyed, a tp::
Job-local buffers, images and descriptor sets can only be used within the scope of the job they were allocated from. The cannot be used in commands of other jobs and job-local buffers and images cannot be exported to other queues. They also internally don't get created until the job gets enqueued. The visible consequence of this is that persistent descriptor sets can only be created out of job-local resources after the parent job has been enqueued. Job-local descriptor sets exist to circumvent this problem, as their creation is also deferred to when the job gets enqueued.
Pre-initialized buffers, on the other hand, are created the moment they are allocated from the job and are primarily meant to be used for conveniently uploading data to the device. They can serve either as temporary staging buffers with data that just gets copied over to an image, or for other kinds of data that is only useful in this job, such as shader constants. The lifetime of pre-initialized buffers still ends when the job finishes executing and their memory cannot be safely accessed after the job has been submitted. For that reason they are not suitable for any readback of data to the host, where persistent buffers must be used. See also Growable ring buffer.
Otherwise, tp::
The way job-local and pre-initialized resources get allocated and aliased can be controlled through the tp::
// Records commands to the given job to upload data to the first mip level of the image and // generates the rest of the mip chain. void uploadTex(tp::Job& job, const tp::ImageView& image, const std::vector<std::byte>& data) { // Allocate a temporary staging buffer for the job. auto stagingBufferSetup = tp::BufferSetup( data.size(), tp::BufferUsage::HostMapped | tp::BufferUsage::ImageTransfer); tp::BufferView stagingBuffer = job.allocatePreinitializedBuffer( stagingBufferSetup, tp::MemoryPreference::Host); { // Copy the data to the staging buffer. Can also be done later, at any point until the job // gets submitted. tp::HostMappedMemory memory = stagingBuffer.mapForHostAccess(tp::MemoryAccess::WriteOnly); memcpy(memory.getPtr<std::byte>(), data.data(), data.size()); } // Record a command to copy the data to the first mip level of the image. tp::ImageSubresourceRange imageRange = image.getWholeRange(); auto copyRegion = tp::BufferImageCopyRegion( 0, imageRange.pickMipLevel(0), tp::Offset3D(0, 0, 0), image.getExtent()); job.cmdCopyBufferToImage(stagingBuffer, image, { copyRegion }); // Build mipmap chain by blitting to each mip level from the last. for (int targetMip = 1; targetMip < imageRange.mipLevelCount; targetMip++) { int sourceMip = targetMip - 1; auto blitRegion = tp::ImageBlitRegion( imageRange.pickMipLevel(sourceMip), { 0, 0, 0 }, image.getExtent(sourceMip), imageRange.pickMipLevel(targetMip), { 0, 0, 0 }, image.getExtent(targetMip)); job.cmdBlitImage(image, image, { blitRegion }); } // Export it for reading it as a texture job.cmdExportResource(image, tp::ReadAccess::FragmentShaderSampled); }
The job-local resource implementation needs to do the following to optimize for memory usage and performance:
- Recycling: The backing resources should be reused in subsequent jobs created from the same pool. Creating Vulkan resources is potentially expensive and recycling allows to have zero such allocations on stable periodic workloads.
- Suballocation: Multiple compatible requested resources should be served by a single backing resource to further reduce overhead.
- Aliasing: If multiple compatible requested resources aren't being used at the same time, they can be assigned to the same region of a backing resource. This can reduce memory usage over a naive approach, potentially at the cost of additional synchronization. Tephra aliases on the resource level, rather than on the memory level.
Job-local buffers and images implement all three. The suballocation of images works over layers. If you request two identical job-local images, then Tephra will create a single VkImage
resource with two layers, if possible, and the images cannot be aliased together into just one layer. The tp::
Each command recorded into a job that operates on a job-local resource marks that resource with the command's index. Each such resource of a job then keeps the minimum and maximum indices of the commands they were used in, which defines the usage range of the resource. Export operations are special and leave the maximum index unbound.
Requested resources are first sorted into "backing groups" by compatibility. Each group has a list of backing resources that are used to fulfill the requests. Those requests are allocated from the backing resources with respect to their usage range. The algorithm for this is contained in the AliasingSuballocator
class. Since it is a greedy algorithm, the list of requested resources are first sorted by size in a descending order, so that the large resources are allocated first and don't have large allocations "stolen" from them by small resources. The algorithm then assigns each resource, one by one, to the leftmost available space that it fits in. Anything left over will prompt the creation of a new backing resource. Recycling works trivially, since jobs allocated from the same pool can never overlap.
Pre-initialized resources work somewhat differently, since their lifetime starts at the moment of the tp::
The pre-initialized buffers can only be released for recycling once the job finishes executing on the device. This is why we are using ring buffers, but care must be taken to allow recording multiple jobs from the same pool at the same time and enqueuing them in any order. In that case we can't rely on the buffers becoming available in allocation order. To resolve this, each job claims exclusive access to its tp::
Synchronization
Within the scope of a job, Tephra synchronizes accesses fully automatically, like in OpenGL. Beyond it, however, it needs some help from the user. We've already discussed tp::
Another such responsibility on the side of the user is to describe the accesses of resources that happen inside command lists, which the library does not analyze for performance reasons. However, even this is not as daunting as it may seem. It was mentioned that render and compute passes need a list of resources that they will access and how. The library also has a more convenient way that declares read-only accesses for all future passes - job export operations.
tp::
tp::ImageView texture = job.allocateLocalImage(setup); // Render something to the texture and bind it for use in future shaders. // The implementation of these functions isn't important for now. renderToTexture(job, texture); bindTexture(binding, texture); // The above is all you would need to do in traditional APIs, but Tephra also requires an export operation // to expose the results of the rendering to a set of read accesses, in this case to the binding: job.cmdExportResource(texture, binding.getReadAccessMask()); // Now we can do some other renders that use the above texture through the binding, or any other binding // of the same type. renderToScreen(job); // Let's say later on we want to copy from this texture, which is an access we haven't exported to. // For job commands like this one, that is still legal, but it invalidates the above export. job.cmdCopyImage(texture, someOtherTexture, regions); // We must re-export now to allow accessing the texture through the binding again job.cmdExportResource(texture, binding.getReadAccessMask());
Exports are also needed when you want to access the contents of a resource from a different queue than the one that has last written to it, even with all the proper semaphore synchronization. A cross-queue export just takes an extra parameter, specifying the queue type that the contents should be exported to. After that, the resource and its data can be accessed from any queue of that type, as long as it was also properly synchronized with a semaphore. Invalidating the export, such as by accessing the resource through a different access type, will make the resource's contents inaccessible to other queues again. A cross-queue export with an empty access mask is therefore enough to just transfer ownership to another queue.
Note that an export is needed even if the queue types of the reader and the writer queue match. A situation where an export isn't required is the case where the current contents of the resource aren't important and can be discarded. The export may be omitted then, but semaphore synchronization is still necessary.
The last use of an export operation is for readback of data to the CPU. To be able to read back the contents of a buffer in host-visible memory, you must export it with the tp::
As a potential optimization for tp::
These paragraphs contain a brief explanation of how automatic synchronization is handled in Tephra. Knowledge of Vulkan synchronization concepts is assumed. If you aren't familiar with Vulkan semaphores and barriers, feel free to skip this section.
Let's first consider the synchronization needed within the scope of a single tp::
To do this, we track certain state for each resource in what is called an access map. It stores, for any subresource range, information about the last accesses made to it, as well as what barriers were already used to synchronize them, if any. The latter allows us to re-use existing barriers efficiently. To process the next command, we first find any previous accesses that intersect the command's accesses to the resource and extend the barrier list with the proper synchronization between them. Afterwards, we update the access map with the new accesses to sync any future commands.
There are usually multiple ways two accesses can be synchronized in the presence of other commands and barriers in between. Tephra tries to minimize the number of pipeline barriers and otherwise inserts any new barriers as late as possible in the command buffer. It does not attempt to reorder the commands, as that is best left in the hands of the user. After we know what barriers to insert and where, we iterate over the job's commands again, but this time we translate them to Vulkan commands into the job's primary command buffer while inserting the appropriate barriers. This is also when inline callbacks of compute and render passes get invoked.
The access maps are kept persistent within each queue, so that we can also naturally ensure correct synchronization against accesses of previous jobs in the same queue. When it comes to accessing resources from other queues, we need the appropriate export operation to be able to properly communicate the correct image layouts and potentially issue special queue family ownership transfer barriers. This is done through simple message passing between queues. Each one has its own access maps with local state, but on export it can broadcast a part of that state of a particular resource range to all queues of the chosen queue type. The queues consume these broadcasts at the start of every submit, updating their own access map.
Resource descriptors
Descriptor set layouts
Descriptors in Vulkan, and by extension in Tephra, facilitate the binding of resources for use in shaders. Rather than binding resources one at a time, you create and bind entire sets at once. Multiple descriptor sets can be bound at the same time, which is why resource bindings are identified by both their descriptor set number and the binding index inside that set. All resource bindings declared in a shader must also be defined by a tp::
To create a tp::layout (set = 0, binding = 1) uniform sampler2D tex;
can be in Tephra represented as tp::DescriptorBinding(1, tp::DescriptorType::CombinedImageSampler, tp::ShaderStage::Fragment)
when creating a layout for the set 0. The order of descriptor bindings passed to tp::
A tp::
Descriptor sets
With descriptor set layouts in hand, we can look at how to actually allocate and bind tp::
Descriptor sets are allocated from a tp::
A tp::arraySize
of the corresponding binding. These descriptors are then tightly packed in the list.
For example, if we have a tp::
tp::DescriptorSetLayout layout = device->createDescriptorSetLayout({ tp::DescriptorBinding(0, tp::DescriptorType::UniformBuffer, tp::ShaderStage::Vertex), tp::DescriptorBinding(1, tp::DescriptorType::UniformBuffer, tp::ShaderStage::Fragment), tp::DescriptorBinding(2, tp::DescriptorType::Sampler, tp::ShaderStage::Fragment), tp::DescriptorBinding(3, tp::DescriptorType::SampledImage, tp::ShaderStage::Fragment, 4), });
Then we can allocate a tp::
std::vector<tp::Descriptor> descriptors; descriptors.push_back(vertexConstants); descriptors.push_back(fragmentConstants); descriptors.push_back(linearSampler); for (int i = 0; i < 4; i++) { descriptors.push_back(textures[i]); } auto descSetSetup = tp::DescriptorSetSetup(tp::view(descriptors)); tp::DescriptorSet descriptorSet; descriptorPool->allocateDescriptorSets(&layout, { descSetSetup }, { &descriptorSet });
Descriptors cannot reference job-local resources before the job has been enqueued - this is because internally, the resources don't actually exist until then. This is a fairly common use case, however. For that reason, you can also create job-local descriptor sets with tp::
By default, all descriptors provided to tp::
If you're familiar with Vulkan descriptor sets, you might notice that unlike those, Tephra's descriptor sets are immutable. Mutating descriptor sets in Vulkan involves waiting until they are no longer in use by the device, something we generally want to avoid. The common solution then is to just allocate a new set now and recycle the old one later. Tephra just embraces this pattern. When mutability might be convenient, there is the tp::
The descriptor set allocator separates the sets by their layout. This makes all the descriptor sets allocated from each pool have the same size and simplifies the allocation algorithm. Job resource pools internally have the same descriptor pool for serving job-local descriptor set allocations. Their allocation just gets deferred until the job is enqueued.
Binding descriptor sets
tp::
Dynamic offsets are used for tp::
Calling tp::tp::
, the contents of the descriptor sets simply get copied according to the offset.
The consequence of this is that frequently changing descriptor set layouts should be assigned to a higher set number than ones that are shared among many pipeline layouts. For example, it makes sense to put all the "global" bindings that can be used by many different pipelines, such as shadow maps, to set number 0, while various material-dependent bindings ought to be in higher set numbers. That way, changing "material" descriptor set layout won't disturb the "global" descriptor set layout.
Pipelines
Shaders
Tephra, just like Vulkan, consumes shaders in SPIR-V. This language, rather than being human readable, serves as an easy-to-parse intermediate representation in a binary format. It is the user's responsibility to compile shaders from other shader languages, like GLSL or HLSL, to SPIR-V using external tools. The Vulkan SDK contains Khronos glslangValidator and Microsoft DXC binaries for compiling GLSL / HLSL to SPIR-V, respectively. If you are already familiar with these languages, note that their use for Vulkan carries some differences. Consult these pages for GLSL and HLSL for details about how they translate to SPIR-V.
tp::
Compute pipelines
To compile a compute tp::
The constructor of tp::
To compile compute pipelines, call tp::
A compute pipeline can be bound to the current state of a compute list with tp::
Graphics pipelines
Graphics tp::
In the constructor, tp::
Graphics pipelines can only be bound and used within render passes that have attachment image formats matching those declared as part of the pipeline setup. You can do so with the tp::
Another commonly changed state is an array of tp::
There are many other configurable states affecting all parts of the graphics pipeline. One of the more important ones is tp::enable
to true enables depth operations with the given depth test comparison operator. enableWrite
also optionally enables depth writes. If depth (or stencil) operations are enabled, the pipeline setup must also describe a valid depth attachment.
By default, face culling is also disabled. That can be changed by calling tp::
Attachment blending also needs to be explicitly enabled with either tp::
Some state can also be declared as dynamic through tp::
// Read compiled SPIR-V shaders from disk std::vector<uint32_t> vertexShaderCode = loadShader("vsExample.spv"); std::vector<uint32_t> fragmentShaderCode = loadShader("fsExample.spv"); tp::ShaderModule vertexShader = device->createShaderModule(tp::view(vertexShaderCode)); tp::ShaderModule fragmentShader = device->createShaderModule(tp::view(fragmentShaderCode)); auto vertexShaderSetup = tp::ShaderStageSetup(&vertexShader, "main"); auto fragmentShaderSetup = tp::ShaderStageSetup(&fragmentShader, "main"); // Use an already prepared pipeline layout auto pipelineSetup = tp::GraphicsPipelineSetup( &pipelineLayout, vertexShaderSetup, fragmentShaderSetup); // Set the formats of the attachments that will be used pipelineSetup.setDepthStencilAttachment(tp::Format::DEPTH32_D32_SFLOAT); pipelineSetup.setColorAttachments({ tp::Format::COL32_B8G8R8A8_UNORM }); // Back face culling pipelineSetup.setCullMode(tp::CullModeFlag::BackFace); // Depth test without writing pipelineSetup.setDepthTest(true, tp::CompareOp::LessOrEqual, false); // Use alpha blending, disable writing to the alpha channel auto alphaBlendState = tp::AttachmentBlendState( tp::BlendState(tp::BlendFactor::SrcAlpha, tp::BlendFactor::OneMinusSrcAlpha), // colorBlend tp::BlendState::NoBlend(), // alphaBlend tp::ColorComponent::Red | tp::ColorComponent::Green | tp::ColorComponent::Blue // writeMask ); pipelineSetup.setBlending(true, alphaBlendState); // Also create a version with multisampling tp::GraphicsPipelineSetup msPipelineSetup = pipelineSetup; msPipelineSetup.setMultisampling(tp::MultisampleLevel::x4); device->compileGraphicsPipelines( { &pipelineSetup, &msPipelineSetup }, nullptr, { &pipeline, &msPipeline });
Swapchain
To actually display the contents of an image to the screen requires the use of a tp::
To create a tp::
The first parameter of the tp::
Before a VkSurfaceKHR handle can be used, you should check whether your device supports presenting to that particular surface. This can be done with tp::
Its second parameter, the tp::minImageCount
allows making trade-offs between display latency, tearing and stability. The imageUsage
, imageFormat
, imageExtent
and imageArrayLayerCount
work similarly to the equivalent parameters of tp::imageCompatibleFormatsKHR
also allows specifying additional formats that views of those images can take, however the use of this parameter additionally requires the tp::
// Manages drawing to a window using the Tephra swapchain class Window { public: Window(const tp::Device* physicalDevice, tp::Device* device) : physicalDevice(physicalDevice), device(device) { // VkSurfaceKHR gets created here in some platform-dependent way, or with a library } bool recreateSwapchain(uint32_t& width, uint32_t& height) { auto capabilities = physicalDevice->querySurfaceCapabilitiesKHR(surface); // Prefer the extent specified by the surface over what's provided if (capabilities.currentExtent.width != ~0) { width = capabilities.currentExtent.width; height = capabilities.currentExtent.height; } // Prefer triple buffering uint32_t minImageCount = 3; if (capabilities.maxImageCount != 0 && capabilities.maxImageCount < minImageCount) minImageCount = capabilities.maxImageCount; // Prefer RelaxedFIFO if available, otherwise fallback to FIFO, which is always supported auto presentMode = tp::PresentMode::FIFO; for (tp::PresentMode m : capabilities.supportedPresentModes) { if (m == tp::PresentMode::RelaxedFIFO) { presentMode = tp::PresentMode::RelaxedFIFO; break; } } // Check if the swapchain supports the required format constexpr tp::Format imageFormat = tp::Format::COL32_B8G8R8A8_UNORM; bool supportsRequiredFormat = false; for (tp::Format format : capabilities.supportedFormatsSRGB) { if (format == imageFormat) supportsRequiredFormat = true; } if (!supportsRequiredFormat) { return false; } auto swapchainSetup = tp::SwapchainSetup( surface, presentMode, minImageCount, tp::ImageUsage::ColorAttachment, imageFormat, { width, height }); // Reuse old swapchain, if available swapchain = device->createSwapchainKHR(swapchainSetup, swapchain.get()); return true; } private: const tp::PhysicalDevice* physicalDevice; tp::Device* device; VkSurfaceKHR surface; std::unique_ptr<tp::Swapchain> swapchain; // List of past frame's semaphores that we will use to limit framerate std::deque<tp::JobSemaphore> frameSemaphores; };
Once created, the swapchain will be in the tp::
tp::Suboptimal
and OutOfDate
status changes happen due to the underlying window being resized. It might be more convenient to handle resize events of your windowing system pre-emptively, so that these errors never happen.
To receive an image to draw onto and later present, call tp::waitExternalSemaphores
parameter of tp::
The last job writing to the image before the present operation must export it with tp::signalExternalSemaphores
parameter upon enqueue. Only once all jobs are enqueued and submitted in the correct order, you can finally call tp::Device::submitPresent, specifying a tp::
Conceptually, tp::
bool Window::drawFrame() { // Limit the number of outstanding frames being rendered - this is better than relying on // acquire to block for us if (frameSemaphores.size() >= 2) { device->waitForJobSemaphores({ frameSemaphores.front() }); frameSemaphores.pop_front(); } if (swapchain->getStatus() != tp::SwapchainStatus::Optimal) { // Recreate out of date or suboptimal swapchain recreateSwapchain(getWindowWidth(), getWindowHeight()); } // Acquire a swapchain image to draw the frame to tp::AcquiredImageInfo acquiredImage; try { acquiredImage = swapchain->acquireNextImage().value(); } catch (const tp::OutOfDateError&) { // Try next frame return false; } catch (const tp::SurfaceLostError&) { // Recreate surface in a platform-dependent way and try next frame return false; } // Create a simple example job to draw the frame tp::Job renderJob = jobResourcePool->createJob(); // We don't need the swapchain image's old contents. It's good practice to discard. renderJob.cmdDiscardContents(*acquiredImage.image); // The render code should go here. For this example, just clear to magenta. renderJob.cmdClearImage(*acquiredImage.image, tp::ClearValue::ColorFloat(1.0f, 0.0f, 1.0f, 1.0f)); // Finally export for present renderJob.cmdExportResource(*acquiredImage.image, tp::ReadAccess::ImagePresentKHR); // Enqueue and submit the job, synchronizing it with the presentation engine's semaphores tp::JobSemaphore jobSemaphore = device->enqueueJob( tp::QueueType::Graphics, std::move(renderJob), {}, { acquiredImage.acquireSemaphore }, // the wait semaphore { acquiredImage.presentSemaphore }); // the signal semaphore device->submitQueuedJobs(tp::QueueType::Graphics); // Keep the job's semaphore so we can wait on it later. frameSemaphores.push_back(jobSemaphore); // Present the image try { device->submitPresentImagesKHR( tp::QueueType::Graphics, { swapchain.get() }, { acquiredImage.imageIndex }); } catch (const tp::OutOfDateError&) { // Let the swapchain be recreated next frame return false; } return true; }
Other functionality
Queries
Another useful functionality exposed by Vulkan are queries, which provide a mechanism for retrieving various statistics and timings about the processing of submitted device commands. Since they are executed on the timeline of a device queue, the process of retrieving the information is asynchronous. In Tephra, queries are split based on the kind of information they record and how they are used.
tp::timestampPeriod
. Timestamp queries can be created through tp::tp::PipelineStage::FragmentShader
pipeline stage value should capture the timestamp at the point when all the previously submitted commands have finished executing all of their fragment shader invocations. The support for accurate timestamps of all pipeline stages is not universal, however.
tp::
Each query object, regardless of type, can be used repeatedly across multiple jobs (but only once in any job). This allows for easy monitoring of repeating workloads, such as frame rendering. The last available result can be retrieved by calling tp::
One caveat is that Tephra does not retrieve these results immediately after they are available, but instead needs to periodically check and update the existing values. This is done automatically at key points like during a call to tp::
The design of queries in Vulkan, of course, did not presume how queries will be used and spared no precautions. Queries of each type are created from their own query pools and are one time use only, though recyclable once the result has been read back. Tephra manages that with internally synchronized pools that are shared for the device. Its query objects are only loosely associated with Vulkan queries. When a query write is recorded, such as through tp::
One more complication comes from the interaction between Vulkan queries and multiview. While rendering using multiview, each query may produce either one result for all views, or one result for each view, depending on the implementation. Tephra consolidates the results transparently, so that your query code does not need to change when you enable multiview.
Utilities
Tephra also provides some additional utilities built on top of the base API. They live in the tp::
Standard report handler
The tp::std::ostream
that messages will be directed to, and optionally specifying the message severity and types to report. The last parameter can additionally cause the handler to try and trigger a breakpoint of any attached debugger when an error occurs.
The second way is to make your own implementation of tp::
Growable ring buffer
Tephra performs various allocation strategies when requesting pre-initialized buffers to offer a low memory usage and high performance when allocating temporary buffers for transferring data from the host to the device. Some of those strategies may be useful even outside of what tp::
As the name suggests, this implementation of ring buffers is resizable. A tp::
The above can still require a considerable amount of management to create the backing buffers and ensure no allocations get popped prematurely. There is also a more user-friendly wrapper around this class, tp::
Another use of the ring buffers can be for job-local data of a size that you don't know when the job is being recorded. It may be more convenient to write down shader constants at the same time as recording the actual draw calls to command lists. In that case a ring buffer may be used to allocate the constant data separately from a job, with a little bit of extra management. Similarly, you may have data that changes somewhat frequently, but can be used in the span of multiple jobs. A ring buffer may be useful there as well.
Mutable descriptor set
Tephra's descriptor sets are immutable and best suited for small sets of material-specific resources that are easily reusable. You may, however, want to additionally keep a set of "global" resources which can get bound and unbound at any time in a stateful fashion. tp::
Note also that any descriptors that weren't set by the time tp::
The mutable descriptor set can also be used to facilitate "bindless" setups, where all the resources used by all shaders in a frame get bound to the same descriptor set into large array bindings. In such cases it may be preferrable to update parts of the existing descriptor set, rather than allocating a fresh one, through the tp::