A deep dive into how the Vortek Vulkan driver works

This series explores the inner workings of Vortek, motivated by its seemingly magical ability to enable directx gaming on Mali devices.

In the first part of this series, we will go through the high level architecture of Vortek, what challenges it is trying to overcome, and some specific implementation details around its IPC mechanism - the command buffer for Vulkan calls.

Part 1 (this one): https://leegao.github.io/winlator-internals/2025/06/01/Vortek1.html
Part 2: https://leegao.github.io/winlator-internals/2025/06/02/Vortek2.html

Disclaimers

This analysis is done with the Winlator 10.0 Final Hotfix APK and its Vortek libraries (libvulkan_vortek.so and libvortekrenderer.so) from https://github.com/brunodev85/winlator/releases/tag/v10.0.0

Note that everything here is inferred from binary reverse engineering of complex native libraries, as a result, the deep dive here is by no means exhaustive. It’s only meant to illustrate the design/architecture and some specific implementation details for certain workarounds found within Vortek.

The decompiled Java code is taken unmodified from JADX, the C code is reinterpreted from Ghidra by me + an LLM assistant into a more human-readable form. Reverse engineering artifacts are in https://github.com/leegao/vortek-deep-dive

1,000 Feet Overview (Goals/Motivations)

Vortek is a Winlator-specific (so far) Vulkan driver that aims to run directx games (via dxvk) on Android system drivers. In particular, it’s designed to solve a few major problems with running the system drivers:

Runtime Incompatibility (this post): Inability for system Vulkan drivers (shipped to run on the Bionic libc runtime) to work with games running within box64 (which runs on a custom glibc runtime)
Missing Extensions: Lack of crucial Vulkan features/extensions necessary to function on certain drivers (e.g. Mali)

Vortek addresses these problems by:

Adding an inter-process command buffer for Vulkan commands so that the game (the client) can continue to run within a glibc runtime while sending Vulkan commands via IPC to the Vulkan renderer (the server) which must run within Bionic
Adding support for specific unsupported features/extensions necessary for Android system Vulkan drivers to work, these include:
- WSI related extensions to render onto an x11 display
- BCn compressed texture formats used by DirectX games
- Workaround lack of gl_ClipDistance on Mali drivers
- Emulating scaled texture formats on drivers that lack its support

High Level Design Sketch

There are two native libraries of interest: libvortekrenderer.so - AKA the server, and libvulkan_vortek.so - AKA the client.

The basic idea for Vortek roughly follows these steps:

The game (the client) loads in libvulkan_vortek.so, which is designed as an interceptor of Vk commands (via the Vulkan ICD loader) and sends them to the remote renderer (the server) through a command buffer
The server generates and maintains the true VkObjects, with the client maintaining a shadow of them (e.g. object pointers that do not contain real data)
- This is because the client can only interact with VkObject through vulkan calls, which will be proxied to the renderer who keeps the real object. The shadows are just opaque tokens/handles that refer to the server-side address of the object.
When the game wants to execute a Vk command (say VkX), it gets intercepted and gets passed to vt_call_VkX instead, Vortek will
- Synchronize the access to the to-server/from-server IPC ring buffers
- Wrap all VkObjects (raw Vulkan objects) into a VortekVkObject handle (that can represent the true underlying object in both client/server processes)
- Serializes the parameters of the call to VkX
- Writes the call id, the size of the payload, and all of the serialized parameters for the VkX call onto the to-server IPC ring buffer
- Wait for the response from the server on the from-server IPC ring buffer
- Deserialize the response and reconstruct it into its VkObject (typically a VkResult)
On the server (libvortekrenderer.so), a main loop will poll the from-client IPC ring buffer for new commands, if found, it will:
- Read the call name (VkX) and the payload size
- Dispatch the call to its associated vt_handle_ function (e.g. vt_handle_VkX)
Within the dispatched handler function (e.g. vt_handle_VkX), Vortek will
- Deserialize the arguments (the reverse process from what the client does)
- Unwrap all wrapped VortekVkObject into their true VkObjects
- Perform additional modifications to the call/parameters if necessary (e.g. reporting that the device supports KHR_Surface or texture compression)
- Call the true underlying Vulkan function from the system library (e.g. VkX from the Mali libvulkan.so)
- Perform additional modifications to the result if necessary
- Wrap and serialize the final result
- Write the status (success/failure), size, and the serialize final result to the to-client IPC ring buffer

Image description

In this sense, Vortek is an inter-process command buffer for Vulkan calls designed so that clients running in glibc or other non-bionic runtimes can send Vulkan calls to a bionic server to execute and render. Additionally, the ability to modify both the parameters and the return values of Vulkan calls (and/or completely stubbing them out with alternative functions) allows Vortek to apply targeted patches to the underlying (often incomplete) system drivers to enable and even emulate crucial yet missing features/extensions needed for DirectX games.

Workarounds / Patches for Device Compatibility

Decoupled from the client-server IPC design are the specific workarounds in Vortek that allows it to workaround various mobile GPU/Vulkan Driver issues (e.g. for Mali devices) to allow the device to get to a somewhat usable state even with known issues in their drivers.

At a high level, these include:

Add support for drivers that lack WSI extensions
Add support for drivers that lack placed memory extensions used by 32bit emulation (for x86-32 games) via emulation
Add support for drivers that lack BCn compressed texture formats used by DX games via emulation + JIT decompression of these compressed textures
Add support for drivers that lack gl_ClipDistance capability on Mali devices by removing all SPIR-V instructions associated with it. (This may however cause graphical glitches as proper clipping is no longer guaranteed)
Add support for drivers that lack scaled texture formats on some mobile GPUs by emulating them on the GPU via SPIR-V instruction patching.

Vortek Initialization

Server - Winlator + libvortekrenderer.so

When you start a game, Winlator loads an XServerDisplayActivity. If you select Vortek as your graphics driver, then the following code executes

case "vortek":
   this.envVars.put("GALLIUM_DRIVER", "zink");
   this.envVars.put("ZINK_CONTEXT_THREADED", "1");
   this.envVars.put("MESA_GL_VERSION_OVERRIDE", "3.3");
   this.envVars.put("WINEVKUSEPLACEDADDR", "1");
   this.envVars.put("VORTEK_SERVER_PATH", rootDir + "/tmp/.vortek/V0");
   if (this.dxwrapper.equals("dxvk")) {
       this.dxwrapperConfig.put("constantBufferRangeCheck", "1");
   }
   if (changed) {
       TarCompressorUtils.Type type = TarCompressorUtils.Type.ZSTD;
       TarCompressorUtils.extract(type, this, "graphics_driver/vortek-1.0.tzst", rootDir);
       TarCompressorUtils.extract(type, this, "graphics_driver/zink-22.2.5.tzst", rootDir);
       break;
   }

Sets the environment variables as such:

GALLIUM_DRIVER=zink
ZINK_CONTEXT_THREADED=1
MESA_GL_VERSION_OVERRIDE=3.3
WINEVKUSEPLACEADDR=1 (requires emulation within vortek, allows for 32bit emulation)
VORTEK_SERVER_PATH=/tmp/.vortek/V0 # UNIX socket for client/server exchange of ring buffers

It then extracts the graphics drivers for zink and vortek into the rootfs of the container.

if (this.graphicsDriver.equals("vortek")) {
   VortekRendererComponent.Options options2 = VortekRendererComponent.Options.fromKeyValueSet(this.graphicsDriverConfig);
   VortekRendererComponent vortekRendererComponent = new VortekRendererComponent(this.xServer, UnixSocketConfig.create(rootPath, "/tmp/.vortek/V0"), options2);
   this.environment.addComponent(vortekRendererComponent);
}

Initialize the renderer component as VortekRendererComponent bound to this xserver (Lorie)

public class VortekRendererComponent extends EnvironmentComponent implements ConnectionHandler, RequestHandler {
   public static final int VK_MAX_VERSION = GPUHelper.vkMakeVersion(1, 3, 128);
   ...

   static {
       System.loadLibrary("vortekrenderer");
   }

   public static class Options {
       public int vkMaxVersion = VortekRendererComponent.VK_MAX_VERSION;
       public int maxDeviceMemory = 4096;
       public String[] exposedDeviceExtensions = null;

       public static Options fromKeyValueSet(KeyValueSet config) {
           ...
       }
   }

   ...

   @Override // com.winlator.xconnector.ConnectionHandler
   public void handleConnectionShutdown(Client client) {
       if (client.getTag() != null) {
           long contextPtr = ((Long) client.getTag()).longValue();
           destroyVkContext(contextPtr);
       }
   }

   @Override // com.winlator.xconnector.ConnectionHandler
   public void handleNewConnection(Client client) {
       client.createIOStreams();
   }

   @Override // com.winlator.xconnector.RequestHandler
   public boolean handleRequest(Client client) throws IOException {
       XInputStream inputStream = client.getInputStream();
       if (inputStream.available() < 1) {
           return false;
       }
       if (inputStream.readByte() == 1) {
           long contextPtr = createVkContext(client.clientSocket.fd, this.options);
           if (contextPtr > 0) {
               client.setTag(Long.valueOf(contextPtr));
           } else {
               this.connector.killConnection(client);
           }
       }
       return true;
   }

   private native long createVkContext(int i, Options options);

   private native void destroyVkContext(long j);

}

Here, Winlator will call VortekRendererComponent.start, which will createAFUnixSocket at $rootfs/tmp/.vortek/V0 (create a unix socket at that path) and then start listening to it (using an epoll server) for events, dispatching them to various handlers:

If the client connects to this same path - calls handleNewConnection
If the client disconnects - calls handleConnectionShutdown
If the client sends data to the server on this path - calls handleRequest

The most interesting handler here is handleRequest, which:

Reads a single byte (the request code)
- The only request code we know how to handle for now is 1, which is the client telling the server that it’s ready to receive the ring buffers and begin sending Vk commands
Calls the native method createVkContext

createVkContext

Now, we need to hop over to the native side in libvortexrenderer.so (a bionic library). Using Ghidra with LLM-assisted decompilation gives us the following pseudo-code for the JNI function Java_com_winlator_xenvironment_components_VortekRendererComponent_createVkContext (VortekRendererComponent::createVkContext):

/**
* JNI entry point to create a Vulkan context for VortekRendererComponent.
* This function primarily initializes the Vulkan dynamic library wrapper if not already done,
* and then calls an internal Vortek function to proceed with context creation.
*/
long Java_com_winlator_xenvironment_components_VortekRendererComponent_createVkContext(
   JNIEnv* env, jobject thiz, jobject surface, int fd, jobject options)
   // Ghidra: (undefined8 param_1, undefined8 param_2, undefined4 param_3, undefined8 param_4)
   // Assuming param_1=env, param_2=thiz/surface, param_3=some int, param_4=some pointer
{
   if (!g_vulkanWrapperInitialized) {
       void* libVulkan = dlopen("libvulkan.so", RTLD_NOW);
       if (libVulkan == NULL) {
           const char* error = dlerror();
           __android_log_print(ANDROID_LOG_ERROR, "System.out", "Unable to initialize vulkan wrapper: %s", error);
           return;
       } else {
           PFN_vkCreateInstance_ptr = (PFN_vkCreateInstance)dlsym(libVulkan, "vkCreateInstance");
           PFN_vkEnumerateInstanceVersion_ptr = (PFN_vkEnumerateInstanceVersion)dlsym(libVulkan, "vkEnumerateInstanceVersion");
           PFN_vkEnumerateInstanceLayerProperties_ptr = (PFN_vkEnumerateInstanceLayerProperties)dlsym(libVulkan, "vkEnumerateInstanceLayerProperties");
           PFN_vkEnumerateInstanceExtensionProperties_ptr = (PFN_vkEnumerateInstanceExtensionProperties)dlsym(libVulkan, "vkEnumerateInstanceExtensionProperties");
           PFN_vkGetInstanceProcAddr_ptr = (PFN_vkGetInstanceProcAddr)dlsym(libVulkan, "vkGetInstanceProcAddr");
           PFN_vkGetDeviceProcAddr_ptr = (PFN_vkGetDeviceProcAddr)dlsym(libVulkan, "vkGetDeviceProcAddr");

           // Check if essential functions were loaded
           if (!PFN_vkCreateInstance_ptr || !PFN_vkGetInstanceProcAddr_ptr) {
                __android_log_print(ANDROID_LOG_ERROR, "System.out", "Failed to dlsym core Vulkan functions.");
                dlclose(libVulkan);
                return;
           }
           g_vulkanWrapperInitialized = true;
       }
   }

   return createVkContext(env, surface, fd, ptrOption);
}

This function will (only once per process) load various vk functions from the system libvulkan.so and ensure certain important Vk functions are present.

Finally, it calls an internal createVkContext function

// --- Function: createVkContext ---
// Parameters:
//   env: JNI environment pointer
//   thiz: 'this' Java object (VortekRendererComponent instance)
//   clientFd: File descriptor for communication with client
//   options: Java object holding configuration (e.g., extensions, versions)
VkContextInternal* createVkContext(JNIEnv* env, jobject thiz, int clientFd, jobject options) {
   jobjectArray javaExposedExtensionsArray = (jobjectArray)getJavaFieldAsObject(env, options, "exposedDeviceExtensions", "[Ljava/lang/String;");

   context = (VkContextInternal*)calloc(1, sizeof(VkContextInternal));
   if (!context) {
       return NULL;
   }

   context->clientFd = clientFd;
   context->vkMaxVersion = getJavaFieldAsInt(env, options, "vkMaxVersion");
   context->maxDeviceMemory = getJavaFieldAsInt(env, options, "maxDeviceMemory");

   if (javaExposedExtensionsArray != NULL) {
       ...
       // Copy the exposedDeviceExtensions in java into context->exposedDeviceExtensionsList
   } else {
       context->exposedDeviceExtensionsList = NULL;
   }

   size_t serverRingMemSize = RingBuffer_getSHMemSize(0x400000); // 4MB + header
   int serverAshmemFd = ashmemCreateRegion("vt-server-ring", serverRingMemSize);

   size_t clientRingMemSize = RingBuffer_getSHMemSize(0x40000); // 256KB + header
   int clientAshmemFd = ashmemCreateRegion("vt-client-ring", clientRingMemSize);

   context->serverToClientRingBuffer = RingBuffer_create(serverAshmemFd, 0x400000);
   context->clientToServerRingBuffer = RingBuffer_create(clientAshmemFd, 0x40000);

   // Error handling to close + free memory
   ...

   // Send ashmem FDs to the client process using Unix socket extensions
   struct msghdr msg;

   ...

   struct cmsghdr *cmptr = CMSG_FIRSTHDR(&msg);
   cmptr->cmsg_level = SOL_SOCKET;
   cmptr->cmsg_type = SCM_RIGHTS;
   cmptr->cmsg_len = CMSG_LEN(sizeof(int) * 2);
   int* fd_ptr = (int*)CMSG_DATA(cmptr);
   fd_ptr[0] = serverAshmemFd;
   fd_ptr[1] = clientAshmemFd;

   ssize_t bytesSent = sendmsg(clientFd, &msg, 0);

   close(serverAshmemFd);
   close(clientAshmemFd);

   // Error handling to close + free memory
   ...

   // Start the command handler thread
   if (pthread_create(&context->workerThreadId, NULL, vortek_renderer_thread_main_loop, context) != 0) {
       // Error handling to close + free memory
       ...
       return NULL;
   }

   // Store global references to Java objects needed by the worker thread
   context->javaRendererComponent_globalRef = env->NewGlobalRef(thiz);
   context->javaSurface_globalRef = env->NewGlobalRef(options); // Assuming param_4 (options) is the surface/window

   return context;
}

This additionally does a few more things:

Initializes the context with configurations such as exposedDeviceExtensions, vkMaxVersion, maxDeviceMemory
Creates two ring buffers (one to-server named vt-server-ring in the memory map, another to-client called vt-client-ring) on ashmem (Android Shared Memory)
1. The ashmemCreateRegion call returns a FD that can be mmap-ed, and can be passed via either unix sockets or Android binders to other processes
Sends these two ashmem fds to the client (using the unix domain socket fd)
Sets up a new thread to poll from the to-server ring buffer and process commands (FUN_001359e0, AKA vortek_renderer_thread_main_loop)
Saves the objects that should live on within the JVM as global references

This then concludes the Winlator server setup (assuming that a client has attempted to connect to it via the /tmp/.vortek/V0 socket)

Client - libvulkan_vortek.so

libvulkan_vortek.so is a Vulkan Loader compatible ICD library:

{
    "ICD": {
        "api_version": "1.1.128",
        "library_path": "/data/data/com.winlator/files/rootfs/lib/libvulkan_vortek.so"
    },
    "file_format_version": "1.0.0"
}

This means that a Vulkan application using a compatible VkICDLoader can load libvulkan_vortek.so as if it were the Vulkan driver for the system. The loader will go through a standard entry point discovery process using the vk_icdGetInstanceProcAddr(NULL, const char* pName) interface from libvulkan_vortek.so to resolve the various vkCMD calls.

void* vk_icdGetInstanceProcAddr(void* instance, const char* pName) {
    // Call initialization function first
    int init_result = vortekInitOnce();
    
    // If initialization failed, return NULL
    if (init_result == 0) {
        return NULL;
    }
    
    // Search through the function table
    // Table has 0x12a (298) entries, each entry is 0x10 (16) bytes containing
    // exactly two pointers: name+0x0 and func_ptr+0x8
    for (int i = 0; i < 0x12a; i++) {
        // Compare the requested function name with table entry
        if (strcmp(pName, vkDispatchTable[i].name) == 0) {
            // Found match - return the function pointer
            return vkDispatchTable[i].func_ptr;
        }
    }
    return NULL;
}

Each time the game/application wants to call a method, e.g. vkCreateShaderModule, it’ll call vk_icdGetInstanceProcAddr(nullptr, "vkCreateShaderModule").

In particular, Vortek will resolve this to the function vt_call_vkCreateShaderModule, as will all other implemented instance/device functions (into vt_call_vkX functions).

vortekInitOnce

When libvulkan_vortek.so is loaded, the function vortekInitOnce is called from vk_icdGetInstanceProcAddr to set up the Ring Buffer IPC mechanism.

static int serverFd = -1;
static void *serverRing = NULL;
static void *clientRing = NULL;

#define CHECK_NOT_TRUE(EXPR) \
    if ((EXPR)) {
        perror("vortekInitOnce: " #EXPR " failed\n");
        close(serverFd);
        serverFd = -1;
        return;
    }

void vortekInitOnce(void) {
    if (serverFd != -1) {
        // Already initialized
        return;
    }

    const char *server_path_env = getenv("VORTEK_SERVER_PATH");
    CHECK_NOT_TRUE(server_path_env == NULL);

    int temp_sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
    CHECK_NOT_TRUE(temp_sock_fd < 0);

    struct sockaddr_un server_addr;
    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sun_family = AF_UNIX;

    strncpy(server_addr.sun_path, server_path_env, sizeof(server_addr.sun_path) - 1);
    server_addr.sun_path[sizeof(server_addr.sun_path) - 1] = '\0';

    int connect_rc;
    socklen_t addr_len = sizeof(server_addr);

    do {
        connect_rc = connect(temp_sock_fd, (struct sockaddr *)&server_addr, addr_len);
    } while (connect_rc == -1 && errno == EINTR);

    CHECK_NOT_TRUE(connect_rc < 0);

    // If connect succeeds, assign to global serverFd
    serverFd = temp_sock_fd;

    // Write a single byte (1) to the server
    unsigned char handshake_byte = 1;
    ssize_t bytes_written = write(serverFd, &handshake_byte, 1);
    CHECK_NOT_TRUE(bytes_written <= 0);

    // Prepare for recvmsg to receive file descriptors
    struct msghdr msg_header;
    ...

    ssize_t bytes_received = recvmsg(serverFd, &msg_header, 0);
    CHECK_NOT_TRUE(bytes_received <= 0);

    int received_fds[2] = {-1, -1};
    int fds_extracted_count = 0;

    // Extract fds from the received msg_header
    ...

    serverRing = RingBuffer_create(received_fds[0], 0x400000);
    clientRing = RingBuffer_create(received_fds[1], 0x40000);

    // Error handling and closing the received_fds which have been dupped within the RingBuffer_create functions
    ...
}

In essence, it reads the unix domain socket path (from the environment variable VORTEK_SERVER_PATH), connects to it, and writes a single ‘1’ byte to start the pairing process:

The server receives this pairing request and sends back a pair of ashmem file descriptors (received_fds[0] = serverRing / outbound buffer, received_fds[1] = clientRing / inbound buffer)
The server creates two ring buffers using the shared memory in these two FDs (the RingBuffer_create function)

Ring Buffer IPC design

The ring buffers are allocated as fixed-length ashmem (Android SHared MEMory) regions that are shared between the client (game) and server (winlator) processes. There are two ring buffers:

The Server Ring Buffer:
- Used by the Client (libvulkan_vortek.so) to send Vulkan commands to the Server (libvortekrenderer.so)
- The Client will issue sends on this ring buffer by writing serialized Vk commands onto this ring buffer
- The Server will wait for and recv the serialized Vk commands from this ring buffer in order to execute them on the real Vulkan driver
The Client Ring Buffer:
- Used by the Server (libvortekrenderer.so) to send VkResults and any serialized output parameter data back to the Client (libvulkan_vortek.so) after executing the requested Vulkan command
- The Server will issue sends on this ring buffer by writing serialized VkResults and return values onto this ring buffer
- The Client will wait for and recv the serialized VkResults and outputs from this ring buffer in order to send them back to the game/application

Note that in general we expect much more data to be written to the server than will be returned to the client. This is because Vk parameters are generally large, while return results (often just a single VkResults status object) are generally small. As a result, Vortek chose to allocate 16x more capacity to the server ring than to the client ring.

Writing to the Ring Buffer

byte RingBuffer_write(RingBuffer *ring_buffer, void *data, uint data_size)
{    
    // Check if buffer has enough total capacity
    if (ring_buffer->buffer_size < data_size) {
        __android_log_print(3, "System.out", 
                           "ring: buffer overflow on write (%d/%d)\n", data_size);
        return 0; // failure
    }
    
    // Calculate available space in ring buffer
    // available = buffer_size - (write_index - read_index) % buffer_size
    uint available_space = ring_buffer->buffer_size - 
                     ((ring_buffer->buffer_size - 1) & 
                      (ring_buffer->write_index - ring_buffer->read_index));
    

    // Wait for space to become available
    uint retry_count = 1;
    while (available_space < data_size) {
        // Yield CPU or sleep based on retry count
        if (retry_count++ < 400) {
            sched_yield();  // yield to other threads
        } else {
            usleep(100);    // sleep 100 microseconds
        }
        
        // Recalculate available space
        ...
    }
    
    // Perform the write operation starting at the ring_buffer->write_index
    // If the write extends beyond the end of the buffer, wrap around to the start
    ...
    
    return 1;
}

Writes will:

Fail if the size of the write is larger than the buffer itself, e.g. if you have a really big shader module
Block if the ring buffer does not have enough capacity for this request (it will wait for the downstream server to drain the ring buffer first)

Note that there are no explicit synchronization mechanisms here between the client and the server. The ring buffer is shared memory that is mapped into both processes, one thread will read from it using the read_index and one thread will write to it using the write_index.

Reading from the Ring Buffer

bool RingBuffer_read(RingBuffer *ring_buffer, void *output_buffer, uint read_size)
{    
    // Check if requested read size exceeds total buffer capacity
    if (ring_buffer->buffer_size < read_size) {
        return false;
    }
    
    // The 2nd bit on the ring buffer flags denote if this buffer is closed
    #define IS_CLOSED ((ring_buffer->flags & 0b10) != 0);

    uint retry_count = 1;
    
    while (!IS_CLOSED) {
        // Calculate available data in ring buffer
        // available = (write_index - read_index) % buffer_size
        uint available_data = (ring_buffer->buffer_size - 1) & 
                        (ring_buffer->write_index - ring_buffer->read_index);
        
        // Check if we have enough data to satisfy the read request
        if (read_size <= available_data) {
            // Perform the read operation
            uint end_index = ring_buffer->read_index + read_size;
            
            // Check if read wraps around the buffer, if so, copy up to the end
            // of the buffer and then wrap around to read the remaining data
            ...
            
            // Update read index with wrap-around
            ring_buffer->read_index = end_index % ring_buffer->buffer_size;
            
            break; // Successfully read data
        }
        
        // Not enough data available, wait for more
        if (retry_count++ < 400) {
            sched_yield();  // yield to other threads
        } else {
            usleep(100);    // sleep 100 microseconds
        }
    }
    return 1;
}

Conversely, the read mirrors the write.

The Vortek Command Buffer

Client - libvulkan_vortek.so

vt_call_vk Calls Dispatch

Here’s a representative example of a vt_call_ function:

int vt_call_vkCreateShaderModule(VkDevice device, 
                                 VkShaderModuleCreateInfo* createInfo,
                                 VkAllocationCallbacks* allocator,
                                 VkShaderModule* shaderModule_out)
{
    // Thread safety - lock mutex for vortek IPCs (the ring buffers)
    pthread_mutex_lock(&vt_call_mutex);
    
    // VortekVkObjects are opaque handles of VkObjects that were generated from the server
    // In particular, they have a ->handle field that just maps to the ptr of the VkObject
    // within the server process.
    VortekVkObject* deviceObj = VkObject_fromHandle(device);
    
    // Calculate message size needed
    int serializedSize = (deviceObj != NULL) ? 13 : 5;  // Base size depending on device validity
    if (createInfo != NULL) {
        serializedSize += createInfo->codeSize + 20;  // Add shader code size + metadata
    }
    serializedSize += 1;  // Null terminator
    
    
    // Serialize all of the parameters to vkCreateShaderModule
    uint8_t* messageBuffer = (uint8_t*)vt_alloc(serializedSize);
    int offset;
    // Pack the deviceObj
    if (deviceObj == NULL) {
        messageBuffer[0] = 0;  // No device
        offset = 1; // sizeof(uint8_t) for has_device
    } else {
        messageBuffer[0] = 1;  // Has device
        *(uint64_t*)(messageBuffer + 1) = deviceObj->handle;
        offset = 9; // sizeof(uint8_t) for has_device + sizeof(void*) for handle
    }
    // Pack shader module creation info
    if (createInfo == NULL) {
        *(uint32_t*)(messageBuffer + offset) = 0;  // No create info
        // Null terminate the message
        messageBuffer[offset + 4] = 0;
    } else {
        size_t codeSize = createInfo->codeSize;
        void* shaderCode = createInfo->pCode;
        
        int infoSize = codeSize + 20;
        *(int*)(messageBuffer + offset) = infoSize;
        
        // Pack VkShaderModuleCreateInfo structure
        // +0x00: ->sType
        // +0x04: ->flags
        // +0x08: ->codeSize (int64)
        // pCode is packed into a Vortek bytestring
        // +0x10: ->pCode.size
        // +0x14: ->pCode
        *(uint32_t*)(messageBuffer + offset + 4) = createInfo->sType;
        *(uint32_t*)(messageBuffer + offset + 8) = createInfo->flags;
        *(size_t*)(messageBuffer + offset + 12) = codeSize;
        
        // Pack shader bytecode as a Vortek bytestring (size + bytes)
        *(int*)(messageBuffer + offset + 20) = (int)codeSize;
        memcpy(messageBuffer + offset + 24, shaderCode, codeSize);
        // Null terminate the message
        messageBuffer[offset + 24 + codeSize] = 0;
    }
    
    // Send request to server through the server ring buffer
    struct { int vk_command_opcode; int size; } request_header = { 0x9f, serializedSize };  // 0x9f = vkCreateShaderModule opcode
    // Send the request header (the command opcode and the size of the serialized data)
    if (!RingBuffer_write(serverRing, &request_header, sizeof(request_header))) goto error_cleanup;
    // Send the actual serialized request data
    if (!RingBuffer_write(serverRing, messageBuffer, serializedSize)) goto error_cleanup;
        
    // Recv the result from the client through the client ring buffer
    struct { int vk_result; int size; } vt_result;
    if (RingBuffer_read(clientRing, &vt_result, 8)) {
        int vk_result = vt_result.vk_result;
        int responseSize = vt_result.size;
        
        // The response will just be a single VortekVkObject*
        void* responseData = NULL;
        if (responseSize > 0) {
            responseData = (void*) vt_alloc(responseSize);
            if (!RingBuffer_read(clientRing, responseData, responseSize)) {
                goto error_cleanup;
            }
        }
        
        // Check if operation succeeded and unpack the output
        if (vk_result != VK_ERROR_UNKNOWN) {
            // Create a new VortekVkObject for the new shader module, this contains a single handle that shadows the actual ptr in the server
            VortekVkObject* new_shader_module_obj = VkObject_create(0xf /* VtShaderModule */, *responseData);  // 0xf = shader module type
            *shaderModule_out = VkObject_toHandle(new_shader_module_obj);
            
            pthread_mutex_unlock(&vt_call_mutex);
            
            vt_free_all();  // Clean up all allocated memory
            return vk_result;
        }
    }
    
error_cleanup:
    pthread_mutex_unlock(&vt_call_mutex);
    vt_free_all();  // Clean up all allocated memory
    
    return VK_ERROR_UNKNOWN;
}

Step by step, the vt_call_vkCreateShaderModule will:

Lock the global Vortek IPC lock (so no other threads can enter another vt_call function)

    pthread_mutex_lock(&vt_call_mutex);

Resolve all client-side opaque VkObject* objects (e.g. a VtDevice object) into VortekVkObject* objects that contain a handle that can be looked-up by the server to find the REAL VkObject* (within the server process) created by the underlying Vulkan driver
- Note that all VkObjects within the client process (libvulkan_vortek.so and the game.exe) are opaque shadows of the real Vulkan object living in a different (server) process. As such, you can’t assume that they follow some specific struct layout and you can only manipulate them by sending them to the Server to transform (since only the server has the real underlying VkObjects)

    VortekVkObject* deviceObj = VkObject_fromHandle(device);

Calculate the amount of bytes needed to fit this command/call on the ring buffer

    // Calculate message size needed
    int serializedSize = (deviceObj != NULL) ? 13 : 5;  // Base size depending on device validity
    if (createInfo != NULL) {
        serializedSize += createInfo->codeSize + 20;  // Add shader code size + metadata
    }
    serializedSize += 1;  // Null terminator

Serialize the command onto a (thread) local buffer first
- First, pack the deviceObj (which is nullable), this is represented as:
  - A ‘0’ byte if null, or
  - A ‘1’ byte if present along with a 8-byte pointer of the device object as it appears in the server (aka the handle)
- Next, pack the VkShaderModuleCreateInfo struct (which is also nullable), this is represented as:
  - A 32bit (4 bytes) integer of the struct size (either 0 if null, or sizeof(VkShaderModuleCreateInfo))
  - A 32bit unsigned integer for the VkShaderModuleCreateInfo.sType field
  - A 32bit unsigned integer for the VkShaderModuleCreateInfo.flags field
  - A 64bit integer for the VkShaderModuleCreateInfo.codeSize field
  - The packing of the VkShaderModuleCreateInfo.pCode field will be done as a Vortek bytestring:
    - A 32bit integer for the size of VkShaderModuleCreateInfo.pCode, this turns out to be a second redundant packing of VkShaderModuleCreateInfo.codeSize but as a 32bit integer
    - The actual bytes of the VkShaderModuleCreateInfo.pCode field
- Finally, terminate the buffer by writing a 0 byte

    // Serialize all of the parameters to vkCreateShaderModule
    uint8_t* messageBuffer = (uint8_t*)vt_alloc(serializedSize);
    int offset;
    // Pack the deviceObj
    if (deviceObj == NULL) {
        messageBuffer[0] = 0;  // No device
        offset = 1; // sizeof(uint8_t) for has_device
    } else {
        messageBuffer[0] = 1;  // Has device
        *(uint64_t*)(messageBuffer + 1) = deviceObj->handle;
        offset = 9; // sizeof(uint8_t) for has_device + sizeof(void*) for handle
    }
    // Pack shader module creation info
    if (createInfo == NULL) {
        *(uint32_t*)(messageBuffer + offset) = 0;  // No create info
        // Null terminate the message
        messageBuffer[offset + 4] = 0;
    } else {
        size_t codeSize = createInfo->codeSize;
        void* shaderCode = createInfo->pCode;
        
        int infoSize = codeSize + 20;
        *(int*)(messageBuffer + offset) = infoSize;
        
        // Pack VkShaderModuleCreateInfo structure
        // +0x00: ->sType
        // +0x04: ->flags
        // +0x08: ->codeSize (int64)
        // pCode is packed into a Vortek bytestring
        // +0x10: ->pCode.size
        // +0x14: ->pCode
        *(uint32_t*)(messageBuffer + offset + 4) = createInfo->sType;
        *(uint32_t*)(messageBuffer + offset + 8) = createInfo->flags;
        *(size_t*)(messageBuffer + offset + 12) = codeSize;
        
        // Pack shader bytecode as a Vortek bytestring (size + bytes)
        *(int*)(messageBuffer + offset + 20) = (int)codeSize;
        memcpy(messageBuffer + offset + 24, shaderCode, codeSize);
        // Null terminate the message
        messageBuffer[offset + 24 + codeSize] = 0;
    }

Send the Vulkan call to the server to actually execute on the underlying vulkan driver
- This is done by writing the sequence {VULKAN_COMMAND_OPCODE, SERIALIZED_PAYLOAD_SIZE, SERIALIZED_PAYLOAD} onto the server ring buffer (the server is always busy looping waiting for data on this ring buffer)
- In our case, the VULKAN_COMMAND_OPCODE for vt_call_vkCreateShaderModule is 0x9f (this is private to Vortek’s implementation)

    // Send request to server through the server ring buffer
    struct { int vk_command_opcode; int size; } request_header = { 0x9f, serializedSize };  // 0x9f = vkCreateShaderModule opcode
    // Send the request header (the command opcode and the size of the serialized data)
    if (!RingBuffer_write(serverRing, &request_header, sizeof(request_header))) goto error_cleanup;
    // Send the actual serialized request data
    if (!RingBuffer_write(serverRing, messageBuffer, serializedSize)) goto error_cleanup;

Read the result and any additional output data from the underlying vulkan driver call from the server on the client ring buffer
- The server will send a response of the form {VK_RESULT, OUTPUT_PAYLOAD_SIZE, OUTPUT_PAYLOAD} on the client ring buffer
- VK_RESULT is the VkResult status of the underlying vulkan call (e.g. success, or an error indicator)
- Most of the time, there are no additional output payloads, but in some cases (e.g. for vkCreateShaderModule), the call also generates an output as well, which must be deserialized

    // Recv the result from the client through the client ring buffer
    struct { int vk_result; int size; } vt_result;
    if (RingBuffer_read(clientRing, &vt_result, 8)) {
        int vk_result = vt_result.vk_result;
        int responseSize = vt_result.size;
        
        // The response will just be a single VortekVkObject*
        void* responseData = NULL;
        if (responseSize > 0) {
            responseData = (void*) vt_alloc(responseSize);
            if (!RingBuffer_read(clientRing, responseData, responseSize)) {
                goto error_cleanup;
            }
        }
        
        ...
    }

(Optional) Decode/Unpack the output response from the ring buffer.
- In our case, we expect just a single output of the type VkObject* corresponding to the compiled shader module that the underlying vulkan driver built for us.
- Note that all opaque vulkan objects that we get back from the server must be wrapped into VortekVkObjects

        // Check if operation succeeded
        if (vk_result != VK_ERROR_UNKNOWN) {
            // Create a new VortekVkObject for the new shader module, this contains a single handle that shadows the actual ptr in the server
            VortekVkObject* new_shader_module_obj = VkObject_create(0xf /* VtShaderModule */, *responseData);  // 0xf = shader module type
            *shaderModule_out = VkObject_toHandle(new_shader_module_obj);
            
            pthread_mutex_unlock(&vt_call_mutex);
            
            vt_free_all();  // Clean up all allocated memory
            return vk_result;
        }

Cleanup any threadlocal allocated temp buffers, and unlock the global IPC lock.

With few exceptions, almost all vt_call_ functions will follow this pattern of packing parameters, sending a serialized command to the server, receiving a serialized response back, unpacking the response, and returning the VkResult and any output parameters back to the game client.

vt_call_vkCmd Calls

One specific exception to this rule are the Vulkan command buffer (not to be confused with the Vortek command buffer for Vulkan calls) functions. Since these do not need to be submitted to the Vulkan driver until the vkSubmitQueue checkpoint, Vortek can optimize these calls by keeping them buffered on the client side only and only triggering an IPC when the command buffer is submitted.

For example, here’s an example with vt_call_vkCmdSetLineWidth

void vt_call_vkCmdSetLineWidth(VkCommandBuffer commandBuffer, float lineWidth)
{
    // Get command buffer object and its associated buffer
    VortekVkObject* cmdBufferObj = VkObject_fromHandle(commandBuffer);
    CommandBufferData* bufferData = (CommandBufferData*)cmdBufferObj->data;
    
    // Get current buffer state
    int currentOffset = bufferData->currentOffset;
    void* buffer = bufferData->buffer;
    
    // Check if we need to reallocate buffer (command needs 0x15 bytes = 21 bytes)
    ...
    
    // Write command data at current offset
    char* commandPtr = (char*)buffer + currentOffset;
    
    // Write command header (packed as 64-bit value)
    // command ID (0xc4) + size (0xd = 13 bytes)
    *(uint32_t*)commandPtr = 0xc4;
    *(uint32_t*)(commandPtr + 4) = 0xd0;
    
    // Write command buffer handle (1 byte for presence, 8 bytes for the handle
    *(uint8_t*)(commandPtr + 8) = 1;
    *(VkCommandBuffer*)(commandPtr + 9) = cmdBufferObj->handle;
    
    // Write line width parameter
    *(float*)(commandPtr + 0x11) = lineWidth;
    
    // Update buffer offset for next command (advance by 21 bytes)
    bufferData->currentOffset += 0x15;
}

Notice also that VkCommandBuffer is one of the few exceptions to the opaque shadow reference rule - it also implements the CommandBufferData interface on the client side because it needs to track the current local command buffer.

Server - libvortekrenderer.so

Main Loop

Unlike libvulkan_vortek.so, the Vortek renderer server does not have direct trigger points (like intercepted vulkan calls). Instead, it sets up a main loop in a dedicated thread to monitor and poll the server ring buffer for requests from the client.

int vortek_renderer_thread_main_loop(VortekThreadContext *ctx_ptr) {
    // Attach current thread to JVM and get JNIEnv instance
    JNIEnv *jni_env;
    (*(ctx_ptr->jvm))->AttachCurrentThread(ctx_ptr->jvm, (void**)&jni_env, NULL);
    ctx_ptr->current_jni_env = jni_env;

    // Get Method IDs for Java calls and store them in the context
    jclass renderer_component_class = (*jni_env)->GetObjectClass(jni_env, ctx_ptr->java_renderer_component_obj);
    ctx_ptr->getWindowWidth = (*jni_env)->GetMethodID(jni_env, renderer_component_class, "getWindowWidth", "...");
    ctx_ptr->getWindowHeight = (*jni_env)->GetMethodID(jni_env, renderer_component_class, "getWindowHeight", "...");
    ctx_ptr->getWindowHardwareBuffer = (*jni_env)->GetMethodID(jni_env, renderer_component_class, "getWindowHardwareBuffer", "...");
    ctx_ptr->updateWindowContent = (*jni_env)->GetMethodID(jni_env, renderer_component_class, "updateWindowContent", "...");
    
    // Main command processing loop
    // The loop condition depends on `ctx_ptr->loop_control_status` and the command received.
    while (ctx_ptr->loop_control_status > -1) {
        // Read command header (type and length) from the server-side ring buffer

        int command_id;
        uint command_size;
        if (!RingBuffer_read(ctx_ptr->server_ring_buffer, &command_id, sizeof(int))) break
        if (!RingBuffer_read(ctx_ptr->server_ring_buffer, &command_size, sizeof(uint))) break

        // If command_id is negative, it's a signal to stop processing
        if (command_id_from_ringbuffer < 0) break
        
        ctx_ptr->current_command_data_size = command_size;
        
        if (command_size > 0) {
            ctx_ptr->current_command_data_buffer = vt_alloc(command_size);
            if (ctx_ptr->current_command_data_buffer == NULL || 
                !RingBuffer_read(ctx_ptr->server_ring_buffer, ctx_ptr->current_command_data_buffer, command_size)) {
                // Handle allocation or read failure
                break; 
            }
        } else {
            ctx_ptr->current_command_data_buffer = NULL;
        }
        
        // Dispatch to the appropriate command handler, the getHandleRequestFunc will look for
        // the command within a dispatch_table for valid commands (between 100 and 337)
        CommandHandlerFuncPtr vt_handle_func = getHandleRequestFunc((ushort)command_id_from_ringbuffer);
        if (vt_handle_func != NULL) {
            vt_handle_func((long)ctx_ptr); // Pass the context pointer
        }

        // Reset the per-thread memory pool for the allocations made for this command
        vt_free_all();
    }

    // Cleanup JNI resources:
    ...

    return 0;
}

In particular, Vortek will:

First set up the JNI environment and several Java method ids that may be used by the handlers within the vt_context object
Then start the main loop:
1. Read the command id for the next Vk call
2. Read the size of the serialized payload
3. Read the serialized payload (but not unpack it, as that is the responsibility of the vt_handlers), instead it will set it as the ctx->current_command_data_buffer
4. Look up the vt_handler_function in the dispatch table (e.g. vt_handle_vkCreateShaderModule) and dispatch it with the vt_context

vt_handle_vkCall Handler

Here’s the mirror of vt_call_vkCreateShaderModule on the server side

vt_handle_vkCreateShaderModule (with interventions)

void vt_handle_vkCreateShaderModule(VtContext* context) {
    char* serialized_payload = context->current_command_data_buffer;
    VkDevice device_id_from_stream;
    long current_offset_in_stream;

    if (*serialized_payload == '\0') {
        current_offset_in_stream = 1;
        device_id_from_stream = context->current_device_id; // Set earlier by a different handler
    } else {
        device_id_from_stream = *(VkDevice*)(serialized_payload + 1);
        current_offset_in_stream = 9;
    }

    size_t shader_code_size = 0;
    void* pCode_buffer = NULL;

    // Read the header value (or a direct codeSize if the structure is simpler than inferred)
    int create_info_size = *(int*)(serialized_payload + current_offset_in_stream);

    if (create_info_size > 0) {
        // The actual VkShaderModuleCreateInfo-like fields start 4 bytes after the create_info_size
        char* create_info_payload = serialized_payload + current_offset_in_block + 4 /* sizeof(create_info_size) */;
        // Layout:
        // +0x00: ->sType 
        // +0x04: ->flags
        // +0x08: ->codeSize (int64)
        // +0x10: ->pCode.size (int32 for the bytestring)
        // +0x14: ->pCode

        // Only unpack the ->codeSize and the ->pCode
        shader_code_size = (VkShaderModuleCreateFlags)(*(uint64_t*)(create_info_payload + 0x08));
        int pCode_size = (*(int*)(create_info_payload + 0x10));
        if (pCode_size > 0) {
            pCode_buffer = vt_alloc(pCode_size);
            memcpy(pCode_buffer, create_info_payload + 0x14, pCode_size);
        }
    }

    // Convert device ID to a native Vulkan handle (details of VkObject_fromId are external)
    VkDevice native_device_handle = VkObject_fromId(device_id_from_stream);

    VkShaderModule created_shader_module_handle = VK_NULL_HANDLE;
    VkResult result_code = ShaderInspector_createModule(
        context->shader_inspector, // Corresponds to *(undefined8 *)(context + 0x98)
        pCode_buffer,              // The allocated and copied shader code
        shader_code_size,              // Flags read from the stream
        &created_shader_module_handle
    );

    // Prepare and write the response to the ring buffer.
    // Write the vk_result as well as a response/output of size VkShaderModule 
    struct { int vk_result; int size; } vt_result = { 
        result_code, 
        result_code == VK_SUCCESS ? sizeof(VkShaderModule) : 0 
    };

    VkShaderModule* handle_storage_for_ringbuffer = NULL;
    if (result_code == VK_SUCCESS) {
        handle_storage_for_ringbuffer = (VkShaderModule*)vt_alloc(sizeof(VkShaderModule));
        if (handle_storage_for_ringbuffer) {
            *handle_storage_for_ringbuffer = created_shader_module_handle;
        } else {
            vt_result.vk_result = VK_ERROR_OUT_OF_HOST_MEMORY; // Or some other appropriate error
            vt_result.size = 0;
        }
    }

    bool ok = RingBuffer_write(
        context->client_ring_buffer,
        &vt_result,
        8 // Size of the header (result_code + payload_size)
    );

    if (ok && vt_result.size > 0 && handle_storage_for_ringbuffer != NULL) {
        RingBuffer_write(
            context->client_ring_buffer,
            handle_storage_for_ringbuffer,
            sizeof(VkShaderModule) // Write the actual handle of the returned VkShaderModule
        );
    }
}

This function takes in the serialized request payload, unpacks it, and executes it:

First, it grabs the serialized payload from ctx->current_command_data_buffer

    char* serialized_payload = context->current_command_data_buffer;

Next, it tries to unpack the VkDevice object from the payload, this can be either:
- NULL - in which case we will use a global VkDevice associated with this context, or
- A direct handle to the VkDevice* (remember that all shadows on the client are real objects, ptr and all, on the server)

    if (*serialized_payload == '\0') {
        current_offset_in_stream = 1;
        device_id_from_stream = context->current_device_id; // Set earlier by a different handler
    } else {
        device_id_from_stream = *(VkDevice*)(serialized_payload + 1);
        current_offset_in_stream = 9;
    }

Next, it tries to unpack the VkShaderModuleCreateInfo, however, since this handler does some specific interventions, it only extracts a subset of the required payload:
- The ->codeSize field +0x08 offset
- The ->pCode buffer at +0x10 (size) and +0x14 (data) offsets

    if (create_info_size > 0) {
        // The actual VkShaderModuleCreateInfo-like fields start 4 bytes after the create_info_size
        char* create_info_payload = serialized_payload + current_offset_in_block + 4 /* sizeof(create_info_size) */;
        // Layout:
        // +0x00: ->sType 
        // +0x04: ->flags
        // +0x08: ->codeSize (int64)
        // +0x10: ->pCode.size (int32 for the bytestring)
        // +0x14: ->pCode

        // Only unpack the ->codeSize and the ->pCode
        shader_code_size = (VkShaderModuleCreateFlags)(*(uint64_t*)(create_info_payload + 0x08));
        int pCode_size = (*(int*)(create_info_payload + 0x10));
        if (pCode_size > 0) {
            pCode_buffer = vt_alloc(pCode_size);
            memcpy(pCode_buffer, create_info_payload + 0x14, pCode_size);
        }
    }

Next, instead of actually executing the underlying vkCreateShaderModule function, it will instead execute the ShaderInspector_createModule function with select interventions to enable Vortek shader patching (used to fixup certain Mali driver issues).

    // Convert device ID to a native Vulkan handle (details of VkObject_fromId are external)
    VkDevice native_device_handle = VkObject_fromId(device_id_from_stream);

    VkShaderModule created_shader_module_handle = VK_NULL_HANDLE;
    VkResult result_code = ShaderInspector_createModule(
        context->shader_inspector, // Corresponds to *(undefined8 *)(context + 0x98)
        pCode_buffer,              // The allocated and copied shader code
        shader_code_size,              // Flags read from the stream
        &created_shader_module_handle
    );

Finally, it will write back to the client_ring_buffer with the VkResult, the output payload size (sizeof(VkShaderModule)), and the raw bytes of the VkShaderModule itself (which will serve as a shadow object on the client)

// Prepare and write the response to the ring buffer.
    // Write the vk_result as well as a response/output of size VkShaderModule 
    struct { int vk_result; int size; } vt_result = { 
        result_code, 
        result_code == VK_SUCCESS ? sizeof(VkShaderModule) : 0 
    };

    VkShaderModule* handle_storage_for_ringbuffer = NULL;
    if (result_code == VK_SUCCESS) {
        handle_storage_for_ringbuffer = (VkShaderModule*)vt_alloc(sizeof(VkShaderModule));
        if (handle_storage_for_ringbuffer) {
            *handle_storage_for_ringbuffer = created_shader_module_handle;
        } else {
            vt_result.vk_result = VK_ERROR_OUT_OF_HOST_MEMORY; // Or some other appropriate error
            vt_result.size = 0;
        }
    }

    bool ok = RingBuffer_write(
        context->client_ring_buffer,
        &vt_result,
        8 // Size of the header (result_code + payload_size)
    );
    if (ok && vt_result.size > 0 && handle_storage_for_ringbuffer != NULL) {
        RingBuffer_write(
            context->client_ring_buffer,
            handle_storage_for_ringbuffer,
            sizeof(VkShaderModule) // Write the actual handle of the returned VkShaderModule
        );
    }

This should hopefully give you a good idea of how Vortek is able to load bionic Vulkan drivers within a glibc environment using a command buffer of Vulkan commands to bridge the two environments.

In the next part of this series, we will look into some specific device/driver-specific fixes and interventions that Vortek does in order to improve compatibility for games running on dxvk on devices with poor Vk compatibility such as Mali devices.