Specifications

Guide to the Operation and Constraints of the Audio System

This section documents the effective runtime behavior of the system. It does not define API contracts; instead, it describes which parameters remain fixed, which may be negotiated, and in which layers relevant decisions are made.

PARAMETERS
EFFECTIVE-BEHAVIOR

Backend

Adapter (CoreAudio, WASAPI, ALSA)

Compatibility

Linux, Windows, and macOS. Mobile: Android and iOS

Render Mode (pull/push)

Hybrid model: the backend delivers push callbacks, while the internal graph is resolved via pull

Audio Routing

Render (output), Record (input), and Duplex. If the requested route is unavailable, the system forces render

Sample Rate

Fixed canonicalSampleRate at 44,100 Hz. No dynamic negotiation with the device and no resampling on the main path

Bit Depth

32-bit floating point (Float32). Backend stream and internal engine representation match

Sample Format

Float32 (Scalar = Float32). No sample-format conversion occurs at any stage of the pipeline

Channel Count (output)

Explicit support for mono and stereo; values >2 are not implemented in the engine

Channel Count (input)

In record/duplex modes, the effective input to the graph is mono; no multichannel deinterleaving is performed. AudioGraph supports mono input only

Buffer Layout

Internally planar (one AudioBus per channel). Output to the backend is interleaved; input from the backend is treated as a linear buffer

Fixed / Negotiated Parameters

Fixed: sample rate, Float32 format, and internal Quantum. Negotiated: device category (render/record/duplex) based on availability

Requested / Effective

A duplex request may degrade to render if capture is unavailable; input is not delivered if it is not mono

Numeric Representation

Audio in Float32; control and time in Float64. DSP processing and audio buses operate in Float32

Conversions

No format conversion; only planar-to-interleaved interleaving on output, delivering interleaved Float32 to the backend

Conversion Point

Interleaving is performed in HardwareIO; layout conversion does not occur within the audio graph

Effective Quantum (frames)

Uses canonicalQuantum = .adaptive (256) on real hardware and .balanced (512) on the simulator. Quantum is fixed for the graph and depends on targetEnvironment

Quantum Variability

The internal graph operates with a fixed Quantum; the backend callback may deliver a different frameCount, which is regrouped to renderQuantum via accumulation

Buffer Size (Backend)

The backend frameCount is filled by iterating renderQuantum blocks until remaining frames are consumed; a callback may span multiple blocks or a partial block

Callback Periodicity

Defined by the backend; the internal engine operates at 256 frames (~5.80 ms) or 512 frames (~11.61 ms) at 44,100 Hz

Additional Buffering

Input ring buffer (_unprocessedSource) with ~4 s capacity; _source and _rendering buffers per quantum. In record or duplex, input is accumulated

Quantum Contractual Impact

frameCount must be a power of two for the binaural panner; the current canonicalQuantum (256/512) satisfies this requirement. Non–power-of-two values violate preconditions

Primary Clock Source

Frame counter combined with canonicalSampleRate; hardware updates lastCursor per rendered block

Time Unit

Frames (Int) and TimeInterval derived from frame/sampleRate. currentTime and currentSampleFrame are derived from lastCursor

Effective Latency

Per-node latencies. Effective latency depends on the backend and the quantum

Channel Remixing

Mono↔stereo mixing occurs when there is a mismatch with speaker interpretation; it happens in the graph bus, not in the backend

Format Conversion

No sample-format conversion is performed; only planar → interleaved layout conversion on output

Binaural System Specifications

This section describes the effective behavior of the binaural audio system as it operates within the engine and in integration with the general render pipeline. The mathematical model and the internal implementation of the nodes are not documented.

PARAMETERS
EFFECTIVE-BEHAVIOR

Coordinate System

Right-handed system, documented in the API.

Orientation / Axes

Positive X axis to the right, positive Y axis upward, positive Z axis backward; perceptual forward corresponds to −Z. Negative Z values represent positions “in front of” the listener.

Forward Vector Convention

forward = −Z per documentation; a normalized listener.forward is used, assuming an orthonormal basis derived from forward and up.

Up Vector Convention

Defined via listener.up, used to construct the right and up axes in computeAngles; non-orthogonal vectors are assumed to be normalized or orthogonalized at a conceptual level.

Units of Measurement

Implicitly meters; inferred from parameters such as speedOfSound (m/s) and internal/external radii. No explicit conversion is performed.

System Origin

World-centric system; source and listener positions are evaluated in the same coordinate space.

Listener Role

Acoustic reference point defining position, orientation, velocity, and global parameters (doppler, speed of sound).

Spatial Parameters

position, forward, up, and velocity; doppler and speedOfSound influence the computation of dopplerRate.

Sampling Moment

Sampled per render block (frameCount) during execution of process / computeAngles.

Source → Listener Relationship

(source − listener) is computed, normalized, and azimuth, elevation, and distance are derived. If the vector is zero, azimuth and elevation are set to 0.

Expected Value Ranges

Approximate suggested ranges (x/z: −1000…1000, y: −40…1000) with no explicit clamping; non-finite values trigger assertions.

Out-of-Range Behavior

Elevation is clamped to the range supported by the HRTF; out-of-range azimuth may produce silence. Clamping behavior depends on BinauralDatabase.

Binaural Processing

HRTF processing via FFT convolution, delay lines, and kernel crossfading, producing binaural stereo output.

Processing Domain

Frequency-domain convolution (ConvolutionFFT) combined with time-domain delay stages (DelayStage).

Expected Sample Rate

Uses canonicalSampleRate. BinauralDatabase supports 44,100 Hz and 48,000 Hz; unsupported sample rates trigger assertions during initialization.

Input / Output Requirements

Mono or stereo input; strictly stereo output. No support for multichannel binaural output.

Implicit Assumptions

frameCount must be a power of two and the binaural database must be loaded for real-time rendering; otherwise, the node may mute or abort.

Binaural ↔ Stream Relationship

Binaural processing produces a stereo AudioBus; hardware may perform additional mixing depending on the final layout.

Pre / Post Conversion

No sample-format conversion is performed; output is interleaved only at the final hardware stage, remaining in Float32.

circle-info

Feel free to contact us with any concerns at [email protected]envelope our team will be happy to assist you in resolving any issues and improving your experience.

Last updated