Specifications
Guide to the Operation and Constraints of the Audio System
This section documents the effective runtime behavior of the system. It does not define API contracts; instead, it describes which parameters remain fixed, which may be negotiated, and in which layers relevant decisions are made.
Backend
Adapter (CoreAudio, WASAPI, ALSA)
Compatibility
Linux, Windows, and macOS. Mobile: Android and iOS
Render Mode (pull/push)
Hybrid model: the backend delivers push callbacks, while the internal graph is resolved via pull
Audio Routing
Render (output), Record (input), and Duplex. If the requested route is unavailable, the system forces render
Sample Rate
Fixed canonicalSampleRate at 44,100 Hz. No dynamic negotiation with the device and no resampling on the main path
Bit Depth
32-bit floating point (Float32). Backend stream and internal engine representation match
Sample Format
Float32 (Scalar = Float32). No sample-format conversion occurs at any stage of the pipeline
Channel Count (output)
Explicit support for mono and stereo; values >2 are not implemented in the engine
Channel Count (input)
In record/duplex modes, the effective input to the graph is mono; no multichannel deinterleaving is performed. AudioGraph supports mono input only
Buffer Layout
Internally planar (one AudioBus per channel). Output to the backend is interleaved; input from the backend is treated as a linear buffer
Fixed / Negotiated Parameters
Fixed: sample rate, Float32 format, and internal Quantum. Negotiated: device category (render/record/duplex) based on availability
Requested / Effective
A duplex request may degrade to render if capture is unavailable; input is not delivered if it is not mono
Numeric Representation
Audio in Float32; control and time in Float64. DSP processing and audio buses operate in Float32
Conversions
No format conversion; only planar-to-interleaved interleaving on output, delivering interleaved Float32 to the backend
Conversion Point
Interleaving is performed in HardwareIO; layout conversion does not occur within the audio graph
Effective Quantum (frames)
Uses canonicalQuantum = .adaptive (256) on real hardware and .balanced (512) on the simulator. Quantum is fixed for the graph and depends on targetEnvironment
Quantum Variability
The internal graph operates with a fixed Quantum; the backend callback may deliver a different frameCount, which is regrouped to renderQuantum via accumulation
Buffer Size (Backend)
The backend frameCount is filled by iterating renderQuantum blocks until remaining frames are consumed; a callback may span multiple blocks or a partial block
Callback Periodicity
Defined by the backend; the internal engine operates at 256 frames (~5.80 ms) or 512 frames (~11.61 ms) at 44,100 Hz
Additional Buffering
Input ring buffer (_unprocessedSource) with ~4 s capacity; _source and _rendering buffers per quantum. In record or duplex, input is accumulated
Quantum Contractual Impact
frameCount must be a power of two for the binaural panner; the current canonicalQuantum (256/512) satisfies this requirement. Non–power-of-two values violate preconditions
Primary Clock Source
Frame counter combined with canonicalSampleRate; hardware updates lastCursor per rendered block
Time Unit
Frames (Int) and TimeInterval derived from frame/sampleRate. currentTime and currentSampleFrame are derived from lastCursor
Effective Latency
Per-node latencies. Effective latency depends on the backend and the quantum
Channel Remixing
Mono↔stereo mixing occurs when there is a mismatch with speaker interpretation; it happens in the graph bus, not in the backend
Format Conversion
No sample-format conversion is performed; only planar → interleaved layout conversion on output
Binaural System Specifications
This section describes the effective behavior of the binaural audio system as it operates within the engine and in integration with the general render pipeline. The mathematical model and the internal implementation of the nodes are not documented.
Coordinate System
Right-handed system, documented in the API.
Orientation / Axes
Positive X axis to the right, positive Y axis upward, positive Z axis backward; perceptual forward corresponds to −Z. Negative Z values represent positions “in front of” the listener.
Forward Vector Convention
forward = −Z per documentation; a normalized listener.forward is used, assuming an orthonormal basis derived from forward and up.
Up Vector Convention
Defined via listener.up, used to construct the right and up axes in computeAngles; non-orthogonal vectors are assumed to be normalized or orthogonalized at a conceptual level.
Units of Measurement
Implicitly meters; inferred from parameters such as speedOfSound (m/s) and internal/external radii. No explicit conversion is performed.
System Origin
World-centric system; source and listener positions are evaluated in the same coordinate space.
Listener Role
Acoustic reference point defining position, orientation, velocity, and global parameters (doppler, speed of sound).
Spatial Parameters
position, forward, up, and velocity; doppler and speedOfSound influence the computation of dopplerRate.
Sampling Moment
Sampled per render block (frameCount) during execution of process / computeAngles.
Source → Listener Relationship
(source − listener) is computed, normalized, and azimuth, elevation, and distance are derived. If the vector is zero, azimuth and elevation are set to 0.
Expected Value Ranges
Approximate suggested ranges (x/z: −1000…1000, y: −40…1000) with no explicit clamping; non-finite values trigger assertions.
Out-of-Range Behavior
Elevation is clamped to the range supported by the HRTF; out-of-range azimuth may produce silence. Clamping behavior depends on BinauralDatabase.
Binaural Processing
HRTF processing via FFT convolution, delay lines, and kernel crossfading, producing binaural stereo output.
Processing Domain
Frequency-domain convolution (ConvolutionFFT) combined with time-domain delay stages (DelayStage).
Expected Sample Rate
Uses canonicalSampleRate. BinauralDatabase supports 44,100 Hz and 48,000 Hz; unsupported sample rates trigger assertions during initialization.
Input / Output Requirements
Mono or stereo input; strictly stereo output. No support for multichannel binaural output.
Implicit Assumptions
frameCount must be a power of two and the binaural database must be loaded for real-time rendering; otherwise, the node may mute or abort.
Binaural ↔ Stream Relationship
Binaural processing produces a stereo AudioBus; hardware may perform additional mixing depending on the final layout.
Pre / Post Conversion
No sample-format conversion is performed; output is interleaved only at the final hardware stage, remaining in Float32.
Feel free to contact us with any concerns at [email protected] our team will be happy to assist you in resolving any issues and improving your experience.
Last updated