Jay Voice Module Architecture

Overview

Jay is a state-machine-driven, multi-voice synthesis engine with pressure-based control and biological voice modeling. It's a specialized library designed for synthesizing realistic animal vocalizations (birds and mammals) with high-level biological abstractions.

Public API - Core Types

Voice Struct (`voice::Voice`)

Purpose: Individual voice state machine for synthesis
State Machine: VoiceState enum (Idle, RampUp, Chaos, Stabilize, Exhaust)
Key Fields:
id: usize - Voice identifier
phase: f32 - Oscillator phase
freq: f32 - Fundamental frequency
amp: f32 - Amplitude
is_active: bool - Active flag
velocity: f32 - Note velocity
pressure: f32 - Breath pressure (maps to biological concepts)

WaveType Enum

Extensive waveform support beyond basic synthesis:

Sine - Sine wave with 2nd harmonic richness
Saw - Sawtooth
Triangle - Triangle wave
Square - 50% duty cycle square
Pulse - PWM square (variable duty cycle)
Blit - Band-limited impulse train
Noise, PinkNoise, BrownNoise - Various noise types
Chaos - Logistic map chaos oscillator
Wavetable - Custom wavetable support

Filtering

BiquadFilter struct: Per-voice biquad filter with IIR implementation
Hard bandpass filter: Built-in 80Hz-8000Hz filtering on all voices
DC blocker: First-order highpass at ~5Hz
Per-voice filtering: Static or dynamic (time-envelope) filter configurations

Effects & Modulation Parameters

Bit Crushing: bit_depth (1.0-16.0)
Sample Rate Reduction: sample_rate_reduction
Ring Modulation: ring_mod_source, ring_mod_amount (cross-voice modulation)
Frequency Modulation: fm_source, fm_amount
Duty Cycle Modulation: duty_cycle_lfo_* parameters
Vibrato: vibrato_rate_hz, vibrato_depth_cents (LFO-based pitch mod)
Tremolo: tremolo_rate_hz, tremolo_depth (LFO-based amplitude mod)
Nonlinear Distortion: Tanh, Softclip, Fold, Waveshaper

VoiceConfig (Configuration)

Frequency Envelope: Time-based frequency trajectory
Pressure Envelope: Dynamic pressure control (maps to biological models)
Channel Assignment: Stereo/mono routing (0-2 channels)
Jitter Envelope: Pitch instability for biological realism
Distortion/Clipping Envelopes: Time-varying nonlinearities
Filters: Static or dynamic (time-based) per-voice filters
Effects: Per-voice effects chain
Vibrato/Tremolo/Duty Cycle LFO Config: Modulation LFOs

SynthEngine

8 concurrent voices by default
2-channel stereo output
44.1 kHz sample rate (configurable)
Pressure-based voice control
Global effects processor
Per-voice and global channel mixing

Animalian Orientation - Biological Models

Bird Vocalizations (BirdVoice)

Maps to syrinx-based dual sound source model:

Airflow: Drive pressure and burstiness
Tension: Left/right bronchial pipe tension, flutter, instability
Resonance: Chamber size, wall flexibility, pulse rate
Turbulence: Noise levels and chaos
Gating: Syrinx valve configuration (left/right/both/alternating)
Gestures: Pitch slope, roughness, transients

Mammal Vocalizations (MammalVoice)

Maps to laryngeal model with vocal fold mechanics:

Airflow: Breath pressure, burstiness, stability
Vocal Folds: Mode (modal/falsetto/fry), tension, mass
False Folds: Engagement, rattle, coupling
Tract Shape: Vocal tract configuration and formants
Turbulence: Noise, aspiration, roar
Gestures: Pitch contour, vibrato, tremolo

Key Improvements Over Standalone Voice Crate

Standalone Voice Crate (plant_voice)

6 engine types (FM, SuperSaw, Pluck, Noise, Sample, Hybrid)
Simple Control struct: freq, energy, pressure, attack, brightness, vibrato, drift, gate, trigger
Lightweight ADSR envelope: Pressure influences decay/release
SVF filters: Simple state-variable filter post-processing
Stereo-aware: SuperSaw has native stereo, others mono-to-stereo duplicate
Use case: General-purpose synth engines

Jay Voice Advantages

Biological Abstraction Layer: Bird/mammal models with physiologically-inspired parameters instead of raw synthesis controls
Pressure-Based Foundation: Entire synthesis driven by pressure envelope (vs. simple ADSR in standalone)
Rich Oscillator Variety: 11+ waveform types including pink noise, brown noise, logistic chaos, custom wavetables
Cross-Voice Modulation: Ring mod and FM between voices
Advanced Nonlinearity: 4 distinct waveshaper types vs. simple tanh saturation
Per-Voice Filtering: Biquad filters configurable per voice with dynamic envelopes
LFO-Based Modulation: Vibrato, tremolo, duty cycle LFO with independent rate/depth
Built-in Hard Filtering: Guaranteed 80Hz-8000Hz bandpass prevents DC offset and aliasing
Oversampling Path: 4x oversampling for distortion/clipping when needed
Performance Envelopes: 5-phase temporal control for complex biological gestures

Architecture Difference

Standalone Voice: Synth-centric, generic engines with simple control flow

Control struct → ADSR Envelope → Engine → SVF Filter → Saturation

Jay Voice: Biologically-inspired, pressure-driven synthesis

VoiceConfig (frequency, pressure envelopes) → Voice state machine →
Multi-oscillator generation → Per-voice filter → Distortion →
Hard bandpass filter → Master effects

Configuration Format

YAML/JSON configuration files
Pressure points define temporal dynamics
Frequency envelopes define pitch trajectory
Filter envelopes define time-varying EQ
Effect chains stack multiple processors

Real-time Control Methods

trigger_voice(voice_id, freq, velocity)
release_voice(voice_id)
set_voice_frequency(voice_id, freq)
set_voice_pressure(voice_id, pressure)
render_buffer(time_ms) for audio generation