Jay Voice Module Architecture
Overview
Jay is a state-machine-driven, multi-voice synthesis engine with pressure-based control and biological voice modeling. It's a specialized library designed for synthesizing realistic animal vocalizations (birds and mammals) with high-level biological abstractions.
Public API - Core Types
Voice Struct (voice::Voice)
- Purpose: Individual voice state machine for synthesis
- State Machine:
VoiceStateenum (Idle, RampUp, Chaos, Stabilize, Exhaust) - Key Fields:
id: usize- Voice identifierphase: f32- Oscillator phasefreq: f32- Fundamental frequencyamp: f32- Amplitudeis_active: bool- Active flagvelocity: f32- Note velocitypressure: f32- Breath pressure (maps to biological concepts)
WaveType Enum
Extensive waveform support beyond basic synthesis:
Sine- Sine wave with 2nd harmonic richnessSaw- SawtoothTriangle- Triangle waveSquare- 50% duty cycle squarePulse- PWM square (variable duty cycle)Blit- Band-limited impulse trainNoise,PinkNoise,BrownNoise- Various noise typesChaos- Logistic map chaos oscillatorWavetable- Custom wavetable support
Filtering
- BiquadFilter struct: Per-voice biquad filter with IIR implementation
- Hard bandpass filter: Built-in 80Hz-8000Hz filtering on all voices
- DC blocker: First-order highpass at ~5Hz
- Per-voice filtering: Static or dynamic (time-envelope) filter configurations
Effects & Modulation Parameters
- Bit Crushing:
bit_depth(1.0-16.0) - Sample Rate Reduction:
sample_rate_reduction - Ring Modulation:
ring_mod_source,ring_mod_amount(cross-voice modulation) - Frequency Modulation:
fm_source,fm_amount - Duty Cycle Modulation:
duty_cycle_lfo_*parameters - Vibrato:
vibrato_rate_hz,vibrato_depth_cents(LFO-based pitch mod) - Tremolo:
tremolo_rate_hz,tremolo_depth(LFO-based amplitude mod) - Nonlinear Distortion: Tanh, Softclip, Fold, Waveshaper
VoiceConfig (Configuration)
- Frequency Envelope: Time-based frequency trajectory
- Pressure Envelope: Dynamic pressure control (maps to biological models)
- Channel Assignment: Stereo/mono routing (0-2 channels)
- Jitter Envelope: Pitch instability for biological realism
- Distortion/Clipping Envelopes: Time-varying nonlinearities
- Filters: Static or dynamic (time-based) per-voice filters
- Effects: Per-voice effects chain
- Vibrato/Tremolo/Duty Cycle LFO Config: Modulation LFOs
SynthEngine
- 8 concurrent voices by default
- 2-channel stereo output
- 44.1 kHz sample rate (configurable)
- Pressure-based voice control
- Global effects processor
- Per-voice and global channel mixing
Animalian Orientation - Biological Models
Bird Vocalizations (BirdVoice)
Maps to syrinx-based dual sound source model:
- Airflow: Drive pressure and burstiness
- Tension: Left/right bronchial pipe tension, flutter, instability
- Resonance: Chamber size, wall flexibility, pulse rate
- Turbulence: Noise levels and chaos
- Gating: Syrinx valve configuration (left/right/both/alternating)
- Gestures: Pitch slope, roughness, transients
Mammal Vocalizations (MammalVoice)
Maps to laryngeal model with vocal fold mechanics:
- Airflow: Breath pressure, burstiness, stability
- Vocal Folds: Mode (modal/falsetto/fry), tension, mass
- False Folds: Engagement, rattle, coupling
- Tract Shape: Vocal tract configuration and formants
- Turbulence: Noise, aspiration, roar
- Gestures: Pitch contour, vibrato, tremolo
Key Improvements Over Standalone Voice Crate
Standalone Voice Crate (plant_voice)
- 6 engine types (FM, SuperSaw, Pluck, Noise, Sample, Hybrid)
- Simple Control struct: freq, energy, pressure, attack, brightness, vibrato, drift, gate, trigger
- Lightweight ADSR envelope: Pressure influences decay/release
- SVF filters: Simple state-variable filter post-processing
- Stereo-aware: SuperSaw has native stereo, others mono-to-stereo duplicate
- Use case: General-purpose synth engines
Jay Voice Advantages
- Biological Abstraction Layer: Bird/mammal models with physiologically-inspired parameters instead of raw synthesis controls
- Pressure-Based Foundation: Entire synthesis driven by pressure envelope (vs. simple ADSR in standalone)
- Rich Oscillator Variety: 11+ waveform types including pink noise, brown noise, logistic chaos, custom wavetables
- Cross-Voice Modulation: Ring mod and FM between voices
- Advanced Nonlinearity: 4 distinct waveshaper types vs. simple tanh saturation
- Per-Voice Filtering: Biquad filters configurable per voice with dynamic envelopes
- LFO-Based Modulation: Vibrato, tremolo, duty cycle LFO with independent rate/depth
- Built-in Hard Filtering: Guaranteed 80Hz-8000Hz bandpass prevents DC offset and aliasing
- Oversampling Path: 4x oversampling for distortion/clipping when needed
- Performance Envelopes: 5-phase temporal control for complex biological gestures
Architecture Difference
Standalone Voice: Synth-centric, generic engines with simple control flow
Control struct → ADSR Envelope → Engine → SVF Filter → Saturation
Jay Voice: Biologically-inspired, pressure-driven synthesis
VoiceConfig (frequency, pressure envelopes) → Voice state machine →
Multi-oscillator generation → Per-voice filter → Distortion →
Hard bandpass filter → Master effects
Configuration Format
- YAML/JSON configuration files
- Pressure points define temporal dynamics
- Frequency envelopes define pitch trajectory
- Filter envelopes define time-varying EQ
- Effect chains stack multiple processors
Real-time Control Methods
trigger_voice(voice_id, freq, velocity)release_voice(voice_id)set_voice_frequency(voice_id, freq)set_voice_pressure(voice_id, pressure)render_buffer(time_ms)for audio generation