Skip to content

Jay Voice Module Architecture

Overview

Jay is a state-machine-driven, multi-voice synthesis engine with pressure-based control and biological voice modeling. It's a specialized library designed for synthesizing realistic animal vocalizations (birds and mammals) with high-level biological abstractions.

Public API - Core Types

Voice Struct (voice::Voice)

  • Purpose: Individual voice state machine for synthesis
  • State Machine: VoiceState enum (Idle, RampUp, Chaos, Stabilize, Exhaust)
  • Key Fields:
  • id: usize - Voice identifier
  • phase: f32 - Oscillator phase
  • freq: f32 - Fundamental frequency
  • amp: f32 - Amplitude
  • is_active: bool - Active flag
  • velocity: f32 - Note velocity
  • pressure: f32 - Breath pressure (maps to biological concepts)

WaveType Enum

Extensive waveform support beyond basic synthesis:

  • Sine - Sine wave with 2nd harmonic richness
  • Saw - Sawtooth
  • Triangle - Triangle wave
  • Square - 50% duty cycle square
  • Pulse - PWM square (variable duty cycle)
  • Blit - Band-limited impulse train
  • Noise, PinkNoise, BrownNoise - Various noise types
  • Chaos - Logistic map chaos oscillator
  • Wavetable - Custom wavetable support

Filtering

  • BiquadFilter struct: Per-voice biquad filter with IIR implementation
  • Hard bandpass filter: Built-in 80Hz-8000Hz filtering on all voices
  • DC blocker: First-order highpass at ~5Hz
  • Per-voice filtering: Static or dynamic (time-envelope) filter configurations

Effects & Modulation Parameters

  • Bit Crushing: bit_depth (1.0-16.0)
  • Sample Rate Reduction: sample_rate_reduction
  • Ring Modulation: ring_mod_source, ring_mod_amount (cross-voice modulation)
  • Frequency Modulation: fm_source, fm_amount
  • Duty Cycle Modulation: duty_cycle_lfo_* parameters
  • Vibrato: vibrato_rate_hz, vibrato_depth_cents (LFO-based pitch mod)
  • Tremolo: tremolo_rate_hz, tremolo_depth (LFO-based amplitude mod)
  • Nonlinear Distortion: Tanh, Softclip, Fold, Waveshaper

VoiceConfig (Configuration)

  • Frequency Envelope: Time-based frequency trajectory
  • Pressure Envelope: Dynamic pressure control (maps to biological models)
  • Channel Assignment: Stereo/mono routing (0-2 channels)
  • Jitter Envelope: Pitch instability for biological realism
  • Distortion/Clipping Envelopes: Time-varying nonlinearities
  • Filters: Static or dynamic (time-based) per-voice filters
  • Effects: Per-voice effects chain
  • Vibrato/Tremolo/Duty Cycle LFO Config: Modulation LFOs

SynthEngine

  • 8 concurrent voices by default
  • 2-channel stereo output
  • 44.1 kHz sample rate (configurable)
  • Pressure-based voice control
  • Global effects processor
  • Per-voice and global channel mixing

Animalian Orientation - Biological Models

Bird Vocalizations (BirdVoice)

Maps to syrinx-based dual sound source model:

  • Airflow: Drive pressure and burstiness
  • Tension: Left/right bronchial pipe tension, flutter, instability
  • Resonance: Chamber size, wall flexibility, pulse rate
  • Turbulence: Noise levels and chaos
  • Gating: Syrinx valve configuration (left/right/both/alternating)
  • Gestures: Pitch slope, roughness, transients

Mammal Vocalizations (MammalVoice)

Maps to laryngeal model with vocal fold mechanics:

  • Airflow: Breath pressure, burstiness, stability
  • Vocal Folds: Mode (modal/falsetto/fry), tension, mass
  • False Folds: Engagement, rattle, coupling
  • Tract Shape: Vocal tract configuration and formants
  • Turbulence: Noise, aspiration, roar
  • Gestures: Pitch contour, vibrato, tremolo

Key Improvements Over Standalone Voice Crate

Standalone Voice Crate (plant_voice)

  • 6 engine types (FM, SuperSaw, Pluck, Noise, Sample, Hybrid)
  • Simple Control struct: freq, energy, pressure, attack, brightness, vibrato, drift, gate, trigger
  • Lightweight ADSR envelope: Pressure influences decay/release
  • SVF filters: Simple state-variable filter post-processing
  • Stereo-aware: SuperSaw has native stereo, others mono-to-stereo duplicate
  • Use case: General-purpose synth engines

Jay Voice Advantages

  1. Biological Abstraction Layer: Bird/mammal models with physiologically-inspired parameters instead of raw synthesis controls
  2. Pressure-Based Foundation: Entire synthesis driven by pressure envelope (vs. simple ADSR in standalone)
  3. Rich Oscillator Variety: 11+ waveform types including pink noise, brown noise, logistic chaos, custom wavetables
  4. Cross-Voice Modulation: Ring mod and FM between voices
  5. Advanced Nonlinearity: 4 distinct waveshaper types vs. simple tanh saturation
  6. Per-Voice Filtering: Biquad filters configurable per voice with dynamic envelopes
  7. LFO-Based Modulation: Vibrato, tremolo, duty cycle LFO with independent rate/depth
  8. Built-in Hard Filtering: Guaranteed 80Hz-8000Hz bandpass prevents DC offset and aliasing
  9. Oversampling Path: 4x oversampling for distortion/clipping when needed
  10. Performance Envelopes: 5-phase temporal control for complex biological gestures

Architecture Difference

Standalone Voice: Synth-centric, generic engines with simple control flow

Control struct → ADSR Envelope → Engine → SVF Filter → Saturation

Jay Voice: Biologically-inspired, pressure-driven synthesis

VoiceConfig (frequency, pressure envelopes) → Voice state machine →
Multi-oscillator generation → Per-voice filter → Distortion →
Hard bandpass filter → Master effects

Configuration Format

  • YAML/JSON configuration files
  • Pressure points define temporal dynamics
  • Frequency envelopes define pitch trajectory
  • Filter envelopes define time-varying EQ
  • Effect chains stack multiple processors

Real-time Control Methods

  • trigger_voice(voice_id, freq, velocity)
  • release_voice(voice_id)
  • set_voice_frequency(voice_id, freq)
  • set_voice_pressure(voice_id, pressure)
  • render_buffer(time_ms) for audio generation