Files
alknet/docs/research/references/iroh/iroh-live/06-moq-media-pipelines.md

13 KiB
Raw Blame History

moq-media: Media Pipelines

Overview

moq-media owns the media pipeline: broadcast management, codec orchestration, playout timing, adaptive bitrate, and audio backend. It has no dependency on iroh — it works with any transport that implements PacketSource and PacketSink. This makes it usable for recording pipelines, studio links, and camera dashboards without RTC.

Module Structure

moq-media/
├── lib.rs              — Re-exports and feature-gated modules
├── publish.rs          — LocalBroadcast, VideoPublisher, AudioPublisher
├── subscribe.rs         — RemoteBroadcast, VideoTrack, AudioTrack, MediaTracks
├── transport.rs         — PacketSource/PacketSink traits, MoqPacketSource, MoqPacketSink
├── net.rs               — NetworkSignals (RTT, loss rate, available bandwidth)
├── adaptive.rs          — Adaptive rendition switching algorithm
├── playout.rs           — PlaybackPolicy, SyncMode
├── chat.rs              — ChatPublisher, ChatSubscriber (MoQ track-based)
├── frame_channel.rs     — Single-frame channel (last-writer-wins for video)
├── sync.rs              — Shared playout clock (Sync) for A/V sync
├── stats.rs             — Metric, Label, NetStats, EncodeStats, RenderStats, etc.
├── pipeline.rs          — Pipeline orchestration
├── pipeline/            — VideoEncoderPipeline, AudioEncoderPipeline, VideoDecoderPipeline, etc.
├── audio_backend.rs     — AudioBackend trait and device enumeration
├── audio_backend/       — Platform-specific audio backends (cpal, etc.)
├── capture.rs           — Camera/screen capture integration
├── source_spec.rs       — VideoInput, PreEncodedTrack
├── test_util.rs         — Test utilities (feature-gated)
└── processing/          — Scale, color conversion, etc.

Publish Pipeline — LocalBroadcast

LocalBroadcast manages encoder pipelines and publishes a catalog that subscribers use to discover available renditions. It owns a BroadcastProducer (from moq-lite) and coordinates video and audio track lifecycles.

Construction

let broadcast = LocalBroadcast::new();
broadcast.video().set_source(camera, VideoCodec::H264, [VideoPreset::P720])?;
broadcast.audio().set(mic, AudioCodec::Opus, [AudioPreset::Hq])?;

// Or pre-encoded sources
broadcast.video().set(VideoInput::pre_encoded("video/h264-pi", config, factory))?;

Slot Handles

  • broadcast.video()VideoPublisher (borrows &self)
  • broadcast.audio()AudioPublisher (borrows &self)

Both use interior mutability. Calling set() tears down any existing pipeline and installs the new one.

Video Input Modes

pub enum VideoInput {
    Renditions(VideoRenditions),  // Raw source → multiple encoded renditions (simulcast)
    PreEncoded(Vec<PreEncodedTrack>), // Already-encoded tracks pass through
}

VideoRenditions holds a SharedVideoSource and a map of rendition names to encoder factories. Multiple renditions share the same source via watch::Receiver<Option<VideoFrame>>. Slow encoders never cause backpressure on the source — intermediate frames are silently skipped.

PreEncodedTrack is for hardware encoders that produce compressed output directly (e.g., rpicam-vid on Raspberry Pi). Each track carries a name, VideoConfig, and a factory closure that creates a fresh source per subscriber.

SharedVideoSource

Runs the capture source on a dedicated OS thread. Parks when no subscribers are connected (releasing camera/screen resources) and unparks when the first subscriber arrives. Uses AtomicU32 subscriber counting with proper memory ordering (AcqRel/Acquire).

Frames are distributed via watch::Sender<Option<VideoFrame>> — always contains the latest frame, so slow encoders never block the source.

Demand-Driven Track Startup

The broadcast's run loop (LocalBroadcast::run_dynamic) calls producer.requested_track().await to wait for subscriber demand. When a subscriber requests a specific rendition:

  1. The loop looks up the rendition in the current VideoInput or AudioRenditions
  2. It starts the corresponding encoder pipeline on a dedicated OS thread
  3. When all subscribers disconnect (tracked via track.unused().await), the pipeline is stopped

This means encoder threads only run when someone is actually consuming.

Catalog

LocalBroadcast maintains a catalog track (hang's built-in catalog mechanism) listing all available video and audio renditions with codec configuration, dimensions, and bitrate. Updated whenever video or audio is set/cleared.

Catalog format follows the hang::catalog::Catalog structure with Video and Audio entries, each containing a BTreeMap<String, Config> of rendition names to configurations.

Encoder Pipeline Architecture

All encoder pipelines run on dedicated OS threads (spawn_thread), not tokio tasks. Codec operations are CPU-intensive and sometimes block on hardware (VAAPI, V4L2), so running on tokio tasks would starve other async work.

Communication with the async runtime:

  • VideoEncoderPipeline: reads SharedVideoSource via watch::Receiver, writes encoded frames to MoqPacketSink
  • AudioEncoderPipeline: reads from AudioSource, writes to MoqPacketSink
  • PreEncodedVideoPipeline: reads from PreEncodedVideoSource, writes to MoqPacketSink

Chat

let chat_publisher = broadcast.enable_chat()?;
chat_publisher.send("Hello!")?;

// Subscriber side
if let Some(chat_sub) = remote_broadcast.chat() {
    let msg = chat_sub.recv().await;
}

Each chat message is a single MoQ group with one frame of UTF-8 text. The track name is "chat" with priority 10.

Subscribe Pipeline — RemoteBroadcast

RemoteBroadcast wraps a BroadcastConsumer and watches its catalog for available video and audio renditions. Created with a BroadcastConsumer and a PlaybackPolicy.

Construction

let broadcast = RemoteBroadcast::new("stream-name", consumer).await?;
// Or with explicit policy
let broadcast = RemoteBroadcast::with_playback_policy("stream", consumer, policy).await?;

On construction, spawns a catalog-watching task that publishes snapshots via Watchable<CatalogSnapshot>.

CatalogSnapshot

Point-in-time view of the broadcast's catalog. Derefs to hang::Catalog. Carries a sequence number for change detection.

let catalog = broadcast.catalog();
catalog.video_renditions()  // Iterator of rendition names sorted by width
catalog.audio_renditions()  // Iterator of audio rendition names
catalog.select_video_rendition(Quality::High)?  // Best match for quality
catalog.has_video()
catalog.has_audio()
catalog.has_chat()
catalog.user()  // User metadata from publisher

Rendition Selection

pub enum Quality { Highest, High, Mid, Low }

pub struct VideoTarget {
    pub max_pixels: Option<u32>,
    pub max_bitrate_kbps: Option<u32>,
    pub rendition: Option<String>,  // Pin to specific rendition
}

Quality::Highmax_pixels(1280*720), etc. If rendition is set, it takes priority.

VideoTrack

Represents a decoded video stream from a remote broadcast. The decoder runs on a dedicated OS thread.

Creation flow:

  1. Pick a rendition (via VideoTarget or explicit name)
  2. Create TrackConsumer from BroadcastConsumer, wrap in OrderedConsumer with PlaybackPolicy::max_latency
  3. Wrap in MoqPacketSource
  4. A forward_packets async task reads from MoqPacketSourcempsc channel
  5. Decoder thread reads mpsc → decoder → output via Sync playout clock (or FramePacer)
  6. Output channel: FrameReceiver<VideoFrame> (latest-frame wins, suitable for rendering)

Frame access:

  • track.try_recv() — Returns latest frame, draining older buffered frames (for game loops)
  • track.next_frame().await — Async wait for next frame
  • track.has_frame() — Check without consuming

Adaptive rendition switching:

track.enable_adaptation(broadcast, signals, config, decode_config)?;
track.disable_adaptation();
track.is_adaptive();
track.selected_rendition();
track.set_rendition_mode(RenditionMode::Fixed("video/h264-360p".into()));
track.set_rendition_mode(RenditionMode::Auto);
track.rendition_watcher();  // Direct<String> watcher for rendition changes

AudioTrack

Same pattern as VideoTrack but sends decoded samples to an AudioSink (typically cpal + sonora). The audio decoder thread runs a 10ms tick loop.

MediaTracks

Convenience struct combining RemoteBroadcast with optional VideoTrack and AudioTrack:

pub struct MediaTracks {
    pub broadcast: RemoteBroadcast,
    pub video: Option<VideoTrack>,
    pub audio: Option<AudioTrack>,
}

Lifecycle

Both VideoTrack and AudioTrack use drop-based cleanup. Dropping cancels the decoder thread (via CancellationToken) and the forward_packets task (via AbortOnDropHandle). The OrderedConsumer is dropped, signaling the transport that the track is no longer needed.

Transport Abstraction — PacketSource / PacketSink

The transport boundary between moq-media and the network:

pub trait PacketSource: Send + 'static {
    fn read(&mut self) -> impl Future<Output = Result<Option<MediaPacket>>> + Send;
}

pub trait PacketSink: Send + 'static {
    fn write(&mut self, packet: EncodedFrame) -> Result<()>;
    fn finish(&mut self) -> Result<()>;
}

MoqPacketSink wraps an OrderedProducer. When it receives an EncodedFrame with is_keyframe = true, it calls keyframe() on the producer to start a new MoQ group. This keyframe-to-group mapping is how subscribers can join at any group boundary.

MoqPacketSource wraps an OrderedConsumer and reads frames, converting them to MediaPacket.

PipeSink / PipeSource — In-memory pipe for local encode→decode without network (testing, local preview).

Adaptive Rendition Switching

The adaptation algorithm runs in a background task that monitors NetworkSignals and decides whether to switch to a different video rendition.

Algorithm

Renditions are ranked by pixel count (highest first). The algorithm maintains state across ticks:

pub enum Decision {
    Hold,                      // Stay on current rendition
    Downgrade(usize),          // Switch to lower at index
    Emergency,                 // Drop to lowest immediately
    StartProbe(usize),         // Try upgrading to index
}

Emergency (immediate): Loss rate ≥ 20% → drop to lowest rendition

Downgrade (sustained 500ms): Loss rate ≥ 10% OR available bandwidth < 85% of current rendition's bitrate

Upgrade probe (sustained 4s good conditions): Loss ≤ 2%, bandwidth ≥ 120% of next-higher rendition's bitrate → start 3-second probe on the higher rendition

Probe abort: Loss ≥ 5% or new congestion events during probe → abort, 8s cooldown

Post-downgrade cooldown: 4s after any downgrade before probes are allowed

Implementation

The adaptation task (adaptation_task_v2) creates new VideoDecoderPipelines that write to the same FrameSender via with_sender(). The frame channel stays the same while the underlying decoder pipeline gets swapped. When switching:

  1. Create a new decoder pipeline for the target rendition
  2. Drop the old pipeline handle
  3. Update selected_rendition Watchable

Playback and Sync

PlaybackPolicy

pub struct PlaybackPolicy {
    pub sync: SyncMode,           // Synced (shared clock) or Unmanaged (PTS pacing)
    pub max_latency: Duration,    // Default: 150ms — how much buffering before skipping forward
}

SyncMode

  • Synced (default): Shared playout clock (Sync). Video frames are gated by Sync::wait(pts), which blocks until reference + pts + latency arrives. Audio paces itself through its ring buffer (~80ms).
  • Unmanaged: No synchronization. FramePacer sleeps between frames based on PTS deltas, clamped to 2× frame period.

Sync

The Sync type records arrival offsets via received(pts) and blocks on wait(pts) until reference + pts + latency. This keeps audio and video aligned without cross-path gating or signaling. Ported from the moq/js implementation.

Stats

moq-media has a structured stats system for debug overlays:

  • NetStats — RTT, loss%, bandwidth, path type (written by iroh-live transport bridge)
  • EncodeStats — FPS, encode time, bitrate, codec, encoder, resolution, capture path
  • RenderStats — FPS, decode time, decoder, renderer, rendition
  • TimingStats — Audio buffer level, video/audio lag, A/V delta, video buffer depth
  • Timeline — Ring buffer of FrameMeta entries for timeline visualization

Each Metric has EMA smoothing, a history ring buffer, and optional color thresholds. Label provides atomic string values.

Codec Support

Feature-gated codec support:

Feature Codec Backend
h264 H.264 openh264 (software)
av1 AV1 rav1e encoder, rav1d decoder
opus Opus opus crate
vaapi VAAPI Linux hardware encode/decode
videotoolbox VideoToolbox macOS hardware
v4l2 V4L2 Raspberry Pi hardware
pcm Raw PCM No encoding