Geoffroy Couprie - @gcouprie - Integrating some Rust in VLC media player

Rustconf 2016: Integrating some Rust in VLC media player

Who

rust main language for nearly a year now

we're pushing some Rust in production now ssh jail, proxy, git subsystems

VideoLAN

what I do at VideoLAN

VLC handles most audio, video and streaming formats

VLC media player vulnerabilities

handling multiple formats is dangerous. MP4 has a lot of flaws, but not the MKV demuxer

fuzzing

Format vulnerabilities

a good developer would avoid those issues But how can we fight the urge to write vulnerable code?

We need a practical solution

if not for the "embeddable", I would write some haskell

Yay! Rust!

- Rust makes memory handling easier - slices are really useful - easy FFI

nom

what is nom?

I needed an easy way to write parsers parser combinators are simple, they're just functions easy to experiment

started in July 2014 (in another repo) no impl Trait no lifetime elision closures?

fighting lifetime issues was (still is) a pain compilation times (well, compilation times can be hairy in rust, like the alt! bug)

Basic types

fn parser<I,O,E>(input: I) -> IResult<I, O, E>
pub enum IResult<Input,Output,CustomError=u32> {
  Done(Input,Output),
  Error(Err<Input,CustomError>),
  Incomplete(Needed)
}

EXPLAIN

Done contains output and remaining input

MACROS!!!

named!(data,
  terminated!( alpha, digit )
);

composition of functions

take time to explain here

macros are easy to write a bit annoying to debug when you don't get the right types some patterns are hard to express, like the permutation parser parser combinators are easy to express

Generated code

fn data<'a>(input: &'a [u8]) -> IResult<&'a [u8], &'a [u8], u32> {
  match alpha(input) {
    Done(i, o) => {
      match digit(i) {
        Done(i2, _) => Done(i2, o),
        Incomplete(Needed::Size(i)) =>
          Incomplete(Needed::Size(input.input_len() - i.input_len() + i)),
        e => e,
      }
    },
    e => e
  }
}

Actual generated code

fn data<'a>(i: &'a [u8]) -> ::nom::IResult<&'a [u8], &[u8], u32> {
    match {
              {
                  use nom::InputLength;
                  match alpha(i) {
                      ::nom::IResult::Error(e) =>
                      ::nom::IResult::Error(e),
                      ::nom::IResult::Incomplete(::nom::Needed::Unknown)
                      =>
                      ::nom::IResult::Incomplete(::nom::Needed::Unknown),
                      ::nom::IResult::Incomplete(::nom::Needed::Size(i))
                      =>
                      ::nom::IResult::Incomplete(::nom::Needed::Size(0usize
                                                                         +
                                                                         i)),
                      ::nom::IResult::Done(i, o) => {
                          {
                              use nom::InputLength;
                              match digit(i) {
                                  ::nom::IResult::Error(e) =>
                                  ::nom::IResult::Error(e),
                                  ::nom::IResult::Incomplete(::nom::Needed::Unknown)
                                  =>
                                  ::nom::IResult::Incomplete(::nom::Needed::Unknown),
                                  ::nom::IResult::Incomplete(::nom::Needed::Size(i))
                                  =>
                                  ::nom::IResult::Incomplete(::nom::Needed::Size(0usize
                                                                                     +
                                                                                     ((i).input_len()
                                                                                          -
                                                                                          i.input_len())
                                                                                     +
                                                                                     i)),
                                  ::nom::IResult::Done(i, o) => {
                                      ::nom::IResult::Done(i, (o, o))
                                  }
                              }
                          }
                      }
                  }
              }
          } {
        ::nom::IResult::Error(a) => ::nom::IResult::Error(a),
        ::nom::IResult::Incomplete(i) => ::nom::IResult::Incomplete(i),
        ::nom::IResult::Done(remaining, (_, o)) => {
            ::nom::IResult::Done(remaining, o)
        }
    }
}

Features

Features (nom 2.0)

VLC media player

How VLC works

The pipeline

access -> demux (=> audio and video streams) -> decode -> filter -> (encode -> stream) | (video and audio out)

Code architecture

- libVLCCore: handles module loading, playlist, synchronization, the whole pipeline - libVLC: a layer above libVLCCore for external applications - vlc: a small executable calling libVLC

Lifetime of a VLC module

so we don't control anything from the module, we just take orders

The plan

Reproducing structures

struct demux_t
{
    VLC_COMMON_MEMBERS

    module_t    *p_module;

    char        *psz_access;
    char        *psz_demux;
    char        *psz_location;
    char        *psz_file;

    union {
        stream_t *s;
        demux_t *p_next;
    };

    /* es output */
    es_out_t *out; /* our p_es_out */

    [...]

there's no silver bullet here, bindgen and others could not work on VLC headers VLC uses a sort of object-like structure, with objects sharing (via a macro) some common members taking some inspiration from rust-openssl and others here: put the structure definitions and function imports in a separate file, make safer wrappers above them, using Rust types

check the bindgen claim

C structure to Rust

#[repr(C)]
pub struct vlc_object_t {
  pub psz_object_type: *const c_char,
  pub psz_header:      *mut c_char,
  pub i_flags:         c_int,
  pub b_force:         bool,
  pub p_libvlc:        *mut libvlc_int_t,
  pub p_parent:        *mut vlc_object_t,
}

#[repr(C)]
pub struct demux_t<T> {
  //VLC_COMMON_MEMBERS
  pub psz_object_type: *const c_char,
  pub psz_header:      *mut c_char,
  pub i_flags:         c_int,
  pub b_force:         bool,
  pub p_libvlc:        *mut libvlc_int_t,
  pub p_parent:        *mut vlc_object_t,

  pub p_module:        *mut module_t,

  pub psz_access:      *mut c_char,
  pub psz_demux:       *mut c_char,
  pub psz_location:    *mut c_char,
  pub psz_file:        *mut c_char,

  pub s:               *mut stream_t,
  pub out:             *mut es_out_t,
  pub pf_demux:        Option<extern "C" fn(*mut demux_t<T>) -> c_int>,
  pub pf_control:      Option<extern "C" fn(*mut demux_t<T>, c_int, *const va_list) -> c_int>,

  // 'info' nested struct. Can we do that with Rust FFI?
  pub i_update:        c_uint,
  pub i_title:         c_int,
  pub i_seekpoint:     c_int,

  //FIXME: p_sys contains a pointer to a module specific structure, make it generic?
  pub p_sys:           *mut T,

  pub p_input:         *mut input_thread_t,
}

Importing functions

#[link(name = "vlccore")]
extern {
  pub fn stream_Read(stream: *mut stream_t, buf: *const c_void, size: size_t) -> ssize_t;
}

try to add a function -> you miss a struct -> add the struct -> you need other structs, etc

Writing safer wrappers

pub fn stream_Read(stream: *mut stream_t, buf: &mut [u8]) -> ssize_t {
  unsafe {
    ffi::stream_Read(stream, buf.as_mut_ptr() as *const c_void, buf.len())
  }
}

preparing a VLC module in C

static int  Open ( vlc_object_t * );
static void Close( vlc_object_t * );

vlc_module_begin ()
    set_description( N_("WAV demuxer") )
    set_category( CAT_INPUT )
    set_subcategory( SUBCAT_INPUT_DEMUX )
    set_capability( "demux", 142 )
    set_callbacks( Open, Close )
vlc_module_end ()

those are macros

expanded

int vlc_entry__3_0_0a (vlc_set_cb, void *);
int vlc_entry__3_0_0a (vlc_set_cb vlc_set, void *opaque) {
    module_t *module;
    module_config_t *config = ((void*)0);
    if (vlc_set (opaque, ((void*)0), VLC_MODULE_CREATE, &module))
        goto error;
    if (vlc_set (opaque, module, VLC_MODULE_NAME, ("modules/demux/wav.c")))
        goto error;
    if (vlc_set (opaque, module, VLC_MODULE_DESCRIPTION,
      (const char *)(N_("WAV demuxer"))))
        goto error;
    vlc_set (opaque, ((void*)0), VLC_CONFIG_CREATE, (0x06), &config);
    vlc_set (opaque, config, VLC_CONFIG_VALUE, (int64_t)(4));
    vlc_set (opaque, ((void*)0), VLC_CONFIG_CREATE, (0x07), &config);
    vlc_set (opaque, config, VLC_CONFIG_VALUE, (int64_t)(403));
    if (vlc_set (opaque, module, VLC_MODULE_CAPABILITY, (const char *)("demux")) ||
        vlc_set (opaque, module, VLC_MODULE_SCORE, (int)(142)))
        goto error;
    if (vlc_set (opaque, module, VLC_MODULE_CB_OPEN, Open) ||
        vlc_set (opaque, module, VLC_MODULE_CB_CLOSE, Close))
        goto error;
    (void) config;
    return 0;

  error:
    return -1;
}

taking apart code loading and APIs is where you spend the most time

Writing the module registration

#[allow(non_snake_case)]
#[no_mangle]
pub unsafe extern fn vlc_entry__3_0_0a(
    vlc_set: unsafe extern fn(*mut c_void, *mut c_void, c_int, ...) -> c_int,
    opaque: *mut c_void) -> c_int {
  let module: *mut c_void = 0 as *mut c_void;
  if vlc_set(opaque, 0 as *mut c_void, VLCModuleProperties::VLC_MODULE_CREATE as i32,
             &module) != 0 { return -1; }
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_NAME as i32,
             PLUGIN_NAME.as_ptr()) != 0 { return -1; }
  let desc = b"FLV demuxer written in Rust\0";
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_DESCRIPTION as i32,
             desc.as_ptr()) != 0 { return -1; }
  let capability = b"demux\0";
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_CAPABILITY as i32,
             capability.as_ptr()) != 0 { return -1; }
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_SCORE as i32, 999) != 0 {
    return -1;
  }
  let p_open: extern "C" fn(*mut demux_t<demux_sys_t>) -> c_int =
    transmute(open as extern "C" fn(_) -> c_int);
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_CB_OPEN as i32, p_open) != 0 {
    return -1;
  }
  let p_close: extern "C" fn(*mut demux_t<demux_sys_t>) = transmute(close as extern "C" fn(_));
  if vlc_set(opaque, module, VLCModuleProperties::VLC_MODULE_CB_CLOSE as i32, p_close) != 0 {
    return -1;
  }
  0
}

after some macro work

vlc_module!(vlc_entry__3_0_0a,
  set_name("inrustwetrust")
  set_description("FLV demuxer written in Rust")
  set_capability("demux", 999)
  set_callbacks(open, close)
);

Write a FLV parser

named!(pub header<Header>,
  chain!(
             tag!("FLV") ~
    version: be_u8       ~
    flags:   be_u8       ~
    offset:  be_u32      ,
    || {
      Header {
        version: version,
        audio:   flags & 4 == 4,
        video:   flags & 1 == 1,
        offset:  offset
      }
    }
  )
);

show the do_parse syntax that could replace it

Parse the header

extern "C" fn open(p_demux: *mut demux_t<demux_sys_t>) -> c_int {
  let p_demux = unsafe { &mut (*p_demux) };
  let sl = stream_Peek(p_demux.s, 9);

  if let nom::IResult::Done(_,h)  = flavors::parser::header(sl) {
    vlc_Log!(p_demux, LogType::Info, PLUGIN_NAME, "FOUND FLV FILE\n
      version: {}\nhas_audio: {}\n has_video: {}\noffset: {}\n",
    h.version, h.audio, h.video, h.offset);

    stream_Seek(p_demux.s, h.offset as uint64_t);

    p_demux.pf_demux   = Some(demux);
    p_demux.pf_control = Some(control);
    p_demux.p_sys = Box::into_raw(Box::new(demux_sys_t {
      i_pos: h.offset as usize,
      i_size: 0,
      video_initialized: false,
      video_es_format: unsafe { zeroed() },
      video_es_id: 0 as *mut c_void,
      audio_initialized: false,
      audio_es_format: unsafe { zeroed() },
      audio_es_id: 0 as *mut c_void,
    }));

    return 0;
  }

  -1
}

proceed step by step. Parse a bit, then advance, log everything

FLV packets

named!(pub tag_header<TagHeader>,
  chain!(
    tag_type: switch!(be_u8,
      8  => value!(TagType::Audio) |
      9  => value!(TagType::Video) |
      18 => value!(TagType::Script)
    )                                ~
    data_size:          be_u24       ~
    timestamp:          be_u24       ~
    timestamp_extended: be_u8        ~
    stream_id:          be_u24       ,
    || {
      TagHeader {
        tag_type:  tag_type,
        data_size: data_size,
        timestamp: ((timestamp_extended as u32) << 24) + timestamp,
        stream_id: stream_id,
      }
    }
  )
);

the tag header is preceded by a 4 bytes big endian uint indicating the previous tag's size

Now, begin parsing

extern "C" fn demux(p_demux: *mut demux_t<demux_sys_t>) -> c_int {
  let p_demux = unsafe { &mut (*p_demux) };
  let p_sys = unsafe { &mut (*p_demux.p_sys) };

  let mut header = [0u8; 15];
  let sz = stream_Read(p_demux.s, &mut header);
  if sz < 15 {
    if sz == 4 {
      vlc_Log!(p_demux, LogType::Info, PLUGIN_NAME, "end of stream");
      return 0;
    } else {
      vlc_Log!(p_demux, LogType::Info, PLUGIN_NAME, "could not read header:\n{}",
      &header[..sz].to_hex(8));
      return -1;
    }
  }

the to_hex method is invaluable: show a hexdump od the slice

Check the header (audio case)

if let Done(_remaining, header) = tag_header(&header[4..]) {
  match header.tag_type {
    TagType::Audio => {
      let mut a_header = [0u8; 1];
      if stream_Read(p_demux.s, &mut a_header) < 1 { return -1; }

      if let Done(_, audio_header) = audio_data_header(&a_header) {
        // <initialize demuxer with audio format info>

        let p_block: *mut block_t = stream_Block(p_demux.s, (header.data_size - 1) as size_t);
        if p_block == 0 as *mut block_t { return -1; }
        let p_block = unsafe { &mut(*p_block) };

        let out_ref = p_demux.out;
        unsafe {
          to_va_list!(move |v: rs_va_list::va_list| {
            let pf_control: fn(*mut c_void, c_int, rs_va_list::va_list) =
              transmute((*out_ref).pf_control);
            pf_control(out_ref as *mut c_void, ES_OUT_SET_PCR, v);
          }, p_block.i_pts);
        }
        es_out_Send(p_demux.out, p_sys.audio_es_id, p_block);
        return 1;
      } else { return -1; }

there's a hack with va_list: https://github.com/GuillaumeGomez/va_list-rs

sometimes, you can't write rust-y code, you need to adapt to the APIs

Demo

Now, the build system

we have a VLC plugin that we can drop in the module folder, can we build it inside VLC's tree instead?

VLC uses the autotools heavily, and builds the module correctly for the right platform automatically (with libtool, etc)

First, check for cargo and Rust

AC_ARG_ENABLE(cargo,
    [AS_HELP_STRING([--enable-cargo],
      [Enable rust-based plugins that require cargo to build (default disabled)])])

AS_IF([test "${enable_cargo}" = "yes"], [
  AC_CHECK_PROG(CARGO, [cargo], [yes], [no])
  AC_CHECK_PROG(RUSTC, [rustc], [yes], [no])

  AS_IF([test "$CARGO" != "yes" -o "$RUSTC" != "yes"], [
    AC_MSG_FAILURE("Rust based plugins cannot be built $CARGO $RUSTC")
  ])
  VLC_ADD_PLUGIN([rust_plugin])
])

Declaring and building a Rust module

librust1_plugin_la_SOURCES = ""

librust_plugin.a: demux/rust/src/lib.rs
    cd $(srcdir)/demux/rust && CARGO_TARGET_DIR="$(abs_builddir)/.libs/" cargo build -v
    mv $(abs_builddir)/.libs/debug/$@ $@

rust_plugin.o: demux/rust/src/lib.rs
    cd $(srcdir)/demux/rust && CARGO_TARGET_DIR="$(abs_builddir)/.libs/" cargo rustc -- --emit obj
    mv $(abs_builddir)/.libs/debug/$@ $@

am_librust1_plugin_la_OBJECTS = rust_plugin.o
librust1_plugin_la_LIBADD = librust_plugin.a

demux_LTLIBRARIES += librust1_plugin.la

we generate an obj file and give it to the build system

Summing up

Next steps

Greetings

More info

if you want some stickers...

Thanks!

Questions ?