HTML5 Split Video Streaming

Recently I’ve wanted to look into how video streaming of local content can be done with NodeJS and HTML5’s video tag. I’ve only found one example to base upon, here. This demo though didn’t really cut it for me, for one it grabbed the file and split it in the client Javascript and then appended it to the video. Ideally, this would be done server side. It also has another bug where the split file doesn’t append at a time offset, and overwrites the original buffer.

I’ve created a demo that addresses these issues by creating a server in NodeJS that scans a videos directory and transcodes and splits the videos on the fly with ffmpeg. And uses mse_webm_remuxer to fix the format of these files in some magical way to make them compatible with HTML5’s MediaSource.

The demo below shows the client side code, it can be found here. This is a client-side only demo of sorts, in that it mimics the NodeJS server statically because the server (being a demo) is incredibly insecure.

So, how the whole process works:

  1. The client makes a request to video/ to see what videos are available.
  2. This propogates a list in the client, the user then clicks a video.
  3. Upon clicking a video, the client makes a request for the videos metadata, returning data such as duration, chunk size and information on the streams.
  4. The client then sets up the video, mostly the duration and requests the first chunk.
  5. The server would then get the source video and transcode the first 30 seconds to a webm format and run mse_webm_remuxer on it, sending the URL of this new file to the client.
  6. The client then buffers this chunk at the correct location in the video.
  7. As the client gets to different points new chunks are transcoded and buffered, allowing jumping around the video and transcoding the necessary sections on the fly.

Video chunks? But why?

There’s a few benefits to using video chunks, such as how YouTube does it. The primary advantage to YouTube I suspect is bandwidth. By only buffering sections of the video they can control how much data to send to the client in advance, this saves a heck of a lot of data if they don’t buffer the whole thing without quitting.

The reason for chunks in this demo though is timing. To transcode a 320×240 30s length of video it will take about 15s, that means the user can start viewing fairly quickly, rather than waiting for a full transcode operation to complete. By measuring the time it took to receive a video chunk in the client side, the program can know when to request the next chunk to ensure smooth playing.

Source Code

The source code for this demo has been put up on my Github, here.