Overview

Most academic papers can benefit from a short associated video, either summarizing the main ideas or showcasing some qualitative results, and many machine learning and computer vision venues recommend attaching a short video to a paper submission.

Furthermore, many conferences, currently operating in a remote manner, ask presenters to provide videos in advance to act as a replacement for the live in-person presentation. While this is unfortunate in a way, as nothing can really replace in-person sessions and networking, it does open the door to putting a little bit more thought and creativity in one’s talks: preparing more interesting visualizations, doing more professional editing, etc.

Inference architecture diagram from the PointFlow paper by Yang et al. (not mine!)
Animated Inference Diagram from PointFlow. This great animation from PointFlow by Yang et al. is a fantastic example of a crisp and descriptive animation which can instantly convey the gist of a method.

However, there are many decisions which must be made in order to determine what makes for a good video. While I am still far from an expert in video editing myself (many of my own videos could benefit from the advice in this post), over time I learned a number of tips and tricks, and this post is meant to gather them in one place.

Hopefully the tips and references will be helpful for people in the process of putting together a paper video!

Tools

I found that using the Open Broadcaster Software (OBS) gives the most flexibility in terms of high-quality screen, audio, and video recording. It’s free, open-source, cross-platform, and very flexible (e.g., you can drag around your webcam view to reposition yourself on the video so there is minimal overlap with your slides).

In general, I would strongly advise against just using Zoom for this. Since Zoom is optimized for video conferences, it compresses the video and audio aggressively, producing serious artifacts and generally not very good-looking results. It also takes a while for the Zoom recording to be available, while OBS produces it instantly once the recording stops. Furthermore, you can customize the output resolution, framerate, and format in OBS, which can’t easily be done with Zoom.

Something like OBS takes maybe 10 minutes to figure out the first time, and then it’s just as seamless, while also producing much higher quality outputs and being more flexible.

Screencast-o-Matic also seems like a nice tool for Mac screen recording, closed captioning, basic editing, etc., and it’s C$25/year, so seems worth it. However, I have not tried this yet. Relying on Zoom for closed caption generation can be unreliable–sometimes it can take over 24h to get the caption file.

qrd.by is a nice free online QR code generator, if you want to encode a link to your paper or project website to your video or slides. I noticed that many free QR code generators produce tracking URLs which reduce privacy and can potentially become stale once the redirect provider disappears, so I’d recommend QR code generators which directly encode your provided link. You can also generate the code yourself with, e.g, QRencode.

ffmpeg

While I do my primary video editing in Adobe Premiere, nothing beats ffmpeg when it comes to quick and simple edits (e.g., crop 5s off the ending of a video) and re-encoding, e.g., to satisfy some conference’s file size limit.

Modern codecs such as h.265 are extremely efficient. For embedding a video on your website, it is definitely worth playing around with ffmpeg settings to find a decent set of parameters for which the video still looks fine to humans; often, you can achieve this while reducing the file size 5–10x, which can be the difference between a slow-non responsive 50Mb webpage and a comparatively nimble 5Mb one!

This can make an even bigger difference when posting many small videos on the website, e.g., to show off an awesome new image-based rendering method.

Some simple tricks:

  • Compress a video to a fixed size budget, e.g., for Papercept or CMT, using the “two pass” method from this wiki page:
      INPUT=YOUR_RAW_VIDEO.mp4; BITRATE=875k; 
      OUT="{INPUT%.mp4}.recoded-$BITRATE.mp4"; 
      ffmpeg -y -i $INPUT -c:v libx264 -preset veryslow -b:v $BITRATE -pass 1 -f mp4 /dev/null && \
      ffmpeg -i $INPUT -c:v libx264 -preset veryslow -b:v $BITRATE -pass 2 "$OUT"
    

    Where you’d set the bitrate by dividing the length of the video in seconds by the max allowed file size in Kb. Note that for videos that are not very long (<3min) and file size limits >50Mb, using -crf 18 instead of the two-pass process is probably also fine (for crf, higher means more compression and thus more artifacts):

    ffmpeg -i YOUR_RAW_VIDEO.mp4 -crf 18 YOUR_RAW_VIDEO.re-encoded.mp4
    
  • I noticed lots of videos exported by MacOS’s screen recorder, or even Adobe Premiere itself (!) are not compressed very efficiently. Often simply doing
    ffmpeg -i my_file.mp4 my_file.opt.mp4
    

    will significantly reduce a file’s size. Of course, always check the output to make sure it looks OK!

Hardware

A good microphone and camera combo can make a huge difference in the quality of academic videos. While not all videos need a face cam, audio narration is always a nice thing to have.

Audio: I recently purchased a FiFine T669 USB microphone kit, and I am currently in the process of evaluating it. So far, I am very happy with it, especially considering the C$95 price tag (microphone, boom arm, and pop filter). Stay tuned for more details! :)

Video: I bought a Logitech Brio webcam, after getting tired of my old 720p Macbook camera. While quite pricey, at C$250+tax in February 2021, it is still much cheaper and easier to set up than a dedicated DSLR, while still offering 60FPS 1080p recording, 30FPS 4k, as well as HDR. I’ve recorded a few videos with it so far, and the footage quality is very good.

Examples of Great Academic Videos

Great paper videos which serve as inspiration to the aspiring academic video editor! :)