Sliceless threading: example with 2 threads. Start encoding frame #0. When it's half done, start encoding frame #1. Thread #1 now only has access to the top half of its reference frame, since the rest hasn't been encoded yet. So it has to restrict the motion search range. But that's probably ok (unless you use lots of threads on a small frame), since it's pretty rare to have such long vertical motion vectors. After a little while, both threads have encoded one row of macroblocks, so thread #1 still gets to use motion range = +/- 1/2 frame height. Later yet, thread #0 finishes frame #0, and moves on to frame #2. Thread #0 now gets motion restrictions, and thread #1 is unrestricted. If at any time some of the threads are encoding non-referenced B-frames, that relaxes the mv restrictions all around. Implementation steps: (1) Allow the threads to work on different frames. They already use separate encoding contexts, but some data is still shared that needs to change. This involves changes all over x264, so I can't really list them, but hopefully there won't be too much beyond what RDRC needed. (2) Make the deblocking filter and motion interpolation work on horizontal strips, instead of whole frames. (3) Synchronize the threads. At the beginning of a row, each thread checks the progress of its reference frames. If they're late, then sleep. When a thread finishes a row, it wakes up any others that were waiting on it. (4) Modify ratecontrol. rc will get the exact results of a frame later than it does now. So we have to do something like assume the in-progress frames will achieve their target size, update the vbv appropriately, then update again when the results are available. -- That much is enough to get it working for our encoder. But to to allow it to replace slices for other purposes, then also: (5) Replace the current scenecut decision with something that runs during B-adapt rather than after encoding a frame. (6) Move the ME prepass + B-adapt to its own thread.