H.264 Switching Pictures: The purpose of switching pictures is to allow a server to serve pre-encoded video at dynamically varying bitrate to various clients, or to allow a server encoding in realtime to serve many clients each with varying bitrate quotas. The way that switching pictures achieve this is that they allow the creation of different versions of a frame, predicted from different versions of their reference frame(s), to decode to exactly the same pixels. You might think, "doesn't that need lossless compression, which would make it ridiculously high bitrate?". And the answer is: no. An inter-predicted block in a normal P-frame is decoded as follows: * compute the motion-compensated prediction. * dequantize and idct the residual. * add residual to prediction. And inter-predicted block in an SP-frame is decoded as follows: * compute the motion-compensated prediction. * fdct and quantize the prediction. * add residual to prediction. * dequantize and idct the samples. If you tried to make two normal P-frames from different reference frames decode to the same pixels, one might have a coefficient predicted to be 0.1 while the other had 0.2, and there's no way (short of lossless) to compensate for that difference. But in SP-frames, the prediction can only be integer values, so any differences in prediction can be exactly compensated by subtracting the change from the quantized residual. You can also code the same block in an SI-frame, just replacing inter prediction with intra prediction, and the decoded result is still the same. Example use case: (I think this would be typical if anyone actually used switching pictures): Assume two bitrates, 1mbit and 2mbit. Assume 250 frame GOPs. Assume every 10th frame is switching. You encode the intervening 9 frames as normal P/B-frames, and make two versions of each: the 1mbit version and the 2mbit version. You encode 6 versions of each switching picture: 1mbit SP predicted from 1mbit reference frames, 1mbit SP predicted from 2mbit reference frames, 2mbit SP predicted from 1mbit reference frames, 2mbit SP predicted from 2mbit reference frames, 1mbit SI, 2mbit SI. (I say "1mbit" and "2mbit" according to which versions of the normal frames will be sent after this switching frame. This does not imply that the different "1mbit" versions of a switching frame are all the same size.) Now for what data gets sent to a client: A stable high bitrate client will get the 2mbit version of all the normal frames, the 2mbit->2mbit SP version of all the switching frames, and will see all the compression benefits of a 250 frame GOP. A stable low bitrate client will get the 1mbit version of all the normal frames, the 1mbit->1mbit SP version of all the switching frames, and will see all the compression benefits of a 250 frame GOP. A client whose bitrate fluctuates in the middle of a GOP (say, decreasing from 2mbit to 1mbit), will get the 2mbit version of the beginning of the GOP, then a 2mbit->1mbit SP-frame, then the 1mbit version of the rest of the GOP (unless his bitrate changes again). A client who joins the stream in the middle of a GOP gets an SI version of the next switching picture, and thereafter becomes one of the above cases. A client who sees some data corruption or dropped packets can also ask for an SI-frame instead of waiting for the next GOP to resync. Thus they see all the seeking and error resilience benefits of a 10 frame GOP. All of the above could sorta be emulated with normal I- and P-frames. But then you'd have to either accept that the different versions of a switching frame don't exactly match, and thus any client that switches would see compound encoder-decoder desync until the next GOP. Or alternately you could encode a different version of the whole GOP for each path through it, which (a) grows exponentially with the GOP length and number of switches, and (b) doesn't work with multicast. The cost of switching frames is that they throw away some information from the prediction, so SP-frames are lower quality than P-frames of the same QP. Also, Extended Profile (which is the only profile to allow switching frames) doesn't allow cabac or 8x8 transform, so you get worse compression than Main/High Profile would (but that's just an artifact of the standard, it wouldn't have to be that way.)