H.264 Switching Pictures:

The purpose of switching pictures is to allow a server to serve pre-encoded
video at dynamically varying bitrate to various clients, or to allow a server
encoding in realtime to serve many clients each with varying bitrate quotas.
The way that switching pictures achieve this is that they allow the creation
of different versions of a frame, predicted from different versions of their
reference frame(s), to decode to exactly the same pixels.

You might think, "doesn't that need lossless compression, which would make
it ridiculously high bitrate?". And the answer is: no.
An inter-predicted block in a normal P-frame is decoded as follows:
* compute the motion-compensated prediction.
* dequantize and idct the residual.
* add residual to prediction.
And inter-predicted block in an SP-frame is decoded as follows:
* compute the motion-compensated prediction.
* fdct and quantize the prediction.
* add residual to prediction.
* dequantize and idct the samples.

If you tried to make two normal P-frames from different reference frames
decode to the same pixels, one might have a coefficient predicted to be 0.1
while the other had 0.2, and there's no way (short of lossless) to compensate
for that difference. But in SP-frames, the prediction can only be integer
values, so any differences in prediction can be exactly compensated by
subtracting the change from the quantized residual. You can also code the
same block in an SI-frame, just replacing inter prediction with intra
prediction, and the decoded result is still the same.

Example use case: (I think this would be typical if anyone actually used
switching pictures):
Assume two bitrates, 1mbit and 2mbit. Assume 250 frame GOPs. Assume every 10th
frame is switching.
You encode the intervening 9 frames as normal P/B-frames, and make two
versions of each: the 1mbit version and the 2mbit version.
You encode 6 versions of each switching picture:
1mbit SP predicted from 1mbit reference frames,
1mbit SP predicted from 2mbit reference frames,
2mbit SP predicted from 1mbit reference frames,
2mbit SP predicted from 2mbit reference frames,
1mbit SI,
2mbit SI.
(I say "1mbit" and "2mbit" according to which versions of the normal frames
will be sent after this switching frame. This does not imply that the
different "1mbit" versions of a switching frame are all the same size.)

Now for what data gets sent to a client:
A stable high bitrate client will get the 2mbit version of all the normal
frames, the 2mbit->2mbit SP version of all the switching frames, and will see
all the compression benefits of a 250 frame GOP.
A stable low bitrate client will get the 1mbit version of all the normal
frames, the 1mbit->1mbit SP version of all the switching frames, and will see
all the compression benefits of a 250 frame GOP.
A client whose bitrate fluctuates in the middle of a GOP (say, decreasing from
2mbit to 1mbit), will get the 2mbit version of the beginning of the GOP, then
a 2mbit->1mbit SP-frame, then the 1mbit version of the rest of the GOP (unless
his bitrate changes again).
A client who joins the stream in the middle of a GOP gets an SI version of the
next switching picture, and thereafter becomes one of the above cases.
A client who sees some data corruption or dropped packets can also ask for an
SI-frame instead of waiting for the next GOP to resync. Thus they see all the
seeking and error resilience benefits of a 10 frame GOP.

All of the above could sorta be emulated with normal I- and P-frames. But then
you'd have to either accept that the different versions of a switching frame
don't exactly match, and thus any client that switches would see compound
encoder-decoder desync until the next GOP. Or alternately you could encode a
different version of the whole GOP for each path through it, which (a) grows
exponentially with the GOP length and number of switches, and (b) doesn't work
with multicast.

The cost of switching frames is that they throw away some information from the
prediction, so SP-frames are lower quality than P-frames of the same QP. Also,
Extended Profile (which is the only profile to allow switching frames) doesn't
allow cabac or 8x8 transform, so you get worse compression than Main/High 
Profile would (but that's just an artifact of the standard, it wouldn't have 
to be that way.)