I-Frame and GOP are video codec terms. When a video is encoded to be viewed on TV, YouTube, Blu-Ray or any other streaming method, it needs to be compressed. If you’re watching a video at 24 frames-per-second, you’re not really seeing 24 full pictures. Instead, you’re seeing sets of GOP (Group of Pictures) composed of inter frames (I-frames), predicted frames (P-frames) and bi-directionally predicted frames (B-frames). Depending on the codec, a GOP could consist of upwards of 15 frames.
I-frames are the complete images. Sure, it’s still a compressed image (in the same way a JPEG is a compressed image), but it’s a complete image in the sense that every pixel is there. Every GOP (Group of Pictures) begins with an I-frame. Then comes the P-frames and B-frames; P-frames reference past frames and B-frames reference past and future frames. To explain it simply, P-frames and B-frames are incomplete images that reference the I-frame and surrounding images to “fill in the blanks”.
So, imagine a static shot of a car driving down the road. Since the camera is locked down with a tripod, nothing within the environment moves (save, for example, rustling leaves on the trees). Well, then the only thing the codec really needs to be concerned about is the car and the leaves on the trees. So, since the I-frame is providing reference, the codec doesn’t need to “re-create” the clouds in the sky or the building in the distance for every frame. None of that changes. Instead, P-frames and B-frames are only concerned about capturing the motion.
By creating mostly incomplete frames that reference only a few complete frames, the codec is saving significant space; which improves both storage and accessibility. That way, for a 1080p video, the player doesn’t need to load two-megapixel images for every frame of video. Also, it keeps file sizes small enough to be shared and streamed over the internet with ease.