Video RAM

Although the different types of video memory are described on the RAM page, what's discussed here is what video memory does.

Wait. Isn't this just memory for graphics? Well yes and no. What matters is what gets stored in video memory. And this is different depending on what kind of graphics processor you have hooked up to that memory.

How video memory works is different on various different kinds of hardware. We can divide the hardware into several categories: early consoles, the first GPUs, tile-based 2D, PC-style 2D, and modern 3D graphics chips.

Early consoles
The earliest consoles didn't have video RAM at all, requiring the programmer to "ride the beam" and generate the video themselves, in real time, for every frame. This was especially true on the Atari 2600, but most 1970s consoles used a similar technique.

The first GPUs
The first console to include anything resembling a modern GPU was the Atari 400/800 in 1979. These used a two-chip video system that had its own instruction set and implemented an early form of "scanline DMA", which was used in later consoles for special effects. The most popular of these early chips, however, was Texas Instruments' TMS9918/9928 family. The 99x8 provided either tile-based or bit-mapped memory arrangements, and could address up to 16 kilobytes of VRAM—a huge amount at the time.

Tiles and layers
The TMS99x8 laid down the groundwork for how tile-based rendering worked, and the NES and later revs of the 99x8 in the MSX and Sega consoles refined it. While the 99x8 required the CPU to load tiles into memory itself, the NES went one better and put what it called the "Picture Processing Unit" on its own memory bus, freeing the CPU to do other things and making larger backgrounds and smooth scrolling possible.

Under this paradigm, video memory has a very specific structure. One part of video memory, which on the NES is provided on the cartridge as the "CHR-ROM" or Character ROM, contains a set of images, called "tiles". A tile is a fixed-size image, usually 8x8. Another part of video memory contains references to tiles, called a "tile map."

Multiple tilemaps can exist. The tilemaps exist within a specific layer; the final image is produced by combining the layers in a specific order.

The GPU handles mundane details like scrolling; the CPU simply tells it the position offset for each layer, and that's what gets shown.

Later consoles like the SNES added new features, such as more colors, "Mode 7" support (hardware rotate/scale/zoom effects) and scanline DMA (used to generate "virtual layers").

Sprites
Of course, a game with just a tilemap would be boring, as objects would have to move in units of a whole tile. It's fine for games like Tetris, not so much for other genres. So added to this is a sprite rendering system. A sprite is a hardware understood object that has a piece of video memory dedicated to it. This memory contains a set of references to tiles, just like a tilemap. The sprite rendering hardware often has an explicit limit on the size of the tilemap. The reason that the SNES had larger sprites than the NES was because the SNES's sprite tilemap sets could be larger than on the NES.

Sprites exist within certain layers, just like tilemaps. So a sprite can appear in front of or behind tilemaps or even other sprites.

Sprite tilemaps are changed by the CPU to give the illusion of animation. Sprites can also be positioned in arbitrary locations.

Sprite rendering, on both the NES and other early consoles, had certain other limitations. Only X number of sprite tiles could be shown per horizontal scanline of the screen. If you exceed this limit, you see errors, like object flickering and so forth. Note that what matters is the number of sprite tiles, not the number of sprites. So 2x2 spites (a sprite composed of 4 tiles arranged in a square) counts as 2 sprite tiles in the horizontal direction.

Framebuffers and blitting
Computers, much like consoles, had their share of real 2D graphics hardware. However, this graphics hardware was an outgrowth of what computers did.

This kind of rendering has video memory. But instead of using hardware-based tilemaps, the video memory is much more generalized. There is a framebuffer, which is an image that represents what the user sees. And there's the rest of video memory (since the framebuffer itself is part of video memory). Stored in that other part of video memory are images, much like tiles in tile-based rendering.

Unlike tile-based rendering, however, the CPU has more direct control over what gets seen. The graphics hardware exposed a very simple and generic function: blit, the fast block copy. It's a way of copying one section of video memory to another. Like to the framebuffer, so the user can see something. The CPU rendered stuff by issuing a number of blit commands to copy from images in video memory to the framebuffer.

The earliest systems using a framebuffer, such as the Apple II computer and the Missile Command system board, required the CPU to do all the blitting work. (This was called a "dumb" framebuffer.) Later systems added dedicated "blitter" coprocessors that greatly improved the number of pixels that could be copied in video memory per second. Blitter hardware appeared in Williams arcade games, in UNIX workstations, and the Amiga in the 1980s, but didn't make it to normal Intel PCs until the early 1990s, when the rise of Microsoft Windows all but required it.

PC-style 2D (EGA and VGA)
What PCs did have in the meantime was not nearly as versatile. EGA had brought full 16-color graphics to the PC in the late 1980s, but its memory structure was, to put it mildly, weird. It used a planar memory setup, meaning that each of the four EGA color bits (red, green, blue and bright) were separated into different parts of memory and accessed one at a time. This made writing slow, so the EGA also provided a set of rudimentary blitting functions to help out. Nevertheless, several games of the PC era made use of this mode. Just like blitter-based systems, there was no support for sprites, so collision detection and layering was up to the programmer.

What really changed things, though, was VGA's introduction of a 256-color, 320x200, packed-pixel (meaning you could write color values directly to memory instead of having to separate them first) graphics mode. There was no blitter support at all in this mode, which slowed things down some; however, the new mode was so much easier to use than 16-color was that developers went for it right away. VGA also bettered EGA by having a full 18-bit palette available in every mode, a huge jump from EGA; EGA could do up to 6-bit color (64 colors), but only 16 at a time, and only in high-resolution mode—200-line modes were stuck with the CGA palette. The huge-for-the-time palette had much richer colors and made special effects like the SNES-style fade in/out (done by reprogramming the color table in real time) and Doom's gamma-correction feature possible.

The lack of a blitter meant that the path between the CPU and the VGA needed to be fast, especially for later games like Doom, and the move from 16-bit ISA video to 32-bit VLB and PCI video around 1994 made that possible.

Modern 3D Graphics
In the world of 3D graphics, things are somewhat different. Polygonal Graphics don't have the kind of rigid layout that tile-based 2D rendering had. And it's not just a matter of managing the graphics, but how we see them. Thus video memory with 3D graphics is (conceptually, if not explicitly in hardware) split into two parts. Texture memory, which manages the actual graphics, and the frame buffer, which manages how we see them.

The closest analogy so far is the difference between a movie set and the cameras that film the action. The texture memory is the set. All the people, places, and objects are handled there. The frame buffer is the camera. It merely sees and records what happens on the set so we can see it. Everything not shot by the camera is still part of the set, but the camera only needs to focus on the relevant parts.

Just the same in 3D video games. If you are playing a First-Person Shooter, everything behind you is part of the level, and is still in the texture memory. But since you can't see them, they are not in the frame buffer, unless you turn around, in which case what you were facing before is now no longer in the frame buffer.

This means that part of the video memory has to handle the texture memory, while the other handles the frame buffer.

A misconception that's dying down is that the amount of Video RAM indicated the graphics card's performance. This started around the turn of the millennium as the PC gaming world was transitioning to 32-bit color (options existed to use either 16-bit, 65536, color or 32-bit, technically 16,777,216, colors). RAM at the time was expensive as well, and 32-bit color would take up twice the space. But as time went on, RAM became cheaper and there were occasions where lower performing graphics cards had more memory than the faster performing ones. Needless to say, a "Direct X compatible graphics card with X MB of memory" was soon dropped because the "memory = performance" trend was rendered moot.

3D performance relies more on the GPU design itself (more processing units = more things happening at once = better performance), how fast those units are clocked, and finally, how fast they can access memory. For all but the most demanding tasks, normal SDRAM is fast enough for 3D rendering, and the amount of RAM only helps if the application uses a lot of textures. 2D performance is pretty much as good as it's going to get, and has been since the Matrox G400 and the first GeForce chips came out in 1999; even a GPU that's thoroughly obsolete for 3D gaming, such as the Intel 945 or X3100, will still be able to handle 2D well all the way up to 1080p HDTV, and possibly beyond.