Motion compensation and prediction are performed at the macroblock layer.
The goal of motion compensation is to provide a good prediction for the macroblock. Actually, in the macroblocks where prediction is applied, the DCT is performed to the prediction errors instead of to the image samples and more the prediction errors are low and more the entropy coding is effective. Therefore, with good predictions it's possible to have low bit rate and good quality.
In nearly still pictures it's quite easy to have very good predictions using the pixels just in the same position of those to predict, but in motion pictures it's necessary to take movements into account.

Both the coder and the decoder have two frame-memories where they store the decoded pictures used as references.
Where do the predictions come from ?
It depends on the kind of the picture.

Predictions for a P_picture
If it's a frame picture they may come from the previous I or P frame.
If it's a field picture they may come from the two I or P fields coded most recently.
Predictions for a B_picture
They may come from the previous (in display order) I or P frame as from the next (in display order) I or P frame and there may be an interpolation between predictions coming from both directions.
See also the sequence layer.

What's a motion vector ?
A motion vector is a bi-dimensional pointer that tell the decoder how much left/right and up/down, from the position of the macroblock, is located the prediction macroblock in the reference frame or field.
Motion vectors have an half-pel resolution, that means that an interpolation process is necessary to get the prediction (for MPEG-1 is also possible the simpler pel resolution selected at the picture layer). It must be noticed that the same motion vector is applied both to luminance and, after being scaled, to chrominance.

What's motion estimation ?
Motion estimation is the process, perfomed by the coder, that should find the motion vector pointing to the best prediction macroblock in a reference frame or field.
The goodness of a prediction macroblock is in general evaluated minimizing a cost function that may be the absolute error or the mean squared error. The cost function is applied to the macroblock (or a part of it), such technique is called block matching and is the most used in video coding. In general every possible prediction in a given range is evaluated, so we speak about full search. Unfortunatly the computation's complexity is proportional to the search area and can be quite heavy, and on the other hand the search area has to be wide enough to include every movement.
It must be noticed that the capability to perform a good motion estimation is a key point for the quality of a coder.

The prediction menu for frame pictures.
Frame-based prediction.
A single motion vector for the whole macroblock. It's used when movements between the two fields are unsignificant. It's the only possible choice for progressive images.
Field-based prediction.
Two different motion vectors. One for the samples belonging to the first field and one for those belonging to the second field. It's used when movements between the two fields are important.

The prediction menu for field pictures.
Field prediction.
A single motion vector for the whole macroblock. It's used when the whole area has a single movement.
16x8 prediction.
Two different motion vectors. One for the top half and one for the bottom half of the macroblock. It's used when the macroblock includes different objects with different movements and the macroblock area is to large to have a good prediction.