The output of an MPEG video encoder is a video elementary stream and the output of an audio encoder is an audio elementary stream. Before being multiplexed video and audio elementary streams are packetized to form the Video PES and the Audio PES.

PES Packet structure
The packet length is variable:

Header
packet start-code prefix (3bytes)
stream identifier (1 byte)
PES packet length (2 bytes)
optional PES HEADER (variable length)
stuffing bytes (FF) (variable length)
PES packet data bytes

The PES packets are the input of Program Stream and Transport Stream.