The temporal aliasing is a common problem with single image algorithms, especially upscaling ones
The differences in prediction each frame are too large in that example. QTGMC inputtype 1/2/3 can only do so much, and there are usually side effects incl. detail loss, possible ghosting . If you stack some temporal filters to combat the temporal aliasing/inconsistencies , you begin losing too much detail and blurring everything to mush
Another approach is to pre scale with something temporally consistent. You can experiment BasicVSR++ or Tecogan somewhere in the chain, either before /after codeformer, +/- downscaling
Here is an experiment with BasicVSR++ 4x => bicubic downscale /2 => GPEN *2 => QTGMCp inputtype=1 , compared to lanczos3 at 1/2 speed. I like the finer hair details improved on BasicVSR++ . I don't like that the BG textures are smoothed away, eye specular reflections are way too enhanced, contrast and saturation changed (in this example, they were crudely matched back) , other small details smoothed away compared to BasicVSR++ alone
Not a fan overall of the "look" of codeformer , or most "face" upscalers - I find that they change the "facts" and actual details too much. In that codeformer example, the hair (e.g. hair style changes, curls are straighened), eyes (eyes change, look almost transplanted), nose & lip shape change too much
My cutoff for acceptable tolerance is if the image is "plausible" and similar in underlying structure compared to say a lanczos4x