Create video scenes from text prompts, images, frames, and references while keeping model choice and credit cost visible.