We present a technique to infer a model of the spatio-temporal statistics of a collection of images of dynamic scenes seen from a moving camera. We use a time-variant linear dynamical system to jointly model the statistics of the video signal and the moving vantage point. We propose three approaches to inference, the first based on the plenoptic function, the second based on interpolating linear dynamical models, the third based on approximating the scene as piecewise planar. For the last two approaches, we also illustrate the potential of the proposed techniques with a number of experiments. The resulting algorithms could be useful for video editing where the motion of the vantage point can be controlled interactively, as well as to perform stabilized synthetic generation of video sequences.