Recent research using eye-tracking typically relies on constrained visual contexts in particular goal-oriented contexts, viewing a small array of objects on a computer screen and performing some overt decision or identification. Eyetracking paradigms that use pictures as a measure of word or sentence comprehension are sometimes touted as ecologically invalid because pictures and explicit tasks are not always present during language comprehension. This study compared the comprehension of sentences with two different grammatical forms: the past progressive (e.g., was walking), which emphasizes the ongoing nature of actions, and the simple past (e.g., walked), which emphasizes the end-state of an action. The results showed that the distribution and timing of eye movements mirrors the underlying conceptual structure of this linguistic difference in the absence of any visual stimuli or task constraint: Fixations were shorter and saccades were more dispersed across the screen, as if thinking about more dynamic events when listening to the past progressive stories. Thus, eye movement data suggest that visual inputs or an explicit task are unnecessary to solicit analog representations of features such as movement, that could be a key perceptual component to grammatical comprehension.