Visual “Language”? Some Thoughts on the naturalness of the Visual System in Films

The phrase “visual language” has been thrown around quite a lot, yet most of the time it has been used as a catch-all term for a vast variety of ideas. In fact, is it even really a “language”? Most would agree that the visual component of our cognitive system has countless similar features as the language system, but few would go as far as claiming that it should be regarded as a language in the strict sense. However, visual cognition itself is too complicated and vague a field to investigate on its entirety, so here I will only attempt to tackle the tip of the iceberg of visual aspect of films, which is itself a small part of the visual system.
First of all, the definition of language itself is much debatable. But one major perspective in modern linguistics considers it a “cognitive system which is part of any normal human being’s mental or psychological structure”, with some social aspect to it as compensation to Chomsky’s strict syntactical perspective. The language would have a grammar, and there exists a universal grammar that has been biologically endowed in human beings. Beneath the surface structures of the language there would be deep structures, ones that are generative from the innate universal grammar. So how similar is “visual language” in films compared to this generative linguists’ idea of language? Let’s take a scene from the Tin Drum (Die Blechtrommel, 1979) as an example. (Suggestion: perhaps have the sound turned off when watching it.)

Here we see a very standard use of basic film language. The scene first starts with a wide/establishing shot (1), showing the entire classroom, facing the teacher. Then it cuts to a reverse shot (2a-2b), facing the students. This is also a dolly shot, moving in until we see a medium shot for the protagonist. Next it cuts to a low-angled medium-wide shot (3) of the teacher. Then back to the protagonist (4). Back to teacher, this time in a medium shot (5). Cuts back to the protagonist again (6), followed by a moving shot (7a-7b) from the medium of the teacher to a medium-wide two-shot containing both characters. Then a medium-wide (8) on the teacher. After this there is a quick-cutting back-and-forth sequence (9-12) of extreme close-ups of the teacher and the protagonist. This is followed with the camera backing out, to a high-angled medium-close-up of the protagonist (13) and a moving low-angled medium-wide following the teacher (14). Finally, this scene is ended with a close-up of the protagonist (15).

Structure-wise, this is a very classic example (although one might argue that this is more like a “paragraph” than a “sentence”). It starts from an establishment of the environment, then closes onto the main characters, and finally ends on the protagonist. This can be seen as a typical presentation of natural human experience, when people would usually take in the surroundings at first, then focusing in onto the details. However, this is just the surface structure. What is the deep structure underlying it? Also, how about the “naturalness” of such structure? Without proper experimental studies, we cannot know if there is an innate language faculty in it, or if it’s just the result of cultural stimuli.
Certain rules can also be seen from this example. One is that although the editor is allowed to use items of the same composition/nature, he/she cannot juxtapose two similar items next to each other (In this case, the editor never put two shots of the same character together). Another is the 180-degree rule, according to which the camera have to always stay on one side of the characters’ eye line. Of course there are many more if we look more carefully, but for the moment let’s stay focused on these two. Also, there are many socio-economic reasons for the forming of such system/rules, but here let’s stay on hypotheses in the realm of nature. One plausible reason for the first rule is that human eyes are like prime lenses: they cannot zoom in/out or have sudden changes of angles. One may be curious about the reverse shots, then. An explanation is that in this case viewers would be taking up the point of views of different characters, but that may not seem satisfactory to everyone. After all, the reverse shots are rarely actual point-of-view shots, but over-the-shoulder or other third-person-view ones. Compared to this, the 180-degree rule is mostly considered settled, with its reason in human recognition of two-dimensional space. (Essentially this rule is aimed at a result of the character always looking at one direction on the screen.)
These are just some disjointed, unsystematic thoughts on mapping the visual system in films onto the theories of modern linguistics. There are much more to be investigated and many more fields that one can draw from to form a more comprehensive view on the subject.

Andrew Radford, et al. Linguistics: An Introduction. 2nd ed. Cambridge, UK: Cambridge University Press, 2009.