A Photograph’s Worth in Words

How many words is a picture actually worth? This picture is worth 728 words according to word.camera. At least, that’s how many it generated. Word.camera is a website that translates photographs into descriptive text.

Photo uploaded to word.camera.

Visual communication is an important part of our culture, and photography is the dominant medium. Despite the incredible volume of images that we are inundated with on a daily basis, we remain a society that depends on the written and spoken word as our primary means of communication. Word.camera straddles the line between images and words, giving insight into both, and telling us a little about ourselves in the process.

Ross Goodwin, the creator of the site, is a graduate student at New York University, a technologist, data scientist, and photography enthusiast. He constructed word.camera to access photographs by using a computer’s webcam or by allowing the user to upload a photo. The website’s technology translates the picture into English using an API by Clarifai that is then expanded into sentences and paragraphs through ConceptNet. Goodwin calls the resulting piece a lexograph, “a text document generated from digital image data.”

I uploaded this photograph of a tree and these are the resulting 728 words.

The sentences are bizarre and disjointed:
Immediately, an environment and a frozen: the environment is the totality of surrounding conditions, and the frozen brings about a your car mighting slide.

Some of them are oddly literal:
…the winter is defined as a cold quarter year.

Yet amidst the strangeness of the sentence structure, I found a certain poetry in the phrases:

But a bird, which evokes an animal.
For that reason, it may sing.
Yet, it longs for flying.

It sounds nice, but it isn’t meaningful. The user is just presented with a series of definitions of the subjects, rehashed redefined in peculiar ways. By producing these phrases, word.camera can provide people with the kaleidoscope needed to bring into focus the fantastical ideas in their heads. Like a Rorschach we see in the text what we want to see.

At their most basic level, photographs document objects and events, but can also convey thoughts, emotions, and concepts. The technology behind word.camera cannot actually interpret the uploaded photograph—the words are generated through various algorithms—and only appears magical because we read into the language, just as we read into the photograph. When using this software, we are using a machine to describe images in the most literal of ways, and we can abstract those words to suit our own ideas about the photograph.

For now, the site is limited to English and the grammar is rudimentary. As the technology improves and its “eyes” become more precise, perhaps one day it will expand into other languages and colorfully describe our photographs in words that have no English translation.

Kat Kiernan is the Editor-in-Chief of Don’t Take Pictures.