Interactive Music Videos

In movies, sometimes there are sequences with no sound except music. This allows the music to fill up the auditory space, independently from the images. I really love these sequences when they are done well. It feels like a purer form of emotional communication. Just images and sound, without the intervening dialogue and plot.

Lately, I’ve been listening to the excellent Aberdeen City album The Freezing Atlantic. I decided that some of these songs could work as awesome tracks for an otherwise silent part of a film. Naturally, I ended up trying to figure out how we could apply this to game design.

Music in games is usually incidental. It comes after everything else. The design is created, and the music is designed to fit around it. But what if we tried to work the other way around. Is it possible to create the game version of a music video?

Let’s look at the problems with this idea.

First, games are interactive. This means that they require player attention for anything to happen. This attentive state often requires intense thought. Being busy thinking could destroy the player’s ability to fall into the slightly trance-like state that you get from the best music videos. You’re too busy playing to feel the music.

We can solve the player attentiveness problem by making the gameplay easy and imprecise. The musical section of the game should give up any difficult, competitive, strategic, or mentally challenging aspects. It should simply be play, in the most basic and pure form. Don’t think Starcraft. Think Jenova Chen‘s Flow or Cloud. The sequence does not need to be tranquil or dreamy, though these could work for tranquil or dreamy songs. The sequence needs to require minimal higher-brain thought on part of the player. We want a person to be able to just sink into the music, and play without their mind being consciously active at all. Conscious thought destroys emotion.

The second problem is synchronization. How do we synchronize dynamic and unpredictable game events to a music track, so the feel of what is happening on screen in the gameworld matches the feel of the music?

Mono-Emotional Song/Sequence

The first solution is to simply step past the problem by choosing a song with a consistent feeling throughout, and then creating a gameplay sequence that matches that feeling for its whole length. In this way, if we envision the gameplay as a strategic space, each node in the space has generally the same feeling, which is the same feeling as the song. There obviously can’t be a mismatch.

For multi-part songs that change feeling over time, this obviously isn’t going to work. We need to adapt our approach. The goal is to make sure the player can’t get to a node/situation in the strategic space that doesn’t match the feeling of the music.

Serially Poly-Emotional Song/Sequence through Prescripted Gameplay Shifts

The second solution is thus to set up a gameworld where some pre-scripted external event (an explosion and the arrival of enemy soldiers, a character in a hospital bed closing his eyes for the last time) changes the whole gameworld, opening up a whole new strategic space and pushing you into it, while simultaneously cutting off the old space completely.

For example, say the song is somewhat tense and edgy at first, and later explodes into full excitement. You and team-mates are fighters, lying in wait for an enemy attack. The arrival of the enemy attack begins with a large explosion which perfectly matches the cymbal crash that comes at the start of the exciting part of the song.

Note that in this scenario, the game needs to be simplified to the point where it can be played without thought by even inexperienced players, so they can achieve conscious disconnection and fall into the music. Do this by removing options. Put the player in a fixed position in a vehicle. Wound the player, and have them watch from a stretcher. Bind the player as a prisoner. Leave the player without a weapon or ammunition, their only duty being to cower in a hole and watch the carnage, or let them get pulled along, away from danger, by friendly soldiers.

Designing this type of sequence smacks heavily of mental simulation (my pretentiously academic name for the practice of evaluating a game design by mentally envisioning some part of it being played). These sequences also stand in severe danger of quickly breaking a player’s immersion by having unrealistic things going on. In my example, the player needs to be able to plausibly survive the enemy attack. If they see other soldiers dying at X distance from an explosion, they better not end up surviving an explosion at less than that distance. D-Day landing sequences in World War 2 shooters tend to suffer from problems in this area.

The final issue is with the players themselves, who will often feel the need to walk into gunfire and die deliberately, just to see what will happen. The first solution is to simply ignore the problem. Players play far from perfectly, we can’t always protect them from themselves. If they don’t want to enjoy the sequence as it was intended, maybe the music wasn’t that good to begin with. The alternative is coming up with a plausible reason why the player physically has no choice but to take actions which will lead them through the sequence in a coherent way. It’s hard to do this without seeming contrived.

Serially Poly-Emotional Song/Sequence through Player-Controlled Pacing

Now consider a third, mixed system for synchronizing the music with the gameplay. We let each emotional segment last for a length which is mostly controlled by the player. When the player hits some defined trigger condition, the music proceeds to its next feeling-segment. This means that a song needs to be written, recorded, and programmed in such a way that any segment of it can be extended or contracted gracefully. This actually implies a new type of music composition, and a close relationship between game designer, writer, and composer.

This will allow us to not force the player or gameworld towards certain actions as much, thus reducing the contrived feeling of the game. That contrived feeling is one of game design’s greatest enemies.

This seems a lot like the classical context-sensitive music which we have seen in games since near the beginning of gaming. It is not. What I am describing is a single gameplay sequence which matches a single dynamic song, not a game with an endless music loop that roughly matches the ‘excitement level’ of the gameplay. It is not a game plus music, it is a holistically absorbed emotional experience.

An example. Consider a song that gets progressively more mournful. In this example, you enter the hospital. The song is slow. The nurse directs you to your father’s room. As long as you wander the halls, the music remains generally slow and tense. When you finally enter your father’s room and see him, his frail body barely clinging to life, the music takes on a deeper, more mournful quality. As he speaks to you, the music quiets so you can hear his strained whispers. Finally, he closes his eyes for the last time. The music and the whine of the heart monitor come together into one fading, final note.