Building a VR Rhythm Game in Unity


There is a couple of different ways rhythm games can be structured, depending on what is important to a project. Some might use simple beat matching as gameplay inputs may require only to be in time with the beat. Other more complicated methods might use the audio spectrum output of the music to represent events such as individual notes played by a range of different instruments.

Using Manual Beat Matching

In rhythm games beat matching is synchronising player input and gameplay feedback with the beat of an audio track. If you are trying to get a project up and running quickly you can script your game logic to update in time with the beat. This requires you to manually input metadata you get from the track (such as the bit rate, time signature and the interval) and then use a bit of maths to synchronise game functions with the track. This method works as a metronome, keeping a consistent beat without ever interacting directly with the audio.

The main problem with this method is that over an extended period of play the track and game logic will get out of sync. Audio tracks (for examples WAV files) update at around 44,100 times a second (44.1 kHz) while Unity’s Update function only updates once per  frame which could be anywhere from 15 to 120 times a second (depending on resource usage). Using the FixedUpdate function does minimise this (it updates exactly 50 times a second) however other factors such as input lag, gaps in audio on some files and inconsistent beats per minute on some tracks (common with rock music) can also affect the audio sync.

Real-time versus Pre-processed Audio Analysis

A more accurate but more complicated method is to read and analyse the audio file itself. This method was first popularised in games such as Audiosurf and allows for code to interact directly with the sample data, the same concept that is used in spectrum analysers. There is two ways to do this: analysing the file in real time or outputting the analysis to another file and reading it from there. There are benefits and drawbacks to both.


Analysing a file in real time is particularly useful because it means you can play any file at any time and get accurate sample information (like a spectrum analyser would). This is also useful for cases where the audio length is dynamic, such as is common in video games. The drawback with this is that you cannot read the sample data ahead of the current position in the track. This means you won’t be able to predict when a function should call (such as in time with an upcoming beat). This also means that if your code is relying on real time analysis there will be a small amount of lag between when the audio is played and when it is analysed and then used in your code. Unlike the manual beat matching process outlined above this lag will stay consistent and won’t get worse over time.

Outputting the spectrum analysis to a file before it is needed is useful for a number of reasons. Firstly, having the complete data can allow your code to look ahead form the current play position and setup events accordingly. This also means that timing sensitive events can be called on time with no lag caused by processing. Where this method falls down is that the data must be paired with the correct track, leaving less flexibility for how an audio track is selected. This is also unsuitable for situations that use dynamic audio because all of the sample data is of a set length and timing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s