Generating Music with Markov Chains and Alda

generating music

A while ago I read a fantastic article by Alex Bainter about how he used markov chains to generate new versions of Aphex Twin’s track ‘aisatsana’. After reading it I also wanted to try my hand at generating music using markov chains, but mix it up by tring out alda.

‘aisatsana’ is very different to the rest of Aphex Twin’s 2012 released Syro album. It’s a calm, soothing piano piece that could easily place you into a meditative state after listening to it.

Alda is a text-based programming language for music composition. If you haven’t tried it before, you’ll get a feel for how it works in this post. If you want to learn about it through some much simpler examples, this quick start guide is a good place to start.

Generating music is made easier using the simple language that alda provides.

As an example, here are 4 x samples ‘phrases’ that are generated with markov chains (based off the aisatsana starting state), and played back with Alda. I’ve picked 4 x random phrases out of 32 that sounded similar to me, but were different in each case. A generated track will not necessarily consist of all similar sounding phrases, but might contain a number of these.

Markov Chains 101

On my journey, the first stop was to learn more about markov chains.

Markov chains are mathematical “stochastic” systems that change from one “state” (a situation or set of values) to another. In addition to this a Markov chain tells you the probabilitiy of transitioning from one state to another.

Using a honey bee worker as an example we might say: A honey bee has a bunch of different states:

  • At the hive
  • Leaving the hive
  • Collecting pollen
  • Make honey
  • Returning to hive
  • Cleaning hive
  • Defending hive

After observing honey bees for a while, you might model their behaviour using a markov chain like so:

  • When at the hive they have:
    • 50% chance to make honey
    • 40% chance to leave the hive
    • 10% chance to clean the hive
  • When leaving the hive they have:
    • 95% chance to collect pollen
    • 5% chance to defend the hive
  • When collecting pollen they have:
    • 85% chance of collecting pollen
    • 10% chance of returning to hive
    • 5% chance to defend the hive
  • etc…

The above illustrates what is needed to create a markov chain. A list of states (the “state space”), and the probabilities of transitioning between them.

To play around with Markov chains and simple string generation, I created a small codebase (nodejs / typescript). The app takes a list of ‘chat messages logs’ (really any line separated list of strings) as input. It then uses random selection to find any lines containing the ‘seed’ string.

With the seed string, it generates new and potentially unique ‘chat messages’ based on this input seed and the ‘state’ (which is the list of chat messages fed in).

Using a random function and initial filtering means that the generation probability is constrained to the size of the input and filtered list, but it still helped me understand some of the concepts.

Converting Aisatana MIDI to Alda Format

To start, the first thing I needed was a list of musical segments from the original track. These are what we refer to as ‘phrases’.

As Alex did in his implementation, I grabbed a MIDI version of Aisatsana. I then fed it into a MIDI to JSON converter, yielding a breakdown of the track into individual notes. Here is what the first two notes look like:

[
  {
    "name": "E3",
    "midi": 52,
    "time": 0,
    "velocity": 0.30708661417322836,
    "duration": 0.5882355
  },
  {
    "name": "G3",
    "midi": 55,
    "time": 0.5882355,
    "velocity": 0.31496062992125984,
    "duration": 0.5882355
  }
]

From there I wrote some javascript to take these notes in JSON format, parse the time values and order them into the 32 ‘phrases’ that aisatsana is made up of.

That is, there are 32 ‘phrases’, with each consisting of 32 ‘half-beats’ at 0.294117647058824 seconds per half beat. Totalling the 301 seconds.

const notes = [] // <-- MIDI to JSON notes here

// constants specific to the aisatsana track
const secPerHalfBeat = 0.294117647058824;
const phraseHalfBeats = 32;

// Array to store quantized phrases
let phrases = [];

notes.forEach(n => {
  const halfBeat = Math.round(n.time / secPerHalfBeat);
  const phraseIndex = Math.floor(halfBeat / phraseHalfBeats);
  const note = n.name.substring(0, 1).toLowerCase();
  const octave = n.name.substring(1, 2);
  const time = n.time;
  const duration = n.duration;

  // Store note in correct 'phrase'
  if (!phrases[phraseIndex]) {
  	phrases[phraseIndex] = [];
  }

  phrases[phraseIndex].push({ note: note, octave: octave, time: time, duration: duration });
});

It also gathers information such as the note symbol, octave, and duration for each note and stores it in a phrases array, which also happens to be ordered by phrase index.

Grouping by Chord

Next, the script runs through each phrase and groups the notes by time. If a note is played at the same timestamp, that means it is part of the same chord. To play correctly with alda, I need to know this, so a chords array is setup for each phrase.

phrases.forEach(phrase => {
  let chords = []
  const groupByTime = groupBy('time');
  phrase.chords = [];
  const chordGrouping = groupByTime(phrase);

  for (let [chordTimestamp, notes] of Object.entries(chordGrouping)) {
    phrase.chords.push(notes)
  }
});

Generating alda Compatible Strings

With chord grouping done, we can now convert the track into 32 phrases that alda will understand.

phrases.forEach(phrase => {
  let aldaStr = "piano: (tempo 51) (quant 90) ";
  phrase.chords.forEach(chord => {
    if (chord.length > 1) {
      // Alda plays notes together as a chord when separated by a '/'
      // character. Generate the alda string based on whether or not
      // it needs to have multiple notes in the chord, separating with
      // '/' if so.
      for (let [idx, note] of Object.entries(chord)) {
        if (idx == chord.length - 1) {
          aldaStr += `o${note.octave} ${note.note} ${note.duration}s `;
        } else {
          aldaStr += `o${note.octave} ${note.note} ${note.duration}s / `;
        }
        
      };
    } else {
      chord.forEach(note => {
        aldaStr += `o${note.octave} ${note.note} ${note.duration}s `;
      });
    }
  });
  // Output the phrase as an alda-compatible / playable string (you can
  // also copy this directly into alda's REPL to play it)
  console.log(aldaStr);
})

Here is the full script to convert the MIDI to alda phrase strings.

Generating Music with Markov Chains

There are different entry points that I could have used to create the markov chain initial state, but I went with feeding in the alda strings directly to see what patterns would emerge.

Here are the first 4 x phrases from aisatsana in alda-compatible format:

piano: (tempo 51) (quant 90) o3 e 0.5882355s o3 g 0.5882355s o3 c 0.5882354999999999s o4 c 7.6470615s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 2.3529420000000005s o4 e 4.705884000000001s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 7.058826s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 1.1764709999999994s o4 e 5.882354999999997s

If you like, you can drop those right into alda’s REPL to play them, or drop them into a text file and play them with:

alda play --file first-four-phrases.alda

The strings are quite ugly to look at, but it turns out that they can still be used to generate new and original phrases based off the aisatsana track phrases using markov chains.

Using the markov-chains npm package, I wrote a small nodejs app to generate new phrases. It takes the 32 x alda compatible phrase strings from the original MIDI track of ‘aisatsana’ as a list of states and walks the chain to create new phrases.

E.g.

const states = [
  // [ alda phrase strings here ],
  // [ alda phrase strings here ],
  // [ alda phrase strings here ]
  // etc...
]

const chain = new Chain(states);
 
// generate new phrase(s)
const newPhrases = chain.walk();

I threw together a small function that you can run directly to generate new phrases. Give it a try here. Hitting this URL in the browser will give you new phrases from the markov generation.

If you want a text version that you can drop right into the alda REPL or into a file for alda to play try this:

curl -s https://solitary-mountain-114.fly.dev/ | jq -r '.phrases[]'

(Just add the instrument type, temp, and quant values you would like to the beginning of each line). E.g.

piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 30.71) e 0.5882354999999961s o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 0.5882354999999961s o3 (volume 29.92) b 7.0588260000000105s / o3 (volume 30.71) d 7.0588260000000105s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) g 0.29411774999999807s o4 (volume 30.71) e 0.5882355000000032s / o3 (volume 30.71) e 0.5882355000000032s o4 (volume 30.71) d 0.5882355000000032s / o3 (volume 29.92) c 0.5882355000000032s o4 (volume 29.92) c 0.5882355000000032s / o3 (volume 31.50) e 0.5882355000000032s o3 (volume 29.92) b 7.058826000000003s / o3 (volume 30.71) d 7.058826000000003s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) e 0.29411774999999807s o4 (volume 59.84) g 0.29411774999999807s o4 (volume 60.63) a 0.29411774999999807s / o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 62.20) b 0.29411774999999807s o5 (volume 62.99) c 0.5882354999999961s / o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o5 (volume 65.35) e 0.5882354999999961s / o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 2.3529419999999845s o5 (volume 66.93) b 0.5882354999999961s / o3 (volume 29.92) b 1.7647064999999884s o5 (volume 60.63) g 8.823532499999999s o4 (volume 30.71) e 7.0588260000000105s / o2 (volume 30.71) c 7.0588260000000105s / o3 (volume 30.71) e 7.0588260000000105s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) e 0.29411774999999807s o4 (volume 59.84) g 0.29411774999999807s o4 (volume 60.63) a 0.29411774999999807s / o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 62.20) b 0.29411774999999807s o5 (volume 62.99) c 0.5882354999999961s / o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o5 (volume 65.35) e 0.5882354999999961s / o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 7.647061499999992s o5 (volume 66.93) b 0.5882354999999961s / o3 (volume 29.92) b 2.3529419999999988s o5 (volume 60.63) g 6.4705905s o4 (volume 15.75) e 4.7058839999999975s

I’ve uploaded the code here that does the markov chain generation using the initial alda phrase strings as input state.

Results and Alda Serverless

Generated Music

The results from generating music off the phrases from the original track are certainly fun and interesting to listen to. The new phrases play out in different ways to the original track, but still have the feeling of belonging to the same piece of music.

Going forward I’ll be definitely experiment further with markov chains and music generation using alda.

Experimenting with alda and Serverless

Something I got side-tracked on during this experiment was hosting the alda player in a serverless function. I got pretty far along using AWS Lambda Layers, but the road was bumpy. Alda requires some fairly chunky dependencies.

Even after managing to squeeze Java and the Alda binaries into lambda layers the audio playback engine was failing to start in a serverless function.

I managed to clear through a number of problems but eventually my patience wore down and I settled with writing my own serverless function to generate the strings to feed into alda directly.

My goal here was to generate unique phrases, output them to MIDI, and then convert them to Audio to be played almost instantenously. For now it’s easy enough to take the generated strings and drop them directly into the alda REPL or play them direct from file though.

It will be nice to see alda develop further and offer an online REPL – which would mean the engine itself would be light enough to perform the above too.