Generating Music with Markov Chains and Alda

A while ago I read a fantastic article by Alex Bainter about how he used markov chains to generate new versions of Aphex Twin’s track ‘aisatsana’. After reading it I also wanted to try my hand at generating music using markov chains, but mix it up by tring out alda.

‘aisatsana’ is very different to the rest of Aphex Twin’s 2012 released Syro album. It’s a calm, soothing piano piece that could easily place you into a meditative state after listening to it.

Alda is a text-based programming language for music composition. If you haven’t tried it before, you’ll get a feel for how it works in this post. If you want to learn about it through some much simpler examples, this quick start guide is a good place to start.

Generating music is made easier using the simple language that alda provides.

As an example, here are 4 x samples ‘phrases’ that are generated with markov chains (based off the aisatsana starting state), and played back with Alda. I’ve picked 4 x random phrases out of 32 that sounded similar to me, but were different in each case. A generated track will not necessarily consist of all similar sounding phrases, but might contain a number of these.

Markov Chains 101

On my journey, the first stop was to learn more about markov chains.

Markov chains are mathematical “stochastic” systems that change from one “state” (a situation or set of values) to another. In addition to this a Markov chain tells you the probabilitiy of transitioning from one state to another.

Using a honey bee worker as an example we might say: A honey bee has a bunch of different states:

At the hive
Leaving the hive
Collecting pollen
Make honey
Returning to hive
Cleaning hive
Defending hive

After observing honey bees for a while, you might model their behaviour using a markov chain like so:

When at the hive they have:
- 50% chance to make honey
- 40% chance to leave the hive
- 10% chance to clean the hive
When leaving the hive they have:
- 95% chance to collect pollen
- 5% chance to defend the hive
When collecting pollen they have:
- 85% chance of collecting pollen
- 10% chance of returning to hive
- 5% chance to defend the hive
etc…

The above illustrates what is needed to create a markov chain. A list of states (the “state space”), and the probabilities of transitioning between them.

To play around with Markov chains and simple string generation, I created a small codebase (nodejs / typescript). The app takes a list of ‘chat messages logs’ (really any line separated list of strings) as input. It then uses random selection to find any lines containing the ‘seed’ string.

With the seed string, it generates new and potentially unique ‘chat messages’ based on this input seed and the ‘state’ (which is the list of chat messages fed in).

Using a random function and initial filtering means that the generation probability is constrained to the size of the input and filtered list, but it still helped me understand some of the concepts.

Converting Aisatana MIDI to Alda Format

To start, the first thing I needed was a list of musical segments from the original track. These are what we refer to as ‘phrases’.

As Alex did in his implementation, I grabbed a MIDI version of Aisatsana. I then fed it into a MIDI to JSON converter, yielding a breakdown of the track into individual notes. Here is what the first two notes look like:

[
  {
    "name": "E3",
    "midi": 52,
    "time": 0,
    "velocity": 0.30708661417322836,
    "duration": 0.5882355
  },
  {
    "name": "G3",
    "midi": 55,
    "time": 0.5882355,
    "velocity": 0.31496062992125984,
    "duration": 0.5882355
  }
]

From there I wrote some javascript to take these notes in JSON format, parse the time values and order them into the 32 ‘phrases’ that aisatsana is made up of.

That is, there are 32 ‘phrases’, with each consisting of 32 ‘half-beats’ at 0.294117647058824 seconds per half beat. Totalling the 301 seconds.

const notes = [] // &lt;-- MIDI to JSON notes here

// constants specific to the aisatsana track
const secPerHalfBeat = 0.294117647058824;
const phraseHalfBeats = 32;

// Array to store quantized phrases
let phrases = [];

notes.forEach(n => {
  const halfBeat = Math.round(n.time / secPerHalfBeat);
  const phraseIndex = Math.floor(halfBeat / phraseHalfBeats);
  const note = n.name.substring(0, 1).toLowerCase();
  const octave = n.name.substring(1, 2);
  const time = n.time;
  const duration = n.duration;

  // Store note in correct 'phrase'
  if (!phrases[phraseIndex]) {
  	phrases[phraseIndex] = [];
  }

  phrases[phraseIndex].push({ note: note, octave: octave, time: time, duration: duration });
});

It also gathers information such as the note symbol, octave, and duration for each note and stores it in a phrases array, which also happens to be ordered by phrase index.

Grouping by Chord

Next, the script runs through each phrase and groups the notes by time. If a note is played at the same timestamp, that means it is part of the same chord. To play correctly with alda, I need to know this, so a chords array is setup for each phrase.

phrases.forEach(phrase => {
  let chords = []
  const groupByTime = groupBy('time');
  phrase.chords = [];
  const chordGrouping = groupByTime(phrase);

  for (let [chordTimestamp, notes] of Object.entries(chordGrouping)) {
    phrase.chords.push(notes)
  }
});

Generating alda Compatible Strings

With chord grouping done, we can now convert the track into 32 phrases that alda will understand.

phrases.forEach(phrase => {
  let aldaStr = "piano: (tempo 51) (quant 90) ";
  phrase.chords.forEach(chord => {
    if (chord.length > 1) {
      // Alda plays notes together as a chord when separated by a '/'
      // character. Generate the alda string based on whether or not
      // it needs to have multiple notes in the chord, separating with
      // '/' if so.
      for (let [idx, note] of Object.entries(chord)) {
        if (idx == chord.length - 1) {
          aldaStr += `o${note.octave} ${note.note} ${note.duration}s `;
        } else {
          aldaStr += `o${note.octave} ${note.note} ${note.duration}s / `;
        }
        
      };
    } else {
      chord.forEach(note => {
        aldaStr += `o${note.octave} ${note.note} ${note.duration}s `;
      });
    }
  });
  // Output the phrase as an alda-compatible / playable string (you can
  // also copy this directly into alda's REPL to play it)
  console.log(aldaStr);
})

Here is the full script to convert the MIDI to alda phrase strings.

Generating Music with Markov Chains

There are different entry points that I could have used to create the markov chain initial state, but I went with feeding in the alda strings directly to see what patterns would emerge.

Here are the first 4 x phrases from aisatsana in alda-compatible format:

piano: (tempo 51) (quant 90) o3 e 0.5882355s o3 g 0.5882355s o3 c 0.5882354999999999s o4 c 7.6470615s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 2.3529420000000005s o4 e 4.705884000000001s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 7.058826s
piano: (tempo 51) (quant 90) o3 e 0.5882354999999997s o3 g 0.5882354999999997s o3 c 0.5882354999999997s o4 c 0.5882354999999997s o3 b 1.1764709999999994s o4 e 5.882354999999997s

If you like, you can drop those right into alda’s REPL to play them, or drop them into a text file and play them with:

alda play --file first-four-phrases.alda

The strings are quite ugly to look at, but it turns out that they can still be used to generate new and original phrases based off the aisatsana track phrases using markov chains.

Using the markov-chains npm package, I wrote a small nodejs app to generate new phrases. It takes the 32 x alda compatible phrase strings from the original MIDI track of ‘aisatsana’ as a list of states and walks the chain to create new phrases.

E.g.

const states = [
  // [ alda phrase strings here ],
  // [ alda phrase strings here ],
  // [ alda phrase strings here ]
  // etc...
]

const chain = new Chain(states);
 
// generate new phrase(s)
const newPhrases = chain.walk();

I threw together a small function that you can run directly to generate new phrases. Give it a try here. Hitting this URL in the browser will give you new phrases from the markov generation.

If you want a text version that you can drop right into the alda REPL or into a file for alda to play try this:

curl -s https://solitary-mountain-114.fly.dev/ | jq -r '.phrases[]'

(Just add the instrument type, temp, and quant values you would like to the beginning of each line). E.g.

piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 30.71) e 0.5882354999999961s o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 0.5882354999999961s o3 (volume 29.92) b 7.0588260000000105s / o3 (volume 30.71) d 7.0588260000000105s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) g 0.29411774999999807s o4 (volume 30.71) e 0.5882355000000032s / o3 (volume 30.71) e 0.5882355000000032s o4 (volume 30.71) d 0.5882355000000032s / o3 (volume 29.92) c 0.5882355000000032s o4 (volume 29.92) c 0.5882355000000032s / o3 (volume 31.50) e 0.5882355000000032s o3 (volume 29.92) b 7.058826000000003s / o3 (volume 30.71) d 7.058826000000003s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) e 0.29411774999999807s o4 (volume 59.84) g 0.29411774999999807s o4 (volume 60.63) a 0.29411774999999807s / o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 62.20) b 0.29411774999999807s o5 (volume 62.99) c 0.5882354999999961s / o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o5 (volume 65.35) e 0.5882354999999961s / o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 2.3529419999999845s o5 (volume 66.93) b 0.5882354999999961s / o3 (volume 29.92) b 1.7647064999999884s o5 (volume 60.63) g 8.823532499999999s o4 (volume 30.71) e 7.0588260000000105s / o2 (volume 30.71) c 7.0588260000000105s / o3 (volume 30.71) e 7.0588260000000105s
piano: (tempo 51) (quant 90) o4 (volume 30.71) e 0.29411774999999807s / o3 (volume 30.71) e 0.29411774999999807s o4 (volume 59.84) g 0.29411774999999807s o4 (volume 60.63) a 0.29411774999999807s / o4 (volume 30.71) e 0.5882354999999961s / o3 (volume 31.50) g 0.5882354999999961s o4 (volume 62.20) b 0.29411774999999807s o5 (volume 62.99) c 0.5882354999999961s / o4 (volume 30.71) d 0.5882354999999961s / o3 (volume 29.92) c 0.5882354999999961s o5 (volume 65.35) e 0.5882354999999961s / o4 (volume 29.92) c 0.5882354999999961s / o3 (volume 31.50) e 7.647061499999992s o5 (volume 66.93) b 0.5882354999999961s / o3 (volume 29.92) b 2.3529419999999988s o5 (volume 60.63) g 6.4705905s o4 (volume 15.75) e 4.7058839999999975s

I’ve uploaded the code here that does the markov chain generation using the initial alda phrase strings as input state.

Results and Alda Serverless

Generated Music

The results from generating music off the phrases from the original track are certainly fun and interesting to listen to. The new phrases play out in different ways to the original track, but still have the feeling of belonging to the same piece of music.

Going forward I’ll be definitely experiment further with markov chains and music generation using alda.

Experimenting with alda and Serverless

Something I got side-tracked on during this experiment was hosting the alda player in a serverless function. I got pretty far along using AWS Lambda Layers, but the road was bumpy. Alda requires some fairly chunky dependencies.

Even after managing to squeeze Java and the Alda binaries into lambda layers the audio playback engine was failing to start in a serverless function.

I managed to clear through a number of problems but eventually my patience wore down and I settled with writing my own serverless function to generate the strings to feed into alda directly.

My goal here was to generate unique phrases, output them to MIDI, and then convert them to Audio to be played almost instantenously. For now it’s easy enough to take the generated strings and drop them directly into the alda REPL or play them direct from file though.

It will be nice to see alda develop further and offer an online REPL – which would mean the engine itself would be light enough to perform the above too.

2 thoughts on “Generating Music with Markov Chains and Alda”

Sean

February 23, 2022 at 6:23 pm

Thanks for creating and working on Alda, Dave! I’ve had a lot of fun playing around with it.

This is such a great comment. You’ve given all the insight necessary to understand the interaction between alda and alda-player in this single comment. Thanks! I really wish I had time to dedicate to looking into the github issue and conversion necessary, but personally won’t be able to help out. I hope someone else stumbles on this post and can get stuck in.

Cheers!
Dave Yarwood

January 15, 2022 at 6:20 pm

Hi, this is really cool stuff, thanks for sharing it! This is exactly the kind of thing that I’ve envisioned that Alda could be used for, and it’s super exciting that people like you are out there doing it!

On the topic of instantaneously playing the output of the generator in the browser: I think the separation that we have today between the client (`alda`, written in Go) and player (`alda-player`, written in Kotlin) is probably the right conceptual “seam” here between the server (which could be serverless function) and the browser. At a high level, the `alda` client converts Alda source input into OSC messages that tell the player process what to do. The `alda-player` receives the OSC messages and follows the instructions, loading MIDI events into a sequencer and either playing the sequence or exporting it as a MIDI file.

The way `alda` and `alda-player` work on the command line is that the client sends these OSC messages to the player over TCP. In a web server/client scenario, the browser could send a request with Alda source to the server, the server could run the Alda client and dump the OSC messages to bytes, then send those back to the client as a response. On the client side, we could have some JavaScript that parses the OSC bytes (using an OSC library) and does something similar to what `alda-player` is doing to interpret the instructions and play music, e.g. using the Web Audio API and/or a JS MIDI library.

If nobody does this before me, I will eventually do it myself, because I’d love to have a live REPL on the Alda website! I’ve already started looking into compiling `alda` into WebAssembly so that we can do this entirely in the browser on a static webpage. I’m hoping that this will be the easy part. The part that will take a little more work is the browser-side library for receiving Alda OSC messages and correctly interpret them to play music in the browser.

If anyone reading this is interested in following this project (or maybe contributing!), see: https://github.com/alda-lang/alda/issues/392

Thanks again for writing this awesome blog post, I love projects like this!