The Quarantine Project

05/03/2020 10:00:00

While we're all enduring these HashTagUnprecendetedTimes I'm going to keep a living list here of my own personal "quarantine project".

Earlier in the year I took some time out, which was originally intended to be dedicaded to travel, conferences and writing book 3. Obviously the first two things in that list are somewhat less plausible in a global pandemic, so I've been up to a bunch of other stuff.

Here is a living list, and I'll update it as more things arrive, until such a time as I write about them all distinctly.

Events

Remote Code DOJO

I've been running weekly pair programming dojos that you can sign up to and attend here

Projects

.NET

CoreBoy - A Working .NET Core / C# 8 GameBoy emulator
RayCastingDemo - Built a RayCasting 3D "engine" in C#. GitHub source and a Glitch interactive demo of it.
Glitch .NET Core Templates .NET Core Templates over time.
.NET Core library templates for GitHub actions

JavaScript / TypeScript

TypeTetris - A TypeScript implementaton of Tetris in the browser.
Get Coding 2 Game Examples - I've built up interactive samples of the games from my second book Get Coding 2, here.
SNES Kart - A half baked attempt at cloning Super Mario Kart's rendering system in CSS.
Glitch TypeScript templates [https://glitch.com/@davidwhitney/java-script-and-type-script-templates]
Glitch JestRunner - An interactive Jest sandpit.
Typescript WebSockets Chat Demo - A quick demo of WebSockets / Glitch / TypeScript.

Hardware

InteractiveLights - Lent a helping hand with some C/C++ for Arduino based interactive LED pixels on wearable tech. Code here. Running demo WebApp here.

Work In Progress

BrowserParty - A proximity based WebRTC chat client.

Books

Book 3 is still coming 🖤

Running .NET Core apps on Glitch!

03/26/2020 16:00:00

Over the last couple of months I've been doing a lot of code for fun in Glitch.

Glitch is a collabartive web platform that aims to make web programming accessible and fun - complete with real-time editing and hot-reloading built in. It's a great platform for sketching out web apps, working with friends, or adapting samples other people share ("remixing"). It's a great product, and I love the ethos behind it - and like a lot of things on the web in 2020, it's commonly used for writing HTML and javascript, with default templates also available for Node + Express.js apps.

...but why not .NET Core?

I was in the middle of configuring some webpack jobs when I idley tweeted that it'd be great if the netcore team could support this as a deployment target. The glitch team shot across a few docs asking what an MVP would look like for netcore on glitch, and I idley, and mostly out of curiosity, typed dotnet into the Glitch command line prompt to see if the dotnet CLI just happened to be installed. And it was.

Armed with the wonderfully named glitchnomicon and the dotnet cli I created a fresh ANC (ASP.NET Core) MVC start project, and migrated the files one by one into a Glitch project.

With a little tweaking I've got the dotnet new project template running in Glitch, without any changes to the C# code at all.

Subtle changes:

No Bootstrap
Stripped out boilerplate layout
No jQuery
Removed "development" mode and "production" CSS filters from the views
Glitch executes the web app in development mode by default so you see detailed errors

I've published some projects on Glitch for you to remix and use as you like.

ASP.NET MVC Starter Project

ASP.NET Starter Project (just app.run)

I'll be adding more templates to the collection .NET Core Templates over time.

Glitch is awesome, and you should check it out.

Small thoughts on literate programming

02/28/2020 16:00:00

One of the most profound changes in my approach to software was understanding it to be literature. Functional literature, but literature regardless. Intent, characterisation, description, text, subtext, flow, rhythm, style, all effect software like they do prose.

It's a constrained form of communication, with grammar, and that's why we work in "programming languages". They are languages. With rules, idioms and quirks. These aren't analogies, it's what software is. It's storytelling. Constrained creative writing with purpose.

Basically, Donald Knuth was right, and called it a bajillion years ago - with the idea of literate programming. Today's languages are that thing. You will never be a great programmer unless you become an excellent communicator, and an excellent writer. The skillset is the same.

Critical thinking, expression of concept, reducing repititon, form for impact, signposting, intent and subtext. If you want to understand great software, understand great literature.

Communication skills are not optional

If you want to teach a junior programmer to be a better programmer, teach them to write. Language is our tool for organising our thoughts. It's powerful. It has meaning. It has power.

It's a gift, it's for everyone. 🖤

(I should plug that I have a talk on this - get in touch)

Hardware Hacking with C# and JavaScript

01/29/2020 09:20:00

Over the last couple of months I’ve had a little exposure to hardware hacking and wearables after lending some of my exceptionally rusty 1990s-era C to Jo’s excellent Christmas Jumper project.

The project was compiled for the AdaFruit Feather Huzzah a low powered, ESP8266 SOC, Arduino compatible, gumstick sized development board. Other than a handful of RaspberryPi’s here and there, I’d never done any hardware hacking before, so it had that wonderful sheen of new and exciting we all crave.

“It’s just C!” I figured, opening the Arduino IDE for the first time.

And it was, just C. And I forgot how much “just C” is a pain – especially with a threadbare IDE.

I like syntax checking. I like refactoring. I like nice things.

The Arduino IDE, while functional, was not a nice thing.

It only really has two buttons – compile, and “push”. And it’s S L O W.

Arduino IDE

That’s it.

I Need Better Tools for This

I went on a small mission to get better C tools that also worked with the Arduino, as my workflow had devolved into moving back and forth between Visual Studio for syntax highlighting, and Arduino IDE for verification.

I stumbled across Visual Micros “Arduino IDE for Visual Studio” which was mostly good, if occasionally flaky, and had a slightly awkward and broken debugging experience. Still – light-years ahead of what I was using. There’s also an open source and free VSCode extension which captures much of the same functionality (though sadly I was missing my ReSharper C++ refactorings).

We stumbled forwards with my loose and ageing grasp of C, Google and got the thing done.

But there had to be a better way. Something less archaic and painful.

Can I Run C#?

How about we just don’t do C?

I know, obvious.

I took to Google to work out what the current state of the art was for C# and IoT, remembering a bit of a fuss made a few years ago during the NETCORE initial prototypes of IoT compatibility.

Windows 10 IOT Core seems steeped in controversy and potential abandonment in the face of NETCOREs cross platform sensibilities, so I moved swiftly onwards.

Thinking back a decade, I remembered a fuss made about the .NET MicroFramework based on Windows CE, and that drove me to a comparatively new project .NET NanoFramework – a reimplementation and cut down version of the CLR designed for IOT devices.

I read the docs and went to flash the NanoFramework runtime onto my AdaFruit Feather Huzzah. I’d flashed this thing hundreds of times by now.

And every time, it failed to connect.

One Last Huzzah

As it transpired, the AdaFruit Feather Huzzah that was listed as supported (£19.82, Amazon), wasn’t the device I needed, I instead needed the beefier AdaFruit Feather Huzzah32 (£21.52, Amazon). Of course.

Turns out the Huzzah had a bigger sibling with more memory and more CPU based on the ESP32 chip. And that’s what nanoFramework targeted.

No problem, low cost, ordered a couple.

Flashing a Huzzah32 to Run C#

The documentation is a little bit dense and took longer than I’d like to fumble through, so I’ll try condensing it here. Pre-Requirement: Visual Studio 2019+, Any SKU, including the free community edition.

Add a NuGet package source to Visual Studio

    https://pkgs.dev.azure.com/nanoframework/feed/_packaging/sandbox/nuget/v3/index.json

Add a Visual Studio Extensions feed

    http://vsixgallery.com/feed/author/nanoframework/

Go to Tools -> Extensions and install the “nanoFramework” extension.
Install the USB driver for the Huzzah, the SiLabs CP2104 Driver
Plug in your Huzzah
Check the virtual COM port it has been assigned in Windows Device Manager under the “Ports (COM&LTP)” category.

Run the following commands from the Package Management Console

    dotnet tool install -g nanoFirmwareFlasher
    nanoff --target ESP32_WROOM_32 --serialport YOURCOMPORTHERE –update

That’s it, reset the board, you’re ready to go.

The entire process took less than 5 minutes, and if you’ve already used an ESP32, or the previous Huzzah, you’ll already have the USB drivers installed.

Hello World

You can now create a new nanoFramework project from your File -> New Project menu.

I used this program, though obviously you can write everything in your static void main if that’s your jam.

public class Program
{
      public const int DefaultDelay = 1000;

      public static void Main() => new Program().Run();
      public Program()
      {
            Console.WriteLine("Constructors are awesome.");
      }

      public void Run()
      {
            while (true)
            {
                  Console.WriteLine("ArduinYo!");
                  Thread.Sleep(DefaultDelay);
            }
      }
}

The brilliant, life affirming, beautiful thing about this, is you can just press F5, and within a second your code will be running on the hardware. No long compile and upload times. Everything just works like any other program, and it’s a revelation.

NanoFramework has C# bindings for much of the hardware you need to use, and extensive documentation for anything you need to write yourself using GPIO (General Purpose IO – the hardware connections on the board you can soldier other components to, or add Arduino shields to).

But It’s 2020 Isn’t Everything JavaScript Now?

Alright fine. If you want a slightly worse debugging experience, but slightly wider array of supported hardware, there’s another project, Espruino.

Somehow, while their API documentation is excellent, it’s a little obtuse to find information about running Espruino on the ESP32 – but they both work and are community supported.

The process of flashing is slightly more obtuse than in .NET land, but let’s skip to the good bit

Make sure you have a version of python installed and in your path
At a command prompt or terminal, using pip (which is installed with python), run the command
```
    pip install esptool
```
Download the latest binaries for the board from here or for subsequent versions, via the download page

You need all three files for your first flash.

From a command prompt, in the same directory as your downloaded files

    esptool.py --chip esp32 --port /dev/ttyUSB0 --baud 921600 --after hard_reset write_flash -z --flash_mode dio --flash_freq 40m --flash_size detect 0x1000 bootloader.bin 0x8000 partitions_espruino.bin 0x10000 espruino_esp32.bin

Install the “Espruino IDE” from the Google Chrome app store
Under Settings -> Communication set the Baud Rate to 115200

That’s it, you’re ready to go in JavaScript – just click connect and run the provided sample

Espruino IDE

What’s next?

Well, nanoFramework is nice, but I really wish I could just use NetStandard 2.0. Luckily for me, the Meadow project from Wilderness labs is just that - an implementation of NetStandard for their own ESP32 derived board. I’ve ordered a couple of them to see how the experience stacks up. Their ESP32 boards look identical to the Huzzah32s, with some extra memory and CPU horsepower, presumably to accommodate the weight of the framework.

They are twice the cost currently at £50 vs the £20 for the Huzzah32, but if they deliver on the flexibility of full .NET, I’d imagine it’ll be the best possible environment for this kind of development if you’re willing to use or learn C#.

In JavaScript land? Well, JavaScript is nice, but TypeScript is better! Espruino don’t directly support a lot of ES6+ features, or TypeScript, but with a little bit of magic, their command line tooling, and Babel, we can use modern JavaScript now, on those devices (I’ll leave this to a subsequent post).

C is wonderful, but “even” hobbyist programmers should have good toolchains for their work <3

Music Streaming and the Disappearing Records

12/15/2019 21:20:00

I'm part of the Napster generation.

That means I pretty much stole all the music I grew up listening to. Unrepentantly, unremorsefully, downloaded everything.

Napster might seem like a dim and distant memory in internet time, but it's easy to forget quite how game changing Napster was for both the music scene, and the industry.

For anyone who might be too young to remember, or perhaps not technically savvy enough at the time, Napster, while not the first way you could "borrow" music from the internet, was the first popular Peer-to-Peer file sharing network.

It was ridiculously easy - you installed the software, told it where your collection of MP3s (songs stored as files on your computer) were and everyone else did the same. You were faced with an empty search box and your imagination and it just worked.

You could search for anything you wanted, however esoteric, and for the most part, Napster delivered.

It'd hard to explain how much of a revelation this was in 1999 - a year after Google was founded, and a handful of years before search on the internet really worked well at all.

I was 15 in 1999, teaching myself C++ and Perl, and this seemed like magic. Everyone ripped the few CDs they could afford to buy themselves to their computers, and in return we got everything.

When you're 15, the complex relationship between rights owners, writers, musicians, ethics, points based royalty systems and the music business was all literally the furthest thing from your mind. You're hungry for culture, and music, for as long as it's been recorded, had always tracked generations and cultural shifts.

People forget just how hard it was to discover new music, especially non-mainstream music in the 90s, and earlier.

When home cassette recording was becoming popular in the early 80s, the record companies were scared. Home Taping Was Killing Music! They proclaimed - yet the same taping and trading scene, participated in by kids eager to hear new, increasingly niche and underground music, rebelled. That same taping scene enhanced the burgeoning New Wave of British Heavy Metal scene, internationalising it, making music accessible to everyone. It's no coincidence that bands like Metallica rose to prominence by both their own participation in the demo trading scene of the early 80s, later weaponising this grass roots movement into international success.

But the 90s were different. The taping scene was related to live bootlegs, and the onward march of technology proven cassettes to be unreliable - home taping didn't kill the industry - it's actually gave the industry room to innovate on quality and easily win.

And oh boy did the industry win. The 90s were probably the final hurrah for rockstars and massive pushed music outside of mainstream pop. They didn't realise it then of course, but somebody did. Someone who made his name in the taping scene.

Lars was right.

Which is an awkward admission for a self-admitted former music pirate.

In 2000, Metallica, et al v. Napster inc saw the first time ever a court case was brought against a Peer-to-Peer filesharing network. Lars Ulrich, Metallica's drummer, was exceptionally vocal in the media about the widespread damage he saw file sharing causing to the music industry.

"Napster hijacked our music without asking. They never sought our permission. Our catalogue of music simply became available as free downloads on the Napster system."

Lars was right. Technology has this wonderful and world changing ability to ask, "can we do this thing?" but it rarely stops to see if it should. Metallica won.

Collectively, we, the youth, demonised them. Greedy rich millionaires pulling up the ladder behind them. Metallica inadvertently fuelled the boom in DRM and various rights management technologies of the late 90s and early 2000s, but the effects of the Napster lawsuit are still felt today.

While they thought they were fighting for their creative agency, they really were fighting for control. What the Metallica suit did was push file sharing underground into a series of different sharing platforms, which were more difficult to regulate, harder to track, and more resilient. They ironically made file sharing more sophisticated.

Lars today understands that the fans, the youth of the day thought Metallica were fighting them, rather than the file-sharing organisations. All his fears did come to fruition though.

It's a sobering admission to be on the wrong side of the argument twenty years later.

But what if file sharing was used for good?

The file-sharing epidemic of Napsters launch wasn't the start of file-sharing. But actually the end destination of an entirely different scene, completely distinct from the tape trading of the 80s.

With its origin in 80s hacker culture, and continued survival on pre-World Wide Web protocols like usenet (a distributed message board system that predates web pages) and IRC (a decentralised chat protocol that was extended to support file transfers) - the digital music trading scene of the late 90s was part of the Warez scene - often just called "the scene" to people involved.

The scene is a closed community of ripping groups specialising in ripping (converting media to digital copies) and distributing copyrighted material, complete with its own rules and regulations about getting access to material - often before release. The scene doesn't really care too much about the material it distributes, though access to pre-release games, movies and music is absolutely a motivating factor. In many cases, scene release groups formed around specific types of content, cracking games, acquiring pre-release music, and distributing them all through private channels and FTP servers. The rise of Peer-to-Peer technology saw many previously difficult to obtain scene releases leaked out to the general public drawing the attention and ire of the recording industry.

This was exactly the kind of technologically advanced, weaponised, piracy that they had feared at the rise of cassette tape duplication - but this time it was viral, hard to stop, and terrifyingly, more advanced than any of the technology the recording industry was using at the time.

You can't put this kind of genie back in the bottle.

For the better part of a decade, the record industry fought a war of attrition with scene releases, the rise of Napster alternatives like AudioGalaxy, KaZaA, LimeWire and eDonkey (never say we’re bad at naming things in technology again…) and the dedication of an entire generation who believed they were in the moral right, fighting evil megacorporation’s trying to enforce archaic copyright law.

And the industry fought and fought.

In a ferociously blind moment, the music industry never critically assessed the value proposition of its products, and they never innovated on the formats. CD prices, especially in the late 90s and early 2000s were at a record high, and as the war against scene rippers and street-date breaking leaks intensified, the products that were being sold were subject to increasingly dubious and in some cases dangerous DRM (digital rights management) approaches in a futile attempt to prevent piracy.

The music industry really didn’t stand a chance – the file-sharing scene entrenched, worked on it’s technology and was brutally effective. BitTorrent became the tool of choice, and the “mass market piracy” calmed down back to smaller communities of enthusiasts around specific genres or niches.

Across the same time window, CD-Rs and home CD burning reached mass market acceptance and affordability. But for labels? The prices had never really come down. They were used to making a lot of money on CD sales, and giving big number advances to artists, but as they saw their profits shrink, they were struggling. The largest cost in the majority of business is always staff and humans - and the scene didn't have to compete with that.

In the UK, high street retail chains like HMV, Our Price, Music Zone and FOPP went into administration, were bought, and entered administration again – relying on cut price DVD sales to keep the doors open (a format that was still slightly impractical for the average user to pirate at the time).

But something more interesting was happening. People were using illegal music sources not just to steal music they knew they wanted, but to discover things they’d never heard of. There were reports that consumers of illegal downloads were actually… spending more money on music?!

While everyone was so caught up on the idea that people were just out to get things for free (which was certainly more than case with the consumption of other contemporary piracy like that of games) music and it’s particular place in the cultural ecosystem, with live performances, and merchandise, and youth identity, actually saw some uplift where bands that would have never gotten the attention of a label were suddenly independent darlings.

While the majors were losing, and the millions-of-unit pop albums of the time were performing poorly, the underground was thriving, much like that early tape trading scene. This phenomenon dovetailed with the rise of then nascent social media platforms like Livejournal and later, eponymously MySpace and the idea of the “MySpace Bands” – but what these bands really were was the grass roots marketing of local niche scenes into bigger audiences, powered by the leveling of technology, and the groundwork done, ironically, by software like Napster.

Did Napster accidentally “save music”?

A whole generation of people grew up stealing music and being totally ok with exploring music they would never listen to precisely because it didn’t cost anything. Sadly, you can’t feed your kids on the willingness of people to explore music free of cost.

There were breakout bands from the “MySpace scene” in the underground – the Arctic Monkeys, Bring Me The Horizon, Asking Alexandria – they made money. People noticed.

Pay What You Want

In October 2007 Radiohead released their seventh album “In Rainbows”, online, for any amount at DRM-free MP3s. They’d found themselves in a novel part of their career free from the encumbrance of a traditional record deal and buoyed by the high profile a previously successful career as a major label recording artist afforded.

In December of the same year, they released a series of expanded physical formats and the download was removed.

Reaction was mixed. While Radiohead didn’t invent the “pay what you want” business model, they were the largest artist (by several orders of magnitude) to adopt it and bring it into the mainstream. Trent Reznor of Nine Inch Nails was critical of it not going far enough (arguing the low-quality digital release was a promotional tool for more expensive deluxe editions) while scores of artists criticised the move as an exceptionally wealthy band devaluing the worth of music.

Trent Reznor would go on to produce the Saul Williams third album “The Inevitable Rise and Liberation of NiggyTardust!” – which Williams and Reznor released as high-quality audio audio files for “Pay What You Want or $5”. In the two months from its release, Tardust! Shifted around 30k paying copies, out of ~150k downloads, this compared favourably to Williams’ debut album, which had shifted 30k copies in the previous 3 years.

Reznor would later go on to release two of his own albums, Ghosts I-IV and The Slip, under similar schemes, and licensed under the Creative Commons license, complete with deluxe physical releases.

While it’s clear that the Pay What You Want model was prolific for these particular bands, much of the criticism centred around the model being entirely untenable for artists without the prior success of Radiohead or Reznor. The benefit of privilege of success under a previous regime.

The record industry didn’t react either in kind, or kindly. Prices remained at an all-time high. In this same time window, a somewhat blunt Reznor addressed crowds in Australia during a show to express his dissatisfaction with the value placed on music.

  “I woke up this morning and I forgot where I was for a minute.

  I remembered the last time I was here; I was doing a lot of complaining at the prices 
  of CDs down here. That story got picked up, and got carried all around the world, and
  now my record label all around the world hates me because I yelled at the and called
  them out for being greedy fucking assholes.

  I didn’t get a chance to check, has the price come down at all?

  You know what that means? Steal it. Steal away. Steal and steal and steal some more
  and give it to all your friends. 

  Because one way or another these motherfuckers are going to realise, they’re
  ripping people off and that’s not right.”

Curt, but the tide was certainly shifting against the high price of physical media at the end of the 00’s. Reznor re-started his own label around this time to release his own work.

A Model for The Rest of Us

The 2000s were not kind to Peer to Peer file sharing services. Apple and Amazon both had DRM powered music storefronts, along with also-rans, and the launch of the iPod in 2001 monopolised paid-for digital downloads, normalised DRM to consumers of music, and saved Apple as a company.

As these more closed ecosystems gave the record industry exactly what they were looking for. The ability to charge the same amount, while enjoying the comparative low cost of digital distribution. Peer to peer had been pushed underground by litigation returning to the warez scene subcultures from where it came, thanks to lobby groups and the rise of film piracy pushing for crackdowns on file-sharing, and especially popular mainstream BitTorrent sites like The Pirate Bay. Several high-profile lawsuits and prison sentences did well for scaring people away from “downloading” pirated music. The industry didn’t recover, but it did see hope.

Towards the end of the 2000s, streaming audio services and web-radio started their rise, along with the founding of companies like Spotify that offered a different model for music consumption. Not only was it a model that worked for the record companies because nobody ever really owned any of the music they were streaming, but it worked for people by passing the tolerance test of “seemingly more convenient than the thing it replaced”.

Tired of loading new songs onto your iPod? Spotify!

Don’t even have an iPod or MP3 player anymore because 4G and smartphones were now ubiquitous? Spotify!

Spotify was so convenient, and so useful, it steamrolled across everything that came before it. Its free mode was ad supported, and sure, the labels weren’t making as much money as they were making before, but it sure beat having some kids upload the albums you published to YouTube and benefiting from the ad revenue.

In Spotify, the labels found the same thing that the videogame industry found in Valves Steam platform – a form of DRM that regular consumers didn’t feel threatened by. That didn’t seem like it infringed on anything. That didn’t feel like it was a threat, or punitive. A far cry from the MPAA and the BPI pressuring ISPs to release information of their consumers so they could litigate against them.

If anything, Spotify is too good. It has competitors in 2019, but none of them are especially credible. Apple Music (which early Pay What You Want proponent Trent Reznor ended up working on for a time), Amazon, and briefly Microsoft all offered competitors – but Spotify and it’s reach, discovery algorithms and passive social features out-stepped the competition. It has a vice like grip on the streaming industry, much like Apple’s iTunes did on DRM’d digital sales previously.

The nature of music has also shifted significantly in the two decades of the mainstream internet. Lars was right.

We normalised the fact that music wasn’t worth anything, and the cracks are now showing around the whole ecosystem that supports music. Bands don’t break big anymore, music is diverse, interesting, challenging, infinitely broad, and infinitely shallow.

You like Mexican Hip Hop mixed with Deathcore? We got that. How about Indian Rap Metal? Christopher Lee singing power metal? Yes, that exists too.

Low cost and high-quality recording equipment have made the production of music accessible to an entire generation, at the same time as the global economic downturn saw the closure of music venues across the UK. Never has something so creatively healthy felt so continuously threatened by extinction.

Spotify exacerbate this problem with a steady stream of controversies regarding the allegedly low renumeration of artists streaming on their platform. You can’t really find any solid numbers on what Spotify pay artist other than the consensus that “it’s not enough”. Songwriters doubly so – the co-writer of the Bon Jovi song Livin’ on a Prayer, in 2018, for half a billion streams, received $6,000 in royalties from Spotify. Several high-profile artists have pulled catalogues from Spotify, only to later re-emerge (presumably because you cannot fight technological change, but also, because money).

It doesn’t take much to do the back of envelope maths with numbers like that, and they don’t look good. I don’t work in the music business, but I know a lot of people that do, and the stories are all consistent. Living as a touring musician in 2019 is a harder life than it’s ever been before.

No art without patronage.

When you’re young, you just want to have people hear your music, to play shows, to be a Rockstar. None of those things pay a mortgage.

Begging to play?

What have artists, bands even done to cope with this existence?

We’ve seen crowdfunding experiments – some successful, some not. Meet and greets, on-tour-lessons, signing sessions, expanded editions, hang-outs, VIP experiences, the works. There’s plenty of good in all those things, but it’s impossible to not identify all of these things for what they are – making artists work extra to somehow justify the value of their art.

Art has value. Value is not cost. These two things should not be accidentally conflated. We started off with “the work” as having value, which was slowly shifted to the live performance. The live performances value was slowly shifted to the merchandise, begetting the slow productisation of art. When we push art into the purely commercial it can’t help but be compromised.

The financial models behind streaming music are compromised. Technology has ascended to occupy the place the record labels once were, with Spotify and other infrastructure companies being the bodies that profit the most.

I run a software company for a living, I speak for, and advocate for technology because I care deeply about it, but there’s certainly something tragically wrong here, even if it’s the simple answer that Spotify subscriptions are too cheap.

What about the Underground?

I’ve only really hinted at my own personal tastes throughout this piece, music is subjective. But the underground, the small labels, I fear deeply for in this climate.

I fear for the small labels for the discovery capabilities – in niche genres, labels are important, grass roots shows are important.

I grew up discovering music from soundtracks, from the second-hand CD shops in Manchester where you could buy music journalists discarded promos for £3-5 a CD. I’d go down on Saturday mornings with friends and we’d buy 4-5 albums a week. We’d take chances. We’d buy things based on the descriptions and the sound-alikes and the artwork.

It was culture, and culture shifts. The internet has been nothing but incredible for music discovery and access, and has replicated and bettered the experiences I had digesting weird metal albums on Saturday afternoons over the last decade – but in doing so, it’s also completely conflated the concept of ownership with that of access.

It’s no shock to anyone that you don’t own the music you listen to on streaming services, but the more none-mainstream you get, the greater the risk you’re running to lose access to that music in its entirety.

We’ve seen how easy it is for records to disappear from Spotify at the behest of their owners, but what happens when the owners go bankrupt? When the labels go out of business?

What happens when nobody owned the thing they’ve been enjoying, and it vanishes?

The games industry has long been contending with a similar category of problem with the way it treats abandonware (games out of copyright where the authors and publishers no longer exist) and video game emulation.

Games stuck in licensing hell have routinely vanished or become unplayable. We shouldn’t let our own culture erode and disappear.

We’ve slowly killed ownership, we’re slowly killing our DIY scenes by closing live venues, and we’re exposing music that has been created in our scenes and undergrounds, across every genre that isn’t mainstream, but outsourcing ownership to privately owned organisations that hardly pay the creators of the art we covet. Culture and art should not be kept behind the gates of collectors, and inflated prices.

The music business is lucky, but not without its tragedies – the UMG media archive famously burnt down, losing original recordings and masters of some of the most important albums in history.

The British Library, thankfully, cares about this.

The “Sound and Moving Image” archive is vast and more varied than you might imagine – their stated aim is to collect a copy of each commercial release in the UK, of any and all genres. There’s no budget, and the rely on donations from labels and artists, along with private collections. The more esoteric “sounds” are digitised, but for most of the popular recordings, you’ll have to go in person to listen for free.

I fundamentally believe in the value of digital distribution and streaming platforms. They’ve opened the walls of music up, but as a community we need to be better at protecting our culture and music – because private organisations owe us nothing and are not always good actors.

Metallica were right about Napster. Let’s protect music, especially outside of the mainstream.

And go to a show!

Architecture for Everyone

12/12/2019 11:20:00

Someone said the dreaded architecture word and you’re Not-An-Architect?

Scared when someone challenges your understanding of computational complexity when all you’re trying to do is put a widget on a webpage? Never fear – it’s probably not nearly as sophisticated or as complex as you might think.

Architecture has a reputation for being unapproachable, gate-kept, and “hard computer science”. Most of the software architecture you run into, for average-to-web-scale web apps, is astonishingly similar. We’re going to cover the basics, some jargon, and some architectural patterns you’ll probably see everywhere in this brief architectural primer for not-architects and web programmers.

What even is a webserver?

Ok so let’s start with the basics. The “web”, or the “world wide web” – to use its hilariously antiquated full honorific, is just a load of computers connected to the internet. The web is a series of conventions that describe how “resources” (read: web pages) can be retrieved from these connected computers.

Long story short, “web servers”, implement the “HTTP protocol” – a series of commands you can send to remote computers – that let you say “hey, computer, send me that document”. If this sounds familiar, it’s because that’s how your web browsers work.

When you type www.my-awesome-website.com into your browser, the code running on your computer crafts a “http request” and sends it to the web server associated with the URL (read: the website address) you typed into the address bar.

So, the web server - the program running on the remote computer, connected to the internet, that’s listening for requests and returning data when it receives them. The fact this works at all is a small miracle and is built on top of DNS (the thing that turns my-awesome-website.com into an IP address), and a lot of networking, routing and switching. You probably don’t need to know too much about any of that in real terms unless you’re going deep.

There are tonne of general purpose web servers out there – but realistically, you’ll probably just see a mixture of Apache, NGINX and Microsoft IIS, along with some development stack specific web servers (Node.js serves itself, as can things like ASP.NET CORE for C#, and HTTP4K for Kotlin).

How does HTTP work? And is that architecture?

If you’ve done any web programming at all, you’ll likely be at least a little familiar with HTTP. It stands for “The Hyper Text Transfer Protocol”, and it’s what your browser talks when it talks to web servers. Let’s look at a simple raw HTTP “request message”:

GET http://www.davidwhitney.co.uk/ HTTP/1.1
Host: www.davidwhitney.co.uk
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64…
Accept: text/html,application/xhtml+xml,application/xml;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

The basics of HTTP are easy to grasp – there’s a mandatory “request line” – that’s the first bit with a verb (one of GET, POST, PUT and HEAD most frequently), the URL (the web address) and the protocol version (HTTP/1.1). There’s then a bunch of optional request header fields – that’s all the other stuff – think of this as extra information you’re handing to the webserver about you. After your headers, there’s a blank line, and an optional body. That’s HTTP/1.1. We’re done here. The server will respond in similar form

HTTP/1.1 200 OK
Cache-Control: public,max-age=1
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Server: Kestrel
X-Powered-By: ASP.NET
Date: Wed, 11 Dec 2019 21:52:23 GMT
Content-Length: 8479


<!DOCTYPE html>
<html lang="en">...

The first line being a status code, followed by headers and a response body. That’s it. The web server, based on the content of a request, can send you anything it likes, and the software that’s making the request must be able to interpret the response. There’s a lot of nuance in asking for the right thing, and responding appropriately, but the basics are the same.

The web is an implementation of the design pattern REST – which stands for “Representational State Transfer”. You’ll hear people talk about REST a lot – it was originally defined by Roy Fielding in his PhD dissertation, but more importantly was a description of the way HTTP/1.0 worked at the time, and was documented at the same time Fielding was working on HTTP/1.1.

So the web is RESTful by default. REST describes the way HTTP works.

The short version? Uniquely addressable URIs (web addresses) that return a representation of some state held on a machine somewhere (the web pages, documents, images, et al). Depending on what the client asks for, the representation of that state could vary.

So that’s HTTP, and REST, and an architectural style all in one.

What does the architecture of a web application look like?

You can write good software following plenty of different architectural patterns, but most people stick to a handful of common patterns.

“The MVC App”

MVC – model view controller – is a simple design pattern that decouples the processing logic of an application and the presentation of it. MVC was really catapulted into the spotlight by the success of Ruby on Rails (though the pattern was a couple of decades older) and when most people say “MVC” they’re really describing “Rails-style” MVC apps where your code is organised into a few different directories

/controllers
/models
/views

Rails popularised the use of “convention over configuration” to wire all this stuff together, along with the idea of “routing” and sensible defaults. This was cloned by ASP.NET MVC almost wholesale, and pretty much every other MVC framework since.

As a broad generalisation, by default, if you have a URL that looks something like

http://www.mycoolsite.com/Home/Index

An MVC framework, using its “routes” – the rules that define where things are looked up - would try and find a “HomeController” file or module (depending on your programming language) inside the controllers directory. A function called “Index” would probably exist. That function would return a model – some data – that is rendered by a “view” – a HTML template from the views folder.

All the different frameworks do this slightly differently, but the core idea stays the same – features grouped together by controllers, with functions for returning pages of data and handling input from the web.

“The Single Page App with an API

SPAs are incredibly common, popularised by client-side web frameworks like Angular, React and Vue.js. The only real difference here is we’re taking our MVC app and shifting most of the work it does to the client side.

There are a couple of flavours here – there’s client side MVC, there’s MVVM (model-view-view-model), and there’s (FRP) functional reactive programming. The differences might seem quite subtle at first.

Angular is a client side MVC framework – following the “models, views and controllers” pattern – except now it’s running inside the users web browser.

React – an implementation of functional reactive programming – it’s a little more flexible but is more concerned with state change events in data – often using some event store like Redux for its data.

MVVM is equally common in single page apps where there’s two way bindings between something that provides data (the model) and the UI (which the view model serves).

Underneath all these client heavy JavaScript frameworks, is generally an API that looks nearly indistinguishable from “the MVC app”, but instead of returning pre-rendered pages, returns the data that the client “binds” it’s UI to.

“Static Sites Hosted on a CDN or other dumb server”

Perhaps the outlier of the set – there’s been a resurgence of static websites in the 20-teens. See, scaling websites for high traffic is hard when you keep running code on your computers.

We spent years building relatively complicated and poorly performing content management systems (like WordPress), that cost a lot of money and hardware to scale.

As a reaction, moving the rendering of content to a “development time” exercise has distinct cost and scalability benefits. If there’s no code running, it can’t crash!

So static site generators became increasingly popular – normally allowing you to use your normal front-end web dev stack, but then generating all the files using a build tool to bundle and distribute to dumb web servers or CDNs. See tools like – Gatsby, Hugo, Jekyll, Wyam.

“Something else”

There are other architypes that web apps follow – there’s a slowly rising trend in transpiled frameworks (Blazor for C# in WebAssembly, and Kotlin’s javascript compile targets) – but with the vast popularity of the dominant javascript frameworks of the day, they all try to play along nicely.

Why would I choose one over the other?

Tricky question. Honestly for the most part it’s a matter of taste, and they’re all perfectly appropriate ways to build web applications.

Server-rendered MVC apps are good for low-interactivity websites. Even though high fidelity frontend is a growing trend, there’s a huge category of websites that are just that – web sites, not web applications – and the complexity cost of a large toolchain is often not worth the investment.

Anything that requires high fidelity UX, almost by default now, is probably a React, Angular or Vue app. The programming models work well for responsive user experiences, and if you don’t use them, you’ll mostly end up reinventing them yourself.

Static sites? Great for blogs, marketing microsites, content management systems, anything where the actual content is the most valuable interaction. They scale well, basically cannot crash, and are cheap to run.

HTTP APIs, Rest, GraphQL, Backend-for-Frontends

You’re absolutely going to end up interacting with APIs, and while there’s a lot of terms that get thrown around to make this stuff sound complicated, but the core is simple. Most APIs you use, or build will be “REST-ish”.

You’ll be issuing the same kind of “HTTP requests” that your browsers do, mostly returning JSON responses (though sometimes XML). It’s safe to describe most of these APIs as JSON-RPC or XML-RPC.

Back at the turn of the millennium there was a push for standardisation of “SOAP” (simple object access protocol) APIs, and while that came with a lot of good stuff, people found the XML cumbersome to read and they diminished in popularity.

Ironically, lots of the stuff that was solved in SOAP (consistent message envelope formats, security considerations, schema verification) has subsequently had to be “re-solved” on top of JSON using emerging open-ish standards like Swagger (now OpenAPI) and JSON:API.

We’re good at re-inventing the things we already had on the web.

So, what makes a REST API a REST API, and not JSON-RPC?

I’m glad you didn’t ask.

REST at its core, is about modelling operations that can happen to resources over HTTP. There’s a great book by Jim Webber called Rest in Practice if you want a deep dive into why REST is a good architectural style (and it is, a lot of the modern naysaying about REST is relatively uninformed and not too dissimilar to the treatment SOAP had before it).

People really care about what is and isn’t REST, and you’ll possibly upset people who really care about REST, by describing JSON-RPC as REST. JSON-RPC is “level 0” of the Richardson Maturity Model – a model that describes the qualities of a REST design. Don’t worry too much about it, because you can build RESTish, sane, decent JSON-RPC by doing a few things.

First, you need to use HTTP VERBs correctly, GET for fetching (and never with side effects), POST for “doing operations”, PUT for “creating stuff where the state is controlled by the client”. After that, make sure you organise your APIs into logical “resources” – your core domain concepts “customer”, “product”, “catalogue” etc.

Finally, use correct HTTP response codes for interactions with your API.

You might not be using “hypermedia as the engine of application state”, but you’ll probably do well enough that nobody will come for your blood.

You’ll also get a lot of the benefits of a fully RESTful API by doing just enough – resources will be navigable over HTTP, your documents will be cachable, your API will work in most common tools. Use a swagger or OpenAPI library to generate a schema and you’re pretty much doing what most people are doing.

But I read on hackernews that REST sux and GraphQL is the way to go?

Yeah, we all read that post too.

GraphQL is confusingly, a Query Language, a standard for HTTP APIs and a Schema tool all at once. With the proliferation of client-side-heavy web apps, GraphQL has gained popularity by effectively pushing the definition of what data should be returned to the client, into the client code itself.

It’s not the first time these kinds of “query from the front end” style approaches have been suggested, and likely won’t be the last. What sets GraphQL apart a little from previous approaches (notably Microsofts’ OData) is the idea that Types and Queries are implemented with Resolver code on the server side, rather than just mapping directly to some SQL storage.

This is useful for a couple of reasons – it means that GraphQL can be a single API over a bunch of disparate APIs in your domain, it solves the “over fetching” problem that’s quite common in REST APIs by allowing the client to specify a subset of the data they’re trying to return, and it also acts as an anti-corruption layer of sorts, preventing unbounded access to underlying storage.

GraphQL is also designed to be the single point of connection that your web or mobile app talks to, which is really useful for optimising performance – simply, it’s quicker for one API over the wire to call downstream APIs with lower latency, than your mobile app calling (at high latency) all the internal APIs itself.

GraphQL really is just a smart and effective way to schema your APIs, and provide a BFF – that’s backend for frontend, not a best friend forever – that’s quick to change.

BFF? What on earth is a BFF?

Imagine this problem – you’re working for MEGACORP where there are a hundred teams, or squads (you don’t remember, they rename the nomenclature every other week) – each responsible for a set of microservices.

You’re a web programmer trying to just get some work done, and a new feature has just launched. You read the docs.

The docs describe how you have to orchestrate calls between several APIs, all requiring OAuth tokens, and claims, and eventually, you’ll have your shiny new feature.

So you write the API calls, and you realise that the time it takes to keep sending data to and from the client, let alone the security risks of having to check that all the data is safe for transit, slows you down to a halt. This is why you need a best friend forever.

Sorry, a backend for front-end.

A BFF is an API that serves one, and specifically only one application. It translates an internal domain (MEGACORPS BUSINESS), into the internal language of the application it serves. It takes care of things like authentication, rate limiting, stuff you don’t want to do more than once. It reduces needless roundtrips to the server, and it translates data to be more suitable for its target application.

Think of it as an API, just for your app, that you control.

And tools like GraphQL, and OData are excellent for BFFs. GraphQL gels especially well with modern JavaScript driven front ends, with excellent tools like Apollo and Apollo-Server that help optimise these calls by batching requests.

It’s also pretty front-end-dev friendly – queries and schemas strongly resemble json, and it keeps your stack “javascript all the way down” without being beholden to some distant backend team.

Other things you might see and why

So now we understand our web servers, web apps, and our APIs, there’s surely more to modern web programming than that? Here are the things you’ll probably run into the most often.

Load Balancing

If you’re lucky enough to have traffic to your site, but unlucky enough to not be using a Platform-as-a-Service provider (more on that later), you’re going to run into a load balancer at some point. Don’t panic. Load balancers talk an archaic language, are often operated by grumpy sysops, or are just running copies of NGINX.

All a load balancer does, is accept HTTP requests for your application (or from it), pick a server that isn’t very busy, and forward the request.

You can make Load balancers do all sorts of insane things that you probably shouldn’t use load balancers for. People will still try.

You might see load balancers load balancing a particularly “hot path” in your software onto a dedicated pool of hardware to try keep it safe or isolate it from failure. You might also see load balancers used to take care of SSL certificates for you – this is called SSL Termination.

Distributed caching

If one computer can store some data in memory, then lots of computers can store… well, a lot more data!

Distributed caching was pioneered by “Memcached” – originally written to scale the blogging platform Livejournal in 2003. At the time, Memcached helped Livejournal share cached copies of all the latest entries, across a relatively small number of servers, vastly reducing database server load on the same hardware.

Memory caches are used to store the result of something that is “heavy” to calculate, takes time, or just needs to be consistent across all the different computers running your server software. In exchange for a little bit of network latency, it makes the total amount of memory available to your application the sum of all the memory available across all your servers.

Distributed caching is also really useful for preventing “cache stampedes” – when a non-distributed cache fails, and cached data would be recalculated by all clients, but by sharing their memory, the odds of a full cache failure is reduced significantly, and even when it happens, only some data will be re-calculated.

Distributed caches are everywhere, and all the major hosting providers tend to support memcached or redis compatible (read: you can use memcached client libraries to access them) managed caches.

Understanding how a distributed cache works is remarkably simple – when an item is added, the key (the thing you use to retrieve that item) that is generated includes the address or name of the computer that’s storing that data in the cache. Generating keys on any of the computers that are part of the distributed cache cluster will result in the same key.

This means that when the client libraries that interact with the cache are used, they understand which computer they must call to retrieve the data.

Breaking up large pools of shared memory like this is smart, because it makes looking things up exceptionally fast – no one computer needs to scan huge amounts of memory to retrieve an item.

Content Delivery Networks (CDNs)

CDNs are web servers run by other people, all over the world. You upload your data to them, and they will replicate your data across all of their “edges” (a silly term that just means “to all the servers all over the world that they run”) so that when someone requests your content, the DNS response will return a server that’s close to them, and the time it takes them to fetch that content will be much quicker.

The mechanics of operating a CDN are vastly more complicated than using one – but they’re a great choice if you have a lot of static assets (images!) or especially big files (videos! large binaries!). They’re also super useful to reduce the overall load on your servers.

Offloading to a CDN is one of the easiest ways you can get extra performance for a very minimal cost.

Let’s talk about design patterns! That’s real architecture

“Design patterns are just bug fixes for your programming languages”

People will talk about design patterns as if they’re some holy grail – but all a design pattern is, is the answer to a problem that people solve so often, there’s an accepted way to solve it. If our languages, tools or frameworks were better, they would probably do the job for us (and in fact, newer language features and tools often obsolete design patterns over time).

Let’s do a quick run through of some very common ones:

MVC – “Split up your data model, UI code, and business logic, so they don’t get confused”
ORM – “Object-Relational mapping” – Use a mapping library and configured rules, to manage the storage of your in-memory objects, into relational storage. Don’t muddle the objects and where you save them together”.
Active Record – “All your objects should be able to save themselves, because these are just web forms, who cares if they’re tied to the database!”
Repository – “All your data access is in this class – interact with it to load things.”
Decorator – “Add or wrap ‘decorators’ around an object, class or function to add common behaviour like caching, or logging without changing the original implementation.”
Dependency Injection – “If your class or function depends on something, it’s the responsibility of the caller (often the framework you’re using) to provide that dependency”
Factory – “Put all the code you need to create one of these, in one place, and one place only”
Adapter – “Use an adapter to bridge the gap between things that wouldn’t otherwise work together – translating internal data representations to external ones. Like converting a twitter response into YourSocialMediaDataStructure”
Command – “Each discrete action or request, is implemented in a single place”
Strategy – “Define multiple ways of doing something that can be swapped in and out”
Singleton – “There’s only one of these in my entire application”.

That’s a non-exhaustive list of some of the pattern jargon you’ll hear. There’s nothing special about design patterns, they’re just the 1990s version of an accepted and popular stackoverflow answer.

Microservice architectures

Microservice architectures are just the “third wave” of Service Oriented Design.

Where did they come from?

In the mid-90s, “COM+” (Component Services) and SOAP were popular because they reduced the risk of deploying things, by splitting them into small components – and providing a standard and relatively simple way to talk across process boundaries. This led to the popularisation of “3-tier” and later “n-tier” architecture for distributed systems.

N-Tier really was a shorthand for “split up the data-tier, the business-logic-tier and the presentation-tier”. This worked for some people – but suffered problems because horizontal slices through a system often require changing every “tier” to finish a full change. This ripple effect is bad for reliability.

Product vendors then got involved, and SOAP became complicated and unfashionable, which pushed people towards the “second wave” – Guerrilla SOA. Similar design, just without the high ceremony, more fully vertical slices, and less vendor middleware.

This led to the proliferation of smaller, more nimble services, around the same time as Netflix were promoting hystrix – their platform for latency and fault tolerant systems.

The third wave of SOA, branded as Microservice architectures (by James Lewis and Martin Fowler) – is very popular, but perhaps not very well understood.

What Microservices are supposed to be: Small, independently useful, independently versionable, independently shippable services that execute a specific domain function or operation.

What Microservices often are: Brittle, co-dependent, myopic services that act as data access objects over HTTP that often fail in a domino like fashion.

Good microservice design follows a few simple rules

Be role/operation based, not data centric
Always own your data store
Communicate on external interfaces or messages
What changes together, and is co-dependent, is actually the same thing
All services are fault tolerant and survive the outages of their dependencies

Microservices that don’t exhibit those qualities are likely just secret distributed monoliths. That’s ok, loads of people operate distributed monoliths at scale, but you’ll feel the pain at some point.

Hexagonal Architectures

Now this sounds like some “Real Architecture TM”!

Hexagonal architectures, also known as “the ports and adapters” pattern – as defined by Alistair Cockburn, is one of the better pieces of “real application architecture” advice.

Put simply – have all your logic, business rules, domain specific stuff – exist in a form that isn’t tied to your frameworks, your dependencies, your data storage, your message busses, your repositories, or your UI.

All that “outside stuff”, is “adapted” to your internal model, and injected in when required.

What does that really look like? All your logic is in files, modules or classes that are free from framework code, glue, or external data access.

Why? Well it means you can test everything in isolation, without your web framework or some broken API getting in the way. Keeping your logic clear of all these external concerns is safe way to design applications.

There’s a second, quite popular approach described as “Twelve Factor Apps” – which mostly shares these same design goals, with a few more prescriptive rules thrown on top.

Scaling

Scaling is hard if you try do it yourself, so absolutely don’t try do it yourself.

Use vendor provided, cloud abstractions like Google App Engine, Azure Web Apps or AWS Lambda with autoscaling support enabled if you can possibly avoid it.

Consider putting your APIs on a serverless stack. The further up the abstraction you get, the easier scaling is going to be.

Conventional wisdom says that “scaling out is the only cost-effective thing”, but plenty of successful companies managed to scale up with a handful of large machines or VMs. Scaling out gives you other benefits (often geo-distribution related, or cross availability zone resilience) but don’t feel bad if the only leaver you have is the one labelled “more power”.

Architectural patterns for distributed systems

Building distributed systems is harder than building just one app. Nobody really talks about that much, but it is. It’s much easier for something to fail when you split everything up into little pieces, but you’re less likely to go completely down if you get it right.

There are a couple of things that are always useful.

Circuit Breakers everywhere

Circuit breaking is a useful distributed system pattern where you model out-going connections as if they’re an electrical circuit. By measuring the success of calls over any given circuit, if calls start failing, you “blow the fuse”, queuing outbound requests rather than sending requests you know will fail.

After a little while, you let a single request flow through the circuit (the “half open” state), and if it succeeds, you “close” the circuit again and let all the queued requests flow through.

Circuit breakers are a phenomenal way to make sure you don’t fail when you know you might, and they also protect the service that is struggling from being pummelled into oblivion by your calls.

You’ll be even more thankful for your circuit breakers when you realise you own the API you’re calling.

Idempotency and Retries

The complimentary design pattern for all your circuit breakers – you need to make sure that you wrap all outbound connections in a retry policy, and a back-off.

What does this mean? You should design your calls to be non-destructive if you double submit them (idempotency), and that if you have calls that are configured to retry on errors, that perhaps you back off a little (if not exponentially) when repeated failures occur – at the very least to give the thing you’re calling time to recover.

Bulkheads

Bulkheads are inspired by physical bulkheads in submarines. When part of a submarines hull is compromised, the bulkheads shut, preventing the rest of the sub from flooding. It’s a pretty cool analogy, and it’s all about isolation.

Reserved resources, capacity, or physical hardware can be protected for pieces of your software, so that an outage in one part of your system doesn’t ripple down to another.

You can set maximum concurrency limits for certain calls in multithreaded systems, make judicious use of timeouts (better to timeout, than lock up and fall over), and even reserve hardware or capacity for important business functions (like checkout, or payment).

Event driven architectures with replay / message logs

Event / message-based architectures are frequently resilient, because by design the inbound calls made to them are not synchronous. By using events that are buffered in queues, your system can support outage, scaling up and scaling down, and rolling upgrades without any special consideration. It’s normal mode of operation is “read from a queue”, and this doesn’t change in exceptional circumstances.

When combined with the competing consumers pattern – where multiple processors race to consume messages from a queue – it’s easy to scale out for good performance with queue and event driven architectures.

Do I need Kubernetes for that?

No. You probably don’t have the same kind of scaling problems as Google do.

With the popularisation of docker and containers, there’s a lot of hype gone into things that provide “almost platform like” abstractions over Infrastructure-as-a-Service. These are all very expensive and hard work.

If you can in any way manage it, use the closest thing to a pure-managed platform as you possibly can. Azure App Services, Google App Engine and AWS Lambda will be several orders of magnitude more productive for you as a programmer. They’ll be easier to operate in production, and more explicable and supported.

Kubernetes (often irritatingly abbreviated to k8s, along with it’s wonderful ecosystem of esoterically named additions like helm, and flux) requires a full time ops team to operate, and even in “managed vendor mode” on EKS/AKS/GKS the learning curve is far steeper than the alternatives.

Heroku? App Services? App Engine? Those are things you’ll be able to set up, yourself, for production, in minutes to only a few hours.

You’ll see pressure to push towards “Cloud neutral” solutions using Kubernetes in various places – but it’s snake oil. Being cloud neutral simply means you pay the cost of a cloud migration (maintaining abstractions, isolating your way from useful vendor specific features) perpetually, rather than in the (exceptionally unlikely) scenario that you’re switching cloud vendor.

The responsible use of technology includes using the thing most suited to the problem and scale you have.

Sensible Recommendations

Always do the simplest thing that can possibly work. Architecture has a cost, just like every abstraction. You need to be sure you’re getting a benefit before you dive into to some complex architectural patterns.

Most often, the best architectures are the simplest and most amenable to change.

Creating Spark Dataframes without a SparkSession for tests

03/26/2019 12:58:01

Back to the scheduled .NET content after this brief diversion into... Java.

I'm currently helping a team put some tests around a Spark application, and one of the big bugbears is testing raw data transformations and functions that'll run inside the spark cluster, on the outside of it. It turns out the most of the core Spark types all hang off a SparkSession and can't really be manually constructed - something a quick StackOverflow query appears to confirm. People just can't seem to create Spark Dataframes outside of a spark session.

Except you can.

All a Spark Dataframe really is, is a schema and a collection of Rows - so with a little bit of digging, you realise that if you can only create a row and a schema, everything'll be alright. So you did, and you discover no public constructors and no obvious ways to create the test data you need.

Unless you apply a little bit of reflection magic, and then you can create a schema with some data rows trivially

Copy pasta until your hearts content. Test that Spark code, it's not going to test itself.

Building .NET Apps in VSCode (Not .NetCore)

11/17/2016 10:09:50

With all the fanfare around .NET Core and VS Code, you might have been lead to believe that you can't build your boring old .NET apps inside of VS Code, but that's not the case.

You can build your plain old .NET solutions (PONS? Boring old .NET projects? BONPS? God knows) by shelling out using the external tasks feature of the editor (http://code.visualstudio.com/docs/editor/tasks).

First, make sure you have a couple of plugins available

C# for Visual Studio Code (powered by OmniSharp)
MSBuild Tools (for syntax highlighting)

Now, "Open Folder" on the root of your repository and press CTRL+SHIFT+B.

VS Code will complain that it can't build your program, and open up a file it generates .vs\tasks.json in the editor. It'll be configured to use msbuild, but won't work unless MSBuild is in your path, with a trivial edit to correct the path, you'll be building straight away:


{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "0.1.0",
"command": "C:\\Program Files (x86)\\MSBuild\\14.0\\Bin\\msbuild.exe",
"args": [
"/property:GenerateFullPaths=true"
],
"taskSelector": "/t:",
"showOutput": "silent",
"tasks": [
{
"taskName": "build",
"showOutput": "silent",
"problemMatcher": "$msCompile"
}
]
}

CTRL+SHIFT+B will now build your code by invoking MSBuild.

Get Coding! - I wrote a book.

06/28/2016 13:58:46

Through late 2015 and the start of 2016 I was working on a “secret project” that I could only allude to in conjunction with Walker Books UK and Young Rewired State – to write a book to get kids coding. As a hirer, I’ve seen first hand the difficulty in getting a diverse range of people through the door into technology roles, and I thoroughly believe that the best way we solve the problem is from the ground up - changing the way we teach computer science.

We live in a world where the resources to teach programming are widely available if there’s an appetite for it, and the barrier to entry is lower than ever, so in some ways, this is my contribution to the movement.

Get Coding is a beautifully illustrated (courtesy of Duncan Beedie) book that takes a “broad not deep” approach to teaching HTML5, JavaScript and CSS to kids from ages 8 and up. It was edited by Daisy Jellicoe at Walker Books UK, and without her attention to detail and enthusiasm it wouldn’t have come out half as well as it did. It’s quite long, at 209 pages, and comes with a story that children can follow along to.

Learn how to write code and then build your own website, app and game using HTML, CSS and JavaScript in this essential guide to coding for kids from expert organization Young Rewired State. Over 6 fun missions learn the basic concepts of coding or computer programming and help Professor Bairstone and Dr Day keep the Monk Diamond safe from dangerous jewel thieves. In bite-size chunks learn important real-life coding skills and become a technology star of the future. Young Rewired State is a global community that aims to get kids coding and turn them into the technology stars of the future.

The book is available on Amazon and in major high street bookstores.

I’ve been thrilled by the response from friends, family and the community, and it made the sometimes quite stressful endeavour thoroughly worthwhile. After launch, Get Coding! has ended up number 1 in a handful of Amazon categories, was in the top 2000 books on Amazon.co.uk for about a week, and has received a series of wonderfully positive reviews, not to mention being recently featured in The Guardians Best New Children's Books Guide for Summer 2016.

I’ll leave you with a few photos I’ve been sent or collected and hope that perhaps you’ll buy the book.

Why code reviews are important

04/07/2016 14:40:41

Making sure that more than one set of eyes has seen all the code that we produce is an important part of software development - it makes sure that we catch bugs, keep our code readable, and share patterns and practices across the teams.

Code review should answer the questions

Are there any logical errors?
Are the requirements implemented?
Are all the acceptance criteria of the user story met?
Do our unit tests and automation tests around this feature pass? Are we missing any?
Does the code match our house style?

The interesting thing about the effectiveness of code review is that it isn't just hearsay, it was measured effectively in the seminal book "CODE Complete":

.. software testing alone has limited effectiveness -- the average defect detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 and 60 percent. Case studies of review results have been impressive:

In a software-maintenance organization, 55 percent of one-line maintenance changes were in error before code reviews were introduced. After reviews were introduced, only 2 percent of the changes were in error. When all changes were considered, 95 percent were correct the first time after reviews were introduced. Before reviews were introduced, under 20 percent were correct the first time.

In a group of 11 programs developed by the same group of people, the first 5 were developed without reviews. The remaining 6 were developed with reviews. After all the programs were released to production, the first 5 had an average of 4.5 errors per 100 lines of code. The 6 that had been inspected had an average of only 0.82 errors per 100. Reviews cut the errors by over 80 percent.

The Aetna Insurance Company found 82 percent of the errors in a program by using inspections and was able to decrease its development resources by 20 percent.

IBM's 500,000 line Orbit project used 11 levels of inspections. It was delivered early and had only about 1 percent of the errors that would normally be expected.

A study of an organization at AT&T with more than 200 people reported a 14 percent increase in productivity and a 90 percent decrease in defects after the organization introduced reviews.

Jet Propulsion Laboratories estimates that it saves about $25,000 per inspection by finding and fixing defects at an early stage.

We do code reviews because they help us make our code better, and measurably save a lot of money - the cost of fixing software issues only multiplies once they're in production.

Code review check-list

Not sure how to code review? Here's a check-list to get you started, derived from many excellent existing check-lists.

General

Does the code work?
Does it perform its intended function?
Is all the code easily understood?
Does it conform to house style, standard language idioms?
Is there any duplicate code?
Is the code as modular as possible?
Can any global variables be replaced?
Is there any commented out code?
Can any of the code be replaced with library functions?
Can any logging or debugging code be removed?
Has the "Boy scout rule" been followed? Is the code now better than before the change?

Security

Are all data inputs checked (for the correct type, length, format, and range) and encoded?
Where third-party utilities are used, are returning errors being caught?
Are output values checked and encoded?
Are invalid parameter values handled?

Documentation

Is any unusual behaviour or edge-case handling described?
Is there any redundant auto-documentation that can be removed?

Testing

Is the code unit tested?
Do the tests actually test that the code is performing the intended functionality?
Could duplication in test code be reduced with builder / setup methods or libraries?

Practical Tips

Don't try to be a human compiler

The first and most important thing to remember when you're doing a code review is that you're not meant to be a human compiler. Ensure you're reviewing the functionality, tests and readability of the code rather than painstakingly inspecting syntax. Syntax and house style are important, but style issues are a much better fit for automated tooling than humans. Don't waste time bickering over style and formatting.

Reviewers are born equal

Our teams are built around mutual trust and respect, and a natural extension of that is that anybody can code review. You may on occasion find yourself working on some code that is someone else's area of particular interest or expertise - but better solicit their advice while you're working than wait for a review. Code reviews aren't limited to your technical lead, and likewise, a lead should have their code reviewed all the same.

It's just code, you're not marrying it

Don't be precious about your code as a submitter, and as a reviewer be honest and open. It's just code, make sure it's the best it can be. A code review is an opportunity to make your code the best code it can be, using the expertise of your colleagues.

That's fine!

Sometimes the code is just fine - don't be the person that nitpicks for change without any quantifiable benefit.

Tooling

Code reviews should be performed on change-sets via either a branch comparison URL, or ideally, a pull request.

Stash has replaced our legacy tool (FishEye) for code reviews using Git, and as projects migrate from SVN they are expected to migrate to using branch comparison or pull requests in Stash.

Pair programming and code reviews

As one of the founding practices of eXtreme Programming (XP), pair-programming can be seen as "extreme code review"

"code reviews are considered a beneficial practice; taken to the extreme, code can be reviewed continuously, i.e. the practice of pair programming."

Pair programming exhibits exactly the same qualities of a great code review, with the additional benefits of immediacy - it's impossible to ignore a pair critiquing a design in real time, poor choices barely survive, and work doesn't end up blocked in a code review queue somewhere.

Pairing is usually preferably to code review for regular work, and if code is produced as part of a pair it can be considered "code reviewed by default". Unfortunately due to reasons of availability, location or specialisation, you may need to rely on a more tradition code review with some of the code you write.

Realistic expectations

Like everything a code review is not a silver bullet - and while they've proven to protect us against a large quantity of visibly obvious bugs, code reviews aren't great at spotting performance bugs or subtle threading and concurrency issues.

You'll need to rely on existing instrumentation and profilers for these types of metrics - "mentally executing" and finding these kinds of bugs is unlikely, if not impossible, just beware to not believe your code to be free of error by virtue of the fact that it's been subjected to a code review or pairing session. Ironically, the kinds of bugs that'll slip through a code review or pairing session will by definition be these tricky and hard to detect edge cases.

« Previous Entries