Technology Blog

Music Streaming and the Disappearing Records

12/15/2019 21:20:00

I'm part of the Napster generation.

That means I pretty much stole all the music I grew up listening to. Unrepentantly, unremorsefully, downloaded everything.

Napster might seem like a dim and distant memory in internet time, but it's easy to forget quite how game changing Napster was for both the music scene, and the industry.

For anyone who might be too young to remember, or perhaps not technically savvy enough at the time, Napster, while not the first way you could "borrow" music from the internet, was the first popular Peer-to-Peer file sharing network.

It was ridiculously easy - you installed the software, told it where your collection of MP3s (songs stored as files on your computer) were and everyone else did the same. You were faced with an empty search box and your imagination and it just worked.

You could search for anything you wanted, however esoteric, and for the most part, Napster delivered.

It'd hard to explain how much of a revelation this was in 1999 - a year after Google was founded, and a handful of years before search on the internet really worked well at all.

I was 15 in 1999, teaching myself C++ and Perl, and this seemed like magic. Everyone ripped the few CDs they could afford to buy themselves to their computers, and in return we got everything.

When you're 15, the complex relationship between rights owners, writers, musicians, ethics, points based royalty systems and the music business was all literally the furthest thing from your mind. You're hungry for culture, and music, for as long as it's been recorded, had always tracked generations and cultural shifts.

People forget just how hard it was to discover new music, especially non-mainstream music in the 90s, and earlier.

When home cassette recording was becoming popular in the early 80s, the record companies were scared. Home Taping Was Killing Music! They proclaimed - yet the same taping and trading scene, participated in by kids eager to hear new, increasingly niche and underground music, rebelled. That same taping scene enhanced the burgeoning New Wave of British Heavy Metal scene, internationalising it, making music accessible to everyone. It's no coincidence that bands like Metallica rose to prominence by both their own participation in the demo trading scene of the early 80s, later weaponising this grass roots movement into international success.

But the 90s were different. The taping scene was related to live bootlegs, and the onward march of technology proven cassettes to be unreliable - home taping didn't kill the industry - it's actually gave the industry room to innovate on quality and easily win.

And oh boy did the industry win. The 90s were probably the final hurrah for rockstars and massive pushed music outside of mainstream pop. They didn't realise it then of course, but somebody did. Someone who made his name in the taping scene.

Lars was right.

Which is an awkward admission for a self-admitted former music pirate.

In 2000, Metallica, et al v. Napster inc saw the first time ever a court case was brought against a Peer-to-Peer filesharing network. Lars Ulrich, Metallica's drummer, was exceptionally vocal in the media about the widespread damage he saw file sharing causing to the music industry.

"Napster hijacked our music without asking. They never sought our permission. Our catalogue of music simply became available as free downloads on the Napster system."

Lars was right. Technology has this wonderful and world changing ability to ask, "can we do this thing?" but it rarely stops to see if it should. Metallica won.

Collectively, we, the youth, demonised them. Greedy rich millionaires pulling up the ladder behind them. Metallica inadvertently fuelled the boom in DRM and various rights management technologies of the late 90s and early 2000s, but the effects of the Napster lawsuit are still felt today.

While they thought they were fighting for their creative agency, they really were fighting for control. What the Metallica suit did was push file sharing underground into a series of different sharing platforms, which were more difficult to regulate, harder to track, and more resilient. They ironically made file sharing more sophisticated.

Lars today understands that the fans, the youth of the day thought Metallica were fighting them, rather than the file-sharing organisations. All his fears did come to fruition though.

It's a sobering admission to be on the wrong side of the argument twenty years later.

The Long Tail of File Sharing

But what if file sharing was used for good?

The file-sharing epidemic of Napsters launch wasn't the start of file-sharing. But actually the end destination of an entirely different scene, completely distinct from the tape trading of the 80s.

With its origin in 80s hacker culture, and continued survival on pre-World Wide Web protocols like usenet (a distributed message board system that predates web pages) and IRC (a decentralised chat protocol that was extended to support file transfers) - the digital music trading scene of the late 90s was part of the Warez scene - often just called "the scene" to people involved.

The scene is a closed community of ripping groups specialising in ripping (converting media to digital copies) and distributing copyrighted material, complete with its own rules and regulations about getting access to material - often before release. The scene doesn't really care too much about the material it distributes, though access to pre-release games, movies and music is absolutely a motivating factor. In many cases, scene release groups formed around specific types of content, cracking games, acquiring pre-release music, and distributing them all through private channels and FTP servers. The rise of Peer-to-Peer technology saw many previously difficult to obtain scene releases leaked out to the general public drawing the attention and ire of the recording industry.

This was exactly the kind of technologically advanced, weaponised, piracy that they had feared at the rise of cassette tape duplication - but this time it was viral, hard to stop, and terrifyingly, more advanced than any of the technology the recording industry was using at the time.

You can't put this kind of genie back in the bottle.

For the better part of a decade, the record industry fought a war of attrition with scene releases, the rise of Napster alternatives like AudioGalaxy, KaZaA, LimeWire and eDonkey (never say we’re bad at naming things in technology again…) and the dedication of an entire generation who believed they were in the moral right, fighting evil megacorporation’s trying to enforce archaic copyright law.

And the industry fought and fought.

In a ferociously blind moment, the music industry never critically assessed the value proposition of its products, and they never innovated on the formats. CD prices, especially in the late 90s and early 2000s were at a record high, and as the war against scene rippers and street-date breaking leaks intensified, the products that were being sold were subject to increasingly dubious and in some cases dangerous DRM (digital rights management) approaches in a futile attempt to prevent piracy.

The music industry really didn’t stand a chance – the file-sharing scene entrenched, worked on it’s technology and was brutally effective. BitTorrent became the tool of choice, and the “mass market piracy” calmed down back to smaller communities of enthusiasts around specific genres or niches.

Across the same time window, CD-Rs and home CD burning reached mass market acceptance and affordability. But for labels? The prices had never really come down. They were used to making a lot of money on CD sales, and giving big number advances to artists, but as they saw their profits shrink, they were struggling. The largest cost in the majority of business is always staff and humans - and the scene didn't have to compete with that.

In the UK, high street retail chains like HMV, Our Price, Music Zone and FOPP went into administration, were bought, and entered administration again – relying on cut price DVD sales to keep the doors open (a format that was still slightly impractical for the average user to pirate at the time).

But something more interesting was happening. People were using illegal music sources not just to steal music they knew they wanted, but to discover things they’d never heard of. There were reports that consumers of illegal downloads were actually… spending more money on music?!

While everyone was so caught up on the idea that people were just out to get things for free (which was certainly more than case with the consumption of other contemporary piracy like that of games) music and it’s particular place in the cultural ecosystem, with live performances, and merchandise, and youth identity, actually saw some uplift where bands that would have never gotten the attention of a label were suddenly independent darlings.

While the majors were losing, and the millions-of-unit pop albums of the time were performing poorly, the underground was thriving, much like that early tape trading scene. This phenomenon dovetailed with the rise of then nascent social media platforms like Livejournal and later, eponymously MySpace and the idea of the “MySpace Bands” – but what these bands really were was the grass roots marketing of local niche scenes into bigger audiences, powered by the leveling of technology, and the groundwork done, ironically, by software like Napster.

Did Napster accidentally “save music”?

A whole generation of people grew up stealing music and being totally ok with exploring music they would never listen to precisely because it didn’t cost anything. Sadly, you can’t feed your kids on the willingness of people to explore music free of cost.

There were breakout bands from the “MySpace scene” in the underground – the Arctic Monkeys, Bring Me The Horizon, Asking Alexandria – they made money. People noticed.

Pay What You Want

In October 2007 Radiohead released their seventh album “In Rainbows”, online, for any amount at DRM-free MP3s. They’d found themselves in a novel part of their career free from the encumbrance of a traditional record deal and buoyed by the high profile a previously successful career as a major label recording artist afforded.

In December of the same year, they released a series of expanded physical formats and the download was removed.

Reaction was mixed. While Radiohead didn’t invent the “pay what you want” business model, they were the largest artist (by several orders of magnitude) to adopt it and bring it into the mainstream. Trent Reznor of Nine Inch Nails was critical of it not going far enough (arguing the low-quality digital release was a promotional tool for more expensive deluxe editions) while scores of artists criticised the move as an exceptionally wealthy band devaluing the worth of music.

Trent Reznor would go on to produce the Saul Williams third album “The Inevitable Rise and Liberation of NiggyTardust!” – which Williams and Reznor released as high-quality audio audio files for “Pay What You Want or $5”. In the two months from its release, Tardust! Shifted around 30k paying copies, out of ~150k downloads, this compared favourably to Williams’ debut album, which had shifted 30k copies in the previous 3 years.

Reznor would later go on to release two of his own albums, Ghosts I-IV and The Slip, under similar schemes, and licensed under the Creative Commons license, complete with deluxe physical releases.

While it’s clear that the Pay What You Want model was prolific for these particular bands, much of the criticism centred around the model being entirely untenable for artists without the prior success of Radiohead or Reznor. The benefit of privilege of success under a previous regime.

The record industry didn’t react either in kind, or kindly. Prices remained at an all-time high. In this same time window, a somewhat blunt Reznor addressed crowds in Australia during a show to express his dissatisfaction with the value placed on music.

  “I woke up this morning and I forgot where I was for a minute.

  I remembered the last time I was here; I was doing a lot of complaining at the prices 
  of CDs down here. That story got picked up, and got carried all around the world, and
  now my record label all around the world hates me because I yelled at the and called
  them out for being greedy fucking assholes.

  I didn’t get a chance to check, has the price come down at all?

  You know what that means? Steal it. Steal away. Steal and steal and steal some more
  and give it to all your friends. 

  Because one way or another these motherfuckers are going to realise, they’re
  ripping people off and that’s not right.” 

Curt, but the tide was certainly shifting against the high price of physical media at the end of the 00’s. Reznor re-started his own label around this time to release his own work.

A Model for The Rest of Us

The 2000s were not kind to Peer to Peer file sharing services. Apple and Amazon both had DRM powered music storefronts, along with also-rans, and the launch of the iPod in 2001 monopolised paid-for digital downloads, normalised DRM to consumers of music, and saved Apple as a company.

As these more closed ecosystems gave the record industry exactly what they were looking for. The ability to charge the same amount, while enjoying the comparative low cost of digital distribution. Peer to peer had been pushed underground by litigation returning to the warez scene subcultures from where it came, thanks to lobby groups and the rise of film piracy pushing for crackdowns on file-sharing, and especially popular mainstream BitTorrent sites like The Pirate Bay. Several high-profile lawsuits and prison sentences did well for scaring people away from “downloading” pirated music. The industry didn’t recover, but it did see hope.

Towards the end of the 2000s, streaming audio services and web-radio started their rise, along with the founding of companies like Spotify that offered a different model for music consumption. Not only was it a model that worked for the record companies because nobody ever really owned any of the music they were streaming, but it worked for people by passing the tolerance test of “seemingly more convenient than the thing it replaced”.

Tired of loading new songs onto your iPod? Spotify!

Don’t even have an iPod or MP3 player anymore because 4G and smartphones were now ubiquitous? Spotify!

Spotify was so convenient, and so useful, it steamrolled across everything that came before it. Its free mode was ad supported, and sure, the labels weren’t making as much money as they were making before, but it sure beat having some kids upload the albums you published to YouTube and benefiting from the ad revenue.

In Spotify, the labels found the same thing that the videogame industry found in Valves Steam platform – a form of DRM that regular consumers didn’t feel threatened by. That didn’t seem like it infringed on anything. That didn’t feel like it was a threat, or punitive. A far cry from the MPAA and the BPI pressuring ISPs to release information of their consumers so they could litigate against them.

If anything, Spotify is too good. It has competitors in 2019, but none of them are especially credible. Apple Music (which early Pay What You Want proponent Trent Reznor ended up working on for a time), Amazon, and briefly Microsoft all offered competitors – but Spotify and it’s reach, discovery algorithms and passive social features out-stepped the competition. It has a vice like grip on the streaming industry, much like Apple’s iTunes did on DRM’d digital sales previously.

The nature of music has also shifted significantly in the two decades of the mainstream internet. Lars was right.

We normalised the fact that music wasn’t worth anything, and the cracks are now showing around the whole ecosystem that supports music. Bands don’t break big anymore, music is diverse, interesting, challenging, infinitely broad, and infinitely shallow.

You like Mexican Hip Hop mixed with Deathcore? We got that. How about Indian Rap Metal? Christopher Lee singing power metal? Yes, that exists too.

Low cost and high-quality recording equipment have made the production of music accessible to an entire generation, at the same time as the global economic downturn saw the closure of music venues across the UK. Never has something so creatively healthy felt so continuously threatened by extinction.

Spotify exacerbate this problem with a steady stream of controversies regarding the allegedly low renumeration of artists streaming on their platform. You can’t really find any solid numbers on what Spotify pay artist other than the consensus that “it’s not enough”. Songwriters doubly so – the co-writer of the Bon Jovi song Livin’ on a Prayer, in 2018, for half a billion streams, received $6,000 in royalties from Spotify. Several high-profile artists have pulled catalogues from Spotify, only to later re-emerge (presumably because you cannot fight technological change, but also, because money).

It doesn’t take much to do the back of envelope maths with numbers like that, and they don’t look good. I don’t work in the music business, but I know a lot of people that do, and the stories are all consistent. Living as a touring musician in 2019 is a harder life than it’s ever been before.

No art without patronage.

When you’re young, you just want to have people hear your music, to play shows, to be a Rockstar. None of those things pay a mortgage.

Begging to play?

What have artists, bands even done to cope with this existence?

We’ve seen crowdfunding experiments – some successful, some not. Meet and greets, on-tour-lessons, signing sessions, expanded editions, hang-outs, VIP experiences, the works. There’s plenty of good in all those things, but it’s impossible to not identify all of these things for what they are – making artists work extra to somehow justify the value of their art.

Art has value. Value is not cost. These two things should not be accidentally conflated. We started off with “the work” as having value, which was slowly shifted to the live performance. The live performances value was slowly shifted to the merchandise, begetting the slow productisation of art. When we push art into the purely commercial it can’t help but be compromised.

The financial models behind streaming music are compromised. Technology has ascended to occupy the place the record labels once were, with Spotify and other infrastructure companies being the bodies that profit the most.

I run a software company for a living, I speak for, and advocate for technology because I care deeply about it, but there’s certainly something tragically wrong here, even if it’s the simple answer that Spotify subscriptions are too cheap.

What about the Underground?

I’ve only really hinted at my own personal tastes throughout this piece, music is subjective. But the underground, the small labels, I fear deeply for in this climate.

I fear for the small labels for the discovery capabilities – in niche genres, labels are important, grass roots shows are important.

I grew up discovering music from soundtracks, from the second-hand CD shops in Manchester where you could buy music journalists discarded promos for £3-5 a CD. I’d go down on Saturday mornings with friends and we’d buy 4-5 albums a week. We’d take chances. We’d buy things based on the descriptions and the sound-alikes and the artwork.

It was culture, and culture shifts. The internet has been nothing but incredible for music discovery and access, and has replicated and bettered the experiences I had digesting weird metal albums on Saturday afternoons over the last decade – but in doing so, it’s also completely conflated the concept of ownership with that of access.

It’s no shock to anyone that you don’t own the music you listen to on streaming services, but the more none-mainstream you get, the greater the risk you’re running to lose access to that music in its entirety.

We’ve seen how easy it is for records to disappear from Spotify at the behest of their owners, but what happens when the owners go bankrupt? When the labels go out of business?

What happens when nobody owned the thing they’ve been enjoying, and it vanishes?

The games industry has long been contending with a similar category of problem with the way it treats abandonware (games out of copyright where the authors and publishers no longer exist) and video game emulation.

Games stuck in licensing hell have routinely vanished or become unplayable. We shouldn’t let our own culture erode and disappear.

We’ve slowly killed ownership, we’re slowly killing our DIY scenes by closing live venues, and we’re exposing music that has been created in our scenes and undergrounds, across every genre that isn’t mainstream, but outsourcing ownership to privately owned organisations that hardly pay the creators of the art we covet. Culture and art should not be kept behind the gates of collectors, and inflated prices.

The music business is lucky, but not without its tragedies – the UMG media archive famously burnt down, losing original recordings and masters of some of the most important albums in history.

The British Library, thankfully, cares about this.

The “Sound and Moving Image” archive is vast and more varied than you might imagine – their stated aim is to collect a copy of each commercial release in the UK, of any and all genres. There’s no budget, and the rely on donations from labels and artists, along with private collections. The more esoteric “sounds” are digitised, but for most of the popular recordings, you’ll have to go in person to listen for free.

I fundamentally believe in the value of digital distribution and streaming platforms. They’ve opened the walls of music up, but as a community we need to be better at protecting our culture and music – because private organisations owe us nothing and are not always good actors.

Metallica were right about Napster. Let’s protect music, especially outside of the mainstream.

And go to a show!

Architecture for Everyone

12/12/2019 11:20:00

Someone said the dreaded architecture word and you’re Not-An-Architect?

Scared when someone challenges your understanding of computational complexity when all you’re trying to do is put a widget on a webpage? Never fear – it’s probably not nearly as sophisticated or as complex as you might think.

Architecture has a reputation for being unapproachable, gate-kept, and “hard computer science”. Most of the software architecture you run into, for average-to-web-scale web apps, is astonishingly similar. We’re going to cover the basics, some jargon, and some architectural patterns you’ll probably see everywhere in this brief architectural primer for not-architects and web programmers.

What even is a webserver?

Ok so let’s start with the basics. The “web”, or the “world wide web” – to use its hilariously antiquated full honorific, is just a load of computers connected to the internet. The web is a series of conventions that describe how “resources” (read: web pages) can be retrieved from these connected computers.

Long story short, “web servers”, implement the “HTTP protocol” – a series of commands you can send to remote computers – that let you say “hey, computer, send me that document”. If this sounds familiar, it’s because that’s how your web browsers work.

When you type www.my-awesome-website.com into your browser, the code running on your computer crafts a “http request” and sends it to the web server associated with the URL (read: the website address) you typed into the address bar.

So, the web server - the program running on the remote computer, connected to the internet, that’s listening for requests and returning data when it receives them. The fact this works at all is a small miracle and is built on top of DNS (the thing that turns my-awesome-website.com into an IP address), and a lot of networking, routing and switching. You probably don’t need to know too much about any of that in real terms unless you’re going deep.

There are tonne of general purpose web servers out there – but realistically, you’ll probably just see a mixture of Apache, NGINX and Microsoft IIS, along with some development stack specific web servers (Node.js serves itself, as can things like ASP.NET CORE for C#, and HTTP4K for Kotlin).

How does HTTP work? And is that architecture?

If you’ve done any web programming at all, you’ll likely be at least a little familiar with HTTP. It stands for “The Hyper Text Transfer Protocol”, and it’s what your browser talks when it talks to web servers. Let’s look at a simple raw HTTP “request message”:

GET http://www.davidwhitney.co.uk/ HTTP/1.1
Host: www.davidwhitney.co.uk
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64…
Accept: text/html,application/xhtml+xml,application/xml;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

The basics of HTTP are easy to grasp – there’s a mandatory “request line” – that’s the first bit with a verb (one of GET, POST, PUT and HEAD most frequently), the URL (the web address) and the protocol version (HTTP/1.1). There’s then a bunch of optional request header fields – that’s all the other stuff – think of this as extra information you’re handing to the webserver about you. After your headers, there’s a blank line, and an optional body. That’s HTTP/1.1. We’re done here. The server will respond in similar form

HTTP/1.1 200 OK
Cache-Control: public,max-age=1
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Server: Kestrel
X-Powered-By: ASP.NET
Date: Wed, 11 Dec 2019 21:52:23 GMT
Content-Length: 8479


<!DOCTYPE html>
<html lang="en">...

The first line being a status code, followed by headers and a response body. That’s it. The web server, based on the content of a request, can send you anything it likes, and the software that’s making the request must be able to interpret the response. There’s a lot of nuance in asking for the right thing, and responding appropriately, but the basics are the same.

The web is an implementation of the design pattern REST – which stands for “Representational State Transfer”. You’ll hear people talk about REST a lot – it was originally defined by Roy Fielding in his PhD dissertation, but more importantly was a description of the way HTTP/1.0 worked at the time, and was documented at the same time Fielding was working on HTTP/1.1.

So the web is RESTful by default. REST describes the way HTTP works.

The short version? Uniquely addressable URIs (web addresses) that return a representation of some state held on a machine somewhere (the web pages, documents, images, et al). Depending on what the client asks for, the representation of that state could vary.

So that’s HTTP, and REST, and an architectural style all in one.

What does the architecture of a web application look like?

You can write good software following plenty of different architectural patterns, but most people stick to a handful of common patterns.

“The MVC App”

MVC – model view controller – is a simple design pattern that decouples the processing logic of an application and the presentation of it. MVC was really catapulted into the spotlight by the success of Ruby on Rails (though the pattern was a couple of decades older) and when most people say “MVC” they’re really describing “Rails-style” MVC apps where your code is organised into a few different directories

/controllers
/models
/views

Rails popularised the use of “convention over configuration” to wire all this stuff together, along with the idea of “routing” and sensible defaults. This was cloned by ASP.NET MVC almost wholesale, and pretty much every other MVC framework since.

As a broad generalisation, by default, if you have a URL that looks something like

http://www.mycoolsite.com/Home/Index

An MVC framework, using its “routes” – the rules that define where things are looked up - would try and find a “HomeController” file or module (depending on your programming language) inside the controllers directory. A function called “Index” would probably exist. That function would return a model – some data – that is rendered by a “view” – a HTML template from the views folder.

All the different frameworks do this slightly differently, but the core idea stays the same – features grouped together by controllers, with functions for returning pages of data and handling input from the web.

“The Single Page App with an API

SPAs are incredibly common, popularised by client-side web frameworks like Angular, React and Vue.js. The only real difference here is we’re taking our MVC app and shifting most of the work it does to the client side.

There are a couple of flavours here – there’s client side MVC, there’s MVVM (model-view-view-model), and there’s (FRP) functional reactive programming. The differences might seem quite subtle at first.

Angular is a client side MVC framework – following the “models, views and controllers” pattern – except now it’s running inside the users web browser.

React – an implementation of functional reactive programming – it’s a little more flexible but is more concerned with state change events in data – often using some event store like Redux for its data.

MVVM is equally common in single page apps where there’s two way bindings between something that provides data (the model) and the UI (which the view model serves).

Underneath all these client heavy JavaScript frameworks, is generally an API that looks nearly indistinguishable from “the MVC app”, but instead of returning pre-rendered pages, returns the data that the client “binds” it’s UI to.

“Static Sites Hosted on a CDN or other dumb server”

Perhaps the outlier of the set – there’s been a resurgence of static websites in the 20-teens. See, scaling websites for high traffic is hard when you keep running code on your computers.

We spent years building relatively complicated and poorly performing content management systems (like WordPress), that cost a lot of money and hardware to scale.

As a reaction, moving the rendering of content to a “development time” exercise has distinct cost and scalability benefits. If there’s no code running, it can’t crash!

So static site generators became increasingly popular – normally allowing you to use your normal front-end web dev stack, but then generating all the files using a build tool to bundle and distribute to dumb web servers or CDNs. See tools like – Gatsby, Hugo, Jekyll, Wyam.

“Something else”

There are other architypes that web apps follow – there’s a slowly rising trend in transpiled frameworks (Blazor for C# in WebAssembly, and Kotlin’s javascript compile targets) – but with the vast popularity of the dominant javascript frameworks of the day, they all try to play along nicely.

Why would I choose one over the other?

Tricky question. Honestly for the most part it’s a matter of taste, and they’re all perfectly appropriate ways to build web applications.

Server-rendered MVC apps are good for low-interactivity websites. Even though high fidelity frontend is a growing trend, there’s a huge category of websites that are just that – web sites, not web applications – and the complexity cost of a large toolchain is often not worth the investment.

Anything that requires high fidelity UX, almost by default now, is probably a React, Angular or Vue app. The programming models work well for responsive user experiences, and if you don’t use them, you’ll mostly end up reinventing them yourself.

Static sites? Great for blogs, marketing microsites, content management systems, anything where the actual content is the most valuable interaction. They scale well, basically cannot crash, and are cheap to run.

HTTP APIs, Rest, GraphQL, Backend-for-Frontends

You’re absolutely going to end up interacting with APIs, and while there’s a lot of terms that get thrown around to make this stuff sound complicated, but the core is simple. Most APIs you use, or build will be “REST-ish”.

You’ll be issuing the same kind of “HTTP requests” that your browsers do, mostly returning JSON responses (though sometimes XML). It’s safe to describe most of these APIs as JSON-RPC or XML-RPC.

Back at the turn of the millennium there was a push for standardisation of “SOAP” (simple object access protocol) APIs, and while that came with a lot of good stuff, people found the XML cumbersome to read and they diminished in popularity.

Ironically, lots of the stuff that was solved in SOAP (consistent message envelope formats, security considerations, schema verification) has subsequently had to be “re-solved” on top of JSON using emerging open-ish standards like Swagger (now OpenAPI) and JSON:API.

We’re good at re-inventing the things we already had on the web.

So, what makes a REST API a REST API, and not JSON-RPC?

I’m glad you didn’t ask.

REST at its core, is about modelling operations that can happen to resources over HTTP. There’s a great book by Jim Webber called Rest in Practice if you want a deep dive into why REST is a good architectural style (and it is, a lot of the modern naysaying about REST is relatively uninformed and not too dissimilar to the treatment SOAP had before it).

People really care about what is and isn’t REST, and you’ll possibly upset people who really care about REST, by describing JSON-RPC as REST. JSON-RPC is “level 0” of the Richardson Maturity Model – a model that describes the qualities of a REST design. Don’t worry too much about it, because you can build RESTish, sane, decent JSON-RPC by doing a few things.

First, you need to use HTTP VERBs correctly, GET for fetching (and never with side effects), POST for “doing operations”, PUT for “creating stuff where the state is controlled by the client”. After that, make sure you organise your APIs into logical “resources” – your core domain concepts “customer”, “product”, “catalogue” etc.

Finally, use correct HTTP response codes for interactions with your API.

You might not be using “hypermedia as the engine of application state”, but you’ll probably do well enough that nobody will come for your blood.

You’ll also get a lot of the benefits of a fully RESTful API by doing just enough – resources will be navigable over HTTP, your documents will be cachable, your API will work in most common tools. Use a swagger or OpenAPI library to generate a schema and you’re pretty much doing what most people are doing.

But I read on hackernews that REST sux and GraphQL is the way to go?

Yeah, we all read that post too.

GraphQL is confusingly, a Query Language, a standard for HTTP APIs and a Schema tool all at once. With the proliferation of client-side-heavy web apps, GraphQL has gained popularity by effectively pushing the definition of what data should be returned to the client, into the client code itself.

It’s not the first time these kinds of “query from the front end” style approaches have been suggested, and likely won’t be the last. What sets GraphQL apart a little from previous approaches (notably Microsofts’ OData) is the idea that Types and Queries are implemented with Resolver code on the server side, rather than just mapping directly to some SQL storage.

This is useful for a couple of reasons – it means that GraphQL can be a single API over a bunch of disparate APIs in your domain, it solves the “over fetching” problem that’s quite common in REST APIs by allowing the client to specify a subset of the data they’re trying to return, and it also acts as an anti-corruption layer of sorts, preventing unbounded access to underlying storage.

GraphQL is also designed to be the single point of connection that your web or mobile app talks to, which is really useful for optimising performance – simply, it’s quicker for one API over the wire to call downstream APIs with lower latency, than your mobile app calling (at high latency) all the internal APIs itself.

GraphQL really is just a smart and effective way to schema your APIs, and provide a BFF – that’s backend for frontend, not a best friend forever – that’s quick to change.

BFF? What on earth is a BFF?

Imagine this problem – you’re working for MEGACORP where there are a hundred teams, or squads (you don’t remember, they rename the nomenclature every other week) – each responsible for a set of microservices.

You’re a web programmer trying to just get some work done, and a new feature has just launched. You read the docs.

The docs describe how you have to orchestrate calls between several APIs, all requiring OAuth tokens, and claims, and eventually, you’ll have your shiny new feature.

So you write the API calls, and you realise that the time it takes to keep sending data to and from the client, let alone the security risks of having to check that all the data is safe for transit, slows you down to a halt. This is why you need a best friend forever.

Sorry, a backend for front-end.

A BFF is an API that serves one, and specifically only one application. It translates an internal domain (MEGACORPS BUSINESS), into the internal language of the application it serves. It takes care of things like authentication, rate limiting, stuff you don’t want to do more than once. It reduces needless roundtrips to the server, and it translates data to be more suitable for its target application.

Think of it as an API, just for your app, that you control.

And tools like GraphQL, and OData are excellent for BFFs. GraphQL gels especially well with modern JavaScript driven front ends, with excellent tools like Apollo and Apollo-Server that help optimise these calls by batching requests.

It’s also pretty front-end-dev friendly – queries and schemas strongly resemble json, and it keeps your stack “javascript all the way down” without being beholden to some distant backend team.

Other things you might see and why

So now we understand our web servers, web apps, and our APIs, there’s surely more to modern web programming than that? Here are the things you’ll probably run into the most often.

Load Balancing

If you’re lucky enough to have traffic to your site, but unlucky enough to not be using a Platform-as-a-Service provider (more on that later), you’re going to run into a load balancer at some point. Don’t panic. Load balancers talk an archaic language, are often operated by grumpy sysops, or are just running copies of NGINX.

All a load balancer does, is accept HTTP requests for your application (or from it), pick a server that isn’t very busy, and forward the request.

You can make Load balancers do all sorts of insane things that you probably shouldn’t use load balancers for. People will still try.

You might see load balancers load balancing a particularly “hot path” in your software onto a dedicated pool of hardware to try keep it safe or isolate it from failure. You might also see load balancers used to take care of SSL certificates for you – this is called SSL Termination.

Distributed caching

If one computer can store some data in memory, then lots of computers can store… well, a lot more data!

Distributed caching was pioneered by “Memcached” – originally written to scale the blogging platform Livejournal in 2003. At the time, Memcached helped Livejournal share cached copies of all the latest entries, across a relatively small number of servers, vastly reducing database server load on the same hardware.

Memory caches are used to store the result of something that is “heavy” to calculate, takes time, or just needs to be consistent across all the different computers running your server software. In exchange for a little bit of network latency, it makes the total amount of memory available to your application the sum of all the memory available across all your servers.

Distributed caching is also really useful for preventing “cache stampedes” – when a non-distributed cache fails, and cached data would be recalculated by all clients, but by sharing their memory, the odds of a full cache failure is reduced significantly, and even when it happens, only some data will be re-calculated.

Distributed caches are everywhere, and all the major hosting providers tend to support memcached or redis compatible (read: you can use memcached client libraries to access them) managed caches.

Understanding how a distributed cache works is remarkably simple – when an item is added, the key (the thing you use to retrieve that item) that is generated includes the address or name of the computer that’s storing that data in the cache. Generating keys on any of the computers that are part of the distributed cache cluster will result in the same key.

This means that when the client libraries that interact with the cache are used, they understand which computer they must call to retrieve the data.

Breaking up large pools of shared memory like this is smart, because it makes looking things up exceptionally fast – no one computer needs to scan huge amounts of memory to retrieve an item.

Content Delivery Networks (CDNs)

CDNs are web servers run by other people, all over the world. You upload your data to them, and they will replicate your data across all of their “edges” (a silly term that just means “to all the servers all over the world that they run”) so that when someone requests your content, the DNS response will return a server that’s close to them, and the time it takes them to fetch that content will be much quicker.

The mechanics of operating a CDN are vastly more complicated than using one – but they’re a great choice if you have a lot of static assets (images!) or especially big files (videos! large binaries!). They’re also super useful to reduce the overall load on your servers.

Offloading to a CDN is one of the easiest ways you can get extra performance for a very minimal cost.

Let’s talk about design patterns! That’s real architecture

“Design patterns are just bug fixes for your programming languages”

People will talk about design patterns as if they’re some holy grail – but all a design pattern is, is the answer to a problem that people solve so often, there’s an accepted way to solve it. If our languages, tools or frameworks were better, they would probably do the job for us (and in fact, newer language features and tools often obsolete design patterns over time).

Let’s do a quick run through of some very common ones:

  • MVC – “Split up your data model, UI code, and business logic, so they don’t get confused”
  • ORM – “Object-Relational mapping” – Use a mapping library and configured rules, to manage the storage of your in-memory objects, into relational storage. Don’t muddle the objects and where you save them together”.
  • Active Record – “All your objects should be able to save themselves, because these are just web forms, who cares if they’re tied to the database!”
  • Repository – “All your data access is in this class – interact with it to load things.”
  • Decorator – “Add or wrap ‘decorators’ around an object, class or function to add common behaviour like caching, or logging without changing the original implementation.”
  • Dependency Injection – “If your class or function depends on something, it’s the responsibility of the caller (often the framework you’re using) to provide that dependency”
  • Factory – “Put all the code you need to create one of these, in one place, and one place only”
  • Adapter – “Use an adapter to bridge the gap between things that wouldn’t otherwise work together – translating internal data representations to external ones. Like converting a twitter response into YourSocialMediaDataStructure”
  • Command – “Each discrete action or request, is implemented in a single place”
  • Strategy – “Define multiple ways of doing something that can be swapped in and out”
  • Singleton – “There’s only one of these in my entire application”.

That’s a non-exhaustive list of some of the pattern jargon you’ll hear. There’s nothing special about design patterns, they’re just the 1990s version of an accepted and popular stackoverflow answer.

Microservice architectures

Microservice architectures are just the “third wave” of Service Oriented Design.

Where did they come from?

In the mid-90s, “COM+” (Component Services) and SOAP were popular because they reduced the risk of deploying things, by splitting them into small components – and providing a standard and relatively simple way to talk across process boundaries. This led to the popularisation of “3-tier” and later “n-tier” architecture for distributed systems.

N-Tier really was a shorthand for “split up the data-tier, the business-logic-tier and the presentation-tier”. This worked for some people – but suffered problems because horizontal slices through a system often require changing every “tier” to finish a full change. This ripple effect is bad for reliability.

Product vendors then got involved, and SOAP became complicated and unfashionable, which pushed people towards the “second wave” – Guerrilla SOA. Similar design, just without the high ceremony, more fully vertical slices, and less vendor middleware.

This led to the proliferation of smaller, more nimble services, around the same time as Netflix were promoting hystrix – their platform for latency and fault tolerant systems.

The third wave of SOA, branded as Microservice architectures (by James Lewis and Martin Fowler) – is very popular, but perhaps not very well understood.

What Microservices are supposed to be: Small, independently useful, independently versionable, independently shippable services that execute a specific domain function or operation.

What Microservices often are: Brittle, co-dependent, myopic services that act as data access objects over HTTP that often fail in a domino like fashion.

Good microservice design follows a few simple rules

  • Be role/operation based, not data centric
  • Always own your data store
  • Communicate on external interfaces or messages
  • What changes together, and is co-dependent, is actually the same thing
  • All services are fault tolerant and survive the outages of their dependencies

Microservices that don’t exhibit those qualities are likely just secret distributed monoliths. That’s ok, loads of people operate distributed monoliths at scale, but you’ll feel the pain at some point.

Hexagonal Architectures

Now this sounds like some “Real Architecture TM”!

Hexagonal architectures, also known as “the ports and adapters” pattern – as defined by Alistair Cockburn, is one of the better pieces of “real application architecture” advice.

Put simply – have all your logic, business rules, domain specific stuff – exist in a form that isn’t tied to your frameworks, your dependencies, your data storage, your message busses, your repositories, or your UI.

All that “outside stuff”, is “adapted” to your internal model, and injected in when required.

What does that really look like? All your logic is in files, modules or classes that are free from framework code, glue, or external data access.

Why? Well it means you can test everything in isolation, without your web framework or some broken API getting in the way. Keeping your logic clear of all these external concerns is safe way to design applications.

There’s a second, quite popular approach described as “Twelve Factor Apps” – which mostly shares these same design goals, with a few more prescriptive rules thrown on top.

Scaling

Scaling is hard if you try do it yourself, so absolutely don’t try do it yourself.

Use vendor provided, cloud abstractions like Google App Engine, Azure Web Apps or AWS Lambda with autoscaling support enabled if you can possibly avoid it.

Consider putting your APIs on a serverless stack. The further up the abstraction you get, the easier scaling is going to be.

Conventional wisdom says that “scaling out is the only cost-effective thing”, but plenty of successful companies managed to scale up with a handful of large machines or VMs. Scaling out gives you other benefits (often geo-distribution related, or cross availability zone resilience) but don’t feel bad if the only leaver you have is the one labelled “more power”.

Architectural patterns for distributed systems

Building distributed systems is harder than building just one app. Nobody really talks about that much, but it is. It’s much easier for something to fail when you split everything up into little pieces, but you’re less likely to go completely down if you get it right.

There are a couple of things that are always useful.

Circuit Breakers everywhere

Circuit breaking is a useful distributed system pattern where you model out-going connections as if they’re an electrical circuit. By measuring the success of calls over any given circuit, if calls start failing, you “blow the fuse”, queuing outbound requests rather than sending requests you know will fail.

After a little while, you let a single request flow through the circuit (the “half open” state), and if it succeeds, you “close” the circuit again and let all the queued requests flow through.

Circuit breakers are a phenomenal way to make sure you don’t fail when you know you might, and they also protect the service that is struggling from being pummelled into oblivion by your calls.

You’ll be even more thankful for your circuit breakers when you realise you own the API you’re calling.

Idempotency and Retries

The complimentary design pattern for all your circuit breakers – you need to make sure that you wrap all outbound connections in a retry policy, and a back-off.

What does this mean? You should design your calls to be non-destructive if you double submit them (idempotency), and that if you have calls that are configured to retry on errors, that perhaps you back off a little (if not exponentially) when repeated failures occur – at the very least to give the thing you’re calling time to recover.

Bulkheads

Bulkheads are inspired by physical bulkheads in submarines. When part of a submarines hull is compromised, the bulkheads shut, preventing the rest of the sub from flooding. It’s a pretty cool analogy, and it’s all about isolation.

Reserved resources, capacity, or physical hardware can be protected for pieces of your software, so that an outage in one part of your system doesn’t ripple down to another.

You can set maximum concurrency limits for certain calls in multithreaded systems, make judicious use of timeouts (better to timeout, than lock up and fall over), and even reserve hardware or capacity for important business functions (like checkout, or payment).

Event driven architectures with replay / message logs

Event / message-based architectures are frequently resilient, because by design the inbound calls made to them are not synchronous. By using events that are buffered in queues, your system can support outage, scaling up and scaling down, and rolling upgrades without any special consideration. It’s normal mode of operation is “read from a queue”, and this doesn’t change in exceptional circumstances.

When combined with the competing consumers pattern – where multiple processors race to consume messages from a queue – it’s easy to scale out for good performance with queue and event driven architectures.

Do I need Kubernetes for that?

No. You probably don’t have the same kind of scaling problems as Google do.

With the popularisation of docker and containers, there’s a lot of hype gone into things that provide “almost platform like” abstractions over Infrastructure-as-a-Service. These are all very expensive and hard work.

If you can in any way manage it, use the closest thing to a pure-managed platform as you possibly can. Azure App Services, Google App Engine and AWS Lambda will be several orders of magnitude more productive for you as a programmer. They’ll be easier to operate in production, and more explicable and supported.

Kubernetes (often irritatingly abbreviated to k8s, along with it’s wonderful ecosystem of esoterically named additions like helm, and flux) requires a full time ops team to operate, and even in “managed vendor mode” on EKS/AKS/GKS the learning curve is far steeper than the alternatives.

Heroku? App Services? App Engine? Those are things you’ll be able to set up, yourself, for production, in minutes to only a few hours.

You’ll see pressure to push towards “Cloud neutral” solutions using Kubernetes in various places – but it’s snake oil. Being cloud neutral simply means you pay the cost of a cloud migration (maintaining abstractions, isolating your way from useful vendor specific features) perpetually, rather than in the (exceptionally unlikely) scenario that you’re switching cloud vendor.

The responsible use of technology includes using the thing most suited to the problem and scale you have.

Sensible Recommendations

Always do the simplest thing that can possibly work. Architecture has a cost, just like every abstraction. You need to be sure you’re getting a benefit before you dive into to some complex architectural patterns.

Most often, the best architectures are the simplest and most amenable to change.

Creating Spark Dataframes without a SparkSession for tests

03/26/2019 12:58:01

Back to the scheduled .NET content after this brief diversion into... Java.

I'm currently helping a team put some tests around a Spark application, and one of the big bugbears is testing raw data transformations and functions that'll run inside the spark cluster, on the outside of it. It turns out the most of the core Spark types all hang off a SparkSession and can't really be manually constructed - something a quick StackOverflow query appears to confirm. People just can't seem to create Spark Dataframes outside of a spark session.

Except you can.

All a Spark Dataframe really is, is a schema and a collection of Rows - so with a little bit of digging, you realise that if you can only create a row and a schema, everything'll be alright. So you did, and you discover no public constructors and no obvious ways to create the test data you need.

Unless you apply a little bit of reflection magic, and then you can create a schema with some data rows trivially

Copy pasta until your hearts content. Test that Spark code, it's not going to test itself.

Building .NET Apps in VSCode (Not .NetCore)

11/17/2016 10:09:50

With all the fanfare around .NET Core and VS Code, you might have been lead to believe that you can't build your boring old .NET apps inside of VS Code, but that's not the case.

You can build your plain old .NET solutions (PONS? Boring old .NET projects? BONPS? God knows) by shelling out using the external tasks feature of the editor (http://code.visualstudio.com/docs/editor/tasks).

First, make sure you have a couple of plugins available

  • C# for Visual Studio Code (powered by OmniSharp)
  • MSBuild Tools (for syntax highlighting)
Now, "Open Folder" on the root of your repository and press CTRL+SHIFT+B.

VS Code will complain that it can't build your program, and open up a file it generates .vs\tasks.json in the editor.  It'll be configured to use msbuild, but won't work unless MSBuild is in your path, with a trivial edit to correct the path, you'll be building straight away:

{ // See https://go.microsoft.com/fwlink/?LinkId=733558 // for the documentation about the tasks.json format "version": "0.1.0", "command": "C:\\Program Files (x86)\\MSBuild\\14.0\\Bin\\msbuild.exe", "args": [ "/property:GenerateFullPaths=true" ], "taskSelector": "/t:", "showOutput": "silent", "tasks": [ { "taskName": "build", "showOutput": "silent", "problemMatcher": "$msCompile" } ] }

CTRL+SHIFT+B will now build your code by invoking MSBuild.

Get Coding! - I wrote a book.

06/28/2016 13:58:46

Through late 2015 and the start of 2016 I was working on a “secret project” that I could only allude to in conjunction with Walker Books UK and Young Rewired State – to write a book to get kids coding. As a hirer, I’ve seen first hand the difficulty in getting a diverse range of people through the door into technology roles, and I thoroughly believe that the best way we solve the problem is from the ground up - changing the way we teach computer science.

We live in a world where the resources to teach programming are widely available if there’s an appetite for it, and the barrier to entry is lower than ever, so in some ways, this is my contribution to the movement.

 

Get Coding is a beautifully illustrated (courtesy of Duncan Beedie) book that takes a “broad not deep” approach to teaching HTML5, JavaScript and CSS to kids from ages 8 and up. It was edited by Daisy Jellicoe at Walker Books UK, and without her attention to detail and enthusiasm it wouldn’t have come out half as well as it did. It’s quite long, at 209 pages, and comes with a story that children can follow along to.

Learn how to write code and then build your own website, app and game using HTML, CSS and JavaScript in this essential guide to coding for kids from expert organization Young Rewired State. Over 6 fun missions learn the basic concepts of coding or computer programming and help Professor Bairstone and Dr Day keep the Monk Diamond safe from dangerous jewel thieves. In bite-size chunks learn important real-life coding skills and become a technology star of the future. Young Rewired State is a global community that aims to get kids coding and turn them into the technology stars of the future.

The book is available on Amazon and in major high street bookstores.

I’ve been thrilled by the response from friends, family and the community, and it made the sometimes quite stressful endeavour thoroughly worthwhile. After launch, Get Coding! has ended up number 1 in a handful of Amazon categories, was in the top 2000 books on Amazon.co.uk for about a week, and has received a series of wonderfully positive reviews, not to mention being recently featured in The Guardians Best New Children's Books Guide for Summer 2016.

I’ll leave you with a few photos I’ve been sent or collected and hope that perhaps you’ll buy the book.

image

ranking

 

IMG_20160618_150039_sm IMG_20160618_150201_sm

IMG_20160618_150212_sm IMG_20160618_145724-sm

 

13139134_10102059147374565_2571542643273524466_n  13227662_10153563152840373_7245727713019050736_o 

CiB_8H4WEAE1zFd  13138762_10153468356995966_6651036153885572284_n

Why code reviews are important

04/07/2016 14:40:41

Making sure that more than one set of eyes has seen all the code that we produce is an important part of software development - it makes sure that we catch bugs, keep our code readable, and share patterns and practices across the teams.

Code review should answer the questions

  • Are there any logical errors?
  • Are the requirements implemented?
  • Are all the acceptance criteria of the user story met?
  • Do our unit tests and automation tests around this feature pass? Are we missing any?
  • Does the code match our house style?
The interesting thing about the effectiveness of code review is that it isn't just hearsay, it was measured effectively in the seminal book "CODE Complete":
.. software testing alone has limited effectiveness -- the average defect detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 and 60 percent. Case studies of review results have been impressive:
  • In a software-maintenance organization, 55 percent of one-line maintenance changes were in error before code reviews were introduced. After reviews were introduced, only 2 percent of the changes were in error. When all changes were considered, 95 percent were correct the first time after reviews were introduced. Before reviews were introduced, under 20 percent were correct the first time.
  • In a group of 11 programs developed by the same group of people, the first 5 were developed without reviews. The remaining 6 were developed with reviews. After all the programs were released to production, the first 5 had an average of 4.5 errors per 100 lines of code. The 6 that had been inspected had an average of only 0.82 errors per 100. Reviews cut the errors by over 80 percent.
  • The Aetna Insurance Company found 82 percent of the errors in a program by using inspections and was able to decrease its development resources by 20 percent.
  • IBM's 500,000 line Orbit project used 11 levels of inspections. It was delivered early and had only about 1 percent of the errors that would normally be expected.
  • A study of an organization at AT&T with more than 200 people reported a 14 percent increase in productivity and a 90 percent decrease in defects after the organization introduced reviews.
  • Jet Propulsion Laboratories estimates that it saves about $25,000 per inspection by finding and fixing defects at an early stage.
We do code reviews because they help us make our code better, and measurably save a lot of money - the cost of fixing software issues only multiplies once they're in production.

Code review check-list

Not sure how to code review? Here's a check-list to get you started, derived from many excellent existing check-lists.

General

  • Does the code work?
  • Does it perform its intended function?
  • Is all the code easily understood?
  • Does it conform to house style, standard language idioms?
  • Is there any duplicate code?
  • Is the code as modular as possible?
  • Can any global variables be replaced?
  • Is there any commented out code?
  • Can any of the code be replaced with library functions?
  • Can any logging or debugging code be removed?
  • Has the "Boy scout rule" been followed? Is the code now better than before the change?
Security
  • Are all data inputs checked (for the correct type, length, format, and range) and encoded?
  • Where third-party utilities are used, are returning errors being caught?
  • Are output values checked and encoded?
  • Are invalid parameter values handled?
Documentation
  • Is any unusual behaviour or edge-case handling described?
  • Is there any redundant auto-documentation that can be removed?
Testing
  • Is the code unit tested?
  • Do the tests actually test that the code is performing the intended functionality?
  • Could duplication in test code be reduced with builder / setup methods or libraries?

Practical Tips

Don't try to be a human compiler

The first and most important thing to remember when you're doing a code review is that you're not meant to be a human compiler. Ensure you're reviewing the functionality, tests and readability of the code rather than painstakingly inspecting syntax. Syntax and house style are important, but style issues are a much better fit for automated tooling than humans. Don't waste time bickering over style and formatting.

Reviewers are born equal

Our teams are built around mutual trust and respect, and a natural extension of that is that anybody can code review. You may on occasion find yourself working on some code that is someone else's area of particular interest or expertise - but better solicit their advice while you're working than wait for a review. Code reviews aren't limited to your technical lead, and likewise, a lead should have their code reviewed all the same.

It's just code, you're not marrying it

Don't be precious about your code as a submitter, and as a reviewer be honest and open. It's just code, make sure it's the best it can be. A code review is an opportunity to make your code the best code it can be, using the expertise of your colleagues.

That's fine!

Sometimes the code is just fine - don't be the person that nitpicks for change without any quantifiable benefit.

Tooling

Code reviews should be performed on change-sets via either a branch comparison URL, or ideally, a pull request.

Stash has replaced our legacy tool (FishEye) for code reviews using Git, and as projects migrate from SVN they are expected to migrate to using branch comparison or pull requests in Stash.

Pair programming and code reviews

As one of the founding practices of eXtreme Programming (XP), pair-programming can be seen as "extreme code review"

"code reviews are considered a beneficial practice; taken to the extreme, code can be reviewed continuously, i.e. the practice of pair programming."

Pair programming exhibits exactly the same qualities of a great code review, with the additional benefits of immediacy - it's impossible to ignore a pair critiquing a design in real time, poor choices barely survive, and work doesn't end up blocked in a code review queue somewhere.

Pairing is usually preferably to code review for regular work, and if code is produced as part of a pair it can be considered "code reviewed by default". Unfortunately due to reasons of availability, location or specialisation, you may need to rely on a more tradition code review with some of the code you write.

Realistic expectations

Like everything a code review is not a silver bullet - and while they've proven to protect us against a large quantity of visibly obvious bugs, code reviews aren't great at spotting performance bugs or subtle threading and concurrency issues.

You'll need to rely on existing instrumentation and profilers for these types of metrics - "mentally executing" and finding these kinds of bugs is unlikely, if not impossible, just beware to not believe your code to be free of error by virtue of the fact that it's been subjected to a code review or pairing session. Ironically, the kinds of bugs that'll slip through a code review or pairing session will by definition be these tricky and hard to detect edge cases.

Projections in Agile Software Development

11/28/2015 16:44:24

Preparing a timeline for the development of software is a difficult problem – teams have proven time and time again that they’re tremendously bad at estimating how long it takes to develop software at any kind of scale.

Unfortunately, there’s a conflict between the need for financial planning and budgeting, and the unpredictable nature of software delivery – to the extent that agile software advocate frequently dismisses the accuracy and need for timelines and schedules.

This presents a difficult problem – organisations need to plan and budget, but their best technical people tell them that accurate planning is impossible.

We can make projections to try and close this gap.

Projections are common in financial planning and they serve the same purpose in software – they’re a forecast.

From http://www.entrepreneur.com/encyclopedia/financial-projections

Planning out and working on your company's financial projections each year could be one of the most important things you do for your business. The results--the formal projections--are often less important than the process itself. If nothing else, strategic planning allows you to "come up for air" from the daily problems of running the company, take stock of where your company is, and establish a clear course to follow.

Importantly

Variances from projections provide early warning of problems. And when variances occur, the plan can provide a framework for determining the financial impact and the effects of various corrective actions.

Projections are formal educated guesses – they don’t claim accuracy, but they intend to give a reasonable idea of progress.

With a well thought out projection, your team can continuously evaluate how their current progress lines up with their projection. As and when the real teams’ world progress and the projection start to diverge, they can realistically measure the difference between the two and adjust the projection accordingly.

Used in this way, projections are a useful tool for communicating with business stakeholders – suggesting a “best guess at when we might be doing this work” and a means of communicating changes in timelines divorced and abstracted away from the “coal face” of user stories, velocity, estimation and planning. Teams use their projections and actual progress to explain and adjust for changes in complexity, scope and delivery time.

If estimation and planning in software is hard, then projection is much harder – it’s “The Lord of the Rings” – a sprawling epic of fiction.

On a small scale, planning is predicated on the understanding that “similar things, with a similar team, take a similar amount of time” – and this generally works quite well.

People understand this quite intuitively – it’s the reason your plumber can tell you roughly how long it’ll take and how much it’ll cost to fit a bathroom. At a larger scale however, planning starts to fall apart. The more complex a job, the more nuance is involved and the more room to loose details and accuracy.

The challenge in putting together a projection is to bridge the gap between the small scale accuracy of planning features, and the long term strategic objectives of a product.

The Simplest Possible Process

In order to put together a projection and make sure that it’s realistic, you need a base-line – some proven and known information to use as a basis of the projection.

You’ll need a few things

  • A backlog of “epics” – or large chunks of work
  • Some user stories of work you’ve already completed
  • A record of the actual time taken to finish that work
  • A big wall
  • Some sticky notes

With these things we’re going to

  • Establish a baseline from the work we’ve already completed
  • Place our completed work on the wall using sticky notes
  • Estimate our new work in relation to our completed work
  • Assign our new work a rough time-box

A note on scales of estimation

Agile teams generally use abstract estimation scales to plan work – frequently using the modified Fibonacci sequence or T-shirt sizes. It’s recommended to use a different scale than you would normally use for story estimation for your projections, to prevent people accidentally equating the estimates of stories in a sprint, to estimates on epics in a roadmap.

I find it most useful to use Fibonacci for scoring user stories, and using t-shirt sizes for projections.

The trick to a successful projection is to find a completed epic or big chunk of work to “hang your hat on” – you can then make a reasonable estimate on new work by asking the question

“Is this new piece of work larger or smaller than the thing we already did?”

This relative estimation technique is called stack ranking – it’s simplistic, but very easy for an entire team to understand.

Take this first item, write it on a sticky note and position it somewhere in the middle of the wall, making a note of the time it took to complete the work as you stick it up.

You should repeat this exercise with a handful of pieces of work that you’ve already completed – giving you a baseline of the kinds of tasks your team performs, and their relative complexity to each other based on their ordering.

As you stick the notes on the wall, if something is significantly more difficult than something else, leave a bit of vertical space between the two notes as you place them on the wall.

clip_image002

Now you’ve established a baseline around work that you’ve already completed, you can start talking – in broad terms, about the large stories or epics you want to projects and schedule.

Work through your backlog of epics, splitting them down if you can into smaller chunks, and arranging them around the stack ranked work on your wall. If something is about as hard as something else, place it at the same level as the thing it’s as hard as.

clip_image004

Once you’ve placed all of your backlog on the wall, arrange a time scale down the right hand side of the wall, based on the work you’ve already completed, and a scale from “XXXS” to “XXXL” on left hand side.

clip_image006

The whole team can then adjust their estimates based on the time windows, and the relative sizes assembled on the wall. Once the team is happy with their relative estimates, they should assign a T-shirt size to each sticky note of work – which in turn, relates to a rough estimate of time.

You can now remove the sticky notes from the wall, and arrange them horizontally into a timeline using the “medium size” sticky note as a guide.

clip_image008

These timelines are based around a known set of completed work, but none the less, are just projections. They’re not delivery dates, they’re not release dates – they’re best guess estimates of the time window in which work will be completed.

Adjusting your projections

Projections are only successful when continually re-adjusted – they’re the early warning, a canary in the coal-mine, a way to know when you’re diverging from your expectations. Treating a projection like a schedule will cause nothing but unmet expectations and disappointment.

As you complete actual pieces of work from your projection, you should verify its original projection against its actual time to completion – making note of the percentage of over or under estimation on that particular piece.

Once you complete a piece, you should adjust your projection based on actual results – checking that your assertion that the “medium sized thing takes three months” holds true.

Once armed with real information, you can have honest conversations about dates based in reality, not fiction, helping your product owners and stakeholders understand if they’re going to be delivering earlier or later than they expected, before it takes them by surprise.

Using C# 6 Language Features In Your Software Tomorrow

07/20/2015 20:42:41

Visual Studio 2015 dropped today, along with C# 6, and a whole host of new language features. While many new features of .NET are tied to framework and library upgrades, language features are specifically tied to compiler upgrades, rather than runtime upgrades.

What this means, is that so long as the build environment for your application supports C# 6, it can produce binaries that can run on earlier versions of the .NET framework, that make use of new language features. The good news for most developers is that this means that you can use these language features in your existing apps today – you don’t have to wait to roll out framework upgrades across your production servers.

You’ll need to install the latest version of Visual Studio 2015 on your dev machines, and you’ll need to make sure your build server has the latest Build Tools 2015 pack installed to be able to compile code that uses C#6.

You can grab the Build Tools 2015 pack from Microsoft here: https://go.microsoft.com/fwlink/?LinkId=615458

I’ve verified C#6 apps, compiled under VS2015, at the very least run on machines that run .NET4.5.
If you’re doing automated deployment with Kudu to Azure, you may need to wait until they roll out the tools pack to the Web Apps infrastructure.

Have fun!

Passenger - A C# Selenium Page Object Library (Video)

04/27/2015 13:37:20

A lap around Passenger - a page object library I’ve been working on for selenium.

In this video, we’ll discuss the general concept of page objects for browser automation testing - then we'll work into a live demo comparing and contrasting traditional C# selenium tests, and tests using Passenger.

 

Resources:
Passenger on GitHub - https://github.com/davidwhitney/Passenger
Passenger
on NuGet - https://www.nuget.org/packages/Passenger/

Page objects for browser automation - 101

04/01/2015 16:56:11

The page object model is a pattern used by UI automation testers to help keep their test code clean. The idea behind it is very simple - rather than directly using your test driver (Selenium / watin etc) you should encapsulate the calls to it's API in objects that describe the pages you're testing.

Page objects gained popularity with people that felt the pain of maintaining a large suite of browser automation tests, and people that practice acceptance test driven development (ATDD) who write their browser automation tests before any of the code to satisfy them - markup included - exists at all.

The common problem in both of these scenarios is that the markup of the pages changes, and the developer has to perform mass find/replaces to deal with the resulting changes. This problem only aggravates over time, with subsequent tests adding to the burden by re-using element selectors throughout the test codebase. In addition to the practical concern of modification, test automation code is natively verbose and often hides the meaning of the interactions its driving.

Page objects offer some relief - by capturing selectors and browser automation calls behind methods on a page object, the behaviour is encapsulated and need only be changed in one place.

Some page objects are anaemic, providing only the selector glue that the orchestrating driver code needs, while others are smarter, encapsulating selection, operations on the page, and navigation.

The simplest page object is easy to home roll, being little more than a POCO / POJO that contains a bunch of strings representing selectors, and methods that the current instance of the automation driver gets passed to. Increasingly sophisticated page objects capture behaviour and will end up looking like small domain specific languages (DSLs) for the page being tested.

The simplest page object could be no bigger than

public class Homepage
{
   public string Uri { get { return "/"; } }
   public string Home { get { return "#home-link"; } }
}

This kind of page object can be used to help cleanup automation code where the navigation and selection code is external to the page object itself. Its a trivial sample to understand, and effectively serves as a kind of hardcoded configuration.

A more sophisticated page object might look like this:

public class Homepage
{
   public string Uri { get { return "/"; } }
   public string Home { get { return "#home-link"; } }

   public void VisitHomeLink(RemoteWebDriver driver)
   {
      driver.SelectElementByCssSelector(Home).Click();
   }
}

This richer model captures the behaviour of the tests by internalising the navigation glue code that would be repeated across multiple tests.

The final piece of the puzzle is the concept of Page Components - reusable chunks of page object that represent a site-wide construct - like a global navigation bar. Page components are functionally identical to page objects, with the single exception being that they represent a portion of the page so won't have a Uri of their own.

By using the idea of page objects, you can codify the application centric portions of your automation suite (selectors, interactions), away from the repetitive test orchestration. You'll end up with cleaner and less repetitive tests, and tests that are more resilient to frequent modification.

Page objects alone aren't perfect - you still end up writing all the boilerplate driver code somewhere, but they're a step in the right direction for your automation suite.

« Previous Entries

History