Doing Open Source Right

Doing Open Source Right

Monday 31 March 2014

A brief history of free* and open source software...

The rise of free and open source software is reasonably well documented with several significant projects built and released starting in the 1970s with Emacs and the first version of the GPL. The momentum gathered, pushed on by the popularity of Linux, Perl, Apache and the LAMP stack in the mid to late 1990s.

Free or open source software now drives a huge portion of the web, and during the late 2000’s and early 2010’s the popularisation of source code sharing sites like SourceForge, GitHub and BitBucket along with the realisation that billion dollar businesses could be built and operated on open source software pushed more software towards being “open by default”. What was once perceived as a risk (“giving my property away for free”) started to gain traction in private business and even traditionally open-source-hostile organisations such as Microsoft - started taking pull requests and publishing their source code.

This context is important - even if you’re not the kind of person who previously would’ve ended up writing and publishing your own software, there’s an increasing chance that you now will because the organisation you work for decides open source is worth investing in.

But I’m scared, I don’t understand why we’re doing this?

As a developer, there are lots of passive benefits to “coding in the open” - it’s a great way to learn, it’s a great way to contribute back, and as an individual, it’s the only real opportunity you have to legitimately “take your work with you” from job to job. Open source software can become your professional portfolio, and as it does, getting it right is important.

As an organisation, the motivations behind adopting open source software are obvious - “hey free software!” - but the benefit in publishing your own open source is a little more obfuscated. There’s a moral aspect to it - if you’re building your business on open source software, it’s perhaps the “right” thing to do to give back to the community you’re benefiting from. Giving back isn’t going to make you money - it’ll probably cost you some - but there are good reasons why publishing or contributing back to open source projects is a rational thing for your business to do.

Open source is a great way to attract talent - hiring excellent people is hard and developers who enjoy contributing to open source software will be drawn to businesses willing to pay them to do just that. Their enthusiasm is infectious and will make your teams better. It’s a solid publicity tool to raise the profile of your organisation in the tech community. It’s a good way to enhance confidence in your business amongst technical people - if they can see your code, and it’s good, you’ll win supporters. In the end if you get external contributions back to your open source projects, that’s a nice thing to have.

If that all sounds intimidating, it’s ok - the fear of continual evaluation and scrutiny is human, especially when you consider we’re an industry of professional amateurs learning much of what we do as we go to keep up with the pace of change in the tech industry. As you increase your contributions, you get more familiar with the kind of feedback cycles open source gifts you with, and hopefully it all becomes a lot less intimidating - everyone is in it together. Reading lots of code makes you a better developer, and contributing back makes you better still. You’ll learn from experts, and maybe teach somebody else along the way.

So lets run an open source project!

Like everything else about building great software, open sourcing software requires discipline and effort. But there are some real world, practical tips to making your open source software successful. Remember that open source is a commitment - you can’t trivially “un-open” your software - once it’s done it’s done.

Don’t surprise potential users or contributors

Follow a predictable repository layout. There are some strong language neutral conventions for open source project topology that people have grown to expect. People know to look for familiar signposts in plain text or markdown; README, LICENSE and CONTRIBUTING files are essential and the guidance in them should always be accurate.

The README serves as the top level overview and getting started guide for your project. Compilation instructions, quick-start examples and links to any deeper documentation are essential. The CONTRIBUTING guide should give potential contributors useful information and your LICENSE file will likely be standard and people will expect to be able to see it.

Make sure building and testing is easy

Regardless of language or platform, you should stick to the established language conventions in that ecosystem.

People will give up on your project if the barrier to entry of build and testing is difficult or requires a lot manual configuration. Practically all mainstream programming languages have mature package management solutions, so use them. If you’re doing Ruby, make sure you’ve got a working rakefile, if you’re in .NET, I’d expect “F5” to build and run your project. Keeping that barrier to entry low is essential.

Where possible, leverage cloud continuous integration services to provide confidence in the current build of your software - it’ll help potential contributors know if they’re dealing with “works on my machine” problems.

Guiding contributors

The contributing file is your contract with potential contributors. It should give them useful information. You should make sure that you guide them towards running your test suite, explain how you’d prefer any pull requests or code submissions to be delivered, outline the coding conventions, and highlight key contributors.

Obviously this is a two way street and in order to encourage high quality contributions, you have to keep your end of the deal. It’s common to require a failing test and a fix for any contribution - this’ll make your life easy, but if you don’t publish a decent test suite or set of unit tests, you can’t realistically expect it. A lack of tests will dissuade contributions - would you change some code without knowing what the impact could be?

Be responsive and communicative

You might not want all the contributions that come your way, and it’s perfectly fine as a project owner to say no to a change that isn’t relevant to the software so make it clear what kind of changes you’re interested in. The simplest way to do that, is to guide users to create an issue in an issue tracker before they start working on a code submission. This helps stop people spending time and effort working on code that you later reject, preventing any animosity between potential contributors.

It’s also useful to create a roadmap of issues in your issue tracker, flagging simple changes that may be suitable for first time contributors if you’re looking to encourage submissions. This is a great way to gain confidence in submissions and a clear way to communicate the direction of development.

Finally, it’s important to respond at all. Respond to issues and pull requests in a timely manner, make sure you have a few canned twitter searches or alerts for people struggling with your software, and if need be, use free tools like Google groups to encourage searchable discussions that might help others later rather than private email conversations.

Don’t be afraid of criticism

By publishing your code, you’re welcoming comments and feedback - it’s not always going to be positive, but you should do your best to steer it towards being constructive. It’s worth remembering that if your code made somebodies job easier, or life better, it was worth publishing, even if it’s not the best code you’ve ever written.

Selecting a reasonable license

If you’re releasing source code, you must license it, even if you just want people to be able to "do whatever they want" with it. Choose A License offers excellent overviews of the most popular open source licenses, but the really short version is this:

  • If you care about users of your code contributing back and enforcing "software freedom", choose a "copyleft" license, probably the latest version of the GPL. If you’re publishing a software library, you probably want the LGPL.
  • If you just want to put the code out in the open, and not have anyone try and use it against you in a court when they destroy their business with it, go with the MIT license.
  • If you’re worried about contributors submitting patented code and were considering the MIT license, you should probably go with the Apache license.

These are the most popular licenses, and they’ll probably cover what you’re trying to do. It’s worth noting that the GPL is a viral license, requiring software that includes GPL’d code to also be released under the GPL - it’s central to the philosophy of the FSF and the free software movement, but can be a barrier to adoption in for-profit organisations who don’t want to open source their own software.

Things to avoid

There are a few anti-patterns when it comes to sharing source code.

Using an open source repository as a “squashed, single commit mirror” defeats much of the purpose. Compressing your commits into single “Version 1.2”, “Version 1.3” commits hides the evolution of the software from people who might have a genuine interest in the changelog. This leads people to believe the the software is “open source in name only” and it’s hostile towards contributions.

Avoid pushing broken builds to the HEAD of your repository - if need be, maintain a development and a master branch, with only good, clean, releasable code going into master. This is just good practice, but when people who you don't know could well be building on your codebase it becomes a worse than just ruining your colleagues day.

A quick recipe for success

We’ve talked about a broad range of topics here that will help you run an open source project responsibly, and why you’d want to do that - but lets nail down a specific pattern for running your first open source project using GitHub.

  • Sign up for a GitHub personal or company account (free for open source)
  • Select a license (Apache or GPL are sane defaults)
  • Publish your code in a Git repository on GitHub
  • Publish tests with your code
  • Use GitHub issues to construct a roadmap of future features
  • Tag some future features as “trivial” and suitable for new contributors
  • Include a contributing.md file that asks for a test and fix in a pull request
  • Discourage people sending pull requests of refactors or rewrites without prior discussion
  • Include obvious scripts in the root of your repository called things like “build-and-run-tests” to give people the confidence to contribute
 
*Footnote: Free Software vs. Open Source Software

There has long been contention between the concepts of “free software” and “open source software”, and while “all free software qualifies as open source, but not all open source software is free as in freedom” - I’m going to be avoiding the distinction here. If you’re interested in the discussions around this, this summary on Wikipedia is a good place to start, along with GNU’s article on “Why open source misses the point”.

If you’re not familiar with the distinction, free in “free software” is free as in "liberated, independent, without restrictions", while many mistake it to mean "costs no money". This is often explained as "free as in speech, not free as in beer" which I’ve never thought as an especially informative one-liner.