Chipping away at legacy code

The net is full of best practices & methodologies that you are told to absolutely follow from the very first commit you do in a project. Armed with this knowledge you are ready to tackle any software project. Right? Wrong. Good luck inheriting a code base or joining a large and established project. In this post I will try to outline some of my personal observations and rules I derived from them. Hopefully this perspective will help you start chipping away at that blob of inherited code.

Inheriting a code base

Your first task when working with a new code base should always be building it yourself and running it. Get a feel for the program. How hard is it to build? What's the problem domain? How does the interface work? Can you drive it? Spend a good chunk of time on it. Depending on the size of the program a couple of days is not overkill. The moment you start working with the code you will loose perspective of the end goal and will be lost inside the guts of the software without a big picture overview. Jumping in code first is often the worst thing you can do as you won't know the impact your changes have on existing users of the software.

There is one more benefit in having a users perspective first. You will catch moments while looking at the code that will immediately remind you of a part of the interface that you already visited. It's harder to go the other way around - trust me. Even with a code base that I already know well or even wrote myself - I keep the UI part of the application open when I'm working on the code responsible for it's behavior.

So you've spent a couple of days playing around with the code. What's next? Stop and check if the code is under version control. If not immediately put it under any version control. I can't stress this enough. Even if the company you work for has a policy (for whatever sick reason) that prevents the team to work together on the code with version control - make a private repository and just contribute back plain changes like everyone else (but seriously try to change that).

The code already had version control? Great! You're off to a better start then some pour souls out there. Check the commit log and see what was changed recently. You don't have to read each commit in great detail but try to get a feel for the pulse of the project. How often changes are made? Are there any recurring patterns ('Fixed validation in field A', 'Fixed validation in Field C')? Is the code mostly receiving bug fixes or is it under active feature development? Take a look at some of the commits, hopefully the commit messages should help you understand what the code change was supposed to achieve. Is the author still around to ask questions? Try hitting him up and ask for a 5-10 minute walk-through for some of his most recent commits. This mostly applies to people working together in the same office but it also works nicely for remote projects if you are on good foot with the maintainers. Though don't abuse this, ask once on your first day and later on only occasionally on really complicated changes that you should really understand.

Enjoyed the tour? Now time to do some serious work. Does the project have any documentation?

Undocumented code is useless

It does? You keep amazing me. Believe me there are legions of programmers not as lucky as you. Wait, you didn't state which documentation. User docs and the technical documentation are both there? Unbelievable.

Open up the user documentation on the part of the interface you played with on your initial tour of the code. Is the application really behaving like the docs state? If not, make notes. Do not ever feel even tempted to change the code to behave according to the documentation. Blindly making changes in the code to match a possibly outdated document is never a good idea. People working on the code were probably changing the application behavior in a hurry with a client breathing down their necks - you don't want to meet the same client the moment you revert the changes in the code to match the documentation.

You are making notes because your first and possibly most useful contribution to the code base will be to update the documentation to actually match what the code base currently does. This may seem like a mundane and boring task but everyone will love you for it - trust me. Especially the new guy that will have to go through the same path as you (especially if they throw him a hot potato on the first day).

Don't try to go through the whole application. Depending on it's scale it might not even be possible. Pick two areas of the system you first want to learn and stick to it. This exercise should show you three important things:

documentation patterns - how are things named (trash, delete, remove - undo, restore, Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtan), layout of the document & covered level of details
interface patterns - is there a guideline on how the application interface is laid out and how common tasks are achieved?
behavioral patterns - are certain actions triggering the same behaviors? How are background tasks handled? How is information displayed to the user? What's the navigation pattern (back/forward buttons etc.)? Does a delete operation have an undo option? Are actions executed immediately or do they require a confirmation?

Done? Pick up your notes and update those documents. Remember to ask someone to peer review the changes. Yes, documentation should absolutely be 'code reviewed'. During those code reviews more experienced team mates will let you know how much more outdated the docs were - you won't even believe it.

Growing impatient? You want to get into the code don't you? Well we still have the technical documentation to go through - remember you are the lucky one to even have outdated docs.

You should more or less go through the same process. Locate a piece of code in a part of interface that you already covered in the previous steps. Take a look at it. Is it documented? Great. The docs are correct? Again, don't modify the code to match the docs. Make notes and go through the process again. This time also go through the functions called by your entry point. Don't go too deep on the first try, two levels down should be enough for now.

There is one extremely large difference with technical documentation. This is important so please remember it well. If you find that the docs are incorrect. Even if the code you are currently looking at is without any flaws (accounting for the current behavior of the called code). Stop. Note that function down and start searching (in an automated way - grep, ack, the silver searcher, intellisense etc) throughout the whole code base for other uses of that piece of code. You will probably uncover improper usage that results from blindly following the docs, or code written when the docs were still correct. This code should go under a strict review for potential flaws. You might need help from other team members that already know those parts - if you're alone then just note this down and do a throughout review when you have the time for it and a better understanding of the other parts of the system.

People will tell you that code should be self documenting. This is true but you should never take this for granted. Code is not cut in stone. It changes and evolves over time. Properly named functions will have a misleading and devilish names when pressure is applied to a bunch of developers. In my career I saw innocent getters that had horrific side effects. It doesn't matter that the function was previously named get_account_balance and did only that. Someone down the line added get_credit_card_cycles as a call in that function and it recalculated the cycles and stored the new results for caching purposes - that's a side effect added by accident. If you notice this - slap a big red warning in the tech docs on any getter that has undocumented side effects.

The biggest takeaway from this part of the article is that code without documentation is useless at best. More often using undocumented code is a recipe for disaster - the biggest contribution you can make to a project or a legacy code base that you inherited is documenting how it actually behaves now.

If you weren't as lucky to have even outdated docs. Do the same steps. Outdated documentation is practically undocumented code.

Understand your interface

The most dangerous bugs result from improper understanding of the called code. Most programmers will either blindly trust the documentation, word of mouth or use the code based on the names in the header of the function/interface they are calling. This is really sad but the most dangerous part is that the code will most probably work for the general use case. Before using any piece of code, you should understand all it's inputs and return values. You should check those return values for any errors and most of all - making sure the documentation is up to date with the code before relying on it.

If you ever make a mistake while using a specific function. Do yourself and others a favor and even just glance over other call sites for that function in your code base. You will most probably find many more places where people made a similar mistake.

Take a note of this function and think how can you eliminate that class of errors from the interface. If the proper call is so convoluted that it takes 50 lines of code to do a proper call and error handling - immediately create a single purpose wrapper - you will thank yourself later on. If you need examples of this happening in real code, then look no further than the new libtls API from OpenBSD. Using the original OpenSSL API properly was extremely hard to do correctly.

While designing interfaces, write programs using them before you write a single line of implementation code. This little exercise will quickly expose cumbersome pain points that should be addressed as quickly as possible, before the new API starts getting used and literally bolted down by backwards compatibility.

Broken by default

There's a fun thing I observed about programmers. We tend to think that a typical user is mostly en-shackled by our decisions. There are a lot of studies proving that most users never change the defaults that are set in systems. What makes me chuckle is the fact that we as programmers fall into the same trap all the time.

Sane defaults matter. Do yourself and others due diligence and spend a significant amount of time before settling on one. If your API has several optional parameters - most people will never set them. If you see a function with 80 obligatory parameters - that's really bad. If 70 of them are optional - then it's a recipe for disaster. When you inherit a code base look carefully at the defaults set for those monsters and identify what potentially unwanted behavior is triggered in them by default. Your next step is going through all the call sites and checking as much as you can for those that triggered something unintentionally.

You might find it funny but it is a real problem. Once while working on a data repair script I had tremendous performance issues with my fix. Profiling the code revealed that the reason was additional work done by a simple getter (it performed additional validations) - those validations were responsible for 80% of the runtime of my fix. During the hotfix I inlined the basic work I wanted to be performed inside my fix and reported the performance defect in order to get it fixed later on. Indeed the function was altered and introduced an optional parameter that disabled the validations, it even allowed us to gain a nice performance increase in some parts of the system. The problem though was that the new parameter was set to perform validations by default in order to be backwards compatible. That function was used all over the whole code base, when I did a throughout code review before leaving the company - just flipping that default bit in the parameter would increase performance across several critical tasks in the system by 15-25%. What would remain is several code sites that did require those validations but the proportions were 1000:1 (not needed : needed).

Code coverage is not a productivity index

If you inherit a code base, you will probably have zero code coverage and possibly no unit tests. My advice is to work on current issues, reported bugs and slowly adding tests in as time permits. You should prioritize regression tests in order to make sure that the just fixed issue won't come back.

Unfortunately legacy code tends to be monolithic. Possibly most of the code won't be easily testable (dependent on a lot of state). Please refrain from blindly refactoring parts of it just for the purpose of unit testing - unless you really know what you are doing. This is mostly a performance story. Adding layers to an old and established system with critical performance paths can have catastrophic consequences. Consult with someone who knows the performance characteristic of the code or profile it yourself before doing that.

The second warning is mostly for people in management. If you decide to implement unit tests & code coverage in your projects - don't treat it as productivity milestones for teams. You will do yourself and everyone else only harm. I was in a situation before, where a company introduced unit tests for a legacy system and required a 1,000 new tests as the quarterly milestone. People quickly gamed that system, instead of working on regression tests for currently burning issues in the unstable part of the application - everyone wrote external API tests (API exposed by the system to external developers) since all of them had 10-60 input parameters and simply doing input validation tests with a permutation of the inputs was easy enough to reach goals - this did not help quality in any way.

Static analysis paralysis

You might be tempted to run a static analysis tool on the newly inherited code base. I love those tools but they are a double edged sword. The worst thing you can do is go nuts on every warning raised by those tools and applying the most obvious fix without considering the consequences - I saw this happen too many times.

These tools will often warn you that some part of the code is dead (unreachable if block, statement without effect, uninitialized variable etc). Some people tend to go in and remove the if block, remove the statement and initialize the variable to NULL or 0. Yes it will satisfy the tool, but this warning could have far more serious implications. That if block - maybe it should execute? That variable - maybe initializing it to NULL will introduce an error (I saw this one on the OpenBSD mailing list). I also saw a company pushing interns with a goal to reduce the amount of warnings/errors reported by those tools by X. Guess what they did?

Static analysis is a tool. The only thing it will do is point you at code that you should analyze carefully. Don't fall into the trap of satisfying those tools without knowing why the changes you are making are correct.

Jenga architecture

The final bit of advice I want to give you is not to over engineer your solutions. Don't go into a code base with an intent to replace all the base mechanics of a system with extensible architecture filled with best programming patterns. I saw this happen in real life. Where people new to the code base were given free rain and turned by force a monolithic, structural code base into object oriented, DDD architecture with new untested mechanism. This had two significant impacts on the code base:

The remaining development team was separated from the new code as they were busy fixing actual business problems
The software was made unstable as a Jenga tower when blocks were fully pulled out from the foundation and replaced with untested new pieces

This was a disaster. I strongly believe that you should approach a code base with respect to it's current authors. If you want to make significant impact - chip away at it slowly. Point out no longer used functionality and find safe ways to remove it. You should feel like a surgeon cutting out only the sick parts of the organism with the least harm you can do to your patient.