When Good Code Goes Bad

Most systems start off with a nice clean design, but over time software starts to rot.

Does Design Matter?

  • Software design is about managing the complexity of a system. Unmanaged complexity leads to difficulty in making changes to the system, which is important because all software development is change.
  • Poor design leads to lower quality systems, less functionality to clients, higher costs (for everyone), loss of business and misery for developers.

Why Do People Tolerate Poor Design?

  • Business pressures:
    • Developers are often encourages to working software fast, so “quick and dirty” solutions go into production
  • Lack of knowledge:
    • Introductory courses / texts tend to focus on syntax rather than design
    • People learn from what they see, and there are a lot of poorly designed systems to learn from
  • Scale effects:
    • Larger systems benefit from good design more than smaller systems
    • It takes more effort to design larger systems well
    • Fewer developers have experience developing larger systems

What are Design Principles?

Design principles help us to understand the “rules” about the best way to manage complexity, and therefore maintiain and increase the value that is delivered by our code. They are best viewed as guidelines. They shouldn’t be followed blindly, as there are costs involved in using them.

What is Design Rot?

Most systems start of with a nice clean design, but over time, “software starts to rot”. As “Uncle Bob” Martin says:

At first it isn’t so bad. An ugly wart here, a clumsy hack there, but the beauty of the design still shows through. Yet, over time as the rotting continues, the ugly festering sores and boils accumulate until they dominate the design of the application. The program becomes a festering mass of code that the developers find increasingly hard to maintain.Robert C Martin, Clean Code

What are the Symptoms of Design Rot?

Martin identifies 4 symptoms of rot:

  1. Rigidity: Software becomes difficult to change, and changes cause other changes in dependent modules. Simple changes become expensive, developers become fearful, unknowns increase.
  2. Fragility: The tendency of the software to break in many places every time it is changed, sometimes in areas that have no conceptual relationship with the area that was changed. Fixes introduce new bugs. Developer credibility is lost.
  3. Immobility: The inability to reuse software from other projects or from parts of the same project. Drawing on existing modules is impossible because of they bring in too many dependencies.
  4. Viscosity: Design Viscosity occurs when the design-preserving approaches are harder to use than hacks, i.e. it is easier to do the wrong thing than the right thing. Environmental Viscosity occurs when the development environment is slow and inefficient, leading to the temptation to take short cuts.

These issues tend to be more of a problem as the size of a software base increases.

What are the Causes of Design Rot?

Each of these symptoms is mainly caused by improper dependencies between the modules of the software. Therefore, managing dependencies between modules is at the core of good design. This applies at several levels: framework, library, package, class and method.

Where Does it Happen

At all levels of abstraction:

  • Function / method level
  • Class / module level
  • Package level
  • API level
  • In the interfaces between systems

How Do We Avoid Design Rot?

Some of the general principles are:

  • Value design and pay attention to it
  • Iteratively improve design
  • Improve design at each layer of abstraction
  • Desgin modules that are small and focused (high cohesion)
  • Reduce dependencies between modules (low coupling)
  • Remove redundancy and repetition
  • Keep learning about design and architecture

Troubleshooting

These general steps can be used to troubleshoot anything.

Step 1: Ask Questions to Understand the Presenting Problem

  • Who is effected?
  • What happens?
  • What should happen?
  • Has it ever worked?
  • Where does it happen?
  • Does it happen for everyone?
  • When does it happen?
  • What were you doing?
  • Has anything changed?

Step 2: Gather Evidence

Seek out information that will help you investigate and to support your understanding.
From, for example:

  • Other reports
  • Logs
  • Attempts to reproduce the issue

Step 3: Formulate a Theory

Step 4: Test to Disprove

It is important to attempt to disprove to overcome “Confirmation Bias”.

Step 5: Provide Recommendations for Further Investigation

Based on understanding of impact vs cause.

Sources of Data on IT Operations Performance

Data that tells you what is going on in your IT infrastructure comes from the following sources:

  • Wire Data (network packets)
  • Machine Data (e.g. CPU usage for workstations and servers)
  • Agent Data (that tell you current number of users, which applications are used when etc.)
  • Synthetic Data (data that you check as required, e.g. “is it available now”)
  • Human Generated Data (e.g. knowledge bases)

It is essential to gather data, so decisions are made based on information rather than guesses.

Solved: Missing Target Framwork in Visual Studio

Here is how I fixed my missing .Net target framework 4.0 issue in VS2010.

The Symptoms

Visual Studio 2010 stopped detecting the .Net framework 4.0. Specifically, this framework version was missing from the “Target framework” drop-down on a project’s property page. Versions 2.0, 3.0 and 3.5 were present in the list, but not 4.0. Reinstalling the framework and rebooting had no effect.

The Solution

The problem was that the file,”FrameworkList.xml” was missing from folder “C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0\RedistList“. Copying this file from another machine solved the problem.

I am now able to target .Net 4.0 again:

Further Reading

  • I found this solution at the end of this discussion.
  • There is also a stackoverflow question dedicated to this issue.

Solved: BizTalk Map Renaming Fails in Deployed Application

BizTalk map renaming isn’t quite as simple as renaming the map.

The Problem

I renamed a BizTalk map in VisualStudio’s Solution Explorer, compiled and deployed the application. However, in the BizTalk Management Console, the renamed map still shows up with its old name.

The Solution

This is one of those BizTalk things that we’ve all been caught out by at least once.
The solution is to right-click the map name in Solution Explorer and click properties. In the properties window, change the name of the Type to the name of the map. Then redeploy.
BizTalk Map Renaming

Everything is Suddenly Asking for Admin Permissions: Solved

The Problem

All of a sudden, odd things have stopped working and others are asking for admin permissions. Explorer won’t let me see things I can normally see, scripts aren’t running… This seems odd, as I’m already an admin on my box. What’s going on here?

The Conditions

This has happened to me several times now, and each time the circumstances have been the same:

  • I’m Windows on a network
  • My password has changed recently
  • I haven’t logged in recently on the box where I’m having problems

This is quite a common scenario for me at work because most of my programming work is done on a VM that I very rarely restart, or even log off. At the same time, whenever my password expires I change it on my physical box rather than the VM.

The Solution

Log out and back in again.

The Reason

Your credentials for your current session are stale.

Solved: BizTalk Scripting Functoid Inline Script Issue

The Problem

I have been developing a rather complex map that includes various scripting functoids for manipulating dates.  One of the in-line C# scripts started producing output that simply didn’t make sense. I ran the code in LINQPad, and it produces the expected output, but testing the map resulted in some bizarre behaviour.

My code looked like this:

Given an input node that contains:

1936-08-07T00:00:00

I expected:

19360807

But received:

193608071200

The Solution

I validated the map, which generated the XSLT that is actually run on the input. I was surprised to find that the C# code embedded in the XSLT looked like this (notice the difference in the string formatting on the 6th line):

Why is this different from the code in the scripting functoid?

MSDN documents the answer:

Avoid using the same method signature more than once. When several Scripting functoids have the same method signature, BizTalk selects the first implementation and disregards the others.

It turned out that I’d created a similar functoid elsewhere in the map that uses the same method signature (name and parameters), but had the implementation above. It turns out that BizTalk recognized that more than one function was defined with the same name, and then silently ignored all but the first one.

Further Reading

Solution: Users Don't Receive Email Sent by an Application

I don’t know much about Exchange, so was baffled when one of of one of our applications couldn’t send emails to an Exchange-based distribution list.

Symptoms

An application sends regular emails to numerous users, and this has been working for some time. A new Exchange based distribution list was set up so that a group of users could receive some of the emails.

  • Users in the distribution list receive the emails if the application was set to send them to their regular account.
  • If I send an email from my account, it reaches everyone in the distribution list.
  • Everyone else receives the mails from the application.
  • Nobody in the distribution list receives emails.

Diagnosis

The new distribution list is the only thing that has changes, so there was obviously something wrong with the list.
I asked one of our Exchange admins to investigate. Exchange message tracking suggested that the emails were being bounced back to the sending account.
Looking at the sender’s returned email revealed the problem with the distribution list.

Explanation

The distribution list was configured to only accept emails from authenticated users. Mails to the list that come from me get through because I’m logged in. However, the application doesn’t authenticate with Exchange when it sends emails. As a result they were being blocked by Exchange.

Solution

Change the settings on the distribution list so that it will receive emails from non-authenticated users.

Lessons Learned from a Failed Deployment

Last week we were scheduled to replace a critical component in a complex, mission-critical hospital system. About two-thirds of the way through the deployment, it became clear that I had missed something during the preparatory work for the change (security, always check security). Additional work would be needed before we could complete the upgrade, and it was very likely that we wouldn’t finish the deployment on time…

Lessons from Previous Implementation Projects

Given the critical nature of this change, experience told us that we needed to do things “properly”. Previous experience suggested that we needed to:

  1. Test the new solution thoroughly (we put 2,000,000 transactions through the new component and compared the results to the old solution).
  2. Write a sufficiently detailed implementation plan
    1. Include prep-work required prior to implementation
    2. Include enough detail so you don’t have to think during implementation. This helps under pressure, and ensures that energy is available to tackle the unexpected.
    3. Outline post-implementation work required
  3. Test the implementation plan (This was not possible for us due to differences between our test and live environments. Rectifying this would cost £ hundreds of thousands).
  4. Write a sufficiently detailed roll-back plan
  5. Test the roll-back plan (Again, not possible).
  6. Keep users and stakeholders informed… allowing plenty of time for them to make necessary arrangements for down-time.
  7. Define a change window
    1. When you’ll cause least disruption during the change
    2. When failure of the new component will cause least chaos
    3. When you have enough support from others
    4. When you’ll have enough time for post-implementation testing
  8. Get approval from stakeholders… in writing
    1. Explain the purpose of the thing you’re changing
    2. Explain why you’re making the change
    3. Say how things are at the moment
    4. Say how things will be in the future
    5. Explain how you will monitor the new solution
    6. Prepare your implementation and roll-back plans in advance
  9. Check the state of the system before changing it (so we could be sure that any faults were due to our changes and not existing faults)

Well, we had done all that, but I had made a minor mistake during prep, so things were going badly.
So, my team leader made the call: to roll back.

Lesson 1: Be Prepared to Roll Back

I don’t just mean having a written plan, although that was extremely useful. I mean psychologically. It is sometimes hard to admit defeat. However, it is better to roll-back than either (1) upset customers by breaching the change window and (2) making mistakes whilst working under pressure. It just isn’t worth it.

Lesson 2: A Successful Roll-Back Is Not a Failure

… it is a tactical retreat. As we had a good roll-back plan we were able to revert to the old module without loss of data, and to do so within the change window. We had maintained the status quo.

Lesson 3: Roll Back Completely

You really don’t want to leave a system in an indeterminate state. As it was, we left some things in place ready for our next roll-out attempt. This was a mistake, as it caused (1) some minor confusion, and (2) if we had forgotten and done more testing, we could have caused corruption of live data.

Lesson 4: Communicate

We explained the reasons for the roll-back and the steps we had taken to make sure that we wouldn’t experience the same difficulties again. This was trust-building, and others were supportive of our action.

Lesson 5: Rally and Retry

Once the roll-back was verified the others went home. I stayed late to fix the problem that had caused the implementation issues.

Conclusion

This was a great learning experience for me. Today we did the implementation again. But this time, it went smoothly.