Object-Oriented Design

Written (C) 2005-2008 by Wayne Pollock, Tampa Florida USA.  All rights reserved.

[Some of the following material adapted from Cay Horstmann’s excellent “Object-Oriented Design & Patterns”, (C)2004 John Wiley & Sons, Inc.]

Write your code as if the next person to maintain it is a homicidal maniac who knows where you live.  ­–“Head First Servlets & JSP” p.314

[Some of the following material adapted from Cay Horstmann’s excellent “Object-Oriented Design & Patterns”, (C)2004 John Wiley & Sons, Inc.]

Following a design procedure to produce software (or anything) is an absolute must.  (I grant it can be hard for a student to see that from the simplistic programming assignments given in a classroom setting.)  Legally speaking “software engineering” (a.k.a. programming) is not a recognized engineering discipline.  In other professional fields, not following standard procedures carries severe consequences, including legal penalties.  For programmers this means you can’t be sued for mal-practice, but it does account in part for the bad quality of much of the software used.  Following proper design, testing, and auditing is often required by government and industry regulations for software quality control.

There are many standards and certifications that apply to software.  Most concern the process of development, and rarely force a particular software design choice.  These are intended to ensure quality software.  Examples include ISO 9000 certification (required to sell software in the EU), DO-178B/DO-254 certification (required for avionic systems), etc.  There are also applicable security and safety standards for various industries (such as PCI-DSS standard for handling credit card data, HIPPA for health data, etc.).

So why must you follow a design methodology?  Many common programming tasks seem simpler than they really are and can require a genius to get it right.  Some of the issues “production” or “professional” quality software should address include thread safety, resource management (e.g., memory or resource leaks), security (protecting confidential data), and safety.  Following standard design patterns and design procedures means your software will use the best known solutions.  Not following proper design and testing methodology leads to systems that in the past have bankrupt companies and endangered human life (see Wikipedia “Therac-25”).

From the point of view of a designer, objects are things that have identity, behavior, and state.  A class describes a group of similar objects.

The state is information that objects store in their fields (or properties, or attributes or instance variables or ...).  This state allows object methods to remember the results of previous method calls, and to cache data that would be expensive to lookup or calculate each time.  Thus the required state is what is required to implement the object’s methods.

In (nearly) every design the state is private and accessible only to the methods.  In this way the exact implementation of state is not part of the public interface to the object and can be changed as needed, without fear of breaking anything.

The behavior of an object is determined by the set of its public (and protected) methods and constructors.  This is determined by the design.  However some methods determined by the design may be too complex to be easily implemented or tested, or may contain duplicated functionality with other methods.  Such methods can be broken into several smaller private methods.  Note that only the public methods define the behavior; the rest of the code is implementation detail that can be changed as needed.

Sometimes the best way to implement state and behavior is to define “helper classes” and use such objects to hold state information or to implement some behaviors.  Such classes should not be visible outside the current package.  In some cases using private nested classes is a more convenient or efficient implementation.

Software Projects are developed in phases: initial idea, requirements specification, design, coding, acceptance testing, delivery, and maintenance.  These phases often overlap in time, and some phases get repeated.  Broadly speaking you can define the first three phases as:

1.     Analysis (initial idea + overall design)  Here you need to determine what the software should do.  The results of this should be the requirements.  Initially you have a vague description (a project proposal, sometimes called a RFC or RFP), but you must determine complete and exact requirements, including performance and cost.  Remember that what is obvious to the VP of marketing that created the project proposal may not be obvious to you or other developers!  Two common ways of doing specifying the requirements (usually both are used) are:

·        Use cases (or scenarios)  A use case is a description of a sequence of actions to accomplish some task.  It is like a storyboard, showing each step of the user interaction with the system.  A set of use cases can fully describe the requirements of a system.  This can also be called task flow or work flow.

·        Functional (text) specification a functional specification must fully describe all the terms and concepts needed, then using these describe every task.  It is difficult to make a functional specification that is complete, unambiguous, free from contradictions, and readable.

2.     Design   Here you determine the objects and classes required to fulfill the requirements.  The design should be as simple as possible, and extensible in light of the fact that requirements change over time.  (It often pays to think about the more likely potential changes and to plan a design that would make implementing such changes cheap and easy.)

The design process can be broken down into these three steps.  Note however they are not done in this order, rather you jump around the steps, repeating as necessary.

A.   Identify the classes.  The really means identify the objects required.  This starts with looking for nouns in the requirements.  In the end, some proposed classes will be trivial and end up as simple primitive types (or Strings) and as fields of other classes.  Some proposed classes have no behavior and are really just database records.  Each proposed class should have a single responsibility, or a (very) small set of related and interdependent responsibilities.

After finding the easier proposed classes, you need to think about the big picture.  What else is needed to make the design complete?  Consider a voice-mail system.  What are the obvious objects?  (users, messages, mailboxes, passwords, phone extensions)  What else?  (Menus that can be altered over time, or perhaps multi-language menus, timestamps.)  These are the obvious objects, but how will messages be stored in mailboxes?  You need a queue (or FIFO), perhaps several (for new and saved messages).  Here is a list of categories of classes; which do you need?

·        Things (real-world items, devices, systems).  Whole systems (VoiceMailSystem) are convenient top-level classes that can contain a main and system initialization and shutdown code.

·        Actors, users, agents, and roles.  (Often have names ending in “or” or “er”, e.g., User, Administrator, Student)

·        Events and transactions (e.g., Sale, Update)

·        Collections and other foundational (standard library) classes (Set, String, Rectangle, Date)

Name your classes with names that are descriptive of (an) object of that class: Message, not Messages (unless the object is a collection) or MessageObject or Msg.  Avoid generic class names such as User, Agent, Item, Event, etc.  If the obvious name turns out to be a verb (e.g., BuildX, MakeX) then you should rethink the design.  (This can happen if the name is just a noun; the tip-off is if the class has a single “doit” method!)  If the obvious name contains the word “and” you should rethink the design.

Identifying classes is hard!  With experience you learn to avoid having one “master” overly complex class plus a bunch of trivial classes.  Each class should do some of the work.  (Qu: Why?)

B.   Identify the responsibilities of the classes.  Responsibilities are at a high level of abstraction.  Don’t try to determine the methods required at this point.  A typical responsibility might be to “Manage Passwords” rather than listing “read password” and “update password” methods.  You find these by looking for the “verbs” in the requirements.

Responsibilities often fit into a natural ordering or layering (level) of abstraction; that is, some responsibilities are low-level (processing mouse-clicks), middle-level (Manage passwords) or high-level (initialize the whole system).  When assigning responsibilities to classes/objects you should make sure all the responsibilities are at the same level for any one class.

Each responsibility becomes one or more public methods of the class; everything else discovered later is private.  When doing this step you will often determine state or instance variables are needed to do the tasks identified.

Classes should have neither too many nor too few responsibilities.  A class with none is probably a Foundational class and is too trivial to worry about (it becomes an attribute, for example a class Point).

All the responsibilities of a class should be related.  If not, you probably should split the class into two or more simpler classes.  Ideally every class should have between 1 and 3 responsibilities.  The degree to which all the parts of a class related to each other is known as cohesion.  One goal of your design is to have classes with a high degree of cohesion.

Don’t add responsibilities that aren’t required, even if it seems that implementing them would be easy!

C.   Identify the relationships between the objects.   A bunch of independent objects that do not interact would make design, implementation, and testing very simple, but the resulting system would not be useful!  The degree in which different objects depend on each other is known as coupling.  A goal of your design should be to minimize coupling, and to use standard designs when you must have two or more objects interact.

In general we say that interacting objects are collaborating and have an association or dependency or link between them.  Three common types of association are:

i.       Dependency:  (commonly called the uses-a relationship)  Consider a voice-mail Message class with the responsibility of playing itself.  One design of a method to play the message could be:

public class Message {
   private AudioClip message;
   public void play ( Device dev ) {
      dev.output( message );
}  }

But this design is coupled with Device and AudioClip classes.  What if you gave the responsibility of playing the message to a Device object?  The revised Message class might then look like this:

public class Message {
   private AudioClip message;
   public AudioClip getMessage() {
      return message;
}  }

Both designs would work, but the second has less coupling, and would thus be a better design.  (Qu: What would the Device.play method look like?)

ii.     Aggregation:  (the has-a relationship) This is the association when one object contains objects of some class over a period of time (longer than a single method call; we’re not talking about parameters to methods or local variables).  A MessageQueue object could be said to has-a a Message object.

Not all instance fields of a class correspond to aggregation.  A single instance of some foundation class (String, Date) is really just an attribute of one class, not an association between classes.  This distinction is very subjective, but if in doubt prefer an attribute to an aggregation.

In some cases you can distinguish between regular aggregation and composition.  If some object X is created and destroyed when the containing object Y is, and is never referred to by other objects, the relationship between X and Y is composition.  (Examples:  “Person has-a Name” is composition, but “a Doctor has-a Patient” is aggregation.)  Think MANAGES-A not HAS-A for non-composition aggregation.

iii.  Inheritance: (the is-a relationship) This is actually an association between classes, not individual objects.

With both dependency and aggregation, the association has a multiplicity.  This can be 1:1 (A student object has an ID number) or 1:n (one to many, as in a MessageQueue has up to n  messages).  A 1:* means an unlimited number (or none at all).  1:0..1 would mean zero or one; the zero is often represented by setting the instance variable to null.  (Aside: Using Integer instead of int allows you to distinguish between not having the attribute (null) and having it with a value of zero.)

Often there are many possible sets of classes that would appear to fulfill the requirements.  Rather than go with your first design, spend a little time thinking of the alternatives.  In the end it may not matter which design you chose, but often one design is simpler, more efficient, or more extendable then another.  (Another school of thought, extreme programming or “XP”, says the opposite:  Just do the minimum now, and add only as needed.)

CRC cards:  On the top of a 3x5 index card, write the proposed class’ name.  Down the left side list the responsibilities, and on the right list the other classes that collaborate with this one.  Make new cards, rip up old ones, add and cross-out responsibilities and collaborators freely.  (Show resource.)

3.     Implementation        This phase starts off with building the “stubs” or skeleton of all identified classes and methods.  Using an IDE such as eclipse, you can use a wizard to do this for you.  Note that stubs are empty methods, except that methods that return values must have a return statement (that returns some fake but realistic value).  Once you have enough stubs the whole project can in fact be compiled.  As you implement classes and methods, this skeleton becomes a prototype, a not quite production-ready version of the project.  The prototype is expanded and filled in until the whole project is complete.  This works well for small and medium sized projects, but not as well for larger ones.

It sometimes pays to ‘do one to throw away”, just to gain insight into the problem, and possibly to tweak the design before building the production (non-prototype) version.

Before starting the implementation the extreme programming method (and other agile methodologies such as rapid prototyping) would have you write a set of test cases that “cover” the design.  This means there should be enough test cases to test every method of every class in your design, except perhaps for the trivial getter/setter (accessor/mutator) methods.  The test cases should also cover the interactions between classes.

You should schedule the implementation.  It is common to implement methods in an order that facilitates testing.  You often build some methods and classes now, and defer the others until later.  The order may depend on developers’ schedules.

Don’t make the (beginner’s) mistake of implementing the “easy” methods first, and saving the “hard” ones until later!  Start with the hard ones; they are hard because the design may not be complete or well structured, and you may have to go back and change the design to fix thatA design that leads to classes and methods that can’t be implemented (and still meet performance requirements and project deadlines) is a bad design.

When implementing code using an IDE such as eclipse, it can automatically write much of the code for you.  Common examples include having the IDE generate all the get and set methods for the properties of a class.  Even once the code is written a good IDE can help re-work the code, a process today called refactoring.

Source Code Control Systems

Two other factors can affect implementation.  One is the ability to pull up previous versions of classes/packages (in general, software modules) and to show the differences.  Another is the ability to work in a group on the same code.  Both factors can be met using Source Code Control Systems, or Revision Control Systems, Version Control Systems, Content Management Systems, Source Code Management systems, Software Configuration Management systems, and no doubt many other marketing buzz-terms.  These systems also provide change tracking and logging features, often tied into bug-tracking systems.

Today the most popular is CVS, which allows one to set up a code repository that all developers can access across the network.  Each developer checks out a complete copy of the code, works on it, and periodically checks in any changes they have made.  The check in process is actually a merge operation, as other developers have been working on the same code.

Despite its popularity CVS has some problems.  Newer systems with many of those issues addressed include subversion.  Eclipse has a plug-in to support CVS and a subversion plug-in is available from subclipse.tigris.org.

For working privately on projects any VCS is good enough.  I use the simple RCS.  But most projects have more than one developer who must all work together as a team.  Originally this meant using a centralized VCS such as CVS or subversion, the two most popular.

In a centralized VCS there’s only one repository.  Only those with “submit” access can make changes, saving their work frequently as “checkpoints”.  Other developers must submit patches to someone with such rights.

It’s awkward to work privately on a shared project because it means building up a huge patch without any saving any checkpoints, then submitting the whole thing at once as a surprise on the other developers, behavior referred to as “dropping a bomb”.

In a distributed version control system (DVCS) every user starts with a fully legitimate fork or repository of the project.  The one ‘official’ tree is purely a matter of social convention.  It can be hard to even know about other repositories (for git projects you can publish on github.com).  The maintainer of the official repository will frequently merge the changes from the other developers’ repos.  DVCSs include Git (most popular for Linux but hardest to learn), Bazaar (my favorite) and Mercurial.

To some extent each of these phases is independent of the later ones.  In general you shouldn’t worry about the design when doing the analysis, or the implementation when doing the design.  In reality the design will be influenced by the implementation choices available to you, and the requirements will be influenced by design considerations.  (If not, you often end up with unrealistic requirements, or requirements that would costs a lot more than they should.)

Design Patterns

The purpose of design patterns is to capture software design know-how and make it reusable.  Design patterns can improve the structure of software and simplify maintenance.  Design patterns also improve communication among software developers and empower less experienced personnel to produce better designs.

There are several catalogs of useful design patters on the Internet (Show ootips.org, ...).  By becoming familiar with some of these, one can read project proposals and instantly see a pattern that might fit.  The book that started it all, Design Patterns was written by the gang of four (GoF): Gamma, Helm, Johnson, and Vlissides, published by Addison-Wesley.  Many new books are available including Object-Oriented Design & Patterns by Cay Horstmann (Wiley).

Consider classes Person, Employee, Customer, or classes Person, Student, Instructor.  It seems logical to have Employee and Customer extend Person, but what if an Employee buys something, how can you add them to your Customer list?  What if an Instructor took a course at HCC?  These issues are better understood as roles.  A Person at any given time may be either an Employee or Customer or both.

This is a common situation, but tricky to get right in the design.  Qu: How would you do this?  (Ans: Look up the adaptive role playing pattern and the role modeling pattern, both variants of the classic decorator pattern.)  Not inheritance at all, but dependency: Have Student, Instructor classes with a final Person attribute, and (tediously) add forwarding methods.

If you think your design, implementation, and testing would be easier if you could treat Students and Instructors as People, then use a second design pattern too: have a Personlike interface that Person, Student and Instructor fulfills, and Student and Instructor classes merely forward the method call (i.e., String getName() {return thePerson.getName(); ).  This is known as the delegation design pattern (and is related to the proxy pattern.  Note that a single design can be thought of as examples of different design patterns; the various patterns overlap somewhat.

If your design has many objects that must do DNS lookups, or get a random number, or in general, you have client objects that use a common service, you only want a single instance of the service object.  This insures the server can cache results, reuse DB connections, ensure unique random numbers, etc.  The pattern is known as the singleton pattern.  You can ensure this by making the constructor private (and provide a readResolve() method, which is a pseudo-constructor used in serialization).  You then provide a single instance as a final property of the class:

class Foo
{  private static final Foo foo = new Foo();
   private Foo() {...}
   public Foo getInstance() { return foo; }
   ...
}

The singleton pattern isn’t commonly used.  The problem with singletons is that they introduce global state into a program, which allows any code to access it at anytime (this is called high coupling).  This has proven to be bad, often doubling (or worse) development and testing costs.  J2EE doesn’t support it well either (you can’t always know which JVM will be used, and you can only make a singleton per JVM).

The best way to make a singleton as of Java 5 is to use an enum with one element:

enum Foo
{  INSTANCE;
   ...
}

One aspect of design is the common confusion in situations such as classes Box and Rectangle.  It may seem as if a Box is-a Rectangle with a depth (a specialization).  But what if I have a collection of Rectangles, one is a Box, and I try to invoke a getArea() method on each?  (Boxes don’t have a clearly defined area the way Rectangles do.)  By extending Rectangle with Box we have broken our definition of Rectangles.

In practice, objects are defined by their methods.  (Eiffel is an exception in that you can specify pre-, post-, and invariant conditions.)  But most computer languages only require the name and parameter list be the same to over-ride a method.  Unfortunately that makes it easy to mess up!

·           pre-conditions  – The requirements which a method requires its caller to fulfill.

·           post-conditions – The promises made by a method to its caller.

·           class invariants – The object state (values of variables at various points) is always valid.

In Java you can (and should) document pre- and post- conditions in the java doc comments.   Proper design leads to the following important principle:

An extension of a class should never modify the behavior (semantics of existing methods) if you treat the extension as though it were one of its superclasses that is, when using over-ridden methods).  This rule is called the Liskov Substitutability Principle or LSP.  Also known as subsumption.  Mathematically you can state this as “Let p be any property of objects of type T.  Then p should be true for objects of type S where S is a subtype of T”.

The proper relationships between classes is often a matter of debate.  For instance few people agree on the relationship between Square and Rectangle.  One good design (and ignoring other shapes including Rhombuses, Parallelograms, and Quadrilaterals) is to have Square extend Rectangle, with both classes immutable.  A square is-a rectangle with an additional property that the width equals the height; in LSP terms, all the properties of Rectangles apply to Squares too.  Something like the following:

public class Rectangle {
    private double w;
    private double h;
    public Rectangle (double w, double h) { ... }
    public Rectangle scale (double scaleFactor) {
        return new Rectangle( (w * scaleFactor)
                  (h * scaleFactor) );
    }
    public Rectangle setWidth (double w) {
        return new Rectangle( w, this.h );
    }
    public Rectangle setHeight (double h) {
        return new Rectangle( this.w, h );
}   }

public class Square extends Rectangle {
    private double s;
    public Square ( double s ) { ... }
    public Rectangle scale (double scaleFactor) {
        return new Square( s * scaleFactor );
    }
    public Rectangle setWidth ( double w ) {
        return new Rectangle( w, this.s ) ;
    }
    public Rectangle setHeight ( double h ) {
        return new Rectangle( this.s, h );
}   }

Another design possibility is to consider there is no IS-A relationship here.  Instead have two classes.  The Rectangle class can have an isSquare method, a squareValue method to convert a square Rectangle into a Square (this is a factory method), and a constructor taking a Square.

A final common example of LSP is cars.  Humans associate “A Ford is a Car” so they make Ford extend Car.  They also then say “ModelT is a Ford”, so ModelT extends Ford. The mistake is confusing Make (Brand) with Type (Class), and further confusing Model with Type.  This kind of mistake can lead to a very ugly class hierarchy when you end up with classes like FordModelTWithRedPaintAndLeatherSeatsAndPowerWindows.

A better hierarchy would be to have a Car that “has-a” Make (Ford is a Make), “has-a” Model, “has-a” list of Options (SeatMaterial is an Option) and “has-a” PaintColor.

Basic UML

Today it is common to design with pictures and diagrams.  For example, each class could be a box with the name of the class at the top and a list of the class members inside.  The boxes might be connected with lines to show relationships between the classes (or objects).  Many types of diagrams are possible to show various things, such as class diagrams, object diagrams, sequence diagrams, state diagrams, etc.  (You rarely use all types for a particular project.)

Originally every book author invented their own graphical notation.  Since they all do the same job it was sensible to merge them all into a single notation.  Surprisingly this is exactly what happened.  The standard is call the Unified Modeling Language or UML.  There are many books and on-line resources that describe UML.  (Show pdf guide.)  A simplified class diagram:

The (hollow) diamond line shows that a WaitingList object contains (or aggregates) one or more Student objects.  A solid diamond show composition, which for Java is about the same thing.  Here’s a list for a class diagram:

Dashed line with “>” (open arrowhead) shows dependency

Solid line with no arrows shows association, with a “>” open arrowhead it is a directed association.

Solid line with hollow triangle arrowhead shows inheritance.

Dashed line with hollow triangle arrowhead shows interface implementation  (show interfaces by using guillemets, i.e., French quotes around the name <<name>>)

Dashed line to a dashed box shows a comment.

Each box represents a class, with the class name at the top, then a list of attributes in the middle and the (public) methods at the bottom.

The lines many have the multiplicity shown if desired, at each end; if omitted it is assumed “1”:

* (any number), 0..* (same), 1..* (one or more), 0..1 (either zero or one), n (exactly n)

The association lines can be labeled (at each end) to show the association details:

[student]----registers for--------------------[course]

Don’t show foundation classes in your UML diagrams.