Unix/Linux Administration:
Overview of Help Desk Organization

Written 2004–2008 by Wayne Pollock, Tampa Florida USA.  All Rights Reserved.

Running a Help Desk (a.k.a. support desk, a.k.a. call center)

How To Get Fired:

  1. No matter what the problem, just tell them to reboot and close the ticket.
  2. Go through your standard help-desk script, and then put the caller on terminal hold or switch them to another SA.  Repeat as often as possible.
  3. Tell the caller that theirs is a non-supported version of that application, no matter what the problem.
  4. To boost your call completion rates, tell all callers “Sorry, my manager said I can't help you with that” and hang up quickly.
  5. Tell callers “the system isn't slow, you just drink
  6. Tell callers “the system isn't slow, you just drink too much caffeine”.

All organizations have a help desk, whether or not they officially acknowledge it.  The help desk is often a virtual place (and defaults to the SA's office or cell phone number) where people expect to receive answers to computer related questions, to report problems, and to request changes in service or tutoring for services.  The help desk is often the first contact for new employees, and must be able to answer policy and procedure questions for users in addition to technical information.

The help desk also provides reports to management, who can use the data to track how well various services are working out, what new services might be requested, feedback for policy and procedures, and workload information (to adjust staffing levels and/or SA salaries).

Security is important for a help desk, so confidential information isn't given out without proper authorization and authentication.  If not careful, social engineering can be employed to have accounts created or passwords changed.  (E.g., the scenario in which imposter gains the trust of some SA, than asks for password or other access.)

Escalation

Escalation is the process of having an issue moved from the current personnel to another with more experience, and eventually to management.

A common solution in large organizations is to have two (or more) levels or tiers of support.  But this can be quite annoying unless handled carefully!  Having the lowest level handle routine requests (password resets, software updates, etc.) allows other SAs to specialize in more complex parts of your system: network infrastructure, routing issues, security, email, database, printing, etc.  It also allows a cost effective way to expand the help desk support hours (up to 24X7).  This first level should be able to handle 80 to 90 percent of all support requests.

Automatic escalation is a very good idea.  When a support call has lasted (say) 5 to 15 minutes, the issue gets automatically escalated up to level two support.  (E.g., One local Tampa company requires escalation after 9.2 minutes.)  To escalate you should say something such as “I don't think I'm helping you fast enough; let me get a more experienced SA to help you, please hold.

Issues left unresolved for a working day should get escalated up to level two.  If not resolved after another day (“resolved” doesn't mean completed), management is informed.

Another form of automatic escalation is when a user is put on hold too long.  The call should then be routed to a level two technician.  If none are available (say it is after hours), a message should provide email and web alternatives plus voicemail.

The Face of the Help Desk

The perception of the users/customers is important.  (Often SA raises depends on what management hears about them from users.)  Always have a friendly and cheerful (but professional) attitude.  If possible, have a single point of contact for each user.  (This helps enormously in a large organization with several SAs staffing a help desk at once.  When the user calls back to add more information it is convenient not to have to explain that “Wayne is already handling this issue, please forward this call”.)  Lacking a single point of contact, make sure all help desk staff provide consistent responses.

Have a set routine (called “scripts”) for dealing with support issues and requests, which should include a professional demeanor.  Have some training (including “dry-runs”) and mentoring (observing more experienced staff) for new staff.  Although running the help desk may be a tiny part of your job, the users will have no idea of your real activities and will assume the help desk is all you do.

Attitude is very important.  You can get very unhappy at this job if you have the wrong attitude!  Remember that one reason you were hired is to handle certain problems that your users don't want to be bothered to handle themselves, and to answer questions that they don't want to bother remember the answers for.

It is silly to get mad at how “dumb” your users are, or how often they ask the same question, or how they obviously didn't read the manual, or when they ask why you can't just bypass the official policy/procedure “just this once”.

Depending on the local corporate culture, tutoring/training the users may or may not help.  But if it might, hold training seminars and offer tutoring services.

At a minimum there should be a (protected) web site with policies, procedures, contact information (phone numbers, IM links, trouble-ticketing system links, and hours that the help desk is staffed), forms, and FAQs (Frequently Asked Questions) that employees can access.  (FAQs should be compiled based on experiences at the help desk.)  New user documentation can be there too.  An unprotected web page should be available for customers to use from the Internet (with a link to the protected page that requires a login).  Make sure the main web page (and selected other pages, perhaps in a navbar) have a link to the help desk web site.

Having some instant messaging to the SA on help desk duty is a nice extra.  (You can add an icon to web pages if you use Yahoo IM.)

A phone number that can be forwarded to the (cell) phone of the SA on duty is great, provided it includes voice-mail.  This is especially useful when the network/web server is down!

Finally, your users should have a clear idea of what support is provided by your help desk.  Make sure your staff know the scope (range of services) of the help they are to provide, and refer users elsewhere (say to management) when the request is outside this scope.  (This might be called the help desk policy.)  Users need to know how long during normal hours and outside of normal hours both routine and non-routine requests might take.

How to Handle a Support Call

(Adapted from Limoncilli & Hogan, chapter 16)

  1. The greeting:  This depends mostly on local (corporate) culture but something like “This is Wayne at the ABC help desk, how may I help you?” is effective.  You might personalize the greeting if you know the user who is calling.

    The greeting should identify you and calm the caller (if necessary).  You need to determine what kind of service is being requested and how urgent the request is before proceeding.

    If the greeting is a pre-recorded phone message, it should include an accurate current status (“all systems up”, “web server is down”, etc.) and options (“please hold and your call will be answered in the order received”, “press 1 to leave a voice mail message”, ...).  Avoid annoying phrases such as I am not available now, or putting jokes in your message.  The whole phone message should be very short, 15 seconds or less.

    Telephone Techniques

    (Adapted from call_handling.pdf found at auditnet.org)

    The following are recommendations to enhance the perception of a help desk:

    • Be courteous at all times no matter how annoying the caller and whatever your mood.
    • Sit up and take a deep breath when tense, frustrated, or upset.  This is relaxing and removes tension from your voice.
    • Speak clearly and respond as quickly as possible, but never interrupt the caller.
    • Listen carefully and think about what the caller wants you to understand—the caller may not always say what they mean, so stop and think before responding.

      For example, suppose a user calls to complain their system is slow.  Ask what is slow, how slow is it, is anything else slow, when did you first notice it was slow, etc.  It is unlikely the customer is complaining about an application running on their PC.  More likely is that some server or the network appears slow.
    • Try to smile, it alters your voice.
    • Understand the urgency of the customer's issue, by asking questions if necessary.  Escalate the call if very urgent and you can't solve it immediately; don't waste the caller's time just to finish some “script”.

      Politically you may need to handle some VIPs faster than other callers.  (When the boss says to jump, you should jump!)  Most problems need to be fixed in a scheduled maintenance window, and you need to explain that so callers don't wonder why their problem isn't being addressed.
    • Do not place a caller on hold unless it is absolutely necessary.  Obtain permission for putting them on hold, and explain how long you expect to be away from the telephone.  (Check back with the caller if you are longer than expected.)  Ask if they would prefer you to call them back by a certain time.  Always thank the caller for holding.
    • Fully document all of the details on the trouble ticket (the problem report).
    • When the call is done, be sure to thank the caller.
  2. Problem Identification:  The SA needs to determine what the problem really is and how to classify it.  This can be done with a decision tree that the SA follows.  At each point the SA asks a question or questions, and takes one branch or another depending on the result.  When the decision tree can't handle the issue, it might be time to escalate.  In a large organization, the user might use phone menus to classify the problem; in this case make sure the choices use language the users expect to hear.

    The user may report a slow application, but that could be caused by network problems, database issues, malware running on either the customer's end or on the server, a shortage of resources, or misconfiguration.  The trouble-shooting steps vary depending on what you think is the real problem.  Normally you monitor the suspected culprit (e.g., the network connection, the DB response time, etc.).  If nothing is found, try monitoring another culprit until the problem is identified.

    It is okay not to know the answer (or even the exact problem).  It is not okay to pretend you do know and make a guess.

  3. Problem Recording:  This is where the gory details of the issue get written down.  This problem statement should have enough details to provide the clues needed to understand the problem and to reproduce it.  Don't forget to include such details as when the problem first was noticed, and what steps the user has taken already.
  4. Problem Verification:  The SA should try to verify there is a problem by reproducing it.  For example if the problem is “the web server is down”, “the mail server is down”, or “the network is down”, the SA can easily try these things from their location.  Other problems (or insistent complainers) may require either a personal visit to their site, or (more commonly today) remote control software (such as VNC), where the SA can attempt exactly what the user tried from the user's computer, or watch the user try themselves, with that user on the phone at the time.  Note these reproduction steps should be added to the problem statement.

    No problem is understood until it is reproducible.

    If this is a performance problem, check logs and run monitoring tools to see what the problem is.  Normally a good SA monitors performance routinely and knows well in advance of projected RAM or other resource shortages.  Poor performance may not be a resource shortage but a misconfiguration or some other serious problem.

    If the problem goes away by itself, log it anyway; a trend may be spotted over time of such intermittent problems.

  5. Solution Proposal List:  Once the problem is understood the SA may feel there are several different ways to resolve the matter.  If not it may be more information is needed, either from the user, by examining the system (say the log files), or by running experiments.  Almost always there is more than one way to tackle some issue.  This is especially true for RFEs.  Speaking with others (such as other experienced SAs, those responsible for the facility/sub-system with the problem, and the vendor who sold/manufactured the product) and searching the Internet can often provide ideas and pointers.

    In the case of poor performance due to a shortage of resources such as RAM or network bandwidth, tuning the systems or re-configuring the application(s) is one possible solution.  Increasing the resource (i.e. installing more RAM) is another.  The last resort solution for poor performance (that nobody likes) is to lower your level of service — if your customer's needs have outgrown what your organization can provide, there is nothing you can do about it.  (Growing the organization is a decision for management and not a SA.)

  6. Select Solution:  There likely will be political issues when selecting a solution.  For example solving DNS problems by adding a secondary (“slave”) server at each site may be the best technical solution, but your management may think that remote servers means giving up control, or there may be problems in how to budget the servers.  Ditching a Windows mail server for a Linux or Unix one may not be politically wise if your management has made a deal with a vendor for some Windows only software (say that requires Exchange mail protocol instead of IMAP or POP, for calendaring).

    See what resources are available for testing your solutions, including similar servers you can take off-line and the ability to generate a (simulated) heavy load.  Ideally you will have an isolated lab setup with appropriate test hosts and tools, but that might be beyond the budget of most organizations.  Then you can try several solutions and see which one might be best in your situation.

    Technically, try to pick the solution that requires the least work to setup and maintain, and that will scale up well when your needs increase in the future.  Using open standards means more choices for interoperability.  Using open software means no licensing fees.

  7. Implement the Solution:  This usually means scheduling the solution for the next maintenance window, or delaying until management approval (usually needed if the solution costs money or if it requires a policy/procedure change).  If the solution requires a visit to the user's site that must be scheduled.

    Figuring out exactly how to deploy and configure a new DNS slave server might take some time even for a DNS expert.  The proposed solution and implementation schedule should be added to the trouble ticket.

  8. Solution Verification:  Never close the trouble ticket until the solution has been verified as working correctly.  This may involve experiments or waiting for events and examining log files afterward.  Sometimes this means a dialog with the user who reported the problem.
  9. Closing the Ticket:  Part of closing the ticket involves informing the originator that the issue has been resolved.  This can be either a phone call or an email (say with a link to the web interface for viewing that specific ticket).  Some tickets may require additional work, such as documentation updates or management reports.

    A customer should not have to call the help desk to find out the status of an open trouble ticket.  Status updates should be provided when service times exceed the service level agreement (SLA).

    Once the issue is resolved, be certain that the customer is satisfied with the resolution.

Staffing

Staffing levels vary widely!  In an academic environment the typical ratio of users to SAs is about 50:1 (at HCC it is closer to 300:1, which causes all sorts of problems).  Sometimes in a large organization (say amazon.com) you might have millions to one ratio.  Metrics used to calculate staffing levels include: volume of calls to SAs ratio, time to call completion (TCC), and time to problem resolution (TPR).

In a small organization the SAs should take turns staffing the help desk.  If someone contacts the off-duty SA directly, have the SA say “I'm in the middle of another issue now, so I can't handle your problem until later.  Let me forward your call to name of on-duty SA who I think can help you right away.

Even if you are the only SA, you should set up a schedule with management approval.  Your schedule should include some quiet time when you won't respond to help requests that take longer than one minute to complete.  (Except for emergencies of course.)  If you don't do this you'll constantly be interrupted and you'll never get your work done.  You may find that most help requests come in the morning, so set aside the afternoons for your quiet time and do work in the mornings that you can afford to be interrupted while doing.  (Or vice-versa.)

Help Desk Software

The alternative to decent software is post-it® notes.  That doesn't work.  One possibility is a PDA but this won't work except in the smallest organizations.  For one thing management has no chance to manage the process.  Good software can provide “scripts” and search/index facilities to aid less experienced SAs.  Logs and reports are other useful features.

The most common type of software is known as a trouble-ticketing system (or service ticketing system) and allows one to enter in support request details, assign SAs, assign priorities, and automatically log details such as user, SA who handled the call, the date and time of the call.  Ideally such a system can be tied into the phone system, so these details can be logged automatically rather than having the SA type them in.  This can also provide call routing (press one to reset password, press two for an on-going issue, ...) and call escalation.

A really good system has web interfaces where users can request support (IM, email links, FAQs, ...), open a trouble ticket (without using a phone call or IM, that is non-face-to-face support), and track previously opened trouble tickets.  The ability to open a ticket is especially useful for users to request new services, or for developers to request enhancements (RFEs) or to report bugs.  (See www.bugzilla.org and bugzilla.redhat.com.)

The software should allow SAs to log into the system and see what issues have been assigned to them, and/or to view issues in priority order.  It should allow management to obtain useful reports on call volumes, TTC (time to completion), SA workload, the rate of escalation, volume trends, etc.  Some software also allows customer satisfaction surveys (that only the management can see).

Other popular features include the ability to schedule system maintenance window tasks (by having each task entered as a separate ticket), and having server/network monitoring tools (such as HP OpenView or IBM Tivoli) have the ability to automatically create and enter trouble tickets.

There is free/open source software available.  Of these Request Tracker is the most recommended.  However a good commercial package can be very worth-while.  You can find these with a Google search for trouble ticketing system and locate scripts with a Google search for +("Help Desk" OR "Call Center") +scripts +open.

Much of this material was adapted from The Practice of System and Network Administration by Limoncelli and Hogan.  ©2002 by Lumeta Corp. and Christine Hogan.  Published by Addison-Wesley.