CTS 2301 (Unix/Linux Administration I) Project #3
Hard Disk Administration

 

Due: by the start of class on the date shown on the syllabus

Background:

When using a modern journaling filesystem such as ext4, by default fsck (filesystem check) is never forced.  With a traditional filesystem such as ext2, the system defaults to checking the filesystem every so many reboots (technically after X number of mounts, but since all filesystems are usually mounted at boot time it comes to the same thing), or every so many months, or after an improper shutdown, whichever comes first.

A problem with traditional filesystem setup occurs with large disks.  Since all partitions have the same default setup, once a check is forced for any reason all filesystems are checked at once.  This can take a very long time!  A staggered schedule can be used to avoid this problem.  An issue with journaling filesystems is that an occasional error can still occur, and if no checking is ever performed the error can snowball, causing other problems.  (Also, hackers can induce errors even with a journaling filesystem.)

A staggered schedule means that each filesystem is still checked at the same frequency but not all on the same day.  For example, if you have only two filesystems you can have each one checked every six months, the first filesystem every January and July 1st, and the second filesystem every February and August 1st.  This is done by changing the date of the last check for each filesystem to different values, while keeping the six month interval between checks the same on both filesystems.

The same staggering can be done if you force checks by the number of reboots.  If you have both filesystems checked every 20 reboots say, change the current mount count so that the first filesystem thinks it has been mounted X times and the second “X+1” times.

This staggering of disk checking generalizes to any number of filesystems.  This way, except after an improper shutdown, only a single filesystem gets checked at any one time.

In addition to setting a check schedule, you will explore other disk related commands: hdparm to check drive capabilities, change settings, and perform timing tests;  and smartctl to perform health checks on your disks.  Note that this part of the project requires real hardware; this won't work on a virtual machine.

Description:

Answer the following questions and perform the indicated tasks:

  1. For each filesystem on your computer, when are checks forced by default?  Use the tune2fs utility to find out.

    Note!  There is a bug with SELinux that may prevent you from running tune2fs on some partitions (logical volumes).  If you are root, and are using the correct device name as shown by the mount command, and you still get various “permission denied” errors, then you may need to disable SELinux to complete this project.

    Run the command “getenforce”.  If the output shows “Enforcing” then you should run the command “setenforce 0” to change the mode to “Permissive”.  Now the tune2fs commands should work!

  2. Next adjust the parameters that control when fsck checks are forced.  You should make sure all filesystems are checked regularly, but on a staggered schedule as described above.  Note this utility works for ext4 filesystems too, but not for other types of filesystems.  (Other tools may or may not be available for other filesystem types.)  What commands did you use?
  3. Why would it not be a good idea to run fsck on a mounted filesystem?  How (or when) can the root disk volume be safely checked?
  4. Unmount your /home partition with umount, and manually check it (and no other partition) for errors using fsck.

    Note!  You can't unmount a volume (using the umount command) if it is in use by any process.  /home will be in use if you logged in as a non-root user.  You must log out and login as root.  To do this, it may be easier to use a virtual console rather than the GUI: after logging out, hit control+alt+F2 to switch to a non-GUI console window.  Later you can switch back to the GUI console using control+alt+F7 (sometimes F1 is used as the GUI console).  If you still can't unmount /home, try using the command “fuser -m /home” to see what process is using that volume.  Then you can kill that process, and then the umount command should work.

    What was the exact fsck command you used?  What was the output?
  5. Run the tune2fs utility on the partition for /homeWhat is the mount-count and last-time-checked values now?

    If the “last checked” date hasn't changed, it is because fsck won't actually check if it doesn't think the filesystem needs it.  If this happened to you, repeat the previous step using the “-f” option to fsck.

    Remount /home using mount and examine the mount-count again.  Has it increased by one?
  6. Reboot your computer into single user mode.  (That is, reboot into run level 1; don't just change run-levels at the command line.)  How exactly did you do this?

    Hint:  You can add the word single to the GRUB boot prompt if you have configured GRUB to show one; by default on Fedora, it doesn't show the GRUB menu.  To get a GRUB prompt at boot time, edit /etc/grub.conf and change “timeout=0” to “timeout=5” (which is the number of seconds to display the GRUB menu before automatically booting), and comment out the “hiddenmenu” line (or you have to hit the escape key to show the menu).

    Booting into single user mode usually causes the boot process to stop while the root partition is mounted in read-only mode, making it safe to run fsck on it (but don't do that yet).  However this varies by distro, so you can't count on it;  Some systems mount all filesystems, even in single user mode, and some even ask for the root password.  (Sometimes such distributions have an “emergency” mode that doesn't mount anything or require any password.)

  7. Once booted in single user mode, run the commands “mount”, “findmnt”, and “lsblk”.  What storage volumes (if any) were mounted, other than root and swap?  Before you can run fsck safely, you must first un-mount any mounted filesystems.  Depending on your version of Unix or Linux, you may or may not be able to un-mount the root volume while in single user mode.  If so you can probably remount it as read-only.

    You can un-mount (with umount) most filesystems if not busy.  But you may find some filesystems are busy (the one holding /var/log/* for example) and those can't be un-mounted until you stop (“kill”) whatever processes are using files on it.  Or wait for them to finish on their own.  One way to find those processes is the command:

       fuser /var/log/*

    As for the root filesystem, if you can't un-mount it you can remount it as read-only with the correct mount options.  The command is:

       mount -no remount,ro /

    Now you can safely run fsck.

    Note is that the output of “mount” won't show the root filesystem mounted as read-only; it will still show it as “rw”!  This is because that status is saved in the file /etc/mtab which is updated when you run mount.  But, once you change the root filesystem to read-only /etc/mtab can't be updated, so the old “rw” status can't be changed.  However the system does know the filesystem is mounted as read-only; view /proc/mounts instead for accurate status.

  8. Run fsck.  What is the output?  What option to the Linux fsck command causes it to silently skip any mounted filesystems?
  9. Bring your system fully up to your normal run-level (“3” for non-GUI, “5” for GUI) using the telinit command to change the run-level.  Note this won't work unless you remount any previously un-mounted filesystems, and remount the root filesystem as read-write!
  10. SATA, PATA, SAS, and older IDA disk drives can be examined and controlled by the hdparm command.  While designed for (E)IDE disk drives, many of the options will work for SCSI drives as well.  Using hdparm, determine the disk geometry for your disk (and show it).  What option(s) did you use?
  11. Determine the drive identity using both the “-i” and “-I” options.  What is the identity data for each of your drives, as shown using each option?  When might this information be useful when configuring your system?
  12. Using hdparm disable the write cache on your disks.  What was the exact command used?  When would this be a good setting to use?
  13. Perform a timing test on your disk to determine the throughput, using “hdparm -t disk”.  Record the MB/Sec value.  Repeat the test 9 more times, recording all ten values.  Now bring the system into single user mode (so that nothing is running) using telinit.  Repeat the previous test another 10 times.  Explain your results.
  14. If you have modern ATA or SCSI disks (but not hardware RAID), you can get all sorts of information about your disk using smartctl command, part of the smartmontools package.  Run the command (as root) “smartctl -i /dev/sda”.  Is SMART support enabled for this drive?  (If not but it is available, attempt to enable it with “smartctl -s on”, and try again.)  What is the make, model, and capacity reported for your disk?
  15. Perform a drive test.  (Note that these can be run regularly using smartd, if you configure that.)  Run the command “smartctl -t short /dev/sda”.  when completed, check on the result of the test: “smartctl -l selftest /dev/sda”.  Were any problems reported?
  16. Next run a drive health check using the command “smartctl -H /dev/sda”.  Did your drive pass?
  17. Finally, examine the data maintained by your drive, using the command “smartctl -A /dev/sda”.  How many times has your drive been power-cycled (attribute 12)?  How many hours has it been powered up (attribute 9)?  Which attributes (if any) indicate the drive is about to fail?

To be turned in:

The answers to the questions above and the portion of your system journal describing the changes you made to the disk.  Be sure to include a chart that lists for each filesystem when checking will be forced.  (That is, list the schedule for checking—the disk checking policy—in an easy to read way, and don't merely write down the commands you typed.)

You can type or send as email to .  Please use the subject similar to “Unix/Linux Admin I, Project 3 (Hard Disk Administration) Submission”, so I can tell which emails are submitted projects.

Send questions about the assignment to .  Please use a subject similar to “Unix/Linux Admin I, Project 3 (Hard Disk Administration) Questions” so I can tell which emails are questions about the assignment (and not submissions).

Please see your syllabus for more information about project grading and also about submitting projects.

Information on tune2fs utility:

tune2fs is a Linux utility that allows you to examine and change the settings in the superblock, which is the name given to the part of a filesystem that holds the filesystem label, its size and type, and other information.  This tool only works for ext2, ext3, and ext4 filesystem types.  You must be logged on as root in order to use tune2fs.  The command to examine the values in the superblock of some partition such as /dev/sda1 is:

     # tune2fs -l partition

There are four parameters that control when a check is forced:

Do not attempt to change any other values!  This can be a dangerous command so be careful what you change!

The max-mount-counts parameter is the number of times the filesystem can be mounted before it will be automatically checked for errors using fsck.  Since most partitions are mounted once each time the system is booted, this often is a count of reboots.  The mount-count parameter is the number of mounts since the last check.

The interval-between-checks parameter is the amount of time that is allow to pass before a check will be forced (at the next mount).  The time-last-checked parameter is the amount of time since the last check was forced.

Why two schemes?  Because many reboot cycles in a short interval of time often means problems or changes are occurring, so checking every so many reboots is reasonable.  But normally a Unix system stays up for long periods of time without requiring any reboots, often months or years, so waiting for 10 reboots before checking for errors may allow some error to go undetected for long periods of time.  So it makes sense to scan the disk for errors every few months as well.  (If the system doesn't shutdown normally, a scan is forced at the next reboot.)

See the man pages for more information about tune2fs, hdparm, and smartctl and smartd.