When using a modern journaling filesystem such as
ext4, by default fsck
(filesystem check) is never forced.
With a traditional filesystem such as ext2,
the system defaults to checking the filesystem every so many
reboots (technically after X number of mounts, but since
all filesystems are usually mounted at boot time it comes to
the same thing), or every so many months, or after an improper
shutdown, whichever comes first.
A problem with traditional filesystem setup occurs with large disks. Since all partitions have the same default setup, once a check is forced for any reason all filesystems are checked at once. This can take a very long time! A staggered schedule can be used to avoid this problem. An issue with journaling filesystems is that an occasional error can still occur, and if no checking is ever performed the error can snowball, causing other problems. (Also, hackers can induce errors even with a journaling filesystem.)
A staggered schedule means that each filesystem is still checked at the same frequency but not all on the same day. For example, if you have only two filesystems you can have each one checked every six months, the first filesystem every January and July 1st, and the second filesystem every February and August 1st. This is done by changing the date of the last check for each filesystem to different values, while keeping the six month interval between checks the same on both filesystems.
The same staggering can be done if you force checks by the number of reboots. If you have both filesystems checked every 20 reboots say, change the current mount count so that the first filesystem thinks it has been mounted X times and the second “X+1” times.
This staggering of disk checking generalizes to any number of filesystems. This way, except after an improper shutdown, only a single filesystem gets checked at any one time.
In addition to setting a check schedule, you will explore other disk
related commands: hdparm to check drive capabilities,
change settings, and perform timing tests;
and smartctl to perform health checks on your disks.
Note that this part of the project requires real
hardware; this won't work on a virtual machine.
Answer the following questions and perform the indicated tasks:
Note!
There is a bug with SELinux that may prevent you from
running tune2fs on some partitions (logical
volumes).
If you are root, and are using the correct device
name as shown by the mount command, and you still
get various “permission denied” errors, then you
may need to disable SELinux to complete this project.
Run the command “getenforce”.
If the output shows “Enforcing” then you should
run the command “setenforce 0” to change
the mode to “Permissive”.
Now the tune2fs commands should work!
fsck
checks are forced.
You should make sure all filesystems are checked regularly,
but on a staggered schedule as described above.
Note this utility works for ext4 filesystems too,
but not for other types of filesystems.
(Other tools may or may not be available for other filesystem
types.)
What commands did you use?
fsck on a mounted filesystem?
How (or when) can the root
disk volume be safely checked? /home partition with umount,
and manually check it (and no other partition) for errors using
fsck.
Note! You can't unmount a volume (using the umount
command) if it is in use by any process.
/home will be in use if you logged in as a non-root
user.
You must log out and login as root.
To do this, it may be easier to use a virtual console rather than
the GUI: after logging out, hit control+alt+F2
to switch to a non-GUI console window.
Later you can switch back to the GUI console using
control+alt+F7 (sometimes F1 is used as the
GUI console).
If you still can't unmount /home, try using the command
“fuser -m /home” to see what process
is using that volume.
Then you can kill that process, and then the umount
command should work.
fsck
command you used?
What was the output? /home.
What is the mount-count
and last-time-checked values now?
If the “last checked” date hasn't changed, it is because
fsck won't actually check if it doesn't think
the filesystem needs it.
If this happened to you, repeat the previous step using the
“-f” option to fsck.
/home using mount and examine the
mount-count again.
Has it increased by one?
Hint:
You can add the word single to the GRUB boot prompt
if you have configured GRUB to show one; by default on Fedora,
it doesn't show the GRUB menu.
To get a GRUB prompt at boot time, edit /etc/grub.conf
and change “timeout=0” to “timeout=5”
(which is the number of seconds to display the GRUB menu before
automatically booting), and comment out the “hiddenmenu”
line (or you have to hit the escape key to show the menu).
Booting into single user mode usually causes the boot process
to stop while the root partition is mounted in
read-only mode, making it safe to run fsck on it
(but don't do that yet).
However this varies by distro, so you can't count on it;
Some systems mount all filesystems, even in single user mode, and
some even ask for the root password.
(Sometimes such distributions have an “emergency”
mode that doesn't mount anything or require any password.)
mount”, “findmnt”, and
“lsblk”.
What storage volumes (if any) were mounted,
other than root and swap?
Before you can run fsck safely, you must first un-mount
any mounted filesystems.
Depending on your version of Unix or Linux, you may or may not be
able to un-mount the root volume while in single user mode.
If so you can probably remount it as read-only.
You can un-mount (with umount) most filesystems
if not busy.
But you may find some filesystems are busy (the one holding
/var/log/* for example) and those can't be
un-mounted until you stop (“kill”)
whatever processes are using files on it.
Or wait for them to finish on their own.
One way to find those processes is the command:
fuser /var/log/*
As for the root filesystem, if you can't un-mount it
you can remount it as read-only
with the correct mount options.
The command is:
mount -no remount,ro /
Now you can safely run fsck.
Note is that the output of “mount” won't
show the root filesystem mounted as read-only; it will still show
it as “rw”!
This is because that status is saved in the file
/etc/mtab which is updated when you run mount.
But, once you change the root filesystem to read-only
/etc/mtab can't be updated,
so the old “rw” status can't be changed.
However the system does know the filesystem is
mounted as read-only; view /proc/mounts
instead for accurate status.
fsck.
What is the output?
What option to the Linux fsck
command causes it to silently skip any mounted filesystems? telinit command to change the run-level.
Note this won't work unless you remount any previously
un-mounted filesystems, and remount the root filesystem as
read-write!
hdparm command.
While designed for (E)IDE disk drives, many of the
options will work for SCSI drives as well.
Using hdparm, determine the
disk geometry for your disk (and show it).
What option(s) did you use?
-i”
and “-I” options.
What is the identity data for each of
your drives, as shown using each option?
When might this information be useful when configuring your
system?
hdparm disable the write cache
on your disks.
What was the exact command used?
When would this be a good setting to use?
hdparm -t disk”.
Record the MB/Sec value.
Repeat the test 9 more times, recording all ten values.
Now bring the system into single user mode (so that
nothing is running) using telinit.
Repeat the previous test another 10 times.
Explain your results.
smartctl command, part of the
smartmontools
package.
Run the command (as root)
“smartctl -i /dev/sda”.
Is SMART support enabled for this
drive?
(If not but it is available, attempt to enable it with
“smartctl -s on”, and try again.)
What is the make, model, and capacity
reported for your disk? smartd,
if you configure that.)
Run the command
“smartctl -t short /dev/sda”.
when completed, check on the result of the test:
“smartctl -l selftest /dev/sda”.
Were any problems reported? smartctl -H /dev/sda”.
Did your drive pass? smartctl -A /dev/sda”.
How many times has your drive been power-cycled (attribute 12)?
How many hours has it been powered up (attribute 9)?
Which attributes (if any) indicate the drive is about to fail?
The answers to the questions above and the portion of your system journal describing the changes you made to the disk. Be sure to include a chart that lists for each filesystem when checking will be forced. (That is, list the schedule for checking—the disk checking policy—in an easy to read way, and don't merely write down the commands you typed.)
You can type or send as email to . Please use the subject similar to “Unix/Linux Admin I, Project 3 (Hard Disk Administration) Submission”, so I can tell which emails are submitted projects.
Send questions about the assignment to . Please use a subject similar to “Unix/Linux Admin I, Project 3 (Hard Disk Administration) Questions” so I can tell which emails are questions about the assignment (and not submissions).
Please see your syllabus for more information about project grading and also about submitting projects.
tune2fs is a Linux utility that allows you to
examine and change the settings in the superblock,
which is the name given to the part of a filesystem that holds
the filesystem label, its size and type, and other information.
This tool only works for ext2, ext3,
and ext4 filesystem types.
You must be logged on as root in order to use tune2fs.
The command to examine the values in the superblock of some
partition such as /dev/sda1 is:
# tune2fs -l partition
There are four parameters that control when a check is forced:
max-mount-counts mount-count interval-between-checks time-last-checked Do not attempt to change any other values! This can be a dangerous command so be careful what you change!
The max-mount-counts parameter is the number of times the
filesystem can be mounted before it will be automatically checked for
errors using fsck.
Since most partitions are mounted once each time the system is booted,
this often is a count of reboots.
The mount-count parameter is the number of mounts since the
last check.
The interval-between-checks parameter is the amount
of time that is allow to pass before a check will be forced (at the
next mount).
The time-last-checked parameter is the amount of time
since the last check was forced.
Why two schemes? Because many reboot cycles in a short interval of time often means problems or changes are occurring, so checking every so many reboots is reasonable. But normally a Unix system stays up for long periods of time without requiring any reboots, often months or years, so waiting for 10 reboots before checking for errors may allow some error to go undetected for long periods of time. So it makes sense to scan the disk for errors every few months as well. (If the system doesn't shutdown normally, a scan is forced at the next reboot.)
See the man pages for more information about tune2fs,
hdparm, and smartctl and smartd.