Setting up SMART monitoring


First, read the article at http://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/. The following are instructions that I got by reading that and adding some additional stuff.

find out which drives are on your system. Since we know that all our devices are sata (using scsi), you can find that by doing ls /sys/block | grep '^sd'

Determine which drives are used for what. Use the mount command and cat /proc/mdstat to see what they are used for.

Now, check to make sure smartmontools is installed on the machine. look for a directory named /etc/smartmontools and a file called /etc/smartd.conf

Once that is done, for each drive you want to monitor, enter the following commands:

smartctl -i /dev/sda # check if it is SMART enabled
smartctl -s on -o on -S on /dev/sda # turn on SMART, auto offline tests and autosave
smartctl -H /dev/sda # view general health of drive
smartctl -c /dev/sda # check for how long it takes for short and long tests

Edit /etc/default/smartmontools
   DO NOT TOUCH the enable_smart line. Leave that commented out
   Uncomment the line #start_smartd=yes

Edit /etc/smartd.conf
   Comment out the line that starts with DEVICESCAN (line 22?)
   Add a line like following for every device to be monitored.
   /dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner
   At the end of the first line from above, add  -M test, so it will look like this:
   /dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner -M test

You will need to change three things. First, you should change the drive name (/dev/sda) to whatever you are using.
Second, you should arrainge so that each drive is being tested at different times (not critical, but nice). The short test above runs at 2am. Look at the output from smartctl -c /dev/sda to see how long that takes. For other drives, choose some other time.
The long test runs on Saturday (day 6) at 3am. You should run long tests where they don't conflict with the short tests or other long tests
HOWEVER, they should not overlap with the long test (the one after the L) which, in this case, runs on Saturday at 3am. So, you should set it up to run all the tests at different times.
NOTE: the backup servers are running their drives hard at night, so the self tests on the data drives should take place during the day.

Start smartmon with
/etc/init.d/smartmontools start

You should get messages and there should be a test e-mail message in the root e-mail account
If all is well, remove the -M test from smartd.conf and restart smartmontools

Last update:
2012-04-11 02:02
Author:
Rod
Revision:
1.2
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags