Monitoring Hard Disks with SMART

One of your hard disks might be trying to tell you it's not long for this world. Install software that lets you know when to replace it.
______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Hard Disks

UK's picture

Bruce's original answer worked for me...
Your disk has one or more unreadable sectors. This does NOT mean that the disk is failing, but it has lost some information on those sectors. Run an extended self test:

smartctl -t long /dev/hda (PATA disk)
smartctl -t long -d ata /dev/sda (SATA disk)

After the test is over, the self-test log (-l) will show what sector is unreadable. This will probably agree with what is shown in SYSLOG. Then look at BadBlockHowTo (linked from smartmontools home page) for instructions about how to identify if there is a file stored on that bad sector. If you have no data that you need, you can fix the problem by overwriting the bad partition with zeros using dd.

But be careful not to zero out regions of the disk that store data that you need!

Bruce Allen

*****************

Nigel UK
Fylde Computer Repairs

How can I get source for

Update News's picture

How can I get source for SMART tools

Help my computer isnt working

Anonymous's picture

Last night my computer froze then all of a sudden it came up with failure loading operator disk, then today it has started saying disk read error. please can somebody help me.

i had the same problem. but i

sarees's picture

i had the same problem. but i bought a new hdd and this freezing has stopped since then.

Re: HDD Freezing

china phones's picture

it happened to me as well. i tried everything on the net but nothing worked for me. finally, i went to the repair shops and they said the HDD has to be replaced. It is replaced now and works well. anyway, no complains it was a old HDD.

ID# ATTRIBUTES

Anonymous's picture

When i run smrtctl on window xp,I did not get any of the following msg.
Is there any missing files on my computer?

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 068 049 006 Old_age - 116459253
3 Spin_Up_Time 0x0003 096 095 000 Old_age - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age - 28
5 Reallocated_Sector_Ct 0x0033 100 100 036 Old_age - 18
7 Seek_Error_Rate 0x000f 083 075 030 Old_age - 223581632
9 Power_On_Hours 0x0032 096 096 000 Old_age - 3778
10 Spin_Retry_Count 0x0013 100 100 097 Old_age - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age - 29
194 Temperature_Celsius 0x0022 047 049 000 Old_age - 47
195 Hardware_ECC_Recovered 0x001a 068 048 000 Old_age - 116459253
197 Current_Pending_Sector 0x0012 100 100 000 Old_age - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age - 0
202 Unknown_Attribute 0x0032 100 253 000 Old_age - 0

hdsentinel

Anonymous's picture

also check out hdsentinel, a program which takes the SMART data and analyzes it a bit more intelligently in the hopes of providing better warning of impending disk failure.

windows version is only $23 ($35 for professional version) and for us linux users there is a free command-line linux version (as of today, 2009-02-13 they are up to version 0.03).

the developers also have a nice writeup on their position on the problems with analyzing SMART data.

device status unreliable -- "2001" error

Frank Poole's picture

The 9000 system indicated our AE-35 unit had a high probability of failure within 72 hours. We replaced it and could find no indication of problems in the original unit. HAL said the replacement was about to fail, too. The ground recommendation was to leave the unit in service and let it fail. HAL now says it has failed, and it acts like it has, but I am wondering what I will find when I go out to replace it. Dave is monitoring the status from inside, and our link to the base is still down. I will report back when I learn more.

SMART on SSD -- be careful of recent bug

Tommy's picture

Some of us with SSD (solid state disk) units have been bitten by filesystem corruption after updating to the latest Ubuntu and Fedora. We have recently determined the corruption was (ironically) triggered by the SMART utilities.

I don't believe we know at this moment whether to expect a firmware update for the SSDs or some other remedy, but beware if you notice strange behavior in your device, especially immediately after probing its SMART status.

Note that SMART may be invoked without your direct knowledge -- Beginning with Ubuntu 9.10 libatasmart is being invoked automatically by DBUS. Also in Ubuntu, running GNU parted probes the disk's SMART status, though running fsck does not.

https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/445852

Follow-up to SMART on SSD -- beware

Tommy's picture

It's now looking like the problems affecting some SSDs in Ubuntu 9.10 was triggered by new code -- libatasmart, replacing the legacy smartmontools. Details in http://bugzilla.kernel.org/show_bug.cgi?id=14583

Thanks!

Phoinx's picture

Well, I'm still discovering smartctl, but your post was very helpful already! Very detailed and easy to read.
Thanks a lot!

I am getting the following error, from my dedicated server

Karen's picture

I just got moved from another server to this one

I am concerned about this error message and the hosting
people are telling me that it is fine, and the only way
to fix it is to turn off the temperature monitor of smart

That the control is set too low

I need to know if this is correct or not.

Here is the error:S.M.A.R.T Errors on /dev/sda
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
190 Temperature_Celsius 0x0022 065 039 045 Old_age Always In_the_past 622854179

Your help would be very much appreciated,

Thanks
Karen

didn't the article mention

Anonymous's picture

didn't the article mention to ignore 194 (temperature) as the variable changes so often?

"Studies have shown that

Anonymous's picture

"Studies have shown that lowering disk temperatures by as little as 5°C significantly reduces failure rates, though this is less of an issue for the latest generation of fluid-drive bearing drives. One of the simplest and least expensive steps you can take to ensure disk reliability is to add a cooling fan that blows cooling air directly onto or past the system's disks."

Which studies are those? Google's study of over 100,000 drives found that disks failed MORE often when they were cooled, and ran better hot:

"In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates. Only at very high temperatures is there a slight reversal of this trend."

-----------

The results from smartctl are very confusing and hard to understand. Wikipedia clarifies what some of the values mean, though there's still a lot of uncertainty:

http://en.wikipedia.org/wiki/Self-Monitoring,_Analysis,_and_Reporting_Te...

Another tool is http://gsmartcontrol.berlios.de/ , which adds a GUI to smartctl, and provides helpful descriptions when you hover over attributes.

You shouldn't quote articles

Anonymous's picture

You shouldn't quote articles without reading the whole piece. Further in that article, they said the failures were suspected not to come from the lower drive temperatures, but rather from the power fluctuations that were coming from the increased electrical load of the air conditioners...

Sense key errors, important or not?

Anonymous's picture

I have a critical server running our mail system which has lately been spewing SCSI "sense key errors" to the console. Is this important?

Taking a backup of this server will be a real pain, so do you think the hard drives are OK?

If your mail server is

Anonymous's picture

If your mail server is "critical" to your operation, shouldn't you be doing regular backups anyway? Or, at the very least, use redundant storage like RAID 1?

If you are getting errors, I would back it up without hesitation. Which is the bigger pain, an inconvenient backup or permanent data loss?

Backup and replace the hard drives, the sooner the better. Also, incorporate some redundancy in there.

Hardware Error

Alex507's picture

Im having the following error.

ce: /dev/sda, SMART Failure: HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS

I was wondering if someone have the correct solution for this issue or the main cause of this message.

block reassigns

Anonymous's picture

In very simple terms, your disk has had problems in the past and has either repaired itself or been repaired by some external utility. However, this repair has left your disk vulnerable to further failure. Back up immediately.

ID 194 shows strange value

bambid's picture

I have WDC WD5000YS-01MPB0 and when I read ID 194 from HDD I get this :

194 Temperature_Celsius 0x0022 253 253 000 Old_age Always - 101

which is totally wrong, my HDD can´t have 101 Celsius.

David

Thank you very much for

Anonymous's picture

Thank you very much for posting this! I already feared that my hard disk is dying because of the strange noises the PC made on start up. (I don't even know now where the noise came from. Could be the other things too, right?)

USB Harddrives

Anonymous's picture

I was wondering if its ever going to be possible to get the SMART info off a USB storage device?

Would it need a redesigned USB/ATA interface?

Seems a real shame I can't monitor the health of my many USB drives.

SMART for USB Harddrives?

mehereno's picture

I miss SMART for USB disks too. I wonder why I cannot monitor my disk connected over USB in Linux. Is it Linux driver limit? Or HW limit? My USB/ATA controler is based on Genesys Logic (05e3:0702), my disk supports SMART; I know I can read SMART statistics when I connect my disk over PATA cable.

Monitoring USB Hard Disks with SMART

D. L. Sneddon's picture

Bruce Allen's reference to SMARTs ability "query the disk's health status, run disk self-tests..." suggests that you could at least get some kind of condition report by removing your Hard Disk from its USB housing, connecting it to an ATA cable in a desktop, then running the query utility. I use a cheap ($6) adapter to connect my 2.5 inch laptop drives to my desktop. Though this ritual does not allow continuous monitoring of the USB drive, it may give a clue as to its current status.

Sad to say, other distractions, such as picking a distribution, have prevented me from trying out my own suggestion.
cheers...

Dead date

StuartH's picture

Is SMARTD able to calculate a dead date?

I came across another SMART tool that did this after it was left running for several weeks.

It gave a estimated dead date.

2 instances of smartd.conf

Vic's picture

I'm confused as to why there are 2 smartd.conf files. One in /etc/ and the other in /usr/local/etc/

Why are there 2? Which one do I need to edit? Lastly, how do I make the smartctl email me once a wek with the SMART results?

Thanks.

interpreting results of smartctl?

richard's picture

I ran smartctl -a /dev/hda and got the following error report (one of 8 - all similar on the same day):

Error 4 occurred at disk power-on lifetime: 9060 hours (377 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 41 00 00 e0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000041 = 65

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 02 40 00 00 e0 00 00:01:01.697 READ DMA
c8 00 02 40 00 00 e0 00 00:01:01.685 READ DMA
10 00 3f 00 00 00 e0 00 00:01:01.685 RECALIBRATE [OBS-4]
c8 00 02 40 00 00 e0 00 00:01:01.685 READ DMA
c8 00 02 40 00 00 e0 00 00:01:01.681 READ DMA

My question is "What does this mean exactly and should I be worried/how can I fix it?"

Thanks for a brilliant piece of diagnostic software. I only wish I was good enough to do full justice to it!!

Regards

Richard

Summary of bad (pending) sectors

Kitty's picture

Hello,
why doesn't smartctl show a summary of bad or pending sectors? One such message can be found in /var/log/messages like "Aug 27 12:17:51 91-64-143-104-dynip smartd[4483]: Device: /dev/hdb, 7 Currently unreadable (pending) sectors", however, it would be more convenient to get this information directly from smartctl. How can I get this information?

Thank u!!!

When do you replace a disk

Michael Janich's picture

I've seen all these attributes and things, but my question
is "when do you replace a disk?" I think that is the only
question a typical sysadmin has.

THANKS

Michael

The day before it fails,

Dotgain's picture

The day before it fails, obviously.

Absolutely

Anonymous's picture

Absolutely

Nice written & informative article

fromport's picture

Thank you for a nice written and informative article.
I tried it on one of my scsi drives which tends to be busy.
It gave me some other information:
The overall status is ok, should i worry about the errors ?

# smartctl -a /dev/sda|less
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: MAXTOR ATLAS10K5_147SCA Version: JNZ3
Serial number: D404M6EK
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Mon Jul 31 07:47:36 2006 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 26 C
Manufactured in week 04 of year
Current start stop count: 1074003968 times
Recommended maximum start stop count: 1124401151 times
Elements in grown defect list: 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 13957055 0 0 0 0 10856.427 0
write: 0 0 0 0 0 21552.894 0

Non-medium error count: 564

Well, this is great

PhilG's picture

Well, this is great information (certainly the parts I understand are.....)

Anyway, I am using SMARTMON to monitor the health of the Seagate drive in my Tivo

The last run produced THIS:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 068 049 006 Old_age - 116459253
3 Spin_Up_Time 0x0003 096 095 000 Old_age - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age - 28
5 Reallocated_Sector_Ct 0x0033 100 100 036 Old_age - 18
7 Seek_Error_Rate 0x000f 083 075 030 Old_age - 223581632
9 Power_On_Hours 0x0032 096 096 000 Old_age - 3778
10 Spin_Retry_Count 0x0013 100 100 097 Old_age - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age - 29
194 Temperature_Celsius 0x0022 047 049 000 Old_age - 47
195 Hardware_ECC_Recovered 0x001a 068 048 000 Old_age - 116459253
197 Current_Pending_Sector 0x0012 100 100 000 Old_age - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age - 0
202 Unknown_Attribute 0x0032 100 253 000 Old_age - 0

********************************************************************************

There are some BIG numbers for attributes 1, 7 and 195.

I do have a fairly good understanding of disk architectures, but I cannot get a handle on what these fields might mean so any assistance would be GREATLY appreciated

Basically, I just want to know whether I have LOTS of errors on this disk or whether I just have a small number of "bad spots" that I am hitting very often

Many thanks

Phil G

with attrib 1,7,195I find

Dave Rave's picture

with attrib 1,7,195
I find this with all my seagate drives
which worries me where I read that part about ata4 standard and drives not keeping the attributes anymore

i think my non-seagate drives are now just too dumb to realise they are failing.
if my seagate drives get that error value down in the 60's, they are going out soonish
not real quick today soon
but the system is just iffy and had to play with
if you get spinrite to run over the drive, it will improve, some, for a while

Cannot get rid of SMART warning on startup

Mark F.'s picture

This is a great article, and the questions following it make it even better. I now understand what that SMART error warning I get whenever my machine starts up. Thanks for the great tools too!

Now my question, I get the following error everytime the machine starts up (I am paraphasing a bit):

SMART monitoring error
Please backup your data!
Press F1 to continue

Strangely, over several years I simply ignored this error and dutifully pressed F1. Slackware Linux (my former OS) and now NetBSD 2.0 and 3.0 have worked without any problem.

After reading the article and doing a long test, I have the following error report:
watson:~#smartctl -l selftest /dev/wd0d
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
Warning: device does not support Self Test Logging
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 29145 -
# 2 Short offline Completed without error 00% 29144 -

If I immediately give the following command:
watson:~#smartctl -l error /dev/wd0d |sed -n '/Error /p'
Warning: device does not support Error Logging
SMART Error Log Version: 1
ATA Error Count: 133 (device log contains only the most recent five errors)
ER = Error register [HEX]
Error 133 occurred at disk power-on lifetime: 29144 hours (1214 days + 8 hours)
Error 132 occurred at disk power-on lifetime: 29144 hours (1214 days + 8 hours)
Error 131 occurred at disk power-on lifetime: 29144 hours (1214 days + 8 hours)
Error 130 occurred at disk power-on lifetime: 29144 hours (1214 days + 8 hours)
Error 129 occurred at disk power-on lifetime: 29144 hours (1214 days + 8 hours)

The problem is I always have to be around to press F1 whenever the system boots up. Other than that, the disk (and the OSes) seem to work fine. I tried disabling BIOS harddrive monitoring but that did not help. Also disabling smart through smartctl and rebooting but that did not help either. Somehow the disk always remembers the SMART error.

The disk is a Maxtor 91531U3.

Is there anyway I get rid of that SMART warning at startup. Any help would be much appreciated.

Mark

Can I switch off SMART detection using this tool?

Mike's picture

I get messages from bios when I switch on the laptop, that "HDD status bad , back up and replace. I want to stop this message appearing so that windows will load normally. I cant disable it via BIOS as it has got no such an option. Will this tool help me?

Thanks!!

Lifetime

Denis's picture

First of all, congratulations on the article.

I've been intrigated with some data shown at the smartctl -a, about lifetime. I've read around, and I still have a doubt.

194 Temperature_Celsius 0x0022 043 049 000 Old_age Always - 43 (Lifetime Min/Max 0/20)

How am I supposed to read this lifetime Min/Max?
What 0/20 means? Anyone knows ?

Thanks

Seagate ST340014A 3.06

Eugene Dzhurinsky's picture

First of all - thank you for great article!
I just have a question - smartctl reports PASSED for my drive, but also it erports for
Extended offline Completed: read failure 0% 1895 39965820

Does it means I just have bad sector which could be remapped, because other sections reports no errors:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 071 067 006 Pre-fail Always - 182149325
3 Spin_Up_Time 0x0003 099 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 16
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 54648620
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1895
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2198
194 Temperature_Celsius 0x0022 038 045 000 Old_age Always - 38
195 Hardware_ECC_Recovered 0x001a 071 067 000 Old_age Always - 182149325
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Yes, this probably means

ballen's picture

Yes, this probably means that your disk has a bad sector. Read the BadBlocksHowTo linked from the smartmontools home page, to see how to identify if there is a file being stored on that bad part of the disk, and how to force the drive to reallocate that sector.

Bruce

worst value

Web Hosting Tech's picture

Great article, many thanks!

I have one thing that I cannot quite understand. If I read it correctly, the value is the current snapshot of what smartctl sees. In the case below, that is 045. The funny thing is it stats that the "worst" it has seen is 054.

194 Temperature_Celsius 0x0022 045 054 000

Is the temperature attribute the exception to the rule that the worst value is the "smallest value attained since SMART was enabled on the disk"

I suppose this would make sense as the worst temperature in a real life system would be a high temperature in most cases. Either that or I am way off base!

This must be a SEAGATE disk.

ballen's picture

This must be a SEAGATE disk. Seagate ignores the smart standard and just stores the temperature (in Celsius) in these variables. So your current disk temperature is 45C and the hottest it has ever been is 54C.

Note: this info can also be found in the smartmontools FAQ page.

Bruce Allen

SMART for SATA drives

Tracy R T's picture

I am running Centos release 4 with SATA drives on the digital video recorders we are building. I want to utilise the SMART suite but I have found that the SMART daemon fails to start during bootup. DO SATA drives support SMART?

regs TT

just what i encountered last

tomas's picture

just what i encountered last week. thanks for the info.

security systems

Yes, smartmontools supports

ballen's picture

Yes, smartmontools supports SATA drives via libata. You need a Linux 2.6.15 or greater kernel. A typical command line is:

smartctl -a -d ata /dev/sda

Starting with release 5.37 smartmontools will also support a SCSI to ATA translation layer (SAT). The code is already in CVS. With this you can also use:

smartctl -a -d sat /dev/sda

The latter form allows extra functionality, for example running selective self-tests.

Bruce

SMART support on SATA drives

sensovision from WKey's picture

Unfortunately right now official libata library in kernel doesn't support ATA-passthrough calls and the only way to check SMART status right now is to use patches like this: http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/

Here is the quote from developers of smartmontools:
"Smartmontools should work correctly with SATA drives under both Linux 2.4 and 2.6 kernels, if you use the standard IDE drivers in drivers/ide. If you use the new libata drivers, it won't work correctly because libata doesn't yet support the needed ATA-passthrough ioctl() calls. Jeff Garzik, the libata developer, says that this support will be added to libata in the future. When this happens, we'll add support to smartmontools for a new SATA/libata device type '-d sata'. Typically, to force an SATA disk to run using the standard (non-libata) drivers, you must use the BIOS to select "legacy mode" for the controller. If the IDE driver doesn't support your particular SATA controller, or the controller doesn't have a legacy interface, then only libata can be used. Unless the hard disk controller on the system motherboard is Intel, VIA or nVidia, standard IDE drivers may not work

Note: an unofficial patch to libata that allows smartmontools to be used with the standard '-d ata' device type was posted to the linux kernel mailing list at the end of August 2004. The patch is included in the libata-dev patchset that can be applied to a recent Linux kernel (>= 2.6.9). With a SATA disk driven by a libata driver, smartmontools can now be used by specifying both the device type 'ata' and the SCSI device corresponding to this disk, for example, smartctl -i -d ata /dev/sda. The patch is still under development and it is probably best to make sure that the disk is idle before trying smartmontools. "

http://smartmontools.sourceforge.net/#testinghelp

Hope this helps.

good work

Guest's picture

Thanks very much for this article. I feel better when I know how my HD's health is. Good work!

S.M.A.R.T

Thomas Rice's picture

Well well - I had some ECS-AMD-Mainboard and activated the S.M.A.R.T. ... but actualy 2 Seagate-Harddisks died (the slow way - losing information) ... without SMART telling me that there is a Problem 8-)

Unfortunately in the real

ballen's picture

Unfortunately in the real world SMART only detects about 2/3 of disk problems. The other 1/3 go undetected until the disk fails.

Bottom line: even with SMART you MUST back up data that you need and can not replace.

Bruce Allen

Kernel I/O Error and SMART test result?

Anonymous's picture

I would like to know are there any direct reflection between the kernel I/O Error report and SMART test report?

I had a harddisk in Linux server, being reported I/O Seek Complete Error from Kernel nearly a year ago. I just leave that partition unused and used another harddisk to replace the mount point for that partition and let the server continues running.

After i read this article, i just go with a testing -a at that "Kernel reported problematic" harddisk.

The result is:
SMART overall-health self-assessment test result: PASSED

What does this mean? my harddisk is healthy with Seek Complete Error?
Or i don't have enough understanding about the actual manner of test result?

First part of the Error report is:
Error 9 occurred at disk power-on lifetime: 7557 hours
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c5 ee 52 e0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 08 c4 ee 52 e0 00 62302.187 READ DMA EXT
25 00 08 7c ee 52 e0 00 62302.186 READ DMA EXT
35 00 08 c9 8f f4 e0 00 62302.186 WRITE DMA EXT
25 00 08 bc ee 52 e0 00 62302.184 READ DMA EXT
25 00 10 7c 5f 53 e0 00 62302.184 READ DMA EXT

Your disk has one or more

ballen's picture

Your disk has one or more unreadable sectors. This does NOT mean that the disk is failing, but it has lost some information on those sectors. Run an extended self test:

smartctl -t long /dev/hda (PATA disk)
smartctl -t long -d ata /dev/sda (SATA disk)

After the test is over, the self-test log (-l) will show what sector is unreadable. This will probably agree with what is shown in SYSLOG. Then look at BadBlockHowTo (linked from smartmontools home page) for instructions about how to identify if there is a file stored on that bad sector. If you have no data that you need, you can fix the problem by overwriting the bad partition with zeros using dd.

But be careful not to zero out regions of the disk that store data that you need!

Bruce Allen

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState