Re: [Hampshire] Laptop Hardrive

Top Page

Reply to this message
Author: Vic
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] Laptop Hardrive

>    The main route to discovering you've got a failed drive or part of
> drive is when you can't read the data that was originally put on
> it. This will come to light either when a checksum is computed and
> fails comparison, or when part of the hardware is operating outside
> the parameters that are expected of it. When that happens, you have
> already lost data.


I'm not so sure that's the "main route". It's the most likely route, given
the attention span of most computer users I see, but it's only hit my
supported customers when they have ignored advice to buy a new drive. I
suspect (but have insufficient data to prove) that a large portion of the
lack of effectiveness of auto-reallocation is down to the fact that modern
computer users are so often conditioned to ignore warnings...

>> The purpose of SMART is to notice impending failures before they get to
>> such a critical level,
>
>    It's not very good at it, though.


I disagree.

> The famous Google paper on disk
> failures quotes a model (under "related work", page 11) with only a
> 30% success rate based on SMART information.


30% would do me. That's three in every ten failures that can be
intercepted prior to catastrophe, and the drive swapped out with minimal
interference to operation, and no data loss.

30% might not be as good as 100%, but it's a damn sight better than 0%.

> They also state (section
> 3.5.6, page 10) that 56% of failed drives show no failure indicators
> at all in the four main SMART fields, and 36% of the failed drives
> show no failure indicators in SMART _at all_.


Yes. These figures are higher than I would expect - and certainly don't
mesh with my personal experience. I suspect (but again, can't prove) that
the effectiveness of the technology depends on the duty cycle of the
drives; they have quite a bit of computation to do with very little
processing grunt...

>    Detecting failures and fixing them before they're going to occur is
> a nice fairy-tale, but in the real world, it's just not going to
> happen unless you're very lucky.


I appear to be the luckiest man in the multiverse.

>    SMART is simply a reporting process (plus a self-test feature) --
> the drive still does sector reallocations even if SMART itself is
> turned off.


Yes. It is the drive firmware that does the heavy lifting. It is the SMART
feature that warns the user, who then fails to do anything nutli the drive
has failed completely.

> If there's damage to the sector
> (physical or checksum), then the data that's read and rewritten may
> not be the data that was originally put on the disk. In this instance,
> it may not be 512 bytes of zeroes, but it's not guaranteed to be
> identical.


Drives have ECC on the data surface; there is a mathematical probability
that a random data failure could lead to a successful ECC check, but I
don't think we need worry too much about that.

In the event of an ECC failure, the sector will not be reallocated - it is
already failed.

If the ECC check passes, the data is almost certainly correct, so a
reallocation will work correctly.

Vic.