Highly Reliable Systems: Removable Disk Backup & Recovery


Monthly Archives: August 2012

3.5 Inch Hard Drives Use Less Power than 2.5 for Backup

August 31st, 2012 by

We spend a lot of time studying backup to removable disk.  Would it surprise you to learn that 3.5 inch hard drives use less power than 2.5 for backup on a per Gigabyte stored basis?

3.5″ disks have around twice the surface area of 2.5″ disks.  They also typically have 1.5 to 2 times as many platters.  No surprise that total storage per drive runs about 3-4 times larger for 3.5″ compared to 2.5″.  For the past 2 years (2010-2012) the largest available 2.5″ drive has been 1TB.  The largest 3.5″ drives have been 3TB, recently going to 4TB.

Let’s compare the power usage of a Seagate Barracuda 3TB 3.5″ drive running at 7200 RPM to a 1TB Seagate Constellation (ST9100640NS) running at 7200 RPM.  For this comparison we assume neither drive spins down or goes to standby mode:

1TB 2.5″ drive –  Idle – 3.31 Watts, Typical operating 5.21 Watts, PowerChoice mode 1.2 W
3TB 3.5″ drive –  Idle – 5.4 Watts, Typical operating 8.0 Watts, Sleep Mode .75 Watts

Let’s assume our backup job takes 8 hours per night and we need to backup 3TB:

Using 2.5″ drives – 3 drives *(5.21W*8Hrs + 3.31W*16Hrs) = 283.9 Watt*Hours per day
Using 3.5″ drives – 1 drives *(8W    *8Hrs + 5.4W*16Hrs) = 150.4  Watt*Hours per day

Conclusion: Backing up to a 3.5″ hard drive uses half the power
(assuming the drives don’t go into standby mode).

Obviously there are a lot of other scenarios we can imagine.  If both drives were in use 24 hours:
2.5″ drives – 3 drives *(5.21W*24Hrs) = 375.1 Watt*Hours per day
3.5″ drive1 – 1 drive * (8W*24) =              192 Watt*Hours per day
Still about a 2 to 1 power savings in favor of 3.5″

What if the drives spin down (sleep) when not in use?
Using 2.5″ drives – 3 drives *(5.21W*8Hrs + 1.2W*16Hrs) = 182.6 Watt*Hours per day
Using 3.5″ drives – 1 drives *(8W    *8Hrs + .75W*16Hrs) = 76  Watt*Hours per day
Now it’s 2.4 to 1 power savings in favor of 3.5″

What if the 2.5″ drives spin down but the 3.5″ drives don’t?
Using 2.5″ drives – 3 drives *(5.21W*8Hrs + 1.2W*16Hrs) = 182.6 Watt*Hours per day
Using 3.5″ drives – 1 drives *(8W    *8Hrs + 5.4W*16Hrs) = 150.4  Watt*Hours per day
So even if the 2.5″ drives spin down, we use less power with the 3.5″ drive.

 

 

 

Posted in Blog

Karl Palachuk PodCast from SMBNation 2012 in Las Vegas Oct 12

August 30th, 2012 by

We just got back (October 12-14) from SMBNation 2012, where we introduced the new RAIDFrame Plus to great reseller interest.  The RAIDFrame Plus is a NAS with 4 Individual removable RAIDPacs.

While we were there, Karl W. Palachuk of smallbizthoughts.com came by and did a nice podcast, which you can find here and for which we are very grateful.  Karl is one of those guys who gets it.

 

Posted in Blog

What Does Enterprise Class Backup and Data Storage Really Look Like?

August 30th, 2012 by

Recently Joseph Walker, a blogger at SMBNation asked what Enterprise class backup and storage really looks like.  My response was to discuss how small business servers had hard drives inside the server chassis whereas Enterprise usually used one or more external SANs.   Joseph published the entire response almost verbatim here.  My portion of the text is reprinted below as a good summary of what’s happening in storage:

Due to the increase in the use of virtualized servers in the enterprise, storage has migrated over the last 10 years from inside the server to external SANs.  While small business still largely builds their servers with mirrored boot drives and RAID5 or RAID6 SAS drives installed inside the server chassis, enterprise customers like the flexibility of using shared storage among multiple physical and virtual servers.  By centralizing storage the enterprise gains several benefits.

Virtualization platforms like VMware and vMotion allow enterprise customers to move running virtual machines from one physical server to another with zero downtime, continuous service availability and complete transaction integrity.  In addition, they minimize wasted drive space compared to having storage in various physical servers.   In an environment that has 100 servers and storage physically installed inside the server, each has to be configured with enough empty space to allow for growth.  To allow overhead of two to three times the current data size per server, it’s not hard to see how an enterprise wastes a tremendous amount of the purchased hard drive space by putting it in the server.  By contrast, with a centralized SAN and shared storage, disks can be virtualized just like machines are.  Space can be allocated to each server based on need without wasting it.

Several issues arise when storage is moved from inside the server to a SAN.  The first is performance.  Anyone who has ever replaced a 7200 RPM drive with a 10,000 RPM or an SSD in a server knows that I/O speed largely dictates the end user experience when running multi-user database applications.  Users will report a night and day difference after the upgrade when doing I/O intensive things like running large reports.  How can a SAN keep up with locally attached SAS storage and retain performance?  The answer is in many cases they can’t.  SANs do have the advantage of being more highly engineered and having more spindles (more hard drives), which can make up for some of the performance gap.  Using faster file systems and interfaces like Fiber Channel has also been a traditional part of the performance answer.

Enterprise SANs must insure redundancy and reliability.  SANs are typically the domain of specialty  manufacturers like EMC, Hitachi, HP and now Dell.  Many of these vendors use redundant power supplies, RAID arrays, and redundant “controllers” (think of a controller as the motherboard inside the SAN).  Much thought goes into making sure the SAN is highly available.  In large enterprises, it’s not unusual to see multiple SANs spread over several offices.  Software in the SAN then allows them to replicate or “snap shot” to one another for backup and redundancy.

HighRelyDiagramThe line between Network Attached Storage (NAS) and a Storage Area Network (SAN) device has blurred over the last 2 to 3 years.  NAS has traditionally been storage shared at the file level, whereas SANS have shared their storage at the block level.  Software protocols like iSCSI are showing up in many NAS boxes, allowing them to be used more like traditional SANs, usually over Gigabit Ethernet, which may not perform as well as fiber channel or other traditional SAN hardware interfaces.

One key differentiator between lower end iSCSI implementations versus true SANS is the ability to have more than one server to share the same hard drive space or volume.  This functionality is important in being able to fail a virtual machine over to another physical machine while retaining connectivity to the shared storage.  At Highly Reliable Systems, our iSCSI implementations require the user to use an entire physical drive rather than allowing them to sub-divide the drive and allocate it to different servers.  This restriction is only because we’re focused on drive removability and creating a transportable backup media. It also means that sharing drive space between servers is an undesirable feature.

Posted in Blog

A 700TB Backup drive?

August 20th, 2012 by

OK it is a little out there.  By storing data in DNA Harvard Scientists George Church and Sri Kosuri have stored 700TB in 1 gram of material (70 Billion copies of a book).  Which makes us wonder how long it will be before there is a High-Rely DNA backup drive.  The medium is very dense, and stable (data lasts for years). The average access times would be low… hours or even days to retrieve. But while you couldn’t read it very quickly, what an opportunity to store large amounts of data!

Posted in Blog

Why RAID-5 Stops Working in 2009 – Not Necessarily

August 13th, 2012 by

Could you write and then read an entire 3TB drive five times without an error?

Suppose you were to run a burn in test on a brand new Seagate 3TB SATA drive, writing 3TB and then reading it back to confirm the data.   Our standards are such that if a drive fails during 5 cycles we won’t ship it.  Luckily, all 20 of 20 drives we tested last night passed.  In fact, most of the 3TB drives we test every week passed this test.  Why is that a big deal?  Because there is a calculation floating around out there that shows when reading a full 3TB drive there is a 21.3% chance of getting an unrecoverable read error.  Clearly the commonly used probability equation isn’t modeling reality.  To me this raises red flags on previous work discussing the viability of both stand alone SATA drives and large RAID arrays.

It’s been five years since Robin Harris pointed out that the sheer size of RAID-5 volumes, combined with the manufacturer’s Bit Error Rate (how often an unrecoverable read error occurs reading a drive) made it more and more likely that you would encounter an error while trying to rebuild a large (12TB) RAID-5 array after a drive failure.  Robin followed up his excellent article with another “Why RAID-6 stops working in 2019” based on work by Leventhal.  Since RAID-5 is still around it seems Mark Twain’s quote “The reports of my death are greatly exaggerated” is appropriate.

Why hasn’t it happened?  Certainly RAID-6 has become more popular in server storage systems.  But RAID-5 is still used extensively, and on 12TB and larger  volumes that Robin predicts don’t recover well from drive failures.  Before I get into some mind numbing math let me give away what I think might be an answer:  Because the Bit Error Rate (BER) for some large SATA drives are clearly better than what the manufacturer says.  The spec is expressed as a worst case scenario and in the real world experience is different.

Seagate’s BER on 3TB drives is stated as 10^14, but may be understated.  Hitachi’s bit error rate on their 4TB SATA drives are 10^15 and in my experience the two drives perform similarly from a reliability perspective.  That order of magnitude makes a big difference on the calculations of expected read errors.  Let me set the stage by going back over the probability equation used by Robin Harris and Adam Leventhal.

The probability equation they use for a successful read of all bits on a drive is
(1-1/b)a   
“b” = the Bit Error Rate (BER) also known as Unrecoverable Read Error(URE) rate
“a” = Number of Bits read (the amount of data on an entire volume or drive)

We can use sectors, bytes or bits for this calculation as long as we stay consistent.  In this article Leventhol uses sectors, which I think just complicates the calculation but let’s confirm his numbers.  He calculates that a 100GB disk array has 200 million 512byte sectors.  So a=2.0×10^8.  He uses b= 24 Billion (2.4×10^10) because he says the bit error rate is 10^14 which you divide by 512bytes per sector and 8bits per bite.  He determines the chance of array failure during a rebuild is only 0.8 percent.  From his article:  (1 — 1/(2.4 x 1010)) ^ (2.0 x 108) = 99.2%. This means that on average, 0.8 percent of disk failures would result in data loss due to an uncorrectable bit error.”

Now we can confirm whether we get the same results by typing the values into our formula on http://web2.0calc.com/.  I wanted to flip this equation from probability of success to calculate the probability of a failure directly and then express it as a percentage so I just subtract the success rate from 1 and multiply times 100:

100(1-(1-1/b))

Cut and paste the string below into the website calculator and hit the equal key:
100*(1-(1-1/(2.4E10))^(2E8))       = 0.83% (I’ve rounded off)

This is the same number the author got, indicating less than 1% chance of failure while reading an entire 100GB array or volume.

Personally I think it is easier to do all this by leaving all the numbers in bits. This is because the hard drive vendors express “b” or “BER” in bits rather than sectors.  For example, this Seagate data sheet shows for Barracuda SATA drives the number for “Nonrecoverable Read Errors per Bits read, Max” is 1014    This is the same number the author used above.  We express this in scientific notation for the variable “b” as 1E14 (1 times 10 to the 14th).  This is about 12.5TeraBytes.  The value for “a” on a 100GB volume can be written as 100 followed by 9 zeros. We must multiply by 8 bits per byte to get the number of bits.

Probability of a read error while reading all of a 100GB volume using SATA drives
100*(1-(1-1/(1E14))^(100E9*8))      = 0.80% (rounded off)

So we’re getting about the same answer using bits instead of  sectors (subject to some rounding errors) and I think using bits is a little less confusing don’t you?

Robin Harris did the calculation on a 12TB array and got a whopping 62% chance of data loss during a RAID rebuild.  Can we confirm his math using bits instead of sectors and the same formula?  As before copy and paste this into web2.0calc.com:

100*(1-(1-1/(1E14))^(12000E9*8))  =       61.68%   Yep.  At least our math is tracking. Now that we’re getting the same results as the experts we’re ready to try our own calculations using real world hard drives.  Let’s not even talk about RAID.  Let’s just take a stand alone Seagate 3TB drive and see what is the probability we’ll get a single non-recoverable read error if we fill and read the whole drive.

Probability of a read error while reading all of a Seagate 3TB SATA drive
100*(1-(1-1/(1E14))^(3000E9*8))      = 21.3% (rounded off)
So after all this math I get the number I started this article with – a 21.3% chance of a single read failure.  But wait a minute!!! Does that sound right to you?  Doesn’t that mean that if I fill a 3TB drive and read all the bits about 5 times that I will likely encounter an error that the drive can’t recover from?  If that were true it would mean 3TB drives would be unusable!  Just for grins lets try a Hitachi 4TB drive with it’s slightly better BER of 10^15:

Probability of a failure while reading all of a 4TB Hitachi SATA drive

100*(1-(1-1/(1E15))^(4000E9*8))      =3.14% (Honest-I didn’t try to get Pi)

Well that’s better!  But if my 4TB drive is going to fail 3% of the time when I read the whole thing I’m still pretty concerned.

Summary
I previously pointed out that our burn-in test alone disproves the calculated failure rates.  We use these drives (both 3TB and 4TB) a lot in a mirroring backup system where drives are swapped nightly and re-mirrored.  Since the mirroring is done at the block level the entire drive is ALWAYS read to create a new copy with its mirror partner.  Which means, we should see some of these read errors on a regular basis.  In fact, our media often goes for years without a problem.

Similarly, our failure rates rebuilding large 8TB RAIDPacs are nowhere near what this probability formula suggests (6%).  So should we believe real world results or the math?  Maybe a hard drive expert can suggest why the formula isn’t properly modeling the real world.  I’m not the only one to notice that the probability formula doesn’t map to real world results: http://www.raidtips.com/raid5-ure.aspx

There is no question that both probability of read error during rebuilds, along with some of the other concerns about large arrays taking a while to rebuild due to limitations of the interface and drive speed should play into future planning and product design.  Right now our RAID-5 three-drive RAIDPacs are rebuilding at about 250-300 Gigabytes per hour, so an 8TB RAIDPac can take about 26 hours to rebuild.  Luckily, we use RAID for backup media rather than primary storage so speed of rebuild isn’t as large an issue for our clients.  Drive failures are relatively infrequent (about 3 drive failures out of 100 per year) and multiple RAIDpac backup appliances are used to duplicate data.  Even when we raise the size of our RAIDPac to 12TB using three 6TB drives the issues will be manageable, though rebuild time will go to 1.67 days.  We’ll continue to watch this issue and continue to make backup reliable, but for now it’s safe to say RAID-5 is alive and well.

2015 UPDATE: Using RAID-5 Means the Sky is Falling!

Posted in Blog

Backup to Cloud – 17 Tips for Doing it Right

August 9th, 2012 by

Backing up to the cloud makes a lot of sense.  There are a broad spectrum of vendors and prices and we always like to encourage people to do both local and cloud backup (hybrid cloud).  I think the cloud vendors are different in different countries and it makes sense to pick one nearby (but not too nearby so that disasters affect both primary and data backup location simulaneously) .  You may want one that accepts (and sends) seed drives if you have more than 1TB of data.   You might want to consider a distributed cloud vendor such as Symform or Cleversafe.  This article has some good reasons why.  I’m not saying these vendors are always cost effective, but the concept is worthy of review and the white paper is well reasoned.

Some things to think about cloud:

  1. Always supplement with a local backup (preferably 2) to a local appliance or drive. Look to our DAS or NAS products for your local storage
  2. Always encrypt with AES or better encryption. Protect keys and keep copies.
  3. Supplement local backup a second time by nightly or weekly copying data to a workstation using crude robocopy or similar type software, even if that means you have to install a big SATA drive or 2 in the workstation. We have recommended batch files, ways to log results, and windows scheduling instructions.
  4. Ask about cost & ability to create and send seed drives to data center and cost of creating and sending you a hard drive Fed-Ex in server down emergency.  It takes too long to move large amounts of data when you’re completely down and need a bare metal restore.
  5. Speaking of which, ask about bare metal restore unless file restore is enough for you.
  6. Ask if their incremental backup engine is file or block based.  In other words, if a file changes does the whole file get re-transmitted or just the blocks that change?  Programs like Exchange might have a huge .edb file that changes constantly.  Retransmitting the whole file on every change means you never “catch up”.
  7. Ask if defragging your server hard drive messes up the incremental backup scheme (hint: Yes it does)
  8. If you care ask about virtualization features.  In other words can I spin your backup “in the cloud” to verify my backup is good and/or use in an emergency?  Note this is a very high end feature that very few vendors have and that costs a lot.
  9. Ask about SAS-70 or other security and redundancy certifications of data center, how they back themselves up, whether they include a 2nd location (those with 1 location might be able to charge low price like 25cents per gig per month.  Those with multiple and lots of support might charge $1 per gig per month or more).
  10. Ask  “Where is my data”.  Is it outside the country?  Do I care?
  11. Ask about version retention.  i.e. can I restore a file from last January or do I have only 1 copy of my backup (the latest).
  12. Ask if one bit gets flipped in transit or when reading off the drive, how would you know about it (ie does software do CRC or optional verification pass?)
  13. Ask whether the cloud vendor is profitable, whether it is venture capital funded.  For example Carbonite lost money last quarter, even though it has a very high advertising profile here in the States.
  14. Ask if cloud backup software supports active directory, exchange, granular or brick level Exchange (ability to restore single messages or mailboxes), open files, SQL, VSS.
  15. Ask how they recommend testing the backup is restorable.
  16. Consider bandwidth costs and whether you need to pay for more.  This is part of the hidden cost of cloud.  You may have to schedule bandwidth to only be used at night to avoid slow internet during the day.
  17. Ask when the last time they were down, what their SLA looks like (send a copy), whether they have a data rider on their E&O insurance that protects them if they lose your data (For example we carry 1 Million dollars per “glitch” and we don’t even offer a managed backup service at this time).  You will find most vendors include language that says “we make best effort to backup your data but ultimately we are not responsible if you can’t restore it”.
Posted in Blog

Apple co-founder Wozniak: Add Local Disk to your Backup Strategy

August 7th, 2012 by

Industry visionaries are concerned about storing everything in the cloud.  Besides the numerous outages we’ve seen over the last few years concern is growing about the liability and ownership issues.  In this article Apple Co-founder Wozniak says “I really worry about everything going to the cloud,” he said. “I think it’s going to be horrendous. I think there are going to be a lot of horrible problems in the next five years.”

We advocate a multiple backup approach in which removable disk backup systems are combined with the cloud using multiple backup software packages to insure recoverability of your data

Posted in Blog

Grab your Data and Go – Removable Disk Backup Systems

August 1st, 2012 by

Joseph Walker, a writer for SMBNation and an IT consultant recently wrote an article about supplementing a backup strategy with local backup that can be pulled and taken with you in an emergency.  Removable disk backup systems provide a way to do a hybrid cloud or local backup strategy to improve the disaster recovery time in an emergency.

The article can be found here.  Joseph winds up his musings this way: “So yes, use the cloud if you’d like—embrace it even—but don’t count on it to “save your bacon” (as one of our community partners likes to say). Me, after talking to Mr. McBride, I have my eye on one of his company’s RAIDPac removable drives. If worst comes to worst, I figure I could grab the three-drive “pac” and stash it in my wife’s purse. I’m certain it would fit, and I can’t imagine a more secure location.   ”

What Joseph didn’t mention is the RAIDPac was designed with USB3, SATA, and power ports to be used separately in an emergency by plugging it into any host that supports large partitions.  The photo nearby shows this more clearly. Look for network connected RAIDFrames in the near future.

Posted in Blog

Removable Disk Backup for the other 60% of businesses

August 1st, 2012 by

Sage software, a leading provider of business management software recently determined that 60% of small businesses don’t “offsite” their data.  The study, discussed here says that while nearly all U.S. small businesses back up critical financial data, 6 in 10 don’t get it off site.  Removable disk backup systems can help overcome the high cost of moving data offsite by providing an easy methodology to do so without monthly fees.

“Backing up on-site may not be sufficient to protect small businesses from natural disasters – particularly if the business is located in an area prone to earthquakes, hurricanes, fires or flooding – or more common crises, such as theft or hardware malfunction,” said Connie Certusi, executive vice president and general manager of Sage Small Business Solutions. “Data loss could have a serious impact on operations and crisis recovery. The development of a preparedness plan that includes solutions for protecting critical information, such us backing up off-site, could be the difference between getting a business on its way to recovery and worrying about its survival.”

With Highly Reliable Systems removable disk backup systems like the 2 bay RAIDFrame or 2 bay Netswap Plus you can send data to a local drive, while allowing the system to produce a duplicate copy that you can easily take off site.  Consultant’s, managed service providers and others responsible for backup should consider forwarding the Sage article to their clients to generate conversation about improving backups and creating disaster recovery plans.

 

Posted in Blog