Why backing up SQL databases to multiple files is a waste of your time

If you are dealing with restoring a SQL backup and your backup file has 100 pieces, well you can thank the internet for that pain in the ass SQL statement. There is only one very limited situation where doing that (backing up / restoring from) many different files will have a positive performance impact. That is where you are backing up your databases across many different physical drives (or remote drives) and the speed is then distributed. But if you do that, you are now vulnerable to a single drive failure causing a total loss of data. It’s the classic you say tomato I say fuck you debate.

After much ado, here are the copy tests. This is Windows explorer SMB 2.0 copy tests to see if copy a single large file (20 GB) or many smaller files 20-30 files ~700MB-2GB in size is more efficient. Is one single file faster or is many smaller files faster?

Lab test – Both VMs have a 1 Gbps shared link. 128 MB/sec is max if all the network link is being used by one server.

Copying a 20 GB SQL backup

Test #1 10 files, ~2 GB each

~98 MB/sec

Took 3 minutes : 32.6 seconds

Test #2 1 file, 20 GB

~100 MB/sec

3 minutes : 29.3 seconds

Test #3 30 files, ~700 MB each

~100 MB/sec

3 minutes : 24 seconds

OK the results without further ado. Pipe speed = pipe speed. You don’t increase it with how many files you are transferring. It=SMB v2.0

So we want single file backups because then we can throttle the SQL restore and it does make a difference with the amount of restore files you have and the BUFFERCOUNT / MAXTRANSFERSIZE parameters you provide. 1 file = what we want. many files = unexpected behavior.

BOOM DONE!

We need 1 file DB backups per DB  and that means whether HD attach or BTS copy

AKA 1 file transfer method

One more direct single file 20 GB copy to see…

3:14.7 time to copy (3 minutes, 14.7 seconds)

OK so clearly it is all within the same range and damn close. Unless you have a 100% completely isolated (and understood) network then you have a little bit of noise. The above calculations are about as dead on as you can get. I can assure you that there is nothing other than a little bit of noise on these links.

These tests are saying a single file backup transfer is equally efficient as X number of backup files are. So why are we splitting up backups into 30 files?

The reason this is important is because it appears we have native T-SQL commands that help throttle network usage during restores… however when the restore is happening across multiple files the native commands are a lot less efficient, resulting in almost no throttling.

As an important side note, I want to get 10 Gbps NICs on our hosts. I want Oleg to know that I do not think having a shared 1 Gbps NIC between SQL-01 and AP-01 is acceptable. If you are going to distribute anything in the farm, distribute that. Christ. Can’t you put AP-18 on that? JK but that pairing seems not right. Officemate chat tomorrow!

But seriously we also need to swap out our 1 Gbps NIC for a 10 Gbps NIC and make sure our switches and uplinks support that throughput. Sharing a 1 Gbps NIC between 2 VMs (the current situation…. Is just not realistic)

We can easily saturate a 1 Gbps NIC by doing a restore over the network. Yes, file copies are now insanely fast.

Bottom line:

Having 2 VMs share a host is not a new concept. But it all depends on function and operation. In the world of a web server (front end / back end / mid tier) bottom line is that it is a real time service. A user is waiting for the response.

The current design limits VMs to a maximum network speed of 500 Mbps each (x 2). 1 Gbps can easily be saturated by backing up remotely to a server, like the backup server or consider restoring backups to production servers. If your restore operation consumes all of your bandwidth… then your users are impacted. How can you do these operations in the background but not impact users…. Always the question.

We need our shared VM NICs to be 10 Gbps NICs on the host machines and we need to hmake sure the switches and gateways are able to support that speed up the chain. Network speed is turning out to be the bottleneck.

Welcome to the 21st century.

ps last test that needs to happen here

let’s test a different size / content db for a sanity check.

god the corporate buzzshit has got me sucked in

Leave a Reply

Your email address will not be published.