Thursday, May 20, 2010

Fast NDMP backups revealed :)

I use Netbackup 6.5 to backup Netapp filers via NDMP. The tape drives are SAN tapes which are also visible + shared to Netapps as well as Netbackup master server. Yes, I do use SSO license so that tape drives can be shared.

As my backup environment grown (around 150TB+, per week backup size), I started facing backup performance problems... like some of the backup jobs were running very fast (>70MBPS throughput on LTO4 tape drives) while there were some (about 30% of total policies) of jobs running at 1 MBPS to 5 MBPS. Since these jobs were running too slow (for a LTO4 drive), my backup window was not sufficient to backup entire backup size in designated backup window (even if I have 15 LTO4 drives). While troubleshooting for performance, I found some important things:

1) Backup performance depends upon file size in general (one factor out of many dependencies,,, like type of tape drives, backup medium.. SAN/LAN, etc etc.)
2) Smaller size of files ... slower the backup throughput
3) Large size of files... better/ faster the backup throughput

Now the thing is that I have to back it up in designated window at the same time exclude the business hours to impact any performance hit on filer due to NDMP job. Theoritically, NDMP is given least priority when user data is being processed by filer (be it http/NFS/CIFS for that matter).

Also, note that while doing a default NDMP backup, the filer sends all the file/directory information to the Netbackup catalog.----------> This was the turning point in this whole troubleshooting!!! Why?? Here is the explanation:

For smaller size of file, the filer would send the file/dir information to catalog and then (or same time) it will backup the file on tape. Now, if the time taken by catalog to register the file info is more or equivalent to that of time to backup the file on tape, the whole purpose of having a faster tape drive is nullified. And at the same time, the NDMP backup will be slow because catalog has to be updated with all the file information it is backing up.

Solution: The solution was to use "set HIST=n" option in each of the NDMP backup pollicies so that NDMP job do not need to update catalog what it is backing up. Result: Faster backups!!! While I got backups faster with this method, this has its own advantages and disadvantages. Here is the list:

Advantage:
1) Backup performance is significantly improved
2) Catalog size is now controller (because it has to capture less info)
3) Catalog backup size is significantly reduced
4) All backup jobs are giving at least 30-40 MBPS while faster jobs are still same

Disadvantage:
1) No file history is captured!!!
2) Meaning that if I have to restore a file, I will have to restore entire filesystem to do so
3) The size of restore volume must be greater or at least equal to largest volume which is getting backed up
4) Also, the number of inodes also must be equal or more in order to restore largest size volume in environment.

Enjoy! :)

5 comments:

Anonymous said...

Don't forget to increase NUMBER_DATA_BUFFERS to something like 64 ( will be default on multiplexing) 128 or 256 depending on size of shared you can spare/support. I would expect LTO4 to hit 80-90Mb/sec.

Anonymous said...

Very useful info. Thanks!

Sunnyvale_home_for_rent said...

Excellent finding. Very useful. - Deb

Sunnyvale_home_for_rent said...

Very useful. Thanks! -Deb

Anonymous said...

Very useful. Thanks! -Deb