Intermediate SQL Color Coded SQL, UNIX and Database Essays

16May/1236

Scrap the SCP. How to copy data fast using pigz and nc

Have you ever heard that the speed of the system is determined by its slowest component ? I am made painfully aware of that every time I do data migrations.

I.e. it doesn’t matter if you have 64 core systems with 100+ Gb of memory on either end if the majority of time is spent waiting for data to trickle across a slow 1 Gb network link.

Watching data trickle for hours, while the rest of the system is doing nothing is a pretty frustrating experience. But limitations breed creativity … so lately, I’ve been experimenting with several different copy techniques to see if there is any way transfer speed can be improved, perhaps using some of the idle capacity to speed things up.

Here is the short summary of my experiments (transferring 16Gb ORACLE data file across the WAN), which I summarized as a “speed and effect comparison” table. You can judge for yourself (with all the usual caveats that your results will depend on your system configuration and will probably vary yada yada yada) …



Method

Transfer Time

Network Capacity Used

CPU Used

Effective Rate

scp

4 min 50 seconds

~ 55 Mb per second

~ 5%

~ 55 Mb per second

bbcp

2 min 27 seconds

~ 108 Mb per second

~ 2%

~ 108 Mb per second

ncp / gzip

2 min 1 second

~ 10 Mb per second

~ 15%

~ 132 Mb per second

ncp / pigz

30 seconds

~ 20 Mb per second

~ 50%

~ 533 Mb per second

ncp / pigz (parallel degree limited)

1 min 15 second

~ 15 Mb per second

~ 20%

~ 214 Mb per second

And here is the longer explanation if you are really interested 🙂

Copying data using SCP

Traditionally, many people are using scp to copy files between systems, something like:

> scp /u02/databases/mydb/data_file-1.dbf remote_host:/u02/databases/mydb

The problem is – scp is NOT very fast. I.e. regular speeds, achieved by scp are in the range of ~ 20-50 Mbs per second. To put it in perspective, it takes from ~ 4 to 13 minutes to copy a 16 Gb file between systems. Multiple it by, say, 8 files and now you are wasting 0.5 to 1.5 hours for the simple copy.

Which begs the question – how can we do better ?

Why SCP is slow by default

The first observation with scp is that even at the top of the range the transfer speeds are NOT approaching the true NIC capacity (which, for 1 Gb NIC is slightly more than 100 Mbs per second).

So, we should do much better if we are able to “fill the pipe” completely.

Filling the pipe: Remote copy with BBCP

“Filling the pipe” is precisely what bbcp command does – it opens multiple network streams and transfers a file in parallel, using most of the network capacity in the process. In my tests, bbcp consistently outperformed scp, reaching speeds of ~ 100-115 Mbs per second and cutting transfer time by the factor of 2.

There are, however, two problems with bbcp.

First of all, its default syntax is pretty scary. I.e. in my example, it looked like this:

> bbcp -P 10 -f -T 'ssh -x -a %I -l %U %H bbcp' \
/u02/databases/mydb/data_file-1.dbf remote_host:/u02/databases/mydb/data_file-1.dbf

But more importantly, using that much network for copy is dangerous as it does not leave much bandwidth for anything else on the host (i.e. regular ORACLE connections by apps).

Plus, it may affect other unrelated hosts if you happen to have multiple machines using the same network path and a slightly oversubscribed network.

In other words, bbcp should be used only if you do not care whether database on the box is accessible and also do not share the host/rack/routers with anybody else.

To be fair, you can use bbcp options to limit how much bandwidth it is using. But if you do that, the copy speed essentially reverts back to scp as it directly correlates to how much data you are pushing over the wire.

Bottom line, bbcp = not good, if your system is actually used !

Is there another alternative ?

The magic bullet: Compression

Yes. Apparently, ORACLE data files are pretty compressible. We can gzip them on the source, transfer 5-10x times less data over the wire (yes, that seems to be the average compression rates, sometimes rates are even better) and unpack them on destination.

The problem, however is that instead of running a simple scp command, we need to run 3 commands on 2 separate systems:

  • Source: gzip
  • Source/Target: transfer, i.e. scp
  • Target: unzip

which is a bit too complex if we just want to copy a bunch of files.

Network streaming

Fortunately, this technique can be simplified and generalized by using network streaming tools.

Here is an example of copying the same file using gzip and netcat. We still need to run 2 commands, but they are pretty simple:

# SOURCE:
> tar -cf - /u02/databases/mydb/data_file-1.dbf | gzip -1 | nc -l 8888

# TARGET:
> nc <source host> 8888 | gzip -d | tar xf - -C /

nc here is a network streamer that sends data over to the wire on the sending end (port: 8888) and reads the data from the wire on the receiving.

I ran many such copies and every single time md5sum confirmed that data files were transferred correctly. Moreover, when something breaks (such as when a certain DBA would run Ctrl+C on either end), this event is very visible – you will recognize that an error has occurred and you need to re-transfer.

In most of my tests this combination of commands was even faster than bbcp (giving me an additional 15-25 % improvement), but, more importantly, it utilized only ~ 1/4 of the available network bandwidth, usually putting ~ 20-30 Mbps over the wire (even scp puts 60).

And, finally, parallelism

But we are still not done. Transfer speeds can be improved further if we are willing to use a bit of CPU on the source host.

As you might know, gzip is a sequential single threaded application, but we also have a parallel zip, named rather expressively as “pigz”:

# SOURCE:
> tar -cf - /u02/databases/mydb/data_file-1.dbf | pigz | nc -l 8888

# TARGET:
> nc <source host> 8888 | pigz -d | tar xf - -C /

Pigz is a essentially a gzip, but can use multiple parallel streams to compress/decompress the data. If we replace gzip with pigz, we can achieve fantastical speeds and cut our transfer time again by the factor of ~ 2-10, comparing to scp.

A few notes and observations

Bbcp compression

If you can fill network pipe completely (i.e. you are the only user in a system), the question naturally becomes: “Can we combine compression and multistream transfer for even faster speeds ?”.

As it happens, bbcp command has “compress me” option for input streams, so it seems a natural candidate here … However as hard as I tried I couldn’t make it work properly. In all of my tests, when bbcp compression was turned on, there was definite improvement in network utilization, but the transfer itself was dead slow … much slower than that of the original scp. If anybody knows how to use bbcp compression efficiently, I’ll appreciate the learning experience.

Still, the rather straightforward workaround is to still use tar/pigz/nc and just run several copies in parallel.

Monitoring transfer progress with nc

Pigz/nc transfer might be significantly faster, but, it might not be the easiest to monitor. While scp has a nice progress bar, pigz/nc just gives you a blank screen for the entire duration of the transfer. Fortunately, it is very easy to correct if you drop in a pipe viewer tool within pigz/nc pipe.

> tar -cf - /u02/databases/mydb/data_file-1.dbf | \
  pv -s `du -sb /u02/databases/mydb/data_file-1.dbf | awk '{s += $1} END {print s}'` | \
  pigz  | nc -l 8888

which should give you a nice progress bar, quite similar to scp:

1.12GB 0:00:15 [86.5MB/s] [===>                      ]  7% ETA 0:03:18

Using one command for copy, instead of two

While source/transfer commands are not too complex to master, there are still two commands that you need to run. To make things easier, it makes sense to script them together to remove another advantage of scp:

> ncp! /u02/databases/mydb/data_file-1.dbf remote_host

tar/pigz/nc transfer is not secure

Finally, there is one advantage that scp still holds: its transfer is secure while pigz/nc transfers data in clear text. So, if you are using unsecured networks, this option is probably not for you.

Cheers,
Maxym Kharchenko

Comments (36) Trackbacks (3)
  1. Nice post, thanks!
    As you’re interested in non-secure file transfer, it would be nice to see how the good ol’ FTP fits in this comparison 😉

    Cheers


    Kamal

  2. Thanks, Kamal.
    Yes, it would be interesting to compare speeds with FTP. The trouble is – these days FTP is almost never to be found in production networks.
    Theoretically, I have no doubt that FTP will be faster than SCP. Will it be faster than NCP ? Probably not, as NCP had two big things going for it: compression and parallelism, which standard FTP does not have.

  3. very good comparison

    Can you try rsync with compress.

    regards
    loh

  4. Thank you very much for this post. I was looking for a solution to the slow scp transferring speeds, and this works for me. The problem with scp is that it does not parallelize file transferring. If you use graphical scp/sftp clients like Filezilla or Transmit (OS X), they are very fast compared to the scp command line client but because they transfer several files simultaneously. However, the speed problem will still be there if, for example, you have a folder with lots of small files and one very big file (20GB) the speed will stall when transferring that file. You’ll notice a small improvement, as the small files will be transferred quickly, but as soon as the transfer is reduced to the single big file, you’ll encounter the usual problem.

    I didn’t know about pigz and pv, so I learned a couple of things from your script that will be very useful in the future. I had used nc and tar for transfers in the past, but I didn’t know how to show the progress (speed, ETA, % done,…) or that you could use a multi-threaded compressor like pigz.

  5. Thanks, Sergio.

    Glad that you could make it useful.

    Cheers

  6. Hi,

    nice one.
    Speed is awesome when using both pigz and lzop. I first thought the script got stuck after I initialized a second file transfer to the same filesystem, but checking the target server using nmon, I found that dirty pages were still in progress of being flushed to disk and this caused the “freeze” of the second copy, as the device was 100% busy for some time. I will do some testing with tuning the target ext3 filesystem (writeback / commit) and use xfs (noatime,nodiratime,nobarrier,logbufs=8) for comparison. As well it might be worth doing some tests with dd.

    Regards

    Efstathios

  7. Hi Efstathios,

    Very interesting, do keep us posted. It would be nice to know how much delay is caused by the file system (I had regular ext3 noatime in my tests) and whether different zippers have better timing (do zippers other than gzip have parallel counterparts or options?)

    Regards

  8. Hi Maxym,

    as usal, it depends on your individual situation. So it plays a big role how your i/o stack looks like and how your array is configured. At good start point is to figure out, if you are CPU or i/o bound. I normally do a simple test by setting up a tmpfs or ram disk at the target server and copying data from source to destination as a start to check how it would like without disks involved. For measurement you can user various tools. I personally can recommend nmon for linux and collectl.
    Having performed this tests, on you can start to tune your system accordingly.

    Our setup consists of HP Blade Server and EMC VMAX Storage.
    The array is tuned for best response time (good for oltp) rather than throughput.
    We use buffered ext3 on lvm with dm-multipath, so our request physically get written at block size 4k to disk at the end, when the dirty pages are flushed, which means alot of small requests to be commited into the SAN’s write cache. So now the art to tweak your system to use the file system cache efficiently, meaning we want some writes to be buffered, but not as much that the system get’s a large backlog and freezes up.

    Keep in mind the following:

    – Scope (Transfer as fast as possible from A to B only or also at the right destination, 1 time or iterative)
    – General Transfer Strategy based on scope (Set Size, Compression Tool, Buffering, Risk Management of Data Loss on Target due to tuning)
    – Schedule (place the transfer at a idle time to use the target server’s resource optimally)

    In our setup we have found ourselves to be limited in sequential write performance and medium network bandwith, but having sufficient cpu and memory available. We had the requirement to transfer files always from scratch, so using netcat for initial transfer and then use rsync for deltas was not required. It is however a good starting point for a simple database backup to a remote server using alter database begin backup / rsync / end backup mode (be sure you backup properly as you can read from the various metalink notes on hot backup of databases).

    In such scenarios it is important to find a proper transfer set size, as using a “read a bit” / write a bit”-strategy seems appropriate:

    Example

    – tar files (split at 1GB Size)
    – transfer 1GB split chunk from source to destination (target system cache should be able to hold the write), while you create the next chunk at the source
    – write these chunks to multiple file systems backed by different disks/luns, so you have more queues

    In most cases I personally tested, I had good results with database files on lzop. Lzop suited our needs better than PIGZ. It has faster compress/decompress, that’s why ORACLE uses it in it’s 11g Avanced Compression Option for RMAN, for which you get a free special use license, if you use Oracle Secure Backup by the way.

    As general starting point:

    Network bandwidth: – start pigz, also try pbzip2
    CPU bound: – start with lzop
    I/O bound: – split into small chunks, that fit easily into fs-cache and can be handled by the backed
    – Write chunks to multiple different file systems
    – If possible increase queue size at the block device /sys/block/sda/queue/nr_requests
    – Increase ext3 commit at target system (eg. commit=180)
    – Test with ext2 or xfs as well
    – On SATA, check write cache is enabled

    In most cases what you can write on hold for writing to disk will be the limiting factor for most of us.

    I hope this feedback was useful to anybody. I you have any corrections / found mistakes, please give feedback, as we are all willing to learn and nobody is perfect 😉

    Regards

    Efstathios

  9. Great research and very good of you to share your findings. Another option if you are forced to use scp is to use a faster encryption cypher. For scp it’s the encryption that really slows it down.

    For scp if you use the -c flag you can select another cypher from the default (not sure what the default cypher is but it’s not the fastest). If you select the medium weight “arcfour” cypher you can greatly increase your transfer speeds on large files (larger than 1000k).

    example :
    scp -rp -c arcfour source1:$PWD .

  10. Thanks Alan,

    I’ll definitely consider (and test) “arcfour” option as some of my environments are “scp only”.

    Maxym kharchenko

  11. Excellent write-up. I definitely appreciate this website.
    Keep it up!

  12. Another option:

    try using multiple rsyncs with lowest compression and lowest encryption level:

    rsync –archive –delete –compress-level=1 –files-from=FILE_WITH_LISTOF_SOURCE_FILES –rsh=”ssh -c arcfour” YourSOURCE YourTarget

    –files-from option is necessary if multiple sessions used, so you would not have conflicting transmissions. This requires some scripting (especially convenient with perl).

  13. Thanks Rosty, I will definitely try that.

  14. Hello Maxym,

    It is a great post!! Just a Question, what software did you use to monitor the transfer rate and what software to do the graphs. I would like to deliver a report but want it so see somehow like yours 🙂

    thank you in advance!!!

    Happy new year 2014 😀

  15. Hello Antonio,

    Thanks, I’m glad you find my post useful. As for the software – this is an internal thing at work, unfortunately not exposed to the outside world.

    If you want the same look and feel in the reports however, tableau will likely be very similar (as long as you pre-collect the data).

    Have a happy new year!
    Maxym Kharchenko

  16. the source command needs to be ” … | nc -l -p8888″ … the “-p” is missing above.
    At least this is what I needed to make it work here.

  17. Hello Stefan, thanks for the note.

    In my case it worked without the “-p”. Might be slightly different version of “nc”.

    Regards,
    Maxym Kharchenko

  18. Hi,

    I just performed another series of tests and I would like to add the following remarks /recommendations:

    1. For initial copies/transfer do not use rsync, use parallel scp instead:
    nohup scp -C -c arcfour256 $file $target_server:$PWD &

    It’s way faster than rsync.

    2. To sync the copy use rsync with the following options to save space for large files:

    nohup rsync -avz –inplace –partial –progress $file $target_server:$PWD &

    If you want to do this on running source database, put it in backup to prevent file header changes a non-system tables (optional).

    I recommend measuring source and target using sar/nmon/collectl.

    Using this method I was able to copy a 1.5TB database over a 1gbit link about 1hour 20 mins.

    Regards

    Efstathios

  19. One addional thing:

    I since all options go through fs cache I recommend adding a “sync” command at the end of the copy function to flush dirty pages to disk:

    f_kill_rsync ()
    {
    ps -ef | grep rsync | grep -v grep | awk ‘{print $2}’ | xargs kill -9
    }

    f_kill_scp ()
    {
    ps -ef | grep scp | grep -v grep | awk ‘{print $2}’ | xargs kill -9
    }

    f_rsync ()
    {
    RSYNC_PARALLEL_DEGREE=$1
    RSYNC_SLAVES_COUNT=$(ps -ef | grep rsync | egrep -v ‘grep|ssh’ | wc -l)

    COUNTER=0

    for dbf in $(ls)
    do
    if [ ${COUNTER} -ge ${RSYNC_PARALLEL_DEGREE} ]; then
    wait
    COUNTER=0
    else
    nohup rsync -avz –inplace –partial –progress $dbf $target_server:$PWD &
    sleep 2
    let COUNTER=COUNTER+1
    fi
    done
    sync
    }

    f_scp ()
    {
    SCP_PARALLEL_DEGREE=$1

    COUNTER=0

    for dbf in $(ls)
    do
    if [ ${COUNTER} -ge ${SCP_PARALLEL_DEGREE} ]; then
    wait
    COUNTER=0
    else
    nohup scp -C -c arcfour256 $dbf $target_server:$PWD &
    sleep 2
    let COUNTER=COUNTER+1
    fi
    done
    sync
    }

  20. awesome post! 🙂

  21. Confirmed, getting much higher speeds using the nc/pigz combo, thanks!

  22. I am glad that it worked for you 🙂

  23. The section “Why SCP is slow by default” doesn’t answer the question in its own title. SCP is slow because it relies on SSH whose internal encryption code is single-threaded by design. That means that your bottleneck is neither your disk nor network IO, but the speed of a single core.

    It’s also worth it to note that “by design” != “by necessity”. While it is true that CBC cipher modes are only capable of encrypting in a single thread, CTR modes can be easily parallelized and have in fact been the default cipher mode in OpenSSH since 5.2 was released in 2009.

    However, the OpenSSH/SSL team has never actually implemented multi-threading for CTR. Even when the Pittsburgh Supercomputing Center offered a patch for OpenSSH 6.2 [2013] that implemented CTR multi-threading it was never implemented due to nebulous concerns about the security of multi-threading.

  24. What about scp -C, and nc+xz?

  25. I’ve had good luck with pigz and mbuffer.

  26. scp supports compression with -C option. Have you tried that? It should be on par with the performance with gzip+nc but is much simpler to use.

  27. There is a paper about using data compression to speed up data transfer here: http://www.slac.stanford.edu/~abh/proj.html
    In that paper it gives a formula which can be used to decide when it is better to use compression, based on CPU speed, network bandwidth, compression ratio, etc.
    Now we just need a script to do the calculation, and choose the appropriate transfer method automatically…

  28. Sammitch,

    Security concerns about multithreading aren’t “nebulous”. They’re very real!

    You wouldn’t think they’re “nebulous” when it’s your network being compromised, and it’s your data being modified, deleted, or even stolen by foreign computer crackers!

    Safety first. Speed second. There’s no use having blindingly fast data transfer if all you’re doing is speeding up the ability of a cracker to compromise your system and more rapidly violate the sanctity of your data!

    And before I leave, I need to request that you please go into the kitchen and make me a sammitch.

    Your friend in computing,
    Jordan

  29. To get scp to compress before transmitting, just use scp -C. That gives a speedup over slow networks, might not help with a LAN, but I’d like to have seen it in the benchmarks.

  30. Thanks for the writeup. I always use nc instead of scp for file > 1G. For known-to-be-not-incompressible data I use lzop for local networks, because the CPU footprint of it is really low, for slow networks (<10MBit/s) I use xz with varying compression ratios, even xz -0 is a lot better than for example bzip2.

  31. Hi,
    why aren’t you piping the whole thing thru the ssh session?
    So the files becomes compressed via pigz and the transfer is secure via ssh:

    tar -cf – -C $FILE_DIR $FILE_NAME | pv -s `du -sb $FILE_FULL | awk ‘{s += $1} END {printf “%d”, s}’` | $ZIP_TOOL | ssh $REMOTE_HOST ” $ZIP_TOOL -d | tar xf – -C $FILE_DIR”

    regards
    Martin

  32. Hi,

    Your script is very promising. However I usually transfer a whole set of files from a folder using wildcard *. Tried adapting your script for that case, but failed. Could you hint me on how to implement that?

    Regards,

  33. Hi,

    was stumbling again on the problem how to transfer data.

    Now a bit wiser (not much :-p), I used ppss (find it on Sourceforge) to transfer stuff reliably via scp in parallel:

    ppss -d ${SOURCE_DIR} -p 12 -c ‘scp -o CompressionLevel=5 -C -c arcfour256 $ITEM ${DEST_HOST}:${DEST_DIR}’

    That spawns 12 parallel SCP threads, giving you avg transfer speed of 50-60 MB/s over 1Gbit while processing data at a rate of around 200-240 MB/s=actual troughput.

    Encryption burns alot of cpu tough 🙁

    The source was an RMAN backup of a 817 GB database done with section size 1g parameter and 24 channels on the source machine to fully saturate all process slaves (backup speed 1.5 GB/s).

    Adding the section size 1g improoved the backup time from 11 min 20 secs to 7 min 15 secs, as the data files where not evenly sized.

    So now there were 805 backup pieces to be transferred.

    scp with ppss worked well for that, as I did not have to do much fancy coding for the transfer.

    Next I will try to parallelize the tar with lzop by using / intitating multiple ssh control sessions:

    Job-Server
    Sender
    Session1 8888 Put
    Session2 8889 Put
    Receiver
    Session1 8888 Get
    Session2 8889 Get

    I am doing this, as I noticed there is some spare ressources on the server, when switching to lzop compression, so I can further tune the read troughput on the source and hopefully manage to pump as much data trough the 1gbit link as possible. Note lzop 1 on a tar will saturate the line, but cpu and disk are not yet saturated. The key for rapid transfer is to find the sweep spot on maxing out the read performance on the source, compress it at level you have some cpu spare and saturate the link ;)

    Will keep you posted on the tar settings.

    Regards

    Efstathios

  34. Is it possible to copy large data files using pigz is the volumes are mounted on the same server? I see all examples for the remote server. If so, what would be the syntax for tar/pigz. Thank you in advance.

  35. I improved apon this script by adding gpg encryption via a pipe and a remote directory to place the file.

    it now takes 4 options and you have to create a gpg key and have it available on both the client and the server

    Here is how to create the gpg keys. i didn’t use any passwords…
    https://fedoraproject.org/wiki/Creating_GPG_Keys

    i just used SCP to move the keys.
    http://www.phildev.net/pgp/gpg_moving_keys.html

    you have to edit the variable ENCRYPT_KEY=’your-keyname’ in the script the kename must be inside ‘ ‘

    it takes three options the name of the file you want to transfer the name of the server you want to trasfer to and the directory you want the file placed on that server.

    Enjoy.

    #! /bin/bash

    FILE_FULL=$1
    REMOTE_HOST=$2
    REMOTE_FILE_PATH=$3

    FILE_DIR=$(dirname $FILE_FULL)
    FILE_NAME=$(basename $FILE_FULL)
    LOCALHOST=$(hostname)

    ZIP_TOOL=pigz

    ENCRYPT_KEY=’your-keyname’
    CRYPT_TOOL=gpg
    ENCRYPT_OPTS=” –encrypt -r $ENCRYPT_KEY”
    DECRYPT_OPTS=” –decrypt -r $ENCRYPT_KEY”
    ENCRYPT_TOOL=”$CRYPT_TOOL $ENCRYPT_OPTS”
    DECRYPT_TOOL=”$CRYPT_TOOL $DECRYPT_OPTS”
    NC_PORT=8888

    tar -cf – -C $FILE_DIR $FILE_NAME | pv -s `du -sb $FILE_FULL | awk ‘{s += $1} END {printf “%d”, s}’` | $ZIP_TOOL | $ENCRYPT_TOOL | nc -l -p $NC_PORT -q 0 &
    ssh $REMOTE_HOST “nc $LOCALHOST $NC_PORT | $DECRYPT_TOOL | $ZIP_TOOL -d | tar xf – -C $REMOTE_FILE_PATH/$FILE_DIR”

  36. Nice Hack with a neat analysis which makes the article really good


Leave a comment