I have used dd to move lvm partitions around quite often. In the past, the procedure has been
- stop all processes using lvm
- dd lv to image file, possibly compressing in the process
- copy image file to new server using ssh or rsync
- on new server, use dd to populate lv
I have often wished there was a way to simply copy directly from one lv to another on separate machines, without the intermediate steps and extra file space requirements. Thanks to the following web sites, I have a new procedure. I recommend reading the following articles as they go into more detail than I do and show additional ways of saving times using the power of ssh
In my new procedure, the lv image is copied directly to the remote server's lv, requiring less time, and less disk space. It is most simply performed if root on the source computer has ssh access on the target computer, which I do only for the period of the actual transfer (root ssh login is not a good idea normally). Its simplest form is the one command issued from the source computer:
dd if=/path/to/source | ssh root@targetServer 'dd of=/path/to/target'
Example: If you are logged into the source as root, and root has remote access to the target on 192.168.1.55, and you have created an lv of the same size in Volume Group vg0 on the target, and both are named 'temp', issue the following command:
dd if=/dev/vg0/temp | ssh 192.168.1.55 'dd of=/dev/vg0/temp'
I have read, in the above articles and other places, that compressing the image during transfer will increase the speed of the transfer, and I have taken this as gospel without checking it. In preperation for this article, I performed a most un-scientific test of this assumption and transferred the same 10G disk image with and without compression, to wit:
dd if=/dev/vg0/temp | ssh 192.168.1.55 'dd of=/dev/vg0/temp' # no compression dd if=/dev/vg0/temp | gzip -c -1 | ssh 192.168.1.55 'gunzip -c | dd of=/dev/vg0/temp' # with compression
In the second line, I'm simply passing the output of dd to gzip, then letting ssh send it over the network, where on the target machine it is uncompressed (gunzip -c), then piped to dd. Imagine my surprise when it actually took longer!
I then decided to "help" gzip by writing zeros to every unused space on the volume before executing the command:
mount /dev/vg0 /mnt && dd if=/dev/zero of=/mnt/deleteme && rm /mnt/deleteme && umount /mnt dd if=/dev/vg0/temp | gzip -c -1 | ssh 192.168.1.55 'gunzip -c | dd of=/dev/vg0/temp'
Discounting the time it took to write the zeros (188s, 3m8s, plus about 3s to delete the file), it still almost as long using compression, and if you add that time in, it actually takes the longest of all, 1633s). I can only postulate the process overhead of doing the compression shadows the savings in transfer times. Note: the source volume only contained about 700M of jpg images, so 9.3G was nothing but zeros.
zeroed, with compression (93% zeros, ignore time to zero unused space)
10737418240 bytes (11 GB) copied, 1445.55 s, 7.4 MB/s
10737418240 bytes (11 GB) copied, 1485.98 s, 7.2 MB/s
10737418240 bytes (11 GB) copied, 1517.37 s, 7.1 MB/s
As I said, these tests were run most unscientifically. On a gigabit network, with two switches between the machines, and each machine under some load:
My workstation, quad core AMD Phenom(tm) II X4 965 Processor, 4G RAM (2 used by virtualbox DOMU's under a little load)
quad core Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1G RAM, running Xen DOM0, but with no DOMU's running at the time
On a slower network, compression may help you, but if you are doing your work on a network of 1G or so, and your source and/or targets are under load, it may be faster to forgo compression completely. Under the following circumstance, I would definitely not do compression:
- Source and Target are running virtual servers (ie, other processes require a lot of cpu)
- Target is running Software RAID (source does not matter that much, but writing to a RAID does take processor)
- Network is 1G or greater.
Thanks to Clay (see comments below), he suggests killing the encryption on ssh to gain some additional speed and tweaking your block size (bs= parameter on the dd command). From what I read (I did not know this), the arcfour encryption appears to be a much less secure form of encryption, but significatnly faster. Use it over secured networks.
Clay also pointed out that the block size will make a difference. I remember someplace that block size should be some multiple of your sector size on the physical hard drive (512 on mine). So, the optimized command would be
dd if=/dev/vg0/temp bs=4M | ssh -c arcfour 192.168.1.55 'dd of=/dev/vg0/temp bs=4M'
I tested this (and forgot to record the results) and there was an obvious speedup using the -c arcfour. Just don't do this over an insecure system if your data is sensitive. And, everytime I've used dd without setting bs=4M, the process takes longer. Thanks Clay.
Watching what you're doing!
I love dd, and have every since I "discovered" it in the 90's. It is fast, efficient, and can copy almost anything to anything else. But, it gives no indication of how long it will take (after all, screen I/O decreases that efficient thing, right?).
So, type in dd if=bigvolume of=output and just wait. How long? Who knows. You can log into another terminal, find the pid of the dd, then type kill -USR1 pid, and it will show you the progress (or, background the original command with &, then repeatedly type it), but then you have to figure out how large the move is, and how much is left to go, then calculate the remaining time. BORING!
About 10 years ago, I found a cute little solution for this, and fell in love. Then, I lost it, but recently, I have found my lost love. Her name is "pv" and she is available on almost all Linux systems (debian: apt-get install pv). pv is designed to be part of a pipe group, and it displays a progress bar (or other options) going through the pipe. Note that I said a pipe, any pipe, something that moves data around. So
command | pv | command ....
displays something on STDERR while a command is running.
Let's give a good example. We are doing a dd from an lv to a file. Normally, you would type:
dd if=/dev/vg0/temp of=/tmp/temp.img
Press enter, then wait. Is it running? Well, the machine is certainly slower. When will it be done? Who knows. Rewrite the command as:
dd if=/dev/vg0/temp | pv | dd of=/tmp/temp.img
and you will at least get an indicator the thing is running.
However, pv has several flags you can pass to it, one of the most important being the --size (-s), telling it how much information it can expect to receive from STDIN (think about it, it has no way of knowing before hand). By setting this to even the most approximate of values, you get an indication of whether you have time for a cup of coffee, or even a full movie before your process is running. My favorite flags are:
- -p show a progress bar
- -e give an ETA (completion time)
- -t show total elapsed time
- -r show rate during transfer
So, I just remember it as peters, without one of the e's, pv -petrs SIZE.
Note, the position of pv in a string of pipes matters.
dd if=/dev/vg0/temp | pv -petrs 10G | gzip -c -1 | ssh 192.168.1.52 'gunzip -c | dd of=/dev/vg0/temp'
is NOT the same as
dd if=/dev/vg0/temp | gzip -c -1 | pv -petrs 10G | ssh 192.168.1.52 'gunzip -c | dd of=/dev/vg0/temp'
The latter will show the transfer after compression, aand obviously will not show the correct anything; transfer rate, percent complete, etc...