Re: [Hampshire] Replicating directory tree and filenames

Author: Hugo Mills
Date:
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] Replicating directory tree and filenames

gpg: failed to create temporary file '/var/lib/lurker/.#lk0x57773100.hantslug.org.uk.30228': Permission denied
gpg: keyblock resource '/var/lib/lurker/pubring.gpg': Permission denied
gpg: Signature made Mon Aug 10 15:36:35 2009 BST
gpg: using DSA key 20ACB3BE515C238D
gpg: Can't check signature: No public key

On Mon, Aug 10, 2009 at 03:21:53PM +0100, Hugo Mills wrote:
> On Mon, Aug 10, 2009 at 03:00:14PM +0100, Keith Edmunds wrote:
> > I want to replicate a huge (multiple TBs) directory tree such that the
> > replica has the same files, same GIDs/UIDs as the original, same paths, but
> > with all the files 0 bytes. In other words, copy the directory and file
> > structure but not the data. It feels as if this should be easy to do, but
> > I haven't thought of an easy way yet...
>
> I'm assuming that your terabytes of stuff consists of a large (>1e6 > or so) number of smallish files, rather than a few large files.

>
> I can think of a couple of ways of doing this via some bash > scripts, but doing it purely in bash is going to involve invoking at > least one external application per file, and you'll have to swallow a > relatively large overhead for process initialisation each time. So, > for performance reasons, I'd suggest doing it all in something a bit > more capable.

For the record, one (icky) way of doing it with just the shell
would be something like:

find -type d | for d in $(cat); do
    stat -c "mkdir -m %f dest/$d; chown %u:%g dest/$d" $d
done | bash
find -type f | for f in $(cat); do
    stat -c "touch dest/$f; chmod %f dest/$f; chown %u:%g dest/$f" $f
done | bash

There's other ways, for example:

find -type d | for d in $(cat); do
    MODE=$(stat -c %f $d)
    OWNER=$(stat -c %u:%g $d)
    mkdir -m $MODE dest/$d
    chown $OWNER dest/$d
done
...

but this has two extra invocations of bash and an extra invocation of
stat per file, which is going to slow you down. If you have the
inclination, I'd be interested to know what the overheads involved
are, and how it compares to the python code I sketched out earlier.

Note that none of the above will cleanly handle filenames with
whitespace in, and you should probably be using -print0 on the find
commands, double quotes around all the filename substitution, and
IFS=$'\0'. Oh, and I've ignored fifos, sockets, devices and links
again.

Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---      
                     headline (possibly apocryphal)

This message is part of the following thread:
	the complete thread tree sorted by date
	Hugo Mills at
	Keith Edmunds at