Package CedarBackup2 :: Module util
[show private | hide private]
[frames | no frames]

Module CedarBackup2.util

Provides general-purpose utilities.

Author: Kenneth J. Pronovici <pronovic@ieee.org>

Classes
AbsolutePathList Class representing a list of absolute paths.
ObjectTypeList Class representing a list containing only objects with a certain type.
RestrictedContentList Class representing a list containing only object with certain values.
UnorderedList Class representing an "unordered list".

Function Summary
  convertSize(size, fromUnit, toUnit)
Converts a size in one unit to a size in another unit.
  getUidGid(user, group)
Get the uid/gid associated with a user/group pair
  changeOwnership(path, user, group)
Changes ownership of path to match the user and group.
  splitCommandLine(commandLine)
Splits a command line string into a list of arguments.
  executeCommand(command, args, returnOutput, ignoreStderr, doNotLog, outputFile)
Executes a shell command, hopefully in a safe way (UNIX-specific).
  calculateFileAge(file)
Calculates the age (in days) of a file.
  encodePath(path)
Safely encodes a filesystem path.
  deviceMounted(devicePath)
Indicates whether a specific filesystem device is currently mounted.
  displayBytes(bytes, digits)
Format a byte quantity so it can be sensibly displayed.
  getFunctionReference(module, function)
Gets a reference to a named function.

Variable Summary
float ISO_SECTOR_SIZE: Size of an ISO image sector, in bytes.
float BYTES_PER_SECTOR: Number of bytes (B) per ISO sector.
float BYTES_PER_KBYTE: Number of bytes (B) per kilobyte (kB).
float BYTES_PER_MBYTE: Number of bytes (B) per megabyte (MB).
float BYTES_PER_GBYTE: Number of bytes (B) per megabyte (GB).
float KBYTES_PER_MBYTE: Number of kilobytes (kB) per megabyte (MB).
float MBYTES_PER_GBYTE: Number of megabytes (MB) per gigabyte (GB).
int SECONDS_PER_MINUTE: Number of seconds per minute.
int MINUTES_PER_HOUR: Number of minutes per hour.
int HOURS_PER_DAY: Number of hours per day.
int SECONDS_PER_DAY: Number of seconds per day.
int UNIT_BYTES: Constant representing the byte (B) unit for conversion.
int UNIT_KBYTES: Constant representing the kilobyte (kB) unit for conversion.
int UNIT_MBYTES: Constant representing the megabyte (MB) unit for conversion.
int UNIT_SECTORS: Constant representing the ISO sector unit for conversion.

Function Details

convertSize(size, fromUnit, toUnit)

Converts a size in one unit to a size in another unit.

This is just a convenience function so that the functionality can be implemented in just one place. Internally, we convert values to bytes and then to the final unit.

The available units are:
  • UNIT_BYTES - Bytes
  • UNIT_KBYTES - Kilobytes, where 1kB = 1024B
  • UNIT_MBYTES - Megabytes, where 1MB = 1024kB
  • UNIT_SECTORS - Sectors, where 1 sector = 2048B
Parameters:
size - Size to convert
           (type=Integer or float value in units of fromUnit)
fromUnit - Unit to convert from
           (type=One of the units listed above)
toUnit - Unit to convert to
           (type=One of the units listed above)
Returns:
Number converted to new unit, as a float.
Raises:
ValueError - If one of the units is invalid.

getUidGid(user, group)

Get the uid/gid associated with a user/group pair
Parameters:
user - User name
           (type=User name as a string)
group - Group name
           (type=Group name as a string)
Returns:
Tuple (uid, gid) matching passed-in user and group.
Raises:
ValueError - If the ownership user/group values are invalid

changeOwnership(path, user, group)

Changes ownership of path to match the user and group.
Parameters:
path - Path whose ownership to change.
user - User which owns file.
group - Group which owns file.

splitCommandLine(commandLine)

Splits a command line string into a list of arguments.

Unfortunately, there is no "standard" way to parse a command line string, and it's actually not an easy problem to solve portably (essentially, we have to emulate the shell argument-processing logic). This code only respects double quotes (") for grouping arguments, not single quotes ('). Make sure you take this into account when building your command line.

Incidentally, I found this particular parsing method while digging around in Google Groups, and I tweaked it for my own use.
Parameters:
commandLine - Command line string
           (type=String, i.e. "cback --verbose stage store")
Returns:
List of arguments, suitable for passing to popen2.Popen4.

executeCommand(command, args, returnOutput=False, ignoreStderr=False, doNotLog=False, outputFile=None)

Executes a shell command, hopefully in a safe way (UNIX-specific).

This function exists to replace direct calls to os.popen() in the Cedar Backup code. It's not safe to call a function such as os.popen() with untrusted arguments, since that can cause problems if the string contains non-safe variables or other constructs (imagine that the argument is $WHATEVER, but $WHATEVER contains something like "; rm -fR ~/; echo" in the current environment).

It's safer to use popen4 (or popen2 or popen3) and pass a list rather than a string for the first argument. When called this way, popen4 will use the list's first item as the command and the remainder of the list's items as arguments to that command.

Under the normal case, this function will return a tuple of (status, None) where the status is the wait-encoded return status of the call per the popen2.Popen4 documentation. If returnOutput is passed in as True, the function will return a tuple of (status, output) where output is a list of strings, one entry per line in the output from the command. Output is always logged to the ouputLogger.info() target, regardless of whether it's returned.

By default, stdout and stderr will be intermingled in the output. However, if you pass in ignoreStderr=True, then only stdout will be included in the output. This is implemented by using popen2.Popen4 in the normal case and popen2.Popen3 if stderr is to be ignored.

The doNotLog parameter exists so that callers can force the function to not log command output to the debug log. Normally, you would want to log. However, if you're using this function to write huge output files (i.e. database backups written to stdout) then you might want to avoid putting all that information into the debug log.

The outputFile parameter exists to make it easier for a caller to push output into a file, i.e. as a substitute for redirection to a file. If this value is passed in, each time a line of output is generated, it will be written to the file using outputFile.write(). At the end, the file descriptor will be flushed using outputFile.flush(). The caller maintains responsibility for closing the file object appropriately.
Parameters:
command - Shell command to execute
           (type=List of individual arguments that make up the command)
args - List of arguments to the command
           (type=List of additional arguments to the command)
returnOutput - Indicates whether to return the output of the command
           (type=Boolean True or False)
doNotLog - Indicates that output should not be logged.
           (type=Boolean True or False)
outputFile - File object that all output should be written to. Type outputFile: File object as returned from open() or file().
Returns:
Tuple of (result, output) as described above.

Notes:

  • I know that it's a bit confusing that the command and the arguments are both lists. I could have just required the caller to pass in one big list. However, I think it makes some sense to keep the command (the constant part of what we're executing, i.e. "scp -B") separate from its arguments, even if they both end up looking kind of similar.
  • You cannot redirect output (i.e. 2>&1, 2>/dev/null, etc.) using this function. The redirection string would be passed to the command just like any other argument. However, you can implement 2>/dev/null by using ignoreStderr=True, as discussed above.

calculateFileAge(file)

Calculates the age (in days) of a file.

The "age" of a file is the amount of time since the file was last used, per the most recent of the file's st_atime and st_mtime values.

Technically, we only intend this function to work with files, but it will probably work with anything on the filesystem.
Parameters:
file - Path to a file on disk.
Returns:
Age of the file in days.
Raises:
OSError - If the file doesn't exist.

encodePath(path)

Safely encodes a filesystem path.

Many Python filesystem functions, such as os.listdir, behave differently if they are passed unicode arguments versus simple string arguments. For instance, os.listdir generally returns unicode path names if it is passed a unicode argument, and string pathnames if it is passed a string argument.

However, this behavior often isn't as consistent as we might like. As an example, os.listdir "gives up" if it finds a filename that it can't properly encode given the current locale settings. This means that the returned list is a mixed set of unicode and simple string paths. This has consequences later, because other filesystem functions like os.path.join will blow up if they are given one string path and one unicode path.

On comp.lang.python, Martin v. Löwis explained the os.listdir behavior like this:
  The operating system (POSIX) does not have the inherent notion that file
  names are character strings. Instead, in POSIX, file names are primarily
  byte strings. There are some bytes which are interpreted as characters
  (e.g. '.', which is '.', or '/', which is '/'), but apart from
  that, most OS layers think these are just bytes.

  Now, most *people* think that file names are character strings.  To
  interpret a file name as a character string, you need to know what the
  encoding is to interpret the file names (which are byte strings) as
  character strings.

  There is, unfortunately, no operating system API to carry the notion of a
  file system encoding. By convention, the locale settings should be used
  to establish this encoding, in particular the LC_CTYPE facet of the
  locale. This is defined in the environment variables LC_CTYPE, LC_ALL,
  and LANG (searched in this order).

  If LANG is not set, the "C" locale is assumed, which uses ASCII as its
  file system encoding. In this locale, '♪♬' is not a
  valid file name (at least it cannot be interpreted as characters, and
  hence not be converted to Unicode).

  Now, your Python script has requested that all file names *should* be
  returned as character (ie. Unicode) strings, but Python cannot comply,
  since there is no way to find out what this byte string means, in terms
  of characters.

  So we have three options:

  1. Skip this string, only return the ones that can be converted to Unicode. 
     Give the user the impression the file does not exist.
  2. Return the string as a byte string
  3. Refuse to listdir altogether, raising an exception (i.e. return nothing)

  Python has chosen alternative 2, allowing the application to implement 1
  or 3 on top of that if it wants to (or come up with other strategies,
  such as user feedback).
As a solution, he suggests that rather than passing unicode paths into the filesystem functions, that I should sensibly encode the path first. That is what this function accomplishes. Any function which takes a filesystem path as an argument should encode it first, before using it for any other purpose.
Parameters:
path - Path to encode
Returns:
Path, as a string, encoded appropriately
Raises:
ValueError - If the path cannot be encoded properly.

Note: As a special case, if path is None, then this function will return None.

deviceMounted(devicePath)

Indicates whether a specific filesystem device is currently mounted.

We determine whether the device is mounted by looking through the system's mtab file. This file shows every currently-mounted filesystem, ordered by device. We only do the check if the mtab file exists and is readable. Otherwise, we assume that the device is not mounted.
Parameters:
devicePath - Path of device to be checked
Returns:
True if device is mounted, false otherwise.

displayBytes(bytes, digits=2)

Format a byte quantity so it can be sensibly displayed.

It's rather difficult to look at a number like "72372224 bytes" and get any meaningful information out of it. It would be more useful to see something like "72.37 MB". That's what this function does. Any time you want to display a byte value, i.e.:
  print "Size: %s bytes" % bytes
Call this function instead:
  print "Size: %s" % displayBytes(bytes)
What comes out will be sensibly formatted. The indicated number of digits will be listed after the decimal point, rounded based on whatever rules are used by Python's standard %f string format specifier.
Parameters:
bytes - Byte quantity.
           (type=Integer number of bytes.)
digits - Number of digits to display after the decimal point.
           (type=Integer value, typically 2-5.)
Returns:
String, formatted for sensible display.

getFunctionReference(module, function)

Gets a reference to a named function.

This does some hokey-pokey to get back a reference to a dynamically named function. For instance, say you wanted to get a reference to the os.path.isdir function. You could use:
  myfunc = getFunctionReference("os.path", "isdir")

Although we won't bomb out directly, behavior is pretty much undefined if you pass in None or "" for either module or function.

The only validation we enforce is that whatever we get back must be callable.

I derived this code based on the internals of the Python unittest implementation. I don't claim to completely understand how it works.
Parameters:
module - Name of module associated with function.
           (type=Something like "os.path" or "CedarBackup2.util")
function - Name of function
           (type=Something like "isdir" or "getUidGid")
Returns:
Reference to function associated with name.
Raises:
ImportError - If the function cannot be found.
ValueError - If the resulting reference is not callable.

Variable Details

ISO_SECTOR_SIZE

Size of an ISO image sector, in bytes.
Type:
float
Value:
2048.0                                                                

BYTES_PER_SECTOR

Number of bytes (B) per ISO sector.
Type:
float
Value:
2048.0                                                                

BYTES_PER_KBYTE

Number of bytes (B) per kilobyte (kB).
Type:
float
Value:
1024.0                                                                

BYTES_PER_MBYTE

Number of bytes (B) per megabyte (MB).
Type:
float
Value:
1048576.0                                                             

BYTES_PER_GBYTE

Number of bytes (B) per megabyte (GB).
Type:
float
Value:
1073741824.0                                                          

KBYTES_PER_MBYTE

Number of kilobytes (kB) per megabyte (MB).
Type:
float
Value:
1024.0                                                                

MBYTES_PER_GBYTE

Number of megabytes (MB) per gigabyte (GB).
Type:
float
Value:
1024.0                                                                

SECONDS_PER_MINUTE

Number of seconds per minute.
Type:
int
Value:
60                                                                    

MINUTES_PER_HOUR

Number of minutes per hour.
Type:
int
Value:
60                                                                    

HOURS_PER_DAY

Number of hours per day.
Type:
int
Value:
24                                                                    

SECONDS_PER_DAY

Number of seconds per day.
Type:
int
Value:
86400                                                                 

UNIT_BYTES

Constant representing the byte (B) unit for conversion.
Type:
int
Value:
0                                                                     

UNIT_KBYTES

Constant representing the kilobyte (kB) unit for conversion.
Type:
int
Value:
1                                                                     

UNIT_MBYTES

Constant representing the megabyte (MB) unit for conversion.
Type:
int
Value:
2                                                                     

UNIT_SECTORS

Constant representing the ISO sector unit for conversion.
Type:
int
Value:
3                                                                     

Generated by Epydoc 2.1 on Tue Mar 8 13:38:06 2005 http://epydoc.sf.net