Package CedarBackup2 :: Module util
[show private | hide private]
[frames | no frames]

Module CedarBackup2.util

Provides general-purpose utilities.

Author: Kenneth J. Pronovici <pronovic@ieee.org>

Classes
AbsolutePathList Class representing a list of absolute paths.
ObjectTypeList Class representing a list containing only objects with a certain type.
RestrictedContentList Class representing a list containing only object with certain values.
UnorderedList Class representing an "unordered list".

Function Summary
  convertSize(size, fromUnit, toUnit)
Converts a size in one unit to a size in another unit.
  getUidGid(user, group)
Get the uid/gid associated with a user/group pair
  splitCommandLine(commandLine)
Splits a command line string into a list of arguments.
  executeCommand(command, args, returnOutput, ignoreStderr)
Executes a shell command, hopefully in a safe way (UNIX-specific).
  calculateFileAge(file)
Calculates the age (in days) of a file.
  encodePath(path)
Safely encodes a filesystem path.
  deviceMounted(devicePath)
Indicates whether a specific filesystem device is currently mounted.
  getFunctionReference(module, function)
Gets a reference to a named function.

Variable Summary
float ISO_SECTOR_SIZE: Size of an ISO image sector, in bytes.
float BYTES_PER_KBYTE: Number of bytes (B) per kilobyte (kB).
float KBYTES_PER_MBYTE: Number of kilobytes (kB) per megabyte (MB).
float BYTES_PER_MBYTE: Number of bytes (B) per megabyte (MB).
float BYTES_PER_SECTOR: Number of bytes (B) per ISO sector.
int SECONDS_PER_MINUTE: Number of seconds per minute.
int MINUTES_PER_HOUR: Number of minutes per hour.
int HOURS_PER_DAY: Number of hours per day.
int SECONDS_PER_DAY: Number of seconds per day.
int UNIT_BYTES: Constant representing the byte (B) unit for conversion.
int UNIT_KBYTES: Constant representing the kilobyte (kB) unit for conversion.
int UNIT_MBYTES: Constant representing the megabyte (MB) unit for conversion.
int UNIT_SECTORS: Constant representing the ISO sector unit for conversion.

Function Details

convertSize(size, fromUnit, toUnit)

Converts a size in one unit to a size in another unit.

This is just a convenience function so that the functionality can be implemented in just one place. Internally, we convert values to bytes and then to the final unit.

The available units are:
  • UNIT_BYTES - Bytes
  • UNIT_KBYTES - Kilobytes, where 1kB = 1024B
  • UNIT_MBYTES - Megabytes, where 1MB = 1024kB
  • UNIT_SECTORS - Sectors, where 1 sector = 2048B
Parameters:
size - Size to convert
           (type=Integer or float value in units of fromUnit)
fromUnit - Unit to convert from
           (type=One of the units listed above)
toUnit - Unit to convert to
           (type=One of the units listed above)
Returns:
Number converted to new unit, as a float.
Raises:
ValueError - If one of the units is invalid.

getUidGid(user, group)

Get the uid/gid associated with a user/group pair
Parameters:
user - User name
           (type=User name as a string)
group - Group name
           (type=Group name as a string)
Returns:
Tuple (uid, gid) matching passed-in user and group.
Raises:
ValueError - If the ownership user/group values are invalid

splitCommandLine(commandLine)

Splits a command line string into a list of arguments.

Unfortunately, there is no "standard" way to parse a command line string, and it's actually not an easy problem to solve portably (essentially, we have to emulate the shell argument-processing logic). This code only respects double quotes (") for grouping arguments, not single quotes ('). Make sure you take this into account when building your command line.

Incidentally, I found this particular parsing method while digging around in Google Groups, and I tweaked it for my own use.
Parameters:
commandLine - Command line string
           (type=String, i.e. "cback --verbose stage store")
Returns:
List of arguments, suitable for passing to popen2.Popen4.

executeCommand(command, args, returnOutput=False, ignoreStderr=False)

Executes a shell command, hopefully in a safe way (UNIX-specific).

This function exists to replace direct calls to os.popen() in the Cedar Backup code. It's not safe to call a function such as os.popen() with untrusted arguments, since that can cause problems if the string contains non-safe variables or other constructs (imagine that the argument is $WHATEVER, but $WHATEVER contains something like "; rm -fR ~/; echo" in the current environment).

It's safer to use popen4 (or popen2 or popen3) and pass a list rather than a string for the first argument. When called this way, popen4 will use the list's first item as the command and the remainder of the list's items as arguments to that command.

Under the normal case, this function will return a tuple of (status, None) where the status is the wait-encoded return status of the call per the popen2.Popen4 documentation. If returnOutput is passed in as True, the function will return a tuple of (status, output) where output is a list of strings, one entry per line in the output from the command. Output is always logged to the ouputLogger.info() target, regardless of whether it's returned.

By default, stdout and stderr will be intermingled in the output. However, if you pass in ignoreStderr=True, then only stdout will be included in the output. This is implemented by using popen2.Popen4 in the normal case and popen2.Popen3 if stderr is to be ignored.
Parameters:
command - Shell command to execute
           (type=List of individual arguments that make up the command)
args - List of arguments to the command
           (type=List of additional arguments to the command)
returnOutput - Indicates whether to return the output of the command
           (type=Boolean True or False)
Returns:
Tuple of (result, output) as described above.

Notes:

  • I know that it's a bit confusing that the command and the arguments are both lists. I could have just required the caller to pass in one big list. However, I think it makes some sense to keep the command (the constant part of what we're executing, i.e. "scp -B") separate from its arguments, even if they both end up looking kind of similar.
  • You cannot redirect output (i.e. 2>&1, 2>/dev/null, etc.) using this function. The redirection string would be passed to the command just like any other argument. However, you can implement 2>/dev/null by using ignoreStderr=True, as discussed above.

calculateFileAge(file)

Calculates the age (in days) of a file.

The "age" of a file is the amount of time since the file was last used, per the most recent of the file's st_atime and st_mtime values.

Technically, we only intend this function to work with files, but it will probably work with anything on the filesystem.
Parameters:
file - Path to a file on disk.
Returns:
Age of the file in days.
Raises:
OSError - If the file doesn't exist.

encodePath(path)

Safely encodes a filesystem path.

Many Python filesystem functions, such as os.listdir, behave differently if they are passed unicode arguments versus simple string arguments. For instance, os.listdir generally returns unicode path names if it is passed a unicode argument, and string pathnames if it is passed a string argument.

However, this behavior often isn't as consistent as we might like. As an example, os.listdir "gives up" if it finds a filename that it can't properly encode given the current locale settings. This means that the returned list is a mixed set of unicode and simple string paths. This has consequences later, because other filesystem functions like os.path.join will blow up if they are given one string path and one unicode path.

On comp.lang.python, Martin v. Löwis explained the os.listdir behavior like this:
  The operating system (POSIX) does not have the inherent notion that file
  names are character strings. Instead, in POSIX, file names are primarily
  byte strings. There are some bytes which are interpreted as characters
  (e.g. '.', which is '.', or '/', which is '/'), but apart from
  that, most OS layers think these are just bytes.

  Now, most *people* think that file names are character strings.  To
  interpret a file name as a character string, you need to know what the
  encoding is to interpret the file names (which are byte strings) as
  character strings.

  There is, unfortunately, no operating system API to carry the notion of a
  file system encoding. By convention, the locale settings should be used
  to establish this encoding, in particular the LC_CTYPE facet of the
  locale. This is defined in the environment variables LC_CTYPE, LC_ALL,
  and LANG (searched in this order).

  If LANG is not set, the "C" locale is assumed, which uses ASCII as its
  file system encoding. In this locale, '♪♬' is not a
  valid file name (at least it cannot be interpreted as characters, and
  hence not be converted to Unicode).

  Now, your Python script has requested that all file names *should* be
  returned as character (ie. Unicode) strings, but Python cannot comply,
  since there is no way to find out what this byte string means, in terms
  of characters.

  So we have three options:

  1. Skip this string, only return the ones that can be converted to Unicode. 
     Give the user the impression the file does not exist.
  2. Return the string as a byte string
  3. Refuse to listdir altogether, raising an exception (i.e. return nothing)

  Python has chosen alternative 2, allowing the application to implement 1
  or 3 on top of that if it wants to (or come up with other strategies,
  such as user feedback).
As a solution, he suggests that rather than passing unicode paths into the filesystem functions, that I should sensibly encode the path first. That is what this function accomplishes. Any function which takes a filesystem path as an argument should encode it first, before using it for any other purpose.
Parameters:
path - Path to encode
Returns:
Path, as a string, encoded appropriately
Raises:
ValueError - If the path cannot be encoded properly.

Note: As a special case, if path is None, then this function will return None.

deviceMounted(devicePath)

Indicates whether a specific filesystem device is currently mounted.

We determine whether the device is mounted by looking through the system's mtab file. This file shows every currently-mounted filesystem, ordered by device. We only do the check if the mtab file exists and is readable. Otherwise, we assume that the device is not mounted.
Parameters:
devicePath - Path of device to be checked
Returns:
True if device is mounted, false otherwise.

getFunctionReference(module, function)

Gets a reference to a named function.

This does some hokey-pokey to get back a reference to a dynamically named function. For instance, say you wanted to get a reference to the os.path.isdir function. You could use:
  myfunc = getFunctionReference("os.path", "isdir")

Although we won't bomb out directly, behavior is pretty much undefined if you pass in None or "" for either module or function.

The only validation we enforce is that whatever we get back must be callable.

I derived this code based on the internals of the Python unittest implementation. I don't claim to completely understand how it works.
Parameters:
module - Name of module associated with function.
           (type=Something like "os.path" or "CedarBackup2.util")
function - Name of function
           (type=Something like "isdir" or "getUidGid")
Returns:
Reference to function associated with name.
Raises:
ImportError - If the function cannot be found.
ValueError - If the resulting reference is not callable.

Variable Details

ISO_SECTOR_SIZE

Size of an ISO image sector, in bytes.
Type:
float
Value:
2048.0                                                                

BYTES_PER_KBYTE

Number of bytes (B) per kilobyte (kB).
Type:
float
Value:
1024.0                                                                

KBYTES_PER_MBYTE

Number of kilobytes (kB) per megabyte (MB).
Type:
float
Value:
1024.0                                                                

BYTES_PER_MBYTE

Number of bytes (B) per megabyte (MB).
Type:
float
Value:
1048576.0                                                             

BYTES_PER_SECTOR

Number of bytes (B) per ISO sector.
Type:
float
Value:
2048.0                                                                

SECONDS_PER_MINUTE

Number of seconds per minute.
Type:
int
Value:
60                                                                    

MINUTES_PER_HOUR

Number of minutes per hour.
Type:
int
Value:
60                                                                    

HOURS_PER_DAY

Number of hours per day.
Type:
int
Value:
24                                                                    

SECONDS_PER_DAY

Number of seconds per day.
Type:
int
Value:
86400                                                                 

UNIT_BYTES

Constant representing the byte (B) unit for conversion.
Type:
int
Value:
0                                                                     

UNIT_KBYTES

Constant representing the kilobyte (kB) unit for conversion.
Type:
int
Value:
1                                                                     

UNIT_MBYTES

Constant representing the megabyte (MB) unit for conversion.
Type:
int
Value:
2                                                                     

UNIT_SECTORS

Constant representing the ISO sector unit for conversion.
Type:
int
Value:
3                                                                     

Generated by Epydoc 2.1 on Sat Feb 26 23:21:10 2005 http://epydoc.sf.net