Interrogating the Infallible Secretary

Numerous applications in GNOME exhibit magically wonderful behavior, like they remember everything and know what you want.  One example of such an application is the excellent PDF reader Evince; every time I open a PDF it opens to the same page as the last time I looked at that document.  This means if I get my morning coffee, switch to the GNOME Activity Journal, see that it was the document "Informix_Python_Carsten_Haese.pdf" that I was reading at 16:59 the previous day, I click on that document and it opens to the same slide it was displaying when I closed it the previous day.  And GNOME applications do this kind of thing all day, like an infalliable secretary.

This reminds me of the now very cliche Niven's law: "Any sufficiently advanced technology is indistinguishable from magic" [no, that is not a quote from Arthur C. Clarke, as commonly attributed].  I could not longer resist looking behind the curtain, so I set off to discover how my infallible secretary accomplishes this.  The answer is "GVFS" - the GNOME Virtual Filesystem which layers an extensible meta-data system on top of application I/O.

GVFS provides a command line tool, of course [this is UNIX!], that allows the savy user to see into the filing cabinet of their infallible secretary.

$ gvfs-info -a "metadata::*" file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf
  metadata::evince::page: 7
  metadata::evince::dual-page-odd-left: 0
  metadata::evince::zoom: 1
  metadata::evince::window_height: 594
  metadata::evince::sizing_mode: fit-width
  metadata::evince::sidebar_page: links
  metadata::evince::window_width: 1598
  metadata::evince::sidebar_size: 249
  metadata::evince::dual-page: 0
  metadata::evince::window_x: 1
  metadata::evince::window_y: 91
  metadata::evince::show_toolbar: 1
  metadata::evince::window_maximized: 0
  metadata::evince::inverted-colors: 0
  metadata::evince::continuous: 1
  metadata::evince::sidebar_visibility: 1
  metadata::evince::fullscreen: 0
And there it is - "metadata::evince::page: 7" - how Evince takes me back to the same page I left from.  As well as lots of other information.

Command line tools are indespensible, but the immediate next question.... can I access this data from Python?  Answer - of course!  With the GIO module the data is there ready to be explored.
>>> import gio
>>> handle = gio.File('/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf')
>>> meta = handle.query_info('metadata')
>>> meta.has_attribute('metadata::evince::page')
>>> meta.get_attribute_string('metadata::evince::page')

Now knowing that, the System Administrator part of my psyche needs to know: where is all this metadata?  His first guess what that it was being stored in the filesystems using extended attribites:
getfattr --dump "/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
Bzzzzt! Nothing there.  Enough with guessing, every System Administrator worth his or her salt knows that guessing [ugh!] is for PC jockeys and web developers.  The correct approach is to drag the appropriate application out to the sheds and ... make it talk.  It turns out that gvfs-info doesn't put up much of a fight - one glimpse of strace and he's confessing everything.

$ strace -e trace=open gvfs-info -a "metadata::*" "file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
open("/home/awilliam/.local/share/gvfs-metadata/home", O_RDONLY) = 6
Yes, there it is.

$ file  /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home: data
$ fuser -u /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home:  2517m(awilliam)  2678m(awilliam) 26624m(awilliam)
$ ps -p 2517
  PID TTY          TIME CMD
 2517 ?        00:08:13 nautilus
$ ps -p 2678
  PID TTY          TIME CMD
 2678 ?        00:01:56 gvfsd-metadata
$ ps -p 26624
  PID TTY          TIME CMD
26624 ?        00:00:17 gedit
A memory-mapped database file [see the "m" after the PID in the output of fuser - that means memory mapped].  And PIDs or of the applications currently performing operations via GIO.   The use of memory mapped files means that read operations require no IPC [inter-process communications] or even syscalls for multiple applications to see the same state.  Now I had to do a little digging for GVFS documentation to understand how they manage concurrency - as multiple writers to memory mapped files is a dicey business [and GIO applications feel rock solid].  The answer is the gvfsd-metadata process.  Applications using GIO push all there writes / changes to that process over D-BUS; so only one process writes, everyone else reads through the memory mapped file.  Concurrency issues are elegantly side-stepped.  Brilliant. 

Now that the geek in me is sated I can go back to letting GNOME and its infallible secretary facilitate my productivity.


Setting a course for UTC

Timezones and daylight savings times are confusing; it is much more complicated that offset-from-UTC.  There are times that occur more that once a year [yep, it hurts] as well as times that are between two valid times but never happen.  It probably requires a Tardis to understand why anyone would want it to work this way.  But, sadly, it does work this way. 
If you stick to the rules, you can safely manage times... so long as those times are all localized.   Naive times, times that are not localized, are the enemy.
Unfortunately there is a lot of code out there, even in important and widely used modules, that uses nieve datetimes.  If you try to use a virtuously localized datetime object with those modules you will likely encounter the dreaded "Cannot compare naive and localized values".
One hack is to make sure the time is localized to the system's timezone, then make it naive, call the module's function, and then re-localize the result (again). Tedious and very prone to error.  The one real problem with this hack is that on most systems the Python process has not @*^$&*@* clue what time zone it is in.  Don't believe me? Try it:
>>> import time
>>> time.tzname
('EST', 'EDT')
Eh, that's a tuple.  And while "EST" is a time zone "EDT" is not a timezone.  Yes, I can determine that I am in daylight savings time locally using time.daylight; but I can't localize a datetime to a daylight timezone because daylight is an attribute of a timezone, not a timezone itself.  That is true regardless of what time.tzname says.  And the "EST" doesn't have daylight savings time, "US/Eastern" does.  "EST" is "US/Eastern" when not it daylight savings time. Gnarly.
But I want to use datetime obejcts reliably and safely with modules that require naive datetime objects....  The answer is to make the timezone known!  I cannot reliably get it from the system but I can make it what I want, and what I want is UTC!  Then my naive datetime objects do not have to be concerned with daylight savings time.  I can just localize them to UTC and subsequently convert them to whatever timezone the user needs to see.  This is accomplished using a combination of the os and time modules.  Early on in my Python application I move myself to UTC.  Here is an example that demonstrates the ugliness of naive times in an unknown timezone, and the beauty of the process being in UTC.
from datetime import datetime
import pytz, time, os

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format(datetime.utcnow( ) ) )
# What timezone is local?  Problem is, most of the time we just do not know.
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )

#Change to UTC
os.environ[ 'TZ' ] = 'UTC'
time.tzset( )

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format( datetime.utcnow( ) ) )
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )
And the output:
NOW: 2012-10-26 07:03:31.285486
UTCNOW: 2012-10-26 11:03:31.285570
LOCALIZEDNOW: 2012-10-26 07:03:31.285632+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285705+00:00
NOW: 2012-10-26 11:03:31.285787
UTCNOW: 2012-10-26 11:03:31.285812
LOCALIZEDNOW: 2012-10-26 11:03:31.285848+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285875+00:00

Now the danger of somehow getting a naive datetime into the mix is completely avoided - I can always safely localize a naive time to UTC.


Simple NAT With Cisco IOS

Performing NAT with any variety of a LINUX box is possibly one of the most redundantly documented applications on the Web.  Attempting to do the same with a Cisco IOS router is not documented in so straight-forward a way.
This little snippet shows the configuration for an IOS router where vLAN 13 is a public network and vLAN 12 is a private network.  The router has a public IP address of A.B.C.D [netmask: E.F.G.H] and the gateway address is A.B.C.I.  The private network is a with multiple /24 segments which all route to this NAT gateway.
interface FastEthernet0/0.12
 encapsulation dot1Q 12
 ip address 10.66.x.y
 ip nat inside
interface FastEthernet0/0.13
 encapsulation dot1Q 13
 ip address A.B.C.D E.F.G.H
 ip nat outside
ip nat inside source list 1 interface FastEthernet0/0.13 overload
ip classless
ip route A.B.C.I
access-list 1 permit
The access-list 1 matches all traffic and is used by the ip nat policy which causes the NATing of all matching traffic with the source IP address of the vLAN 13 interface.  The template for the ip nat inside source command is:
ip nat inside source {list {access-list-number | access-list-name} | route-map name} {interface type number | pool name} [mapping-id map-name | vrf name] [overload]
The "overload" option is what enables the routers use of a single address to NAT many local addresses; this corresponds to the default behavior of most iptables configuration tools (does iptables have a "default" behavior?)

One nice feature of using a Cisco for NAT, rather than a host (besides the simplicity of no moving parts) is the very concise reporting provided by "show ip nat translations" and "show ip nat statistics" commands.
Router#show ip nat statistics 
Total active translations: 208 (0 static, 208 dynamic; 208 extended)
Outside interfaces:
Inside interfaces:
Hits: 4890142  Misses: 52844
Expired translations: 52640
Dynamic mappings:
-- Inside Source
[Id: 3] access-list 1 interface FastEthernet0/0.13 refcount 208

Similar to "iptables -t nat -L -v" in LINUX.
Additional, and much more technical, documentation for this feature can be found here.


Counting weekdays between dates

Need a method to accurately count the number of weekdays between two days?  [The key here is "accurately", it is a bit harder than it seems at first].  In Python there are several ways to do this, but most involve some iteration or list comprehension.  In my opinion, if you have to do that, you are probably violating the Python idiom of "use the batteries".
A better way to solve this is to use recurrence rules - used every day in scheduling software, groupware, and anything that supports iCalendar.  Recurrence rules in Python are handled by the dateutil module's rrule component.  Here is the code:

from datetime import date, timedelta
from dateutil.rrule import rrule, MO, TU, WE, TH, FR, DAILY
start = date.today( ) - timedelta( days=90 )
end = date.today( ) + timedelta( days=3 )
rule = rrule( DAILY,
              byweekday=( MO, TU, WE, TH, FR ),
              until=end )
print( 'Days: {0}'.format( rule.count( ) )

The date range here is inclusive, it includes both the start and end dates.  One caveat is that if your end date is prior to the start date you will not get an error or exception - you'll just get a recurrence with zero elements.


Idjit's Guide To Installing RabbitMQ On openSUSE 12.2

The RabbitMQ team provides a generic SUSE RPM which works on openSUSE 11.x, openSUSE 12.1, and I presume on the pay-to-play versions of SuSE Enterprise Server. About the only real dependency for RabbitMQ is the erlang platform which is packaged in the erlang language repo. So the only real trick is getting the RabbitMQ package itself [from this page].  Then install and start is as simple as:
zypper ar http://download.opensuse.org/repositories/devel:/languages:/erlang/openSUSE_12.2 erlang
zypper in erlang
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.8.6/rabbitmq-server-2.8.6-1.suse.noarch.rpm
rpm -Uvh rabbitmq-server-2.8.6-1.suse.noarch.rpm
Now before you start the rabbitmq-server you need to modify the /etc/init.d/rabbitmq-server file changing "LOCK_FILE=/var/lock/subsys/$NAME" to "LOCK_FILE=/var/run/rabbitmq/$NAME".  The directory "/var/lock/subsys/$NAME" doesn't exist on openSUSE, so this change puts the lock file over under /var/run/rabbitmq along with the PID file.  Otherwise you can create the /var/lock/subsys/$NAME directory with the appropriate permissions.

Every time you modify a service script in /etc/init.d you need to then run systemctl --system daemon-reload so that systemd knows to anticipate the changed file.
If you want to use Rabbit's management interface you now need to enable the appropriate plugins:
rabbitmq-plugins enable rabbitmq_management
rabbitmq-plugins enable rabbitmq_management_visualiser

By default RabbitMQ will listen on all your hosts interfaces for erlang kernel, AMQP, and HTTP (management interface) connections.  Especially in the case of a developement host you may want to restrict the availability of one or all of these services to the local machine.

In order to keep the erlang kernel and Rabbit's AMQ listeners restricted to the local host you'll need to add two exported environment variables to the service script - just put them in following the definition of PID_FILE.
export RABBITMQ_NODENAME=rabbit@localhost
For the management inteface and other components you'll need to modify [and possibly create] the /etc/rabbitmq/rabbitmq.config configuration file.  RabbitMQ violates the only-have-one-way-to-configure rule of system administration; this is in part due to its reliance on the Erlang runtime - controlling the behavior of the run-time is a [poorly documented] black art.  Both the environment variables and the configuration file are required to restrict all the components to the local interface.  The following configuration file restricts HTTP [management interface] and AMQP services to the localhost and informs the RabbitMQ application that it should find the Erlang kernel at the address
  {mnesia, [{dump_log_write_threshold, 1000}]},
  {rabbit, [{tcp_listeners, [{"", 5672}]}]},
  {rabbitmq_management,  [ {http_log_dir,   "/tmp/rabbit-mgmt"} ] },
  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] },
  {rabbitmq_mochiweb, [ {listeners, [{mgmt, [{port, 55672},
                                             {ip, ""}]}]},
                        {default_listener, [{port, 60000} ] } ] }
Always modify the configuration file when the RabbitMQ service is shutdown.  A botched configuration file can render the broker unable to shutdown properly leaving you to have to manually kill the processes old-school.

With the RABBITMQ_NODENAME defined in the services file you will either need to add that same variable to the administrator's and application's environment or specify the node name when attempting to connect to or manage the RabbitMQ broker service [your application probably already refers to a configured broker, but you'll certainly have to deal with this when using the rabbitmqclt command].

Now the service should start:
service rabbitmq-server start
The broker service should now be running and you can see the components' open TCP connections using the netstat command.  The management interface should also be available on TCP/55672 [via your browser of choice] unless you specified an alternative port in the rabbitmq.config file.

linux-nysu:/etc/init.d # netstat --listen --tcp --numeric --program
Active Internet connections (only servers)
Proto Local Address    Foreign Address State  PID/Program name  
tcp*       LISTEN 23180/beam.smp     
tcp*       LISTEN 23180/beam.smp     
tcp*       LISTEN 22681/epmd         
tcp*       LISTEN 23180/beam.smp     
Now you probably want to do some configuration and provisioning using the rabbitmqctl command; but your RabbitMQ instance is up and running.


Recommended GNOME3 Extensions

I'm a GNOME3 / GNOME Shell user, and a big time fan of this elegant new desktop environment.  Gone are the clumsy panels, task bars, and the applets.  GNOME Shell replaces all that with an extension system that allows developers to extend and modify the working environment using only JavaScript and CSS [leave your compiler at home].  Extensions can be installed on the fly, and enable or disabled at will.  Extensions can be browsed and installed just by visiting the extensions.gnome.org website with your Epiphany or Firefox web browser. 

This is a list of extensions I find most useful.

Tracker Search
Tracker is an efficient and fast desktop search engine.  Open Source desktop search experienced a painful set-back when faux Open Source advocates ignorantly crusading against the Mono project bludgeoned the reputation of the Beagle project based on a few bugs experienced in early releases [as if every project and product doesn't have those].  Tracker stepped in to replace Beagle, and being implemented in C, avoided the ire of the trolls [or at least that set of trolls].  It has taken a l-o-n-g time for Tracker to match Beagle's level of awesome, but that day has arrived.  And to put this amazing little search engine work for you is the Tracker Search extension.  This extension adds search results derived from all your data to the search feature of Shell's overview mode;  you can see applications, recent items, and the top matches from your data all in one dynamic view.  This extension is like having your own personal secretary with a degree in library science - and who doesn't want that?

Disable Hot Corners
GNOME Shell features hot corners so that it can claim to support the hip new thing known as "gestures".  Gestures are an awful idea and impede usability.  This extension disables hot corners - win!.  If you have a keyboard you can get to overview mode using either Alt-F1 or the Windows key; what could be faster?  Nothing. If you do not have a keyboard you almost certainly are not doing anything productive anyway - go outside, get some exercise, make some friends who don't live in their mother's basement.

Zeitgeist is the activity hub of the Open Source desktop.  It correlates and records your activity and the data you access.  In conjunction with Tracker and the Tracker Search this provides a nearly full-fledged secretarial service.  Often times I can resume what I ended working on yesterday directly from the GNOME Activity Journal.  This extension adds all that knowledge and context to Shell's overview mode.

Frequently I just need to run something, or check something, and I to do so I need a terminal window.  This extension puts an icon on Shell's top bar that with a single click always gives me a shiny new shell.  Simple.

Advanced Settings in UserMenu
GNOME hackers haven't quite settled on where settings belong.  It appears that between all the various work environments that may just be an eternal question.  And people have strong opinions about it.  So GNOME Shell provides "System Settings" in the drop down menu.  But... a lot of settings aren't there.  Including the ability to enable and disable extensions.  This extension just adds an "Advanced Settings" option which shortcuts to the gnome-tweak-tool where numerous [officially unsupported] settings can be tweaked (hence the name).  In gnome-tweak-tool it is also possible to enable and disable extensions.  This extension just makes it faster to get to the tool.  Once you have things the way you really want you won't use it much, but getting to that point you'll possibly be searching for and running gnome-tweak-tool on a regular basis.

Dash to Dock

This extension makes the dash [the dashboard for launching favorite applications] a bit more like a dock or toolbar.  The dash will stick around, even when not in overview, until a window presses it out of the way.  Sometimes it is a bit too sticky but most of the time it works as expected.  The best part of the modified behavior is that every time you navigate to a new [empty] workspace the dash is ready and waiting for you to summon some applications.

Connection Manager

This great little extension drops a new drop-down menu into the top bar of the Shell from which you can create, via one-click, a new SSH session from a predefined lists of hosts.  And there is not froggin' about in a configuration file to setup the hosts - the extension provides a handy configuration dialog to add and remove host entries.  It even integrates with the GNOME Terminal profiles so that you can select what profile you'd like for each SSH host entry.  This is a must-have for the beleaguered system adminstrator.


If a record exists

A common action when synchronizing data between some source and a database is to check if such-and-such record already exists and needs to be updated or if a new record needs to be created.  The SQLAlchemy's one() method [of the query object] provides an easy way to check to see if such-and-such record exists;  but it doesn't return either an ORM object or None - if no record is found it raises an exception.  This is surprising at first as x=do;if-not-x is possibly the most common of all Python constructs.  The corresponding SQLAlchemy construct is just to catch the NoResultFound exception.

from sqlalchemy import and_
from sqlalchemy.orm.exc import NoResultFound

    db.query( VendorCross ).\
        filter( and_( VendorCross.vendor_code == vendor_code,
                      VendorCross.oem_code = oem_code,
                      VendorCross.oem_partcode = oem_partcode ).one( )
except NoResultFound:
    # no such record exists
    # record exists


Recovery From Half An MD Mirrored Pair

In the process of decommissioning an old physical server I wanted to recover some data from the server's drives.  The failing server was configured with a SATA RAID1 pair that contained a logical volume group (LVM).  So I could either boot up the old server, change it's IP address, and recover the data over the network.... or I could just recover the data directly from one of the drives [they are a mirrored pair after all].   But only having a USB caddy for one SATA drive the trick was to get the RAID1 array to come up on my laptop with only one drive.

Step#1) Connect the drive.
beast:~ # fdisk -l
Disk /dev/sdd: 80.0 GB, 80026361856 bytes
Disk identifier: 0x0007236b
   Device Boot    Start       End   Blocks Id  System
/dev/sdd1   *        63    530144   265041 fd  Linux raid autodetect
/dev/sdd2        530145   8916074  4192965 fd  Linux raid autodetect
/dev/sdd3       8916075 156296384 73690155 fd  Linux raid autodetect
Text 1: Connect the drive and look at the partition table of the new SD device.
Yep.  There are three RAID partitions.  I know the first tiny one was for "/boot" and the second larger one was for swap - the one I wanted was the third, and largest, partition which was the physical volume group that contained the data I wanted to recover.

Step#2) Create a "new" RAID device.
beast:~ # mdadm --build --force --level=0 --raid-devices=1 /dev/md3 /dev/sdd3
mdadm: array /dev/md3 built and started.
Text 2: Create a new level 0 [no redundancy] RAID device with mdadm from a redundant portion of the previous RAID device.
The only gotcha here is to use an MD device that is not already in use.  Since I already have an md0, md1, and md2 on my laptop I create this new device as md3.

Step#3) Activate the volume group.
beast:/home/awilliam # vgchange -ay
  4 logical volume(s) in volume group "Beast" now active
  3 logical volume(s) in volume group "KPAYL17" now active
Text 3: Scan for and activate the old volume group.
vgchange -ay of course finds the volume group on my laptop "Beast" but also discovers and activates the volume group that existed on the LVM physical volume in the new/old RAID volume. 
Aside: Note that prior to this point I never wrote anything to the recovered volume, I'm just building up around and on top of it.   This is also a good point to point out why every server, SAN, or JBOD should use an LVM volume group with a unique name.  If every volume group is named "System" then recovery just becomes that much more tedious - trying to activate a "System" volume group on a host that already has a "System" volume group...
Step#4) Mount a recovered logical volume.
beast:/home/awilliam # mount /dev/KPAYL17/SRV /mnt
Text 4: Now I can mount the SRV logical volume from the recovered volume group from the physical volume on the new/old RAID device from the physical drive from the old server.  Whew!
Data recovered!  Have coffee.


rrdtool lastupdate

A variety of tools use RRD databases which are ideal for recording a sequence of values over time.  So one interesting question is when the last time the rrd file was updated - rrdtool's lastupdate mode will tell you.
$ rrdtool lastupdate probe302Value_probe302Value.rrd
1334603585: 190.0
Text 1:   The most recent value in the RRD is 190.0 recorded at 1334603585
The value of the time is a UTC timestamp.  Converting this value to a readable date/time can be performed using the UN*X date command.
$ date -d '@1334603681'
Mon Apr 16 15:14:41 EDT 2012
Text 2: Converting a timestamp to a date/time.
With knowing the last update and most recent value you can verify that the database is being updated at the correct interval and with the expected values.


Informix Dialect With CASE Derived Polymorphism

I ran into an interesting issue when using SQLAlchemy 0.7.7 with the Informix dialect.  In a rather ugly database (which dates back to the late 1980s) there is a table called "xrefr" that contains two types of records: "supersede" and "cross".  What those signify doesn't really matter for this issue so I'll skip any further explanation.  But the really twisted part is that while a single field distinquishes between these two record types - it does not do so based on a consistent value.  If the value of this field is "S" then the record is a "supersede", any other value (including NULL) means it is a "cross".  This makes creating a polymorphic presentation of this schema a bit more complicated.  But have no fear, SQLAlchemy is here!

When faced with a similar issue in the past, on top of PostgreSQL, I've created polymorphic presentations using CASE clauses. But when I tried to do this using the Informix dialect the generated queries failed. They raised the dreaded -201 "Syntax error or access violation" message. 
The Informix SQLCODE -201 is in the running for "Most useless error message ever!".  Currently it is tied with PHP's "Stack Frame 0" message.  Microsoft's "File not found" [no filename specified] is no longer in the running as she is being held at the Hague to face war crimes charges.
Rant#1: Why do developers get away with such lazy error messages?
The original [failing] code that I tried looked something like this:
class XrefrRecord(Base):
    __tablename__  = 'xrefr'
    record_id      = Column("xr_serial_no", Integer, primary_key=True)
    _supersede     = Column("xr_supersede", String(1))
    is_supersede   = column_property( case( [ ( _supersede == 'S', 1, ), ],
                                            else_ = 0 ) )
    __mapper_args__ = { 'polymorphic_on': is_supersede }   
class Cross(XrefrRecord): 
    __mapper_args__ = {'polymorphic_identity': 0} 
class Supsersede(XrefrRecord): 
    __mapper_args__ = {'polymorphic_identity': 1}
Text#1: Code snippet that does not work.
The generated query looked like:
  SELECT xrefr.xr_serial_no AS xrefr_xr_serial_no,
           WHEN (xrefr.xr_supersede = :1) THEN :2 ELSE :3
           END AS anon_1
  FROM xrefr
  WHERE xrefr.xr_oem_code = :4 AND
        xrefr.xr_vend_code = :5 AND
          WHEN (xrefr.xr_supersede = :6) THEN :7
          ELSE :8
         END IN (:9) <--- ('S', 1, 0, '35X', 'A78', 'S', 1, 0, 0)
Text#2: Query SQLAlchemy generates for that construct.
It would seem that this would work.  If you substitute the values for their place holders in an application like DbVisualizer - it works.

The condition raising the -201 error is the use of place holders in a CASE WHEN structure within the projection clause of the query statement; the DBAPI module / Informix Engine does not [or can not] infer the type [cast] of the values.  The SQL cannot be executed unless the values are bound to a type.  Why this results in a -201 and not a more specific data-type related error... that is beyond my pay-grade.
Notice that when used like this in the projection clause the values to be bound are both input and output values.
Aside#1: An existential dilemma
The trick to get this to work is to explicitly declare the types of the values when constructing the case statement for the polymorphic mapper. This can be accomplished using the literal_column expression.
from sqlalchemy import literal_column

class XrefrRecord(Base):
    _supersede    = Column("xr_supersede", String(1))
    is_supersede  = column_property( case( [ ( _supersede == 'S', literal_column('1', Integer) ) ],
                                               else_ = literal_column('0', Integer) ) )
    __mapper_args__     = { 'polymorphic_on': is_supersede }
Text#3: A working CASE based polymorphic declaration.
Visually if you log or echo the statements they will not appear to be any different than before; but SQLAlchemy is now binding the values to a type when handing the query off to the DBAPI informixdb module. 

Happy polymorphing!


Using GNOME Terminal Custom Commands

There are numerous terminal session managers and profile managers, etc... for use by people [typically network and system administrators] who have to SSH or telnet to lots of hosts.   But most of these aren't packages or very well maintained - fortunately there is an almost stealth solution built into the familiar gnome-terminal application.
Multiple profiles can be created in gnome-terminal and profiles can be assigned to "Run a custom command instread of my shell";  this command can be anything.  So that profile can automatically telnet or SSH to another host, potentiall even launch an application on that host (such as firing up GNU screen).

The defined profiles are conveniently available in "Files" / "Open Terminal" and "File" / "Open Tab" menu.  Simply create a profile for each host or device that you typically jump on to. If SSH is in use the automatic tie-in to the GNOME passphrase and keyring will kick in.
Generally you don't want a terminal emulator to grab your keystrokes - you want them to go to the application or session.  Under gnome-terminal's "Edit" / "Keyboard Shortcuts" you can enable F10 to activate the ring-menu which will allow you to conveniently navigate to profiles using just the keyboard.


A JSONDateTime TypeDecorator

JSON doesn't provide a date or date-time construct;  therefore every application is burdened with implementing a solution of it's own for receiving date-time information over the wire.  On common issue receiving JSON and serializing that data into some type of database - but the database knows the value is a date-time and you want to be able to perform date-time like operations on the value (locally).
Fortunately if your database is accessed via an SQLAlchemy ORM you can out-source this deserialization of the the values into your ORM by creating a TypeDecorator that wraps the datetime value and knows how to recognize and parse date-time values.
This example will allow an actual date-time value to be assigned to the attribute, or a 19 or 16 character string, or a NULL (None).  Conversion happens automatically when the value is assigned, but when the value is accessed an actual datetime is always seen.
from datetime import datetime
from sqlalchemy import TypeDecorator
from sqlalchemy.types import DateTime

class JSONDateTime(TypeDecorator):
    """ Allow storing a string into a datetime value, this allows 
        for automatically conversion of the JSON date strings into 
        date values"""
    impl = DateTime

    def __init__(self, *arg, **kw):
        TypeDecorator.__init__(self, *arg, **kw)

    def process_bind_param(self, value, dialect):
        if value:
            if isinstance(value, basestring):
                if (len(value) == 19):
                    return datetime.strptime(value, '%Y-%m-%dT%H:%M:%S')
                elif (len(value) == 16):
                    return datetime.strptime(value, '%Y-%m-%dT%H:%M')
                elif (len(value) == 0):
                    return None
            elif isinstance(value, datetime):
                return value
            raise Exception('Cannot store value "{0}" as
        return None

# Create a class/table

from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class LogEntry(Base):
    __tablename__ = 'logentry'
    actiondate = Column(JSONDateTime)
Text1: A TypeDecorator class for automatically converting string datetime representations to their datetime values; and a snippet of a table utilizing the new type.

One possible improvement is to make sure to assign a time-zone to the date value as comparing zoned and nieve datetime values will raise an exception.  But to do that you need to know the timezone you expect the JSON values to represent (in most cases this isn't present in the date-time string representation). The Python datetime's replace(tzinfo={tzinfoclass}) method is used to assign time zone information to a value. [Note: time zone information in Python.... not especially graceful or fun].


Using Local Update Publisher; WSUS, but for you.

One of my colleagues has published an excellent and comprehensive article about using Local Update Publisher,a project hosted at SourceForge, that enables using your local Windows Update Services to distribute other and custom software. For example, your workstations can get deployments and updates of FireFox, Java, Flash, LibreOffice, etc... through the same mechanism used to stay up-to-date with Microsoft products.  LUP is a powerful tool for steamlining the managemento third-party software on you Microsoft Windows workstations;  if you've got lots of workstations be sure to check it out.


Sound Converter

A common issue is to have an audio file in one format at to need it in another for compatibility with some specific application or device.  And how to covert the file and know the quality of the result, etc...?  Well... there is a simple application called ... wait for it .... "soundconverter".  It is packaged for just about every distribution and available on openSUSE through the normal repositories.  How obvious can it get?  Apparently not so obvious I couldn't have overlooked it for a long long time.
The soundconverter application.
With soundconverter you can convert between FLAC, MP3, OGG, and WAV.  Add the files you want to covert, use preferences to select the output format, and away you go.  Nice job.


Sequestering E-Mail

When testing applications one of the concerns is always that their actions don't effect the real-world. One aspect of that this is sending e-mail; the last thing you want is the application you are testing to send a paid-in-full customer a flurry of e-mails that he owes you a zillion dollars. A simple, and reliable,  method to avoid this is to adjust the Postfix server on the host used for testing to bury all mail in a shared folder.  This way:
  • You don't need to make any changes to the application between production and testing.
  • You can see the message content exactly as it would ordinarily have been delivered.
To accomplish this you can use Postfix's generic address rewriting feature;  generic address rewriting processes addresses of messages sent [vs. received as is the more typical case for address rewriting] by the service.  For this example we'll rewrite every address to shared+myfolder@example.com using a regular expression.


Create the regular expression map.  Maps are how Postfix handles all rewriting; a match for the input address is looked for in the left hand [key] column and rewritten in the form specified by the right hand [value] column.
echo "/(.)/           shared+myfolder@example.com" > /etc/postfix/generic.regexp
Text 1: Create the new regular expression map.

Configure Postfix to use the new map for generic address rewriting.
postconf -e smtp_generic_maps=regexp:/etc/postfix/generic.regexp
Text 2: Enable generic rewriting using the regular expression map.


Tell Postfix to reload its configuration.
postfix reload
Text 3: Signal the running Postfix service to reload its configuration.

Now any mail, to any address, sent via the hosts' Postfix service, will be driven not to the original address but to the shared "myfolder" folder.


Complex Queries With SQLAlchemy (Example#1)

There are lots of examples of how to use SQLAlchemy to provide your Python application with a first-rate ORM. But most of these examples tend to model very trivial queries;  but the real power of SQLAlchemy, unlike many ORM solutions, is that it doesn't hide / bury the power of the RDBMS - and if you aren't going to use that power why bother with an RDBMS at all [Aren't NoSQL solutions the IT fad of the year? You could be so hip!].  So in this post I'll provide a useful non-trivial query and how to perform the a complex query - only better - using SQLalchemy.
So first, a query:
SELECT process.process_id, op1.value_int, op2.value_string, op3.value_string
FROM process
  INNER JOIN route ON ( route.route_id = process.route_id )
  LEFT OUTER JOIN obj_property op1
    ON ( op1.obj_id = route.route_id AND
         op1.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op1.value_key = 'expireDays' )
  LEFT OUTER JOIN obj_property op2
    ON ( op2.obj_id = route.route_id AND
         op2.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op2.value_key = 'preserveAfterCompletion' )
  LEFT OUTER JOIN obj_property op3
    ON ( op3.obj_id = route.route_id AND
         op3.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op3.value_key = 'archiveAfterExpiration' )
WHERE process.db_status != 'archived'
  AND process.state IN ( 'C', 'F', 'Z' )
  AND process.status != 'archived';
Text 1: A sophisticated query that exploits the power of the PostgreSQL database engine.

This query returns the process_id value from the process table and then some values from multiple records from the table obj_property - if those records exist - correlated via an intermediate table route.  Complex, but also fast!  
Aside:With the correct DDL when creating the database PostgreSQL also enforces the integrity of the relations, maintains usage statistics, and provides multiple levels of atomicity.  I certainly wouldn't want to have to do all that myself.
So how to model such a query in SQLAlchemy?  First realize that the result is actually going to be better than what straight SQL / DBAPI would give us.  Instead of some values the query will return real objects, the advantage of this is the amount of code eliminated by having to do things based on ids or primary keys.  This example assumes that the ORM entities Process, Route, and ObjectProperty have already been described - there is lots of documentation about how to declare the relations between your tables and your objects using the declarative style.
from sqlalchemy.orm   import  aliased
from sqlalchemy       import and_, or_

db = ctx.db_session()

op1 = aliased(ObjectProperty)
op2 = aliased(ObjectProperty)
op3 = aliased(ObjectProperty)

q = db.query( Process, op1, op2, op3 ).\
       join( Route, Route.object_id == Process.route_id ).\
       outerjoin( op1, and_( op1.parent_id == Route.object_id,
                             op1.name=='expireDays' ), ).\
       outerjoin( op2, and_( op2.parent_id == Route.object_id,
                             op2.name=='preserveAfterCompletion' ), ).\
       outerjoin( op3, and_( op3.parent_id == Route.object_id,
                             op3.name=='archiveAfterExpiration' ), ).\
       filter( and_( Process.state.in_( [ 'C', 'F', 'Z' ] ),
                     Process.status != 'archived' ) )
Text 2: The same query as above, only expressed via the SQLAlchemy ORM.  But instead of returning values it returns live objects.
The SQLAlchemy aliased method declares multiple references to ObjectProperty that can be used independently: op1, op2, and op3.  The other advanced technique is to use the outerjoin method to relate the need for a LEFT OUTER join.
The results of this query will be tuples of four elements; the first being a Process object and the second, third, and fourth will either be ObjectProperty objects if the concomitant outer join identified a record or None if no record matched the join. The lovely upside of this is that the query results can be processed using a straight forward for-each construct:
for process, expire_days, preserve_after, archive_after in q.all():
   if expire_days:
Text 3: Iterate over the query results; the first step depends if the op1 is an object (a record matched the first outer join).
Personally I find the ORM code to be easier to visually parse than the native SQL. Especially if you need to build the query dynamically or modify it based on the applications needs - since q is an object additional filter and join conditions can continue to be added.  Imagine trying to do that with straight SQL?
q = q.filter(Process.owner_id == 10100)
q = q.limit(150)
Text 4: Add one more filter expression to the queries WHERE clause and limit the query to 150 results.
Another advantage to this method is that SQLAlchemy can adapt it's dialect to the specific back-end if, for example, you are stuck using a database other that PostgreSQL.  Without such an adaptive layer using anything other than the most trivial queries becomes daunting do to slight but important differences in how various engines express joins and nested queries.


Configuring Postfix As An SMTP Client

Every host needs to send mail; not just users.  Hosts send mail for a variety of reasons - from cron jobs, log watchers, error and exception reports, lots and lots of reasons.  But mail sent by hosts should be as secure as mail sent by users at least to the degree you trust the facilities generating the mail.  To achieve that the host's Postfix instance should encrypt it's traffic to the central SMTP relay and it should authenticate itself - just like a user.  Fortunately doing so is pretty straight-forward.  For this example I'm assuming the central SMTP server is smtp.example.com and you have a username/password you need to authenticate. 

A note about SASL mechs

Whether authentication is performed using DIGEST, PLAIN, CRAM, etc... doesn't really matter.  The only caveat for the type of authentication is that you need to have the appropriate SASL library installed; so for plain you need to "yum install cyrus-sasl-plain".  If you get to the end and don't have an appropriate SASL library installed for a type of authentication the central SMTP server supports you'll see messages like "SASL authentication failure: No worthy mechs found".  It is the central SMTP server that determines what authentication methods are acceptable - your client has to be able to match at least on of the methods it supports.

Step #1 : Point to the central SMTP server

Configure the Postfix instance to only listen to the local interface and to send all mail, regardless of destination, to the central relay.
postconf -e inet_interfaces=localhost
postconf -e relayhost='[smtp.example.com]'
Text 1: Setting the central SMTP server (relayhost)
Step #2 : Enable authentication & encryption

Of course you'll want to encrypt the traffic and the relay host will probably only permit authentication over an encrypted connection anyway.
postconf -e smtp_sasl_auth_enable=yes
postconf -e smtp_use_tls=yes
postconf -e smtp_tls_note_starttls_offer = yes
Text 2: Enable TLS & authentication
If the site is using their own CA certificate to create SSL certificates then that CA certification must be available on the host in order to verify the host certificate of the SMTP relay.
postconf -e smtp_tls_CAfile=/path/to/the/cacert.pem
Text 3: Set the path to the CA certificate
Step #3 : Establish the authentication credentials

Now the SMTP server needs some credentials.  These are written to a file and then a Postfix map is generated from that file. The format of the file is the host name of the remote, whitespace, and then the username and password delimited by a colon.  Note that the hostname must match the actual hostname of the remote or the local Postfix instance won't attempt to login - it will think it doesn't have credentials. The permissions on the sasl_passwd and sasl_passwd.db files should be secured so that only user root & group mail have access.

echo "smtp.example.com username:password" > sasl_passwd
postmap hash:/etc/postfix/sasl_passwd
postconf -e smtp_sasl_password_maps=hash:/etc/postfix/sasl_passwd
chown root:mail sasl_passwd sasl_passwd.db
chmod 740 sasl_passwd sasl_passwd.db
Text 4: Creating the sasl_passwd map
Optional Extra Paranoia

My personal preference, for a bit of added paranoia, is to also set the immutable flag of the three security sensitive files.
chattr +i cacert.pem sasl_passwd sasl_passwd.db
Text 5: Making the sensitive file immutable.

A file set as imutable canntot be modfied, deleted, renamed, or linked to.  Not even by root - at least not until the immutability flag is explicitly removed [chatter -i files].  This protects the file from being modified or deleted unintentionally as well as making them that much more difficult to modify maliciously.

Step #4: Test
Now you should be able to send some mail;  this is most easily accomplished with the mail command [which is provided in the mailx package].  Watch the /var/log/maillog file to see your message go; or see any errors. If you see messages like "certificate verification failed for ..." then Postfix doesn't accept the validity of the central SMTP relay's certificate.  Either the CA cert specified in Step#2 is invalid or the permissions are incorrect and Postfix can access the file.
When you receive the mail you've sent in your INBOX you can look at the headers and you should see something very much like:
Received: from client.example.com (client.example.com []) \
  (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No \
  client certificate requested) (Authenticated sender: smtpclient) by \
  smtp.example.com (Postfix) with ESMTP id 5FD712000C for \
  ; Wed,  8 Feb 2012 12:05:19 -0500 (EST)
Text 6: Example header from a secure and authenticated e-mail.
You server is now securely sending messages.


Installing PDO_INFORMIX on CentOS6

Step#1 : Install the Informix SDK / Client

This is as simple as copying the files to /opt/informix or using one of the various install methods provided.  But beyond that it is necessarily to initialize the required environment variables.  The simplest way to set the environment variables is to create an informix.sh profile script in /etc/profile.d - these scripts are executed by /etc/profile whenever a session is created [such as when a user logs in].  Additionally you'll need to set these same variables in /etc/sysconfig/httpd so that they are set in Apache's environment when started by the system start-up scripts.
$ (cat << EOF
export INFORMIXDIR=/opt/informix
export DBDATE='Y4MD-'
) > /etc/profile.d/informix.sh
Text 1: Creating /etc/profile.d/informix.sh
YOURINSTANCENAME needs to be defined in /opt/informix/etc/sqlhosts.  Your method of installing the SDK may or may not have set that up for you.

The system library path must also be extended to include the directories containing the SDK's libraries.
$ ( cat << EOF
 ) > /etc/ld.so.conf.d/informix.conf
Text 2: Extending the system's library path
 If the library path is not configured correctly applications, included httpd, will not be able to load the Informix libraries.  At this point the library cache can be refreshed by executing the /sbin/ldconfig command.  Once that has been performed either log out and back into the server, or just reboot the server, to verify that upon logging in you have the INFORMIXDIR, INFORMIXSERVER, and DBDATE variables in your enviroment.

Step#2 : Build the Informix PDO driver.

In order to build PHP PECL modules you must have php-devel, make, and gcc installed on the server.
$ pecl download PDO_INFORMIX-1.2.6
$ tar xzf PDO_INFORMIX-1.2.6.tgz
$ cd PDO_INFORMIX-1.2.6
$ phpize
$ configure
$ make
Text 3: Building PDO_INFORMIX
If your Informix SDK is installed correctly and you've properly initialized the environment everything should be found automatically and build without complaint.  Now move the PDO driver into place and inform the PHP interpreter that it needs to load the library.  Here we perform a dirty trick of first loading the base pdo.so library.  This shouldn't be necessary and PHP will grumble about it upon initialization, but it works around some wackiness regarding PDO versions.  Without this line pdo_informix.so will refuse to load because PDO isn't loaded yet because the need for PDO isn't automatically discovered.
$ cp /tmp/PDO_INFORMIX-1.2.6/modules/pdo_informix.so /usr/lib64/php/modules/
$ ( cat << EOF
 ) > > /etc/php.d/informix.ini
Text 4:  Install and register PDO_INFORMIX
Now we can try to start/restart the Apache service and see if our PDO module is available: service httpd restart.  But it won't work. The loading of the Informix SDK by Apache will be blocked by SELinux's security policy.

Step#3 : Provision SELinux

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib64/php/modules/pdo_informix.so' - libifcli.so: failed to map segment from shared object: Permission denied in Unknown on line 0
The message in /var/log/httpd/error_log indicating that loading the library failed with a "permission denied"; regardless of what you set the permissions too.
Text 4: SELinux blocking the loading of libifcli.so
The solution is not to disable SELinux; SELinux is your over-protective big brother.  Maybe annoying to have around sometimes, but worth it for those time when you need to take a short cut through a dark musty alley.  The correct solution is just to label the required library as a known and approved shared library.
$ chcon -t lib_t /opt/informix/lib/cli/libifcli.so
Text 5: Applying the appropriate label to libifcli.so
Step#3 : Get coffee

Restarting Apache know and you should see the pdo_informix driver available in phpinfo() output.  Also double check that INFORMIXDIR, INFORMIXSERVER, and DBDATE appear in the "Enviroment" section of phpinfo; without these variables the PDO driver will not be able to find your informix instance.

From here on out it is pretty much the web developer's problem.


Identifying The Hottest Tables (Informix)

Yesterday I posted about how to identify the hottest table in a PostgreSQL database.  Pretty much the same functionality is available for administrators of Informix databases as well; this kind of information is found in the "sysmaster" database which is the database engine's own database.
        TRIM(dbsname) || ':' || TRIM(tabname) AS relation_name,
        isreads AS records_read,
        pagreads AS page_reads,
        iswrites AS records_inserted
        bufwrites AS buffered_writes
FROM sysmaster:sysptprof
ORDER BY isreads DESC;
Text 1: List the basic read and write statistics for objects in the engine's databases.
This will list a record for every object in the database including indexes; the ratio of ISAM operations vs. buffer page operations can give you a hint as to the effectiveness of your server's configuration. If the ratio is very low for busy object your buffer pool is possibly too small.
If you are interested in the counts of various query operations the sysptprof table also provides the following values:
  • isrwrite - The number of records updated.
  • isrdelete - The number of records deleted.
These counters will reset whenever the database server is restarted. In versions 11.7 and later of the Informix persistent values are available from the sysmaster database.
Many more sysmaster queries can be found in the Informix Wiki.


Identifying The Hottest Tables (PostgreSQL)

In recent versions of PostgreSQL there is a magical view called pg_stat_user_tables which provides per table information on usage; there is one row per table and eight counters per row.
relname - The name of the table.
seq_scan - The number of sequential scans that have been performed on the table. A sequential scan is a read of the table from beginning to end, either because the table is very small or no indexes were available that could satisfy the filter criteria in an efficient way. Sequential scans are probably the most expensive operation the database server performs, some are however unavoidable. If proper indexing cannot resolve the need to sequentially scan a table it is imperative that the PostrgeSQL configuration provide enough resources to maintain a high cache rate.
seq_tup_read - The number of rows processed through sequential scans. This is not the number of records returned to the applications as results but the number of records processed in order to create the result set, which is probably a significant subset of this number. For example, if a query returns ten records but requires a sequential scan of the table then this value will increase by the number of records in the table, not by ten.
idx_scans - The number of indexes scans of the tables.
idx_tup_fetch - The number of rows processes through indexed scans. As with seq_tup_read this is not the count of records returned as the results of queries but those evaluated for queries due to index entries.
seq_tup_read - The number of records processed in order to create the result set of a query, not the number of records returned to the applications.
n_tup_ins - The number of rows inserted into the table.
n_tup_upd - The number of rows updated.
n_tup_del - The number of rows deleted.
Using this view an administrator can isolate the busiest tables in the database.
SELECT relname AS table_name,
seq_tup_read, idx_tup_fetch
FROM pg_stat_user_tables
WHERE (seq_tup_read + idx_tup_fetch) > 0
Text 1: Query to return the ten hottest tables with their sequential and index tuple fetch values.
These results will reveal both table usage and the effectiveness of your indexes.  If you have lots of sequential scans occurring then the query engine isn't finding indexes that match the queries being performed.
     table_name                           seq_tup_read      idx_tup_fetch
doc                                       1,423,407,729,074    349,028,985,971
job_history                                        71,378,301     4,213,364,118
job_history_info                               74,454,363      4,207,594,850
date_company_assignment           31,059,671      1,305,469,897
enterprise                                    3,551,311,871     1,083,015,878
date_x                                               12,884,498        982,418,723
object_acl                                  15,942,621,939        137,179,721
job                                               39,956,712,914          46,912,825
project_info                                 1,709,329,011                         23
team                                           1,141,035,688                            0
Text 2: Example results.
In these example results it is apparent that the table doc is one of the hottest objects and while many records are being identified using index entries there is also a very large number of sequential processes occurring.  This may be because either the indexes do not match the queries being performed or the cardinality of the indexed values is too low. Now we know where to look. 
So don't grope about speculating about how to improve database performance or scalability - ask where to look, PostgreSQL wants to help you.  Much more information can be found at the PostgreSQL stats monitoring documentation.

Integrating Postfix And CLAMAV

The clamav-miler is packaged by most distributions in their "clamav" package can be used in conjunction with Postfix to protect your network from malware embedded in SMTP traffic. Integration of CLAMAV and Postfix involves four steps:

  1. Configuration and enabling of the clamd service.
  2. Updating the CLAMAV malware database and enabling the freshclam service
  3. Configuration and enabling of the clamav-milter service. Current versions of the clamav-milter require connectivity to the clamd daemon through either a local UNIX socker or a TCP/IP socket.
  4. Configuration of Postfix to utilize the available clamav-milter service.
Step#1 : Enabling the clamd service
LocalSocket /var/lib/clamav/clamd-socket
LogFacility LOG_MAIL
LogSyslog yes
PidFile /var/lib/clamav/clamd.pid
TCPSocket 3310
User vscan
Text 1: Typical settings overridden from defaults in /etc/clamd.conf

The clamd daemon typically reads its configuration from the /etc/clamd.conf file. Most importantly this file specifies, via the TCPSocket and TCPAddr directives, on what IP port and address the service listens for connections. These directives should be set to values appropriate for the host and which will be reachable by the clamav-milter. If the clamav-milter and the clamd daemon will be running on the same host the clamd service can be configured to listen to the localhost address [] to avoid any potential network firewall and traffic filtering issues.
The clamd.conf file also provides many other tunable values but almost all of these should be appropriate at the distributions defaults.
Once configured the clamd service must be started and enabled for automatic start following the system's boot-up sequence; on RPM based systems this is typically achieved using the service and chkconfig commands.

Step #2 : Enabling the freshclam service

The freshclam service is an instance of the freshclam command line tool started with the “-d” option which runs the command in daemon mode. Whether started from the command-line or running in daemon mode freshclam will read its configuration from the /etc/freshclam.conf file. When running the freshclam daemon will periodically check the CLAMAV project mirrors for new malware signatures and update the local database used by the clamd scanning service. The freshclam daemon should run as the same user context as the clamd service; the typical way to ensure this is to synchronize the values of DatabaseOwner in /etc/freshclam.conf and User in /etc/clamd.conf. The frequency which freshclam will check for new patterns is controlled by the Checks directive – the default is 12 [times a day], this value should be sufficient in most cases. When database update succeeds the freshclam service will notify the clamd service that newer patterns are now available [for this to work the NotifyClamd directive must indicate the correct path to the current clamd configuration file].
DatabaseMirror database.clamav.net
DatabaseOwner vscan
HTTPProxyPort 3128
HTTPProxyServer proxy.example.com
LogFacility LOG_MAIL
LogSyslog yes
NotifyClamd /etc/clamd.conf
OnErrorExecute /usr/local/bin/malware_update_fail.sh
OnUpdateExecute /usr/local/bin/malware_update_ok.sh
PidFile /var/lib/clamav/freshclam.pid
UpdateLogFile /var/log/freshclam.log
Text 2: Example /etc/freshclam.conf file (comments removed)

The most important considerations in configuration of freshclam is if your network configuration requires use of an HTTP proxy server in order to access the CLAMAV mirrors for updates and if you need to configure some form of notification concerning success or failure of the pattern updates – a security focused service like a malware milter doesn't help anyone if it is silently failing in the background.
The HTTPProxyPort and HTTPProxyServer directives allow an HTTP proxy to be specified; freshclam will use this proxy for all mirror requests whether running as a command-line utility or in daemon mode. Should your proxy require a username/password for authentication these can be provided using the additional HTTPProxyUsername and HTTPProxyPassword directives. However it is simpler and more reliable to simply approve the “database.clamav.net” domain and sub-domains on your HTTP proxy service; all mirror requests will be made to those domains.
For notification of successful or failed updates the OnUpdateExecute and OnErrorExecute directives are used respectively. Whatever command is specified here will execute in the security context of the DatabaseOwner. A useful approach is to enable the log file via the UpdateLogFile directive and have the tail-end of that file mailed to a responsible party such as a help-desk or system-administrator for periodic verification that the service is operational.

tail -25  /var/log/freshclam.log \
 | mail -s "[NOTICE] Malware Database Update Successful" \
    -r milter@example.com helpdesk@example.com
Text 3: A simple example script that might be used for OnUpdateExecute
The proper operation of freshclam can be tested by simply executing the freshclam utility on the command-line; it should check the mirrors and download any new patterns without an error message. Once configured and tested the freshclam service must be started and enabled for automatic start following the system's boot-up sequence.

Step #3 : Enabling the clamav-milter service
ClamdSocket tcp:
LogFacility LOG_MAIL
LogSyslog yes
MilterSocket inet:32767@
OnInfected Reject
PidFile /var/lib/clamav/clamav-milter.pid
ReportHostname mail.example.com
User vscan
VirusAction /usr/local/bin/virus_notify.sh
Text 4: Example clamav-milter.conf file (with comments removed)

Once the clamd scanning service is running and the freshclam service is maintaining the malware signatures the clamav-milter must be configured and started in order to connect the scanning service into Postfix's SMTP processing. The milter service is typically loads its configuration from the /etc/clamav-milter.conf file.
The service must be informed via the ClamdSocket directive where to find the clamd scanning service and via MilterSocket where to listen for connections from Postfix. The MitlerSocket directive is “inet:port@ip-address”. VirusAction and OnInfected directives can be used to control the behavior of the service when malware is identified; an OnInfected value of Quarantine will cause Postfix to hold the infected message in it's hold queue while a value of Reject will bounce the message with an SMTP error. Especially when used in Reject mode defining an appropriate VirusAction to notify the intended recipient of the message that a message has been discarded is important. The script named by VirusAction is executed in the security context of the scanning service and is provided seven parameters:
  1. Virus name-space
  2. Message queue id
  3. The sender's e-mail addres
  4. The e-mail address of the intended recipient.
  5. The subject of the message-id
  6. The message's Message-ID
  7. The date of the message.
Once configured the clamav-milter service must be started and set to automatically restart upon completion of system boot-up.

# Parameters:
#   virus name, queue id, sender, destination, subject, message id, message date

 echo "";
 echo "   A message containing malware has been discarded.";
 echo "";
 echo "   Malware:     $1";
 echo "   Sender:      $3";
 echo "   Destination: $4";
 echo "   Subject:     $5";
 echo "   Message-ID:  $6";
 echo "   Date:        $7";
 echo "   Queue-ID:    $2";
 echo "";
) | \
 mail -s '[ALERT] Infected Messages Discarded' \
  -r milter@example.com -c helpdesk@example.com $4
Text 5: A sample script for use as the VirusAction. This script notifies the intended recipient and help-desk that a message was identified as malware and discarded.
Connecting the Postfix service to clamav-milter

In order to integrate the scanning into Postfix the milter is configured in the main.cf file as an smtpd_milter. The default action of the milter should be set to “accept” so if for any reason the milter is unresponsive messages will still be delivered. As when connecting the other components it is important to verify that the Postfix service can reach the specified service [traffic is permitted by firewall's etc...].
smtpd_milters = inet:milter.example.com:32767
milter_default_action = accept
Text 6: Configuration directives from Postfix's main.cf

Upon modification of the main.cf file the Postfix service should be restarted.
Once configured the malware filtering service should be tested; this can be accomplished by acquiring a copy of the EICAR diagnostic virus and verifying that messages with this content attached are rejected and that the end-user's are notified of the rejection [according the clamav-milter's defined VirusAction].

clamd[11973]: instream( Eicar-Test-Signature FOUND
Text 7: Example clamd log message for identified malware.

When malware is detected a message will be logged by clamd via syslog regarding the event; this will typically be logged under the “mail” service. Depending on the distribution messages logged as mail will be written to either /var/log/mail or /var/log/maillog [at least with the default syslog configuration].