Sequestering E-Mail

When testing applications one of the concerns is always that their actions don't effect the real-world. One aspect of that this is sending e-mail; the last thing you want is the application you are testing to send a paid-in-full customer a flurry of e-mails that he owes you a zillion dollars. A simple, and reliable,  method to avoid this is to adjust the Postfix server on the host used for testing to bury all mail in a shared folder.  This way:
  • You don't need to make any changes to the application between production and testing.
  • You can see the message content exactly as it would ordinarily have been delivered.
To accomplish this you can use Postfix's generic address rewriting feature;  generic address rewriting processes addresses of messages sent [vs. received as is the more typical case for address rewriting] by the service.  For this example we'll rewrite every address to shared+myfolder@example.com using a regular expression.


Create the regular expression map.  Maps are how Postfix handles all rewriting; a match for the input address is looked for in the left hand [key] column and rewritten in the form specified by the right hand [value] column.
echo "/(.)/           shared+myfolder@example.com" > /etc/postfix/generic.regexp
Text 1: Create the new regular expression map.

Configure Postfix to use the new map for generic address rewriting.
postconf -e smtp_generic_maps=regexp:/etc/postfix/generic.regexp
Text 2: Enable generic rewriting using the regular expression map.


Tell Postfix to reload its configuration.
postfix reload
Text 3: Signal the running Postfix service to reload its configuration.

Now any mail, to any address, sent via the hosts' Postfix service, will be driven not to the original address but to the shared "myfolder" folder.


Complex Queries With SQLAlchemy (Example#1)

There are lots of examples of how to use SQLAlchemy to provide your Python application with a first-rate ORM. But most of these examples tend to model very trivial queries;  but the real power of SQLAlchemy, unlike many ORM solutions, is that it doesn't hide / bury the power of the RDBMS - and if you aren't going to use that power why bother with an RDBMS at all [Aren't NoSQL solutions the IT fad of the year? You could be so hip!].  So in this post I'll provide a useful non-trivial query and how to perform the a complex query - only better - using SQLalchemy.
So first, a query:
SELECT process.process_id, op1.value_int, op2.value_string, op3.value_string
FROM process
  INNER JOIN route ON ( route.route_id = process.route_id )
  LEFT OUTER JOIN obj_property op1
    ON ( op1.obj_id = route.route_id AND
         op1.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op1.value_key = 'expireDays' )
  LEFT OUTER JOIN obj_property op2
    ON ( op2.obj_id = route.route_id AND
         op2.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op2.value_key = 'preserveAfterCompletion' )
  LEFT OUTER JOIN obj_property op3
    ON ( op3.obj_id = route.route_id AND
         op3.namespace_prefix = 'http://www.opengroupware.us/oie' AND
         op3.value_key = 'archiveAfterExpiration' )
WHERE process.db_status != 'archived'
  AND process.state IN ( 'C', 'F', 'Z' )
  AND process.status != 'archived';
Text 1: A sophisticated query that exploits the power of the PostgreSQL database engine.

This query returns the process_id value from the process table and then some values from multiple records from the table obj_property - if those records exist - correlated via an intermediate table route.  Complex, but also fast!  
Aside:With the correct DDL when creating the database PostgreSQL also enforces the integrity of the relations, maintains usage statistics, and provides multiple levels of atomicity.  I certainly wouldn't want to have to do all that myself.
So how to model such a query in SQLAlchemy?  First realize that the result is actually going to be better than what straight SQL / DBAPI would give us.  Instead of some values the query will return real objects, the advantage of this is the amount of code eliminated by having to do things based on ids or primary keys.  This example assumes that the ORM entities Process, Route, and ObjectProperty have already been described - there is lots of documentation about how to declare the relations between your tables and your objects using the declarative style.
from sqlalchemy.orm   import  aliased
from sqlalchemy       import and_, or_

db = ctx.db_session()

op1 = aliased(ObjectProperty)
op2 = aliased(ObjectProperty)
op3 = aliased(ObjectProperty)

q = db.query( Process, op1, op2, op3 ).\
       join( Route, Route.object_id == Process.route_id ).\
       outerjoin( op1, and_( op1.parent_id == Route.object_id,
                             op1.name=='expireDays' ), ).\
       outerjoin( op2, and_( op2.parent_id == Route.object_id,
                             op2.name=='preserveAfterCompletion' ), ).\
       outerjoin( op3, and_( op3.parent_id == Route.object_id,
                             op3.name=='archiveAfterExpiration' ), ).\
       filter( and_( Process.state.in_( [ 'C', 'F', 'Z' ] ),
                     Process.status != 'archived' ) )
Text 2: The same query as above, only expressed via the SQLAlchemy ORM.  But instead of returning values it returns live objects.
The SQLAlchemy aliased method declares multiple references to ObjectProperty that can be used independently: op1, op2, and op3.  The other advanced technique is to use the outerjoin method to relate the need for a LEFT OUTER join.
The results of this query will be tuples of four elements; the first being a Process object and the second, third, and fourth will either be ObjectProperty objects if the concomitant outer join identified a record or None if no record matched the join. The lovely upside of this is that the query results can be processed using a straight forward for-each construct:
for process, expire_days, preserve_after, archive_after in q.all():
   if expire_days:
Text 3: Iterate over the query results; the first step depends if the op1 is an object (a record matched the first outer join).
Personally I find the ORM code to be easier to visually parse than the native SQL. Especially if you need to build the query dynamically or modify it based on the applications needs - since q is an object additional filter and join conditions can continue to be added.  Imagine trying to do that with straight SQL?
q = q.filter(Process.owner_id == 10100)
q = q.limit(150)
Text 4: Add one more filter expression to the queries WHERE clause and limit the query to 150 results.
Another advantage to this method is that SQLAlchemy can adapt it's dialect to the specific back-end if, for example, you are stuck using a database other that PostgreSQL.  Without such an adaptive layer using anything other than the most trivial queries becomes daunting do to slight but important differences in how various engines express joins and nested queries.


Installing PDO_INFORMIX on CentOS6

Step#1 : Install the Informix SDK / Client

This is as simple as copying the files to /opt/informix or using one of the various install methods provided.  But beyond that it is necessarily to initialize the required environment variables.  The simplest way to set the environment variables is to create an informix.sh profile script in /etc/profile.d - these scripts are executed by /etc/profile whenever a session is created [such as when a user logs in].  Additionally you'll need to set these same variables in /etc/sysconfig/httpd so that they are set in Apache's environment when started by the system start-up scripts.
$ (cat << EOF
export INFORMIXDIR=/opt/informix
export DBDATE='Y4MD-'
) > /etc/profile.d/informix.sh
Text 1: Creating /etc/profile.d/informix.sh
YOURINSTANCENAME needs to be defined in /opt/informix/etc/sqlhosts.  Your method of installing the SDK may or may not have set that up for you.

The system library path must also be extended to include the directories containing the SDK's libraries.
$ ( cat << EOF
 ) > /etc/ld.so.conf.d/informix.conf
Text 2: Extending the system's library path
 If the library path is not configured correctly applications, included httpd, will not be able to load the Informix libraries.  At this point the library cache can be refreshed by executing the /sbin/ldconfig command.  Once that has been performed either log out and back into the server, or just reboot the server, to verify that upon logging in you have the INFORMIXDIR, INFORMIXSERVER, and DBDATE variables in your enviroment.

Step#2 : Build the Informix PDO driver.

In order to build PHP PECL modules you must have php-devel, make, and gcc installed on the server.
$ pecl download PDO_INFORMIX-1.2.6
$ tar xzf PDO_INFORMIX-1.2.6.tgz
$ cd PDO_INFORMIX-1.2.6
$ phpize
$ configure
$ make
Text 3: Building PDO_INFORMIX
If your Informix SDK is installed correctly and you've properly initialized the environment everything should be found automatically and build without complaint.  Now move the PDO driver into place and inform the PHP interpreter that it needs to load the library.  Here we perform a dirty trick of first loading the base pdo.so library.  This shouldn't be necessary and PHP will grumble about it upon initialization, but it works around some wackiness regarding PDO versions.  Without this line pdo_informix.so will refuse to load because PDO isn't loaded yet because the need for PDO isn't automatically discovered.
$ cp /tmp/PDO_INFORMIX-1.2.6/modules/pdo_informix.so /usr/lib64/php/modules/
$ ( cat << EOF
 ) > > /etc/php.d/informix.ini
Text 4:  Install and register PDO_INFORMIX
Now we can try to start/restart the Apache service and see if our PDO module is available: service httpd restart.  But it won't work. The loading of the Informix SDK by Apache will be blocked by SELinux's security policy.

Step#3 : Provision SELinux

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib64/php/modules/pdo_informix.so' - libifcli.so: failed to map segment from shared object: Permission denied in Unknown on line 0
The message in /var/log/httpd/error_log indicating that loading the library failed with a "permission denied"; regardless of what you set the permissions too.
Text 4: SELinux blocking the loading of libifcli.so
The solution is not to disable SELinux; SELinux is your over-protective big brother.  Maybe annoying to have around sometimes, but worth it for those time when you need to take a short cut through a dark musty alley.  The correct solution is just to label the required library as a known and approved shared library.
$ chcon -t lib_t /opt/informix/lib/cli/libifcli.so
Text 5: Applying the appropriate label to libifcli.so
Step#3 : Get coffee

Restarting Apache know and you should see the pdo_informix driver available in phpinfo() output.  Also double check that INFORMIXDIR, INFORMIXSERVER, and DBDATE appear in the "Enviroment" section of phpinfo; without these variables the PDO driver will not be able to find your informix instance.

From here on out it is pretty much the web developer's problem.


Identifying The Hottest Tables (Informix)

Yesterday I posted about how to identify the hottest table in a PostgreSQL database.  Pretty much the same functionality is available for administrators of Informix databases as well; this kind of information is found in the "sysmaster" database which is the database engine's own database.
        TRIM(dbsname) || ':' || TRIM(tabname) AS relation_name,
        isreads AS records_read,
        pagreads AS page_reads,
        iswrites AS records_inserted
        bufwrites AS buffered_writes
FROM sysmaster:sysptprof
ORDER BY isreads DESC;
Text 1: List the basic read and write statistics for objects in the engine's databases.
This will list a record for every object in the database including indexes; the ratio of ISAM operations vs. buffer page operations can give you a hint as to the effectiveness of your server's configuration. If the ratio is very low for busy object your buffer pool is possibly too small.
If you are interested in the counts of various query operations the sysptprof table also provides the following values:
  • isrwrite - The number of records updated.
  • isrdelete - The number of records deleted.
These counters will reset whenever the database server is restarted. In versions 11.7 and later of the Informix persistent values are available from the sysmaster database.
Many more sysmaster queries can be found in the Informix Wiki.


Identifying The Hottest Tables (PostgreSQL)

In recent versions of PostgreSQL there is a magical view called pg_stat_user_tables which provides per table information on usage; there is one row per table and eight counters per row.
relname - The name of the table.
seq_scan - The number of sequential scans that have been performed on the table. A sequential scan is a read of the table from beginning to end, either because the table is very small or no indexes were available that could satisfy the filter criteria in an efficient way. Sequential scans are probably the most expensive operation the database server performs, some are however unavoidable. If proper indexing cannot resolve the need to sequentially scan a table it is imperative that the PostrgeSQL configuration provide enough resources to maintain a high cache rate.
seq_tup_read - The number of rows processed through sequential scans. This is not the number of records returned to the applications as results but the number of records processed in order to create the result set, which is probably a significant subset of this number. For example, if a query returns ten records but requires a sequential scan of the table then this value will increase by the number of records in the table, not by ten.
idx_scans - The number of indexes scans of the tables.
idx_tup_fetch - The number of rows processes through indexed scans. As with seq_tup_read this is not the count of records returned as the results of queries but those evaluated for queries due to index entries.
seq_tup_read - The number of records processed in order to create the result set of a query, not the number of records returned to the applications.
n_tup_ins - The number of rows inserted into the table.
n_tup_upd - The number of rows updated.
n_tup_del - The number of rows deleted.
Using this view an administrator can isolate the busiest tables in the database.
SELECT relname AS table_name,
seq_tup_read, idx_tup_fetch
FROM pg_stat_user_tables
WHERE (seq_tup_read + idx_tup_fetch) > 0
Text 1: Query to return the ten hottest tables with their sequential and index tuple fetch values.
These results will reveal both table usage and the effectiveness of your indexes.  If you have lots of sequential scans occurring then the query engine isn't finding indexes that match the queries being performed.
     table_name                           seq_tup_read      idx_tup_fetch
doc                                       1,423,407,729,074    349,028,985,971
job_history                                        71,378,301     4,213,364,118
job_history_info                               74,454,363      4,207,594,850
date_company_assignment           31,059,671      1,305,469,897
enterprise                                    3,551,311,871     1,083,015,878
date_x                                               12,884,498        982,418,723
object_acl                                  15,942,621,939        137,179,721
job                                               39,956,712,914          46,912,825
project_info                                 1,709,329,011                         23
team                                           1,141,035,688                            0
Text 2: Example results.
In these example results it is apparent that the table doc is one of the hottest objects and while many records are being identified using index entries there is also a very large number of sequential processes occurring.  This may be because either the indexes do not match the queries being performed or the cardinality of the indexed values is too low. Now we know where to look. 
So don't grope about speculating about how to improve database performance or scalability - ask where to look, PostgreSQL wants to help you.  Much more information can be found at the PostgreSQL stats monitoring documentation.

Integrating Postfix And CLAMAV

The clamav-miler is packaged by most distributions in their "clamav" package can be used in conjunction with Postfix to protect your network from malware embedded in SMTP traffic. Integration of CLAMAV and Postfix involves four steps:

  1. Configuration and enabling of the clamd service.
  2. Updating the CLAMAV malware database and enabling the freshclam service
  3. Configuration and enabling of the clamav-milter service. Current versions of the clamav-milter require connectivity to the clamd daemon through either a local UNIX socker or a TCP/IP socket.
  4. Configuration of Postfix to utilize the available clamav-milter service.
Step#1 : Enabling the clamd service
LocalSocket /var/lib/clamav/clamd-socket
LogFacility LOG_MAIL
LogSyslog yes
PidFile /var/lib/clamav/clamd.pid
TCPSocket 3310
User vscan
Text 1: Typical settings overridden from defaults in /etc/clamd.conf

The clamd daemon typically reads its configuration from the /etc/clamd.conf file. Most importantly this file specifies, via the TCPSocket and TCPAddr directives, on what IP port and address the service listens for connections. These directives should be set to values appropriate for the host and which will be reachable by the clamav-milter. If the clamav-milter and the clamd daemon will be running on the same host the clamd service can be configured to listen to the localhost address [] to avoid any potential network firewall and traffic filtering issues.
The clamd.conf file also provides many other tunable values but almost all of these should be appropriate at the distributions defaults.
Once configured the clamd service must be started and enabled for automatic start following the system's boot-up sequence; on RPM based systems this is typically achieved using the service and chkconfig commands.

Step #2 : Enabling the freshclam service

The freshclam service is an instance of the freshclam command line tool started with the “-d” option which runs the command in daemon mode. Whether started from the command-line or running in daemon mode freshclam will read its configuration from the /etc/freshclam.conf file. When running the freshclam daemon will periodically check the CLAMAV project mirrors for new malware signatures and update the local database used by the clamd scanning service. The freshclam daemon should run as the same user context as the clamd service; the typical way to ensure this is to synchronize the values of DatabaseOwner in /etc/freshclam.conf and User in /etc/clamd.conf. The frequency which freshclam will check for new patterns is controlled by the Checks directive – the default is 12 [times a day], this value should be sufficient in most cases. When database update succeeds the freshclam service will notify the clamd service that newer patterns are now available [for this to work the NotifyClamd directive must indicate the correct path to the current clamd configuration file].
DatabaseMirror database.clamav.net
DatabaseOwner vscan
HTTPProxyPort 3128
HTTPProxyServer proxy.example.com
LogFacility LOG_MAIL
LogSyslog yes
NotifyClamd /etc/clamd.conf
OnErrorExecute /usr/local/bin/malware_update_fail.sh
OnUpdateExecute /usr/local/bin/malware_update_ok.sh
PidFile /var/lib/clamav/freshclam.pid
UpdateLogFile /var/log/freshclam.log
Text 2: Example /etc/freshclam.conf file (comments removed)

The most important considerations in configuration of freshclam is if your network configuration requires use of an HTTP proxy server in order to access the CLAMAV mirrors for updates and if you need to configure some form of notification concerning success or failure of the pattern updates – a security focused service like a malware milter doesn't help anyone if it is silently failing in the background.
The HTTPProxyPort and HTTPProxyServer directives allow an HTTP proxy to be specified; freshclam will use this proxy for all mirror requests whether running as a command-line utility or in daemon mode. Should your proxy require a username/password for authentication these can be provided using the additional HTTPProxyUsername and HTTPProxyPassword directives. However it is simpler and more reliable to simply approve the “database.clamav.net” domain and sub-domains on your HTTP proxy service; all mirror requests will be made to those domains.
For notification of successful or failed updates the OnUpdateExecute and OnErrorExecute directives are used respectively. Whatever command is specified here will execute in the security context of the DatabaseOwner. A useful approach is to enable the log file via the UpdateLogFile directive and have the tail-end of that file mailed to a responsible party such as a help-desk or system-administrator for periodic verification that the service is operational.

tail -25  /var/log/freshclam.log \
 | mail -s "[NOTICE] Malware Database Update Successful" \
    -r milter@example.com helpdesk@example.com
Text 3: A simple example script that might be used for OnUpdateExecute
The proper operation of freshclam can be tested by simply executing the freshclam utility on the command-line; it should check the mirrors and download any new patterns without an error message. Once configured and tested the freshclam service must be started and enabled for automatic start following the system's boot-up sequence.

Step #3 : Enabling the clamav-milter service
ClamdSocket tcp:
LogFacility LOG_MAIL
LogSyslog yes
MilterSocket inet:32767@
OnInfected Reject
PidFile /var/lib/clamav/clamav-milter.pid
ReportHostname mail.example.com
User vscan
VirusAction /usr/local/bin/virus_notify.sh
Text 4: Example clamav-milter.conf file (with comments removed)

Once the clamd scanning service is running and the freshclam service is maintaining the malware signatures the clamav-milter must be configured and started in order to connect the scanning service into Postfix's SMTP processing. The milter service is typically loads its configuration from the /etc/clamav-milter.conf file.
The service must be informed via the ClamdSocket directive where to find the clamd scanning service and via MilterSocket where to listen for connections from Postfix. The MitlerSocket directive is “inet:port@ip-address”. VirusAction and OnInfected directives can be used to control the behavior of the service when malware is identified; an OnInfected value of Quarantine will cause Postfix to hold the infected message in it's hold queue while a value of Reject will bounce the message with an SMTP error. Especially when used in Reject mode defining an appropriate VirusAction to notify the intended recipient of the message that a message has been discarded is important. The script named by VirusAction is executed in the security context of the scanning service and is provided seven parameters:
  1. Virus name-space
  2. Message queue id
  3. The sender's e-mail addres
  4. The e-mail address of the intended recipient.
  5. The subject of the message-id
  6. The message's Message-ID
  7. The date of the message.
Once configured the clamav-milter service must be started and set to automatically restart upon completion of system boot-up.

# Parameters:
#   virus name, queue id, sender, destination, subject, message id, message date

 echo "";
 echo "   A message containing malware has been discarded.";
 echo "";
 echo "   Malware:     $1";
 echo "   Sender:      $3";
 echo "   Destination: $4";
 echo "   Subject:     $5";
 echo "   Message-ID:  $6";
 echo "   Date:        $7";
 echo "   Queue-ID:    $2";
 echo "";
) | \
 mail -s '[ALERT] Infected Messages Discarded' \
  -r milter@example.com -c helpdesk@example.com $4
Text 5: A sample script for use as the VirusAction. This script notifies the intended recipient and help-desk that a message was identified as malware and discarded.
Connecting the Postfix service to clamav-milter

In order to integrate the scanning into Postfix the milter is configured in the main.cf file as an smtpd_milter. The default action of the milter should be set to “accept” so if for any reason the milter is unresponsive messages will still be delivered. As when connecting the other components it is important to verify that the Postfix service can reach the specified service [traffic is permitted by firewall's etc...].
smtpd_milters = inet:milter.example.com:32767
milter_default_action = accept
Text 6: Configuration directives from Postfix's main.cf

Upon modification of the main.cf file the Postfix service should be restarted.
Once configured the malware filtering service should be tested; this can be accomplished by acquiring a copy of the EICAR diagnostic virus and verifying that messages with this content attached are rejected and that the end-user's are notified of the rejection [according the clamav-milter's defined VirusAction].

clamd[11973]: instream( Eicar-Test-Signature FOUND
Text 7: Example clamd log message for identified malware.

When malware is detected a message will be logged by clamd via syslog regarding the event; this will typically be logged under the “mail” service. Depending on the distribution messages logged as mail will be written to either /var/log/mail or /var/log/maillog [at least with the default syslog configuration].