2014-03-11

gedit's amazing External Tools

In a few recent conversations I have become aware of an unawareness - an unawareness of the awesome that is gedit's best feature: External Tools.  External Tools allow you to effortlessly link the power of the shell, Python, or whatever into an otherwise already excellent text editor yielding maximum awesome.  External Tools, unlike some similar features in many IDEs is drop-dead simple to use - you do not need to go somewhere and edit files, etc... you can create and use them without ever leaving the gedit UI.
Plugins tab of the Preferences dialog.

To enable External Tools [which is a plugin - as is nearly every feature in gedit] go to the Plugin tab of Preferences dialog and check the box for "External Tools".  External Tools is now active.  Close the dialog and proceed in defining the tools useful to you.
With External Tools enabled there will be a "Manage External Tools" option in the Tools menu.  When in the tools menu not there is also an "External Tools" submenu - every external tool you define will be available in the menu, automatically.  The list of defined tools in that submenu will also include whatever hot-key you may have bound to the tool - as you likely will not remember at first.
Manage External Tools Dialog

Within the Manage External Tools dialog you can start defining what tools are useful to you.  For myself the most useful feature is the ability to perform in-place transformations of the current document; to accomplish this set input to "Current Document" and Output to "Replace Current Document".  With that Input & Output the current document is streamed to your defined tool as standard input and the standard output from the tool replaces the document.  Don't worry - Undo [Ctrl-Z] still works if your tool did not do what you desired.
What are some useful External Tools?  That depends on what type of files and data you deal with on a regular basis.  I have previously written a post about turning a list of value into an set format - that is useful for cut-n-paste into either an SQL tool [for use as an IN clause] or into a Python editor [for x=set(....)].   That provides a simple way to take perhaps hundreds of rows and get them into data very simply. 
Otherwise some tools I find useful are:

Format JSON to be nicely indented

#!/bin/sh
python -m json.tool

Use input/output settings to replace current document.

Open a terminal in the directory of the document


#!/bin/sh
gnome-terminal --working-directory=$GEDIT_CURRENT_DOCUMENT_DIR &


Set the input/ouput for this action to "Nothing"

Remove leading spaces from lines


#!/bin/sh
sed 's/^[[:blank:]]*//'


Use input/output settings to replace current document. 

Remove trailing spaces from lines


#!/bin/sh
sed 's/[[:blank:]]*$//'


Use input/output settings to replace current document.

Keep only unique lines of the file


#!/bin/sh
sort | uniq


Use input/output settings to replace current document. 

Format an XML file with nice indentation


#!/bin/sh
xmllint --format - -


Use input/output settings to replace current document.

2014-03-10

Installation & Initialization of PostGIS

Distribution: CentOS 6.x / RHEL 6.x

If you already have a current version of PostgreSQL server installed on your server from the PGDG repository you should skip these first two steps.

Enable PGDG repository

curl -O http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm
rpm -ivh pgdg-centos93-9.3-1.noarch.rpm


Disable all PostgreSQL packages from the distribution repositories. This involves editing the /etc/yum.repos.d/CentOS-Base.repo file. Add the line "exclude=postgresql*" to both the "[base]" and "[updates]" stanzas. If you skip this step everything will appear to work - but in the future a yum update may break your system.

Install PostrgreSQL Server

yum install postgresql93-server

Once installed you need to initialize and start the PostgreSQL instance
service postgresql-9.3 initdb
service postgresql-9.3 start

If you wish the PostgreSQL instance to start with the system at book use chkconfig to enable it for the current runlevel.
chkconfig postgresql-9.3 on

The default data directory for this instance of PostgreSQL will be "/var/lib/pgsql/9.3/data". Note: that this path is versioned - this prevents the installation of a downlevel or uplevel PostgreSQL package destroying your database if you do so accidentally or forget to follow the appropriate version migration procedures. Most documentation will assume a data directory like "/var/lib/postgresql" [notably unversioned]; simply keep in mind that you always need to contextualize the paths used in documentation to your site's packaging and provisioning.

Enable EPEL Repository

The EPEL repository provides a variety of the dependencies of the PostGIS packages provided by the PGDG repository.
curl -O http://epel.mirror.freedomvoice.com/6/x86_64/epel-release-6-8.noarch.rpm
rpm -Uvh epel-release-6-8.noarch.rpm

Installing PostGIS

The PGDG package form PostGIS should now install without errors.
yum install postgis2_93

If you do not have EPEL successfully enables when you attempt to install the PGDG PostGIS packages you will see dependency errors.
---> Package postgis2_93-client.x86_64 0:2.1.1-1.rhel6 will be installed
--> Processing Dependency: libjson.so.0()(64bit) for package: postgis2_93-client-2.1.1-1.rhel6.x86_64
--> Finished Dependency Resolution
Error: Package: gdal-libs-1.9.2-4.el6.x86_64 (pgdg93)
           Requires: libcfitsio.so.0()(64bit)
Error: Package: gdal-libs-1.9.2-4.el6.x86_64 (pgdg93)
           Requires: libspatialite.so.2()(64bit)
Error: Package: gdal-libs-1.9.2-4.el6.x86_64 (pgdg93)
...

Initializing PostGIS

The template database "template_postgis" is expected to exist by many PostGIS applications; but this database is not created automatically.
su - postgres
createdb -E UTF8 -T template0 template_postgis
-- ... See the following note about enabling plpgsql ...
psql template_postgis
psql -d template_postgis -f /usr/pgsql-9.3/share/contrib/postgis-2.1/postgis.sql
psql -d template_postgis -f /usr/pgsql-9.3/share/contrib/postgis-2.1/spatial_ref_sys.sql 

Using the PGDG packages the PostgreSQL plpgsql embedded language, frequently used to develop stored procedures, is enabled in the template0 database from which the template_postgis database is derived. If you are attempting to use other PostgreSQL packages, or have built PostgreSQL from source [are you crazy?], you will need to ensure that this language is enabled in your template_postgis database before importing the scheme - to do so run the following command immediately after the "createdb" command. If you see the error stating the language is already enabled you are good to go, otherwise you should see a message stating the language was enabled. If creating the language fails for any other reason than already being enabled you must resolve that issue before proceeding to install your GIS applications.
$ createlang -d template_postgis plpgsql
createlang: language "plpgsql" is already installed in database "template_postgis"

Celebrate
PostGIS is now enabled in your PostgreSQL instance and you can use and/or develop exciting new GIS & geographic applications.

2013-12-26

Translating...domain server....wait...be-frustrated

One of the most annoying features of Cisco's IOS is the assuming that anything you type which is not a command is a hostname.  So...

Router#dev
Translating "dev"...domain server (255.255.255.255)
 (255.255.255.255)
Translating "dev"...domain server (255.255.255.255)
....

... and when you are configuring a router which either (a) does not have DNS, (b) is on a network that is down, or (c) is on the workbench and not actually connected to a network - you get to enjoy the long pause of a DNS timeout.

Argh!

The solution is simple:

Router#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Router(config)#no ip domain-lookup
Router(config)#exit

The "no ip domain-lookup" disables this feature. Now at least it fails instantly:
Router#dev
Translating "dev"
Translating "dev"
% Bad IP address or host name
% Unknown command or computer name, or unable to find computer address
The downside is that the router will no longer perform DNS look-ups to translate host names to addresses.  That is bad for some specific use-cases [a VPN terminator is one possible example] - but generally that is not something that matters for a router.  Once a router is configured you can always turn domain-lookup back on.

Note to self: everytime you pull a router out of storage - do this first.

2013-05-01

Getting a native connection from the ORM

The SQLAlchemy ORM provides a powerful abstraction from the database allowing operations to be performed on objects and queries to be constructed based on object attributes rather than dealing with attribute-to-field correspondence.  But there are still some operations for which you need to talk directly to the underlying database. 
In 'normal' mode SQLAlchemy maintains a connection pool and releases connections from the pool to the application as needed, tracks them, and tries to keep everything tidy.  When the need arises for a 'native' DBAPI connection [for this example I'm using PostgreSQL] it is possible to explicitly check a connection out from the pool - after which it is yours to deal with, to track the isolation, close, etc..
Assuming the database connection has already been created and bound to the session factory with something like:

from sqlalchemy import create_engine, Session
...
engine = create_engine( orm_dsn, **{ 'echo': orm_logging } )
Session.configure( bind=engine )

 - then sessions can be taken from the connection pool simply by calling "db = Session( )".  When the "db.close( )" is performed that session goes back into the pool.
If a connection is to be used outside of all the mechanics that SQLAlchemy provides it can be checked out, as mentioned before, using a rather verbose call:

conn = Session( ).connection( ).connection.checkout( ).connection

Now "conn" is a 'native' DBAPI connection.  You can perform low level operations such as setting the isolation level and creating cursors:

conn.set_isolation_level( psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT )
curs = conn.cursor( )
curs.execute( .... )

This is not a frequent need, but for very database backend specific it is the simplest approach [otherwise you can extend SQLAlchemy...]. One use case is using PostreSQL's asyncronous notify channels to capture database events; for this purpose an application needs to select on the DBAPI connection, there's no need for an ORM in the mix when you are just capturing events.

2012-11-11

Interrogating the Infallible Secretary

Numerous applications in GNOME exhibit magically wonderful behavior, like they remember everything and know what you want.  One example of such an application is the excellent PDF reader Evince; every time I open a PDF it opens to the same page as the last time I looked at that document.  This means if I get my morning coffee, switch to the GNOME Activity Journal, see that it was the document "Informix_Python_Carsten_Haese.pdf" that I was reading at 16:59 the previous day, I click on that document and it opens to the same slide it was displaying when I closed it the previous day.  And GNOME applications do this kind of thing all day, like an infalliable secretary.

This reminds me of the now very cliche Niven's law: "Any sufficiently advanced technology is indistinguishable from magic" [no, that is not a quote from Arthur C. Clarke, as commonly attributed].  I could not longer resist looking behind the curtain, so I set off to discover how my infallible secretary accomplishes this.  The answer is "GVFS" - the GNOME Virtual Filesystem which layers an extensible meta-data system on top of application I/O.

GVFS provides a command line tool, of course [this is UNIX!], that allows the savy user to see into the filing cabinet of their infallible secretary.

$ gvfs-info -a "metadata::*" file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf
attributes:
  metadata::evince::page: 7
  metadata::evince::dual-page-odd-left: 0
  metadata::evince::zoom: 1
  metadata::evince::window_height: 594
  metadata::evince::sizing_mode: fit-width
  metadata::evince::sidebar_page: links
  metadata::evince::window_width: 1598
  metadata::evince::sidebar_size: 249
  metadata::evince::dual-page: 0
  metadata::evince::window_x: 1
  metadata::evince::window_y: 91
  metadata::evince::show_toolbar: 1
  metadata::evince::window_maximized: 0
  metadata::evince::inverted-colors: 0
  metadata::evince::continuous: 1
  metadata::evince::sidebar_visibility: 1
  metadata::evince::fullscreen: 0
And there it is - "metadata::evince::page: 7" - how Evince takes me back to the same page I left from.  As well as lots of other information.

Command line tools are indespensible, but the immediate next question.... can I access this data from Python?  Answer - of course!  With the GIO module the data is there ready to be explored.
>>> import gio
>>> handle = gio.File('/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf')
>>> meta = handle.query_info('metadata')
>>> meta.has_attribute('metadata::evince::page')
True
>>> meta.get_attribute_string('metadata::evince::page')
'7'

Now knowing that, the System Administrator part of my psyche needs to know: where is all this metadata?  His first guess what that it was being stored in the filesystems using extended attribites:
getfattr --dump "/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
Bzzzzt! Nothing there.  Enough with guessing, every System Administrator worth his or her salt knows that guessing [ugh!] is for PC jockeys and web developers.  The correct approach is to drag the appropriate application out to the sheds and ... make it talk.  It turns out that gvfs-info doesn't put up much of a fight - one glimpse of strace and he's confessing everything.

$ strace -e trace=open gvfs-info -a "metadata::*" "file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
...
open("/home/awilliam/.local/share/gvfs-metadata/home", O_RDONLY) = 6
Yes, there it is.

$ file  /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home: data
$ fuser -u /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home:  2517m(awilliam)  2678m(awilliam) 26624m(awilliam)
$ ps -p 2517
  PID TTY          TIME CMD
 2517 ?        00:08:13 nautilus
$ ps -p 2678
  PID TTY          TIME CMD
 2678 ?        00:01:56 gvfsd-metadata
$ ps -p 26624
  PID TTY          TIME CMD
26624 ?        00:00:17 gedit
A memory-mapped database file [see the "m" after the PID in the output of fuser - that means memory mapped].  And PIDs or of the applications currently performing operations via GIO.   The use of memory mapped files means that read operations require no IPC [inter-process communications] or even syscalls for multiple applications to see the same state.  Now I had to do a little digging for GVFS documentation to understand how they manage concurrency - as multiple writers to memory mapped files is a dicey business [and GIO applications feel rock solid].  The answer is the gvfsd-metadata process.  Applications using GIO push all there writes / changes to that process over D-BUS; so only one process writes, everyone else reads through the memory mapped file.  Concurrency issues are elegantly side-stepped.  Brilliant. 

Now that the geek in me is sated I can go back to letting GNOME and its infallible secretary facilitate my productivity.

2012-10-26

Setting a course for UTC

Timezones and daylight savings times are confusing; it is much more complicated that offset-from-UTC.  There are times that occur more that once a year [yep, it hurts] as well as times that are between two valid times but never happen.  It probably requires a Tardis to understand why anyone would want it to work this way.  But, sadly, it does work this way. 
If you stick to the rules, you can safely manage times... so long as those times are all localized.   Naive times, times that are not localized, are the enemy.
Unfortunately there is a lot of code out there, even in important and widely used modules, that uses nieve datetimes.  If you try to use a virtuously localized datetime object with those modules you will likely encounter the dreaded "Cannot compare naive and localized values".
One hack is to make sure the time is localized to the system's timezone, then make it naive, call the module's function, and then re-localize the result (again). Tedious and very prone to error.  The one real problem with this hack is that on most systems the Python process has not @*^$&*@* clue what time zone it is in.  Don't believe me? Try it:
>>> import time
>>> time.tzname
('EST', 'EDT')
Eh, that's a tuple.  And while "EST" is a time zone "EDT" is not a timezone.  Yes, I can determine that I am in daylight savings time locally using time.daylight; but I can't localize a datetime to a daylight timezone because daylight is an attribute of a timezone, not a timezone itself.  That is true regardless of what time.tzname says.  And the "EST" doesn't have daylight savings time, "US/Eastern" does.  "EST" is "US/Eastern" when not it daylight savings time. Gnarly.
But I want to use datetime obejcts reliably and safely with modules that require naive datetime objects....  The answer is to make the timezone known!  I cannot reliably get it from the system but I can make it what I want, and what I want is UTC!  Then my naive datetime objects do not have to be concerned with daylight savings time.  I can just localize them to UTC and subsequently convert them to whatever timezone the user needs to see.  This is accomplished using a combination of the os and time modules.  Early on in my Python application I move myself to UTC.  Here is an example that demonstrates the ugliness of naive times in an unknown timezone, and the beauty of the process being in UTC.
from datetime import datetime
import pytz, time, os

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format(datetime.utcnow( ) ) )
# What timezone is local?  Problem is, most of the time we just do not know.
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )

#Change to UTC
os.environ[ 'TZ' ] = 'UTC'
time.tzset( )

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format( datetime.utcnow( ) ) )
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )
And the output:
NOW: 2012-10-26 07:03:31.285486
UTCNOW: 2012-10-26 11:03:31.285570
LOCALIZEDNOW: 2012-10-26 07:03:31.285632+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285705+00:00
NOW: 2012-10-26 11:03:31.285787
UTCNOW: 2012-10-26 11:03:31.285812
LOCALIZEDNOW: 2012-10-26 11:03:31.285848+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285875+00:00

Now the danger of somehow getting a naive datetime into the mix is completely avoided - I can always safely localize a naive time to UTC.

2012-10-04

Simple NAT With Cisco IOS

Performing NAT with any variety of a LINUX box is possibly one of the most redundantly documented applications on the Web.  Attempting to do the same with a Cisco IOS router is not documented in so straight-forward a way.
This little snippet shows the configuration for an IOS router where vLAN 13 is a public network and vLAN 12 is a private network.  The router has a public IP address of A.B.C.D [netmask: E.F.G.H] and the gateway address is A.B.C.I.  The private network is a 10.0.0.0/8 with multiple /24 segments which all route to this NAT gateway.
interface FastEthernet0/0.12
 encapsulation dot1Q 12
 ip address 10.66.x.y 255.255.255.0
 ip nat inside
!        
interface FastEthernet0/0.13
 encapsulation dot1Q 13
 ip address A.B.C.D E.F.G.H
 ip nat outside
!        
ip nat inside source list 1 interface FastEthernet0/0.13 overload
ip classless
ip route 0.0.0.0 0.0.0.0 A.B.C.I
access-list 1 permit 10.0.0.0 0.255.255.255
The access-list 1 matches all 10.0.0.0/8 traffic and is used by the ip nat policy which causes the NATing of all matching traffic with the source IP address of the vLAN 13 interface.  The template for the ip nat inside source command is:
ip nat inside source {list {access-list-number | access-list-name} | route-map name} {interface type number | pool name} [mapping-id map-name | vrf name] [overload]
The "overload" option is what enables the routers use of a single address to NAT many local addresses; this corresponds to the default behavior of most iptables configuration tools (does iptables have a "default" behavior?)

One nice feature of using a Cisco for NAT, rather than a host (besides the simplicity of no moving parts) is the very concise reporting provided by "show ip nat translations" and "show ip nat statistics" commands.
Router#show ip nat statistics 
Total active translations: 208 (0 static, 208 dynamic; 208 extended)
Outside interfaces:
  FastEthernet0/0.13
Inside interfaces:
  FastEthernet0/0.12
Hits: 4890142  Misses: 52844
Expired translations: 52640
Dynamic mappings:
-- Inside Source
[Id: 3] access-list 1 interface FastEthernet0/0.13 refcount 208

Similar to "iptables -t nat -L -v" in LINUX.
Additional, and much more technical, documentation for this feature can be found here.