2014-03-11

gedit's amazing External Tools

In a few recent conversations I have become aware of an unawareness - an unawareness of the awesome that is gedit's best feature: External Tools.  External Tools allow you to effortlessly link the power of the shell, Python, or whatever into an otherwise already excellent text editor yielding maximum awesome.  External Tools, unlike some similar features in many IDEs is drop-dead simple to use - you do not need to go somewhere and edit files, etc... you can create and use them without ever leaving the gedit UI.
Plugins tab of the Preferences dialog.

To enable External Tools [which is a plugin - as is nearly every feature in gedit] go to the Plugin tab of Preferences dialog and check the box for "External Tools".  External Tools is now active.  Close the dialog and proceed in defining the tools useful to you.
With External Tools enabled there will be a "Manage External Tools" option in the Tools menu.  When in the tools menu not there is also an "External Tools" submenu - every external tool you define will be available in the menu, automatically.  The list of defined tools in that submenu will also include whatever hot-key you may have bound to the tool - as you likely will not remember at first.
Manage External Tools Dialog

Within the Manage External Tools dialog you can start defining what tools are useful to you.  For myself the most useful feature is the ability to perform in-place transformations of the current document; to accomplish this set input to "Current Document" and Output to "Replace Current Document".  With that Input & Output the current document is streamed to your defined tool as standard input and the standard output from the tool replaces the document.  Don't worry - Undo [Ctrl-Z] still works if your tool did not do what you desired.
What are some useful External Tools?  That depends on what type of files and data you deal with on a regular basis.  I have previously written a post about turning a list of value into an set format - that is useful for cut-n-paste into either an SQL tool [for use as an IN clause] or into a Python editor [for x=set(....)].   That provides a simple way to take perhaps hundreds of rows and get them into data very simply. 
Otherwise some tools I find useful are:

Format JSON to be nicely indented

#!/bin/sh
python -m json.tool

Use input/output settings to replace current document.

Open a terminal in the directory of the document


#!/bin/sh
gnome-terminal --working-directory=$GEDIT_CURRENT_DOCUMENT_DIR &


Set the input/ouput for this action to "Nothing"

Remove leading spaces from lines


#!/bin/sh
sed 's/^[[:blank:]]*//'


Use input/output settings to replace current document. 

Remove trailing spaces from lines


#!/bin/sh
sed 's/[[:blank:]]*$//'


Use input/output settings to replace current document.

Keep only unique lines of the file


#!/bin/sh
sort | uniq


Use input/output settings to replace current document. 

Format an XML file with nice indentation


#!/bin/sh
xmllint --format - -


Use input/output settings to replace current document.

2013-05-01

Getting a native connection from the ORM

The SQLAlchemy ORM provides a powerful abstraction from the database allowing operations to be performed on objects and queries to be constructed based on object attributes rather than dealing with attribute-to-field correspondence.  But there are still some operations for which you need to talk directly to the underlying database. 
In 'normal' mode SQLAlchemy maintains a connection pool and releases connections from the pool to the application as needed, tracks them, and tries to keep everything tidy.  When the need arises for a 'native' DBAPI connection [for this example I'm using PostgreSQL] it is possible to explicitly check a connection out from the pool - after which it is yours to deal with, to track the isolation, close, etc..
Assuming the database connection has already been created and bound to the session factory with something like:

from sqlalchemy import create_engine, Session
...
engine = create_engine( orm_dsn, **{ 'echo': orm_logging } )
Session.configure( bind=engine )

 - then sessions can be taken from the connection pool simply by calling "db = Session( )".  When the "db.close( )" is performed that session goes back into the pool.
If a connection is to be used outside of all the mechanics that SQLAlchemy provides it can be checked out, as mentioned before, using a rather verbose call:

conn = Session( ).connection( ).connection.checkout( ).connection

Now "conn" is a 'native' DBAPI connection.  You can perform low level operations such as setting the isolation level and creating cursors:

conn.set_isolation_level( psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT )
curs = conn.cursor( )
curs.execute( .... )

This is not a frequent need, but for very database backend specific it is the simplest approach [otherwise you can extend SQLAlchemy...]. One use case is using PostreSQL's asyncronous notify channels to capture database events; for this purpose an application needs to select on the DBAPI connection, there's no need for an ORM in the mix when you are just capturing events.

2012-11-11

Interrogating the Infallible Secretary

Numerous applications in GNOME exhibit magically wonderful behavior, like they remember everything and know what you want.  One example of such an application is the excellent PDF reader Evince; every time I open a PDF it opens to the same page as the last time I looked at that document.  This means if I get my morning coffee, switch to the GNOME Activity Journal, see that it was the document "Informix_Python_Carsten_Haese.pdf" that I was reading at 16:59 the previous day, I click on that document and it opens to the same slide it was displaying when I closed it the previous day.  And GNOME applications do this kind of thing all day, like an infalliable secretary.

This reminds me of the now very cliche Niven's law: "Any sufficiently advanced technology is indistinguishable from magic" [no, that is not a quote from Arthur C. Clarke, as commonly attributed].  I could not longer resist looking behind the curtain, so I set off to discover how my infallible secretary accomplishes this.  The answer is "GVFS" - the GNOME Virtual Filesystem which layers an extensible meta-data system on top of application I/O.

GVFS provides a command line tool, of course [this is UNIX!], that allows the savy user to see into the filing cabinet of their infallible secretary.

$ gvfs-info -a "metadata::*" file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf
attributes:
  metadata::evince::page: 7
  metadata::evince::dual-page-odd-left: 0
  metadata::evince::zoom: 1
  metadata::evince::window_height: 594
  metadata::evince::sizing_mode: fit-width
  metadata::evince::sidebar_page: links
  metadata::evince::window_width: 1598
  metadata::evince::sidebar_size: 249
  metadata::evince::dual-page: 0
  metadata::evince::window_x: 1
  metadata::evince::window_y: 91
  metadata::evince::show_toolbar: 1
  metadata::evince::window_maximized: 0
  metadata::evince::inverted-colors: 0
  metadata::evince::continuous: 1
  metadata::evince::sidebar_visibility: 1
  metadata::evince::fullscreen: 0
And there it is - "metadata::evince::page: 7" - how Evince takes me back to the same page I left from.  As well as lots of other information.

Command line tools are indespensible, but the immediate next question.... can I access this data from Python?  Answer - of course!  With the GIO module the data is there ready to be explored.
>>> import gio
>>> handle = gio.File('/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf')
>>> meta = handle.query_info('metadata')
>>> meta.has_attribute('metadata::evince::page')
True
>>> meta.get_attribute_string('metadata::evince::page')
'7'

Now knowing that, the System Administrator part of my psyche needs to know: where is all this metadata?  His first guess what that it was being stored in the filesystems using extended attribites:
getfattr --dump "/home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
Bzzzzt! Nothing there.  Enough with guessing, every System Administrator worth his or her salt knows that guessing [ugh!] is for PC jockeys and web developers.  The correct approach is to drag the appropriate application out to the sheds and ... make it talk.  It turns out that gvfs-info doesn't put up much of a fight - one glimpse of strace and he's confessing everything.

$ strace -e trace=open gvfs-info -a "metadata::*" "file:///home/awilliam/Documents/Informix_Python_Carsten_Haese.pdf"
...
open("/home/awilliam/.local/share/gvfs-metadata/home", O_RDONLY) = 6
Yes, there it is.

$ file  /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home: data
$ fuser -u /home/awilliam/.local/share/gvfs-metadata/home
/home/awilliam/.local/share/gvfs-metadata/home:  2517m(awilliam)  2678m(awilliam) 26624m(awilliam)
$ ps -p 2517
  PID TTY          TIME CMD
 2517 ?        00:08:13 nautilus
$ ps -p 2678
  PID TTY          TIME CMD
 2678 ?        00:01:56 gvfsd-metadata
$ ps -p 26624
  PID TTY          TIME CMD
26624 ?        00:00:17 gedit
A memory-mapped database file [see the "m" after the PID in the output of fuser - that means memory mapped].  And PIDs or of the applications currently performing operations via GIO.   The use of memory mapped files means that read operations require no IPC [inter-process communications] or even syscalls for multiple applications to see the same state.  Now I had to do a little digging for GVFS documentation to understand how they manage concurrency - as multiple writers to memory mapped files is a dicey business [and GIO applications feel rock solid].  The answer is the gvfsd-metadata process.  Applications using GIO push all there writes / changes to that process over D-BUS; so only one process writes, everyone else reads through the memory mapped file.  Concurrency issues are elegantly side-stepped.  Brilliant. 

Now that the geek in me is sated I can go back to letting GNOME and its infallible secretary facilitate my productivity.

2012-10-26

Setting a course for UTC

Timezones and daylight savings times are confusing; it is much more complicated that offset-from-UTC.  There are times that occur more that once a year [yep, it hurts] as well as times that are between two valid times but never happen.  It probably requires a Tardis to understand why anyone would want it to work this way.  But, sadly, it does work this way. 
If you stick to the rules, you can safely manage times... so long as those times are all localized.   Naive times, times that are not localized, are the enemy.
Unfortunately there is a lot of code out there, even in important and widely used modules, that uses nieve datetimes.  If you try to use a virtuously localized datetime object with those modules you will likely encounter the dreaded "Cannot compare naive and localized values".
One hack is to make sure the time is localized to the system's timezone, then make it naive, call the module's function, and then re-localize the result (again). Tedious and very prone to error.  The one real problem with this hack is that on most systems the Python process has not @*^$&*@* clue what time zone it is in.  Don't believe me? Try it:
>>> import time
>>> time.tzname
('EST', 'EDT')
Eh, that's a tuple.  And while "EST" is a time zone "EDT" is not a timezone.  Yes, I can determine that I am in daylight savings time locally using time.daylight; but I can't localize a datetime to a daylight timezone because daylight is an attribute of a timezone, not a timezone itself.  That is true regardless of what time.tzname says.  And the "EST" doesn't have daylight savings time, "US/Eastern" does.  "EST" is "US/Eastern" when not it daylight savings time. Gnarly.
But I want to use datetime obejcts reliably and safely with modules that require naive datetime objects....  The answer is to make the timezone known!  I cannot reliably get it from the system but I can make it what I want, and what I want is UTC!  Then my naive datetime objects do not have to be concerned with daylight savings time.  I can just localize them to UTC and subsequently convert them to whatever timezone the user needs to see.  This is accomplished using a combination of the os and time modules.  Early on in my Python application I move myself to UTC.  Here is an example that demonstrates the ugliness of naive times in an unknown timezone, and the beauty of the process being in UTC.
from datetime import datetime
import pytz, time, os

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format(datetime.utcnow( ) ) )
# What timezone is local?  Problem is, most of the time we just do not know.
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )

#Change to UTC
os.environ[ 'TZ' ] = 'UTC'
time.tzset( )

print( 'NOW: {0}'.format( datetime.now( ) ) )
print( 'UTCNOW: {0}'.format( datetime.utcnow( ) ) )
print( 'LOCALIZEDNOW: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.now( ) ) ) )
print( 'LOCALIZEDUTC: {0}'.format( pytz.timezone( 'UTC' ).localize( datetime.utcnow( ) ) ) )
And the output:
NOW: 2012-10-26 07:03:31.285486
UTCNOW: 2012-10-26 11:03:31.285570
LOCALIZEDNOW: 2012-10-26 07:03:31.285632+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285705+00:00
NOW: 2012-10-26 11:03:31.285787
UTCNOW: 2012-10-26 11:03:31.285812
LOCALIZEDNOW: 2012-10-26 11:03:31.285848+00:00
LOCALIZEDUTC: 2012-10-26 11:03:31.285875+00:00

Now the danger of somehow getting a naive datetime into the mix is completely avoided - I can always safely localize a naive time to UTC.

2012-09-25

Idjit's Guide To Installing RabbitMQ On openSUSE 12.2

The RabbitMQ team provides a generic SUSE RPM which works on openSUSE 11.x, openSUSE 12.1, and I presume on the pay-to-play versions of SuSE Enterprise Server. About the only real dependency for RabbitMQ is the erlang platform which is packaged in the erlang language repo. So the only real trick is getting the RabbitMQ package itself [from this page].  Then install and start is as simple as:
zypper ar http://download.opensuse.org/repositories/devel:/languages:/erlang/openSUSE_12.2 erlang
zypper in erlang
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.8.6/rabbitmq-server-2.8.6-1.suse.noarch.rpm
rpm -Uvh rabbitmq-server-2.8.6-1.suse.noarch.rpm
Now before you start the rabbitmq-server you need to modify the /etc/init.d/rabbitmq-server file changing "LOCK_FILE=/var/lock/subsys/$NAME" to "LOCK_FILE=/var/run/rabbitmq/$NAME".  The directory "/var/lock/subsys/$NAME" doesn't exist on openSUSE, so this change puts the lock file over under /var/run/rabbitmq along with the PID file.  Otherwise you can create the /var/lock/subsys/$NAME directory with the appropriate permissions.

Every time you modify a service script in /etc/init.d you need to then run systemctl --system daemon-reload so that systemd knows to anticipate the changed file.
If you want to use Rabbit's management interface you now need to enable the appropriate plugins:
rabbitmq-plugins enable rabbitmq_management
rabbitmq-plugins enable rabbitmq_management_visualiser

By default RabbitMQ will listen on all your hosts interfaces for erlang kernel, AMQP, and HTTP (management interface) connections.  Especially in the case of a developement host you may want to restrict the availability of one or all of these services to the local machine.

In order to keep the erlang kernel and Rabbit's AMQ listeners restricted to the local host you'll need to add two exported environment variables to the service script - just put them in following the definition of PID_FILE.
export RABBITMQ_NODENAME=rabbit@localhost
export ERL_EPMD_ADDRESS=127.0.0.1
For the management inteface and other components you'll need to modify [and possibly create] the /etc/rabbitmq/rabbitmq.config configuration file.  RabbitMQ violates the only-have-one-way-to-configure rule of system administration; this is in part due to its reliance on the Erlang runtime - controlling the behavior of the run-time is a [poorly documented] black art.  Both the environment variables and the configuration file are required to restrict all the components to the local interface.  The following configuration file restricts HTTP [management interface] and AMQP services to the localhost and informs the RabbitMQ application that it should find the Erlang kernel at the address 127.0.0.1.
[
  {mnesia, [{dump_log_write_threshold, 1000}]},
  {kernel,[{inet_dist_use_interface,{127,0,0,1}}]},
  {rabbit, [{tcp_listeners, [{"127.0.0.1", 5672}]}]},
  {rabbitmq_management,  [ {http_log_dir,   "/tmp/rabbit-mgmt"} ] },
  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] },
  {rabbitmq_mochiweb, [ {listeners, [{mgmt, [{port, 55672},
                                             {ip, "127.0.0.1"}]}]},
                        {default_listener, [{port, 60000} ] } ] }
 ].
Always modify the configuration file when the RabbitMQ service is shutdown.  A botched configuration file can render the broker unable to shutdown properly leaving you to have to manually kill the processes old-school.

With the RABBITMQ_NODENAME defined in the services file you will either need to add that same variable to the administrator's and application's environment or specify the node name when attempting to connect to or manage the RabbitMQ broker service [your application probably already refers to a configured broker, but you'll certainly have to deal with this when using the rabbitmqclt command].

Now the service should start:
service rabbitmq-server start
The broker service should now be running and you can see the components' open TCP connections using the netstat command.  The management interface should also be available on TCP/55672 [via your browser of choice] unless you specified an alternative port in the rabbitmq.config file.

linux-nysu:/etc/init.d # netstat --listen --tcp --numeric --program
Active Internet connections (only servers)
Proto Local Address    Foreign Address State  PID/Program name  
tcp   127.0.0.1:5672   0.0.0.0:*       LISTEN 23180/beam.smp     
tcp   127.0.0.1:60712  0.0.0.0:*       LISTEN 23180/beam.smp     
tcp   127.0.0.1:4369   0.0.0.0:*       LISTEN 22681/epmd         
tcp   127.0.0.1:55672  0.0.0.0:*       LISTEN 23180/beam.smp     
Now you probably want to do some configuration and provisioning using the rabbitmqctl command; but your RabbitMQ instance is up and running.

2012-04-15

Using GNOME Terminal Custom Commands

There are numerous terminal session managers and profile managers, etc... for use by people [typically network and system administrators] who have to SSH or telnet to lots of hosts.   But most of these aren't packages or very well maintained - fortunately there is an almost stealth solution built into the familiar gnome-terminal application.
Multiple profiles can be created in gnome-terminal and profiles can be assigned to "Run a custom command instread of my shell";  this command can be anything.  So that profile can automatically telnet or SSH to another host, potentiall even launch an application on that host (such as firing up GNU screen).

The defined profiles are conveniently available in "Files" / "Open Terminal" and "File" / "Open Tab" menu.  Simply create a profile for each host or device that you typically jump on to. If SSH is in use the automatic tie-in to the GNOME passphrase and keyring will kick in.
Generally you don't want a terminal emulator to grab your keystrokes - you want them to go to the application or session.  Under gnome-terminal's "Edit" / "Keyboard Shortcuts" you can enable F10 to activate the ring-menu which will allow you to conveniently navigate to profiles using just the keyboard.

2012-03-05

Sound Converter

A common issue is to have an audio file in one format at to need it in another for compatibility with some specific application or device.  And how to covert the file and know the quality of the result, etc...?  Well... there is a simple application called ... wait for it .... "soundconverter".  It is packaged for just about every distribution and available on openSUSE through the normal repositories.  How obvious can it get?  Apparently not so obvious I couldn't have overlooked it for a long long time.
The soundconverter application.
With soundconverter you can convert between FLAC, MP3, OGG, and WAV.  Add the files you want to covert, use preferences to select the output format, and away you go.  Nice job.