2009-06-16

Logging *USEFUL* Web Access To PostgreSQL

It is pretty easy to configure syslog to log system messages to a PostgreSQL database; and that sure beats flat files. With some GRANT/REVOKE goodness this setup can be made quite secure - only allowing the message import process to insert messages and denying the ability to everyone else to modify the contents of the database. But all that data is still pretty ugly to deal with as the messages are just long strings; so some method is needed to break down that "data" into "information". I had the specific need to do this with SQUID weblogs. I already had syslog logging to a PostgreSQL database so the first step was just to have SQUID use syslog to log access. That is easily accomplished like:

cache_access_log syslog:LOG_LOCAL4

After a restart SQUID messages show up in the LOCAL4 facility (table: facility_local4) with the priority of "info". Then I created a table to hold the parsed messages:

CREATE TABLE proxy_audit (
timestamp TIMESTAMP,
elapsed INT,
source VARCHAR(40),
status VARCHAR(15),
size INT,
method VARCHAR(15),
url VARCHAR(128),
domain VARCHAR(60),
identity VARCHAR(25),
mimetype VARCHAR(45));


Last a trigger on the LOCAL4 log parses incoming messages into this table:

CREATE OR REPLACE FUNCTION p_proxy_audit() RETURNS trigger AS '
BEGIN
IF (new.hostname = ''hod-a'' AND new.priority = ''info'') THEN
INSERT INTO proxy_audit (timestamp, elapsed, source, status, size,
method, url, domain, identity, mimetype)
VALUES(''epoch''::TIMESTAMP + (split_part(split_part(new.message, '' '', 1), ''.'', 1)::INT) * ''1 second''::INTERVAL,
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 1)::INT,
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 2),
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 3)::VARCHAR(15),
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 4)::INT,
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 5)::VARCHAR(15),
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 6)::VARCHAR(128),
split_part(split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 6), ''/'', 3)::VARCHAR(60),
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 7)::VARCHAR(25),
split_part(TRIM(substring(new.message, strpos(new.message, '' ''))), '' '', 9));
END IF;
RETURN new;
END
' LANGUAGE plpgsql;
CREATE TRIGGER t_proxy_audit AFTER INSERT
ON facility_local4 FOR EACH ROW
EXECUTE PROCEDURE p_proxy_audit()


Once you get allot of data in your table, which happens quickly, you'll probably discover you need an index.

CREATE INDEX proxy_audit_i1 ON proxy_audit(identity, timestamp);

Now looking at a user's web usage is straight-forward. Finally.

No comments:

Post a Comment