University of Sussex Library Systems: Ezproxy harvesting

I am genuinely happy at the prospect of being able to analyse usage statistics for our electronic resources. I have heard myself telling colleagues it was impossible for so many years that I feel ashamed that I never seriously tried to pull it off before.

My Python -> MySQL model is shaping up well.

Here is an outline of how the process works so far. (This will all be automated at a later stage, but at the moment involves me taking the place of scheduled jobs).

Step One: get the Ezproxy logs

We host our own Ezproxy server, so I just FTP the most recent batch to a network drive that allows me to run Python.

The log files I need are named along the lines:

ezproxy.log.04Nov2019
ezproxy.log.05Nov2019
ezproxy.log.06Nov2019

Step Two: extract the details I need

From these huge logfiles, I only need a tiny subset of information:

IP address of the requester
User name of the requester
Timestamp
Which of our electronic resources they viewed

I do this at the command line, by going through the logs and cutting out what I need:

cat ezproxy*.log* | cut -d' ' -f1,3,4,7 | grep 'connect?session' > ezproxy.out

(This basically retrieves columns 1, 3, 4, and 7 from the log file, from each line that shows the user authenticating their session)

With the user names redacted, the output looks like:

Step Three - run it through my Python script

Details in next post

University of Sussex Library Systems

Friday, 15 November 2019

Ezproxy harvesting - walk through of the steps so far

Step One: get the Ezproxy logs

Step Two: extract the details I need

Step Three - run it through my Python script

No comments:

Post a Comment