Wednesday, 16 October 2019

Electronic resources: getting informative usage statistics from our EZProxy logs using Python and Microsoft Power BI


Outline of intended model: September 2019

I started developing a process whereby relevant information could be extracted from the Ezproxy logs, augmented with information from other sources (a user’s school from Juno, and geographic location via IP address), and loaded into a MySQL database.
This enhanced data was then available for data visualisation in tools like Microsoft Power BI.
This was achieved by means of a Python script.

The model works as follows:
·         A UNIX shell script extracts the lines from the ezproxy log that identify a user authentication to a resource: gathering the user name, resource, ip address and timestamp.  [At the moment, I am doing this manually, but it can be automated for a production model].
·         A Python script works through the log extract. For each authentication, it calls the Alma API to get the user’s group, School, and course of study.
·         Python replaces the IP address with a geographical location.
·         Python writes the combined data to a MySQL database.
·         Visualisations can then easily be created from the MySQL database: using software of choice (Power BI in our case).
In terms of GDPR, the MySQL database will not hold anything that identifies an individual.

No comments:

Post a Comment