I’ve made a tool to analyse my article statistics on Weekend Notes. Though it can be used by other writers since I’ve deployed it as a public site. It’s running on AWS Lambda with the front end on my own site just because it’s easier - though the front end could be deployed to S3 and hosted as a static site since it just posts form data to my Lambda function. Here’s how it works and a bit about the issues I had while building it.
Analysing the Data
I’ve built my app to use BeautifulSoup to parse the HTML source code of the stats page, this way there’s no risk of exposing anything secret apart stats about one particular article, and I don’t need to worry about logins in code - just let the user login as normal in their web browser and grab the source code and give it to my tool.
I won’t get into the details of the analysis or DOM structure of the Weekend Notes statistics pages, but I’ll talk about the ways I structured the data and my code, as well as things I learnt.
I discovered the Python
calendar.month_name for converting integer value to English name
I discovered that you can pass a function to BeautifulSoup so that your function can do the complex matching like in LINQ.
I tried to handle actions in methods rather than in global scope so that I can easier handle errors in input.
You can sort a dictionary so that when you loop over the keys, the values are sorted. (How do I sort a dictionary by value?) That’s handy.
Deploying The Script
I created the script to be run as a Flask app using Zappa for Python because I already had some experience with a previous data analysis project in Python. Read about my experiences with Zappa this time around: AWS Lambda using Zappa
Because Zappa needs to be in a Python virtual environment I knew I had to create the virtual environment and be able to put my Flask app code in it easily. I could develop in the virtual environment (which would keep Python projects from conflicting) but I don’t have any other Python projects on my personal PCs that I’m concerned about so I didn’t worry.
So I created two BASH scripts to:
- Create the virtual environment
- Copy the files needed and pull down any Python packages (using a requirements.txt file) and run the Zappa commands.
This way I code away and just use a simple script to deploy everything for me. If I wanted to put in a CI/CD pipeline, I could do that too.
This worked well, except for copying files inside a folder and making it repeatable. I had to mkdir -p (Is there anyway to copy a file to a directory that does not exist?) and also the -R flag on cp will copy recursively. Handy!
Finally, due to PC issues I changed to use my Windows 10 desktop rather than my Mac laptop so I needed to fire up Windows Subsystem for Linux to use my BASH scripts. Unfortunately it was so old the Python was out of date. I tried to update it using apt-get upgrade but that doesn’t actually upgrade the OS. (How To Upgrade to Ubuntu 16.04 LTS from Ubuntu 14.04 LTS ). Actually I stumbled upon do-release-upgrade because the Microsoft Store advised that’s how to upgrade Linux inside WSL. (Actually that page says so too, further down.) After a couple of upgrades and hours later I had a working WSL again and I could just develop on Windows and switch to WSL when I wanted to package up. Wonderful!
Building the Front End
But I soon realised it was going to be easier just to change my back-end to handle multipart/form-data and let the browser submit it as normal. That way, the browser will also handle the returned page as the standard response.(Like when you submit any other form.)
(It’s hard to build plain HTTP POSTs by hand, much easier to just use Postman or Insomnia): Mozilla Developer Network POST - HTTP
Luckily Flask can do both. Quickstart — Flask Documentation (1.1.x)
At the same time, I discovered that pasting a large amount of data into a textArea causes Chrome to lockup. Firefox wasn’t locking up so I realised it was a browser issue, which meant it might be possible to code around it. I found:
and then disabled spell check, auto correct, and auto complete. Now Chrome is as responsive as Firefox is when pasting the source code from Weekend Notes.