Home Artificial Intelligence Organising Python Projects: Part V

Organising Python Projects: Part V

2
Organising Python Projects: Part V

Mastering the Art of Python Project Setup: A Step-by-Step Guide

Photo by Zoya Loonohod on Unsplash

Whether you’re a seasoned developer or simply getting began with 🐍 Python, it’s necessary to know how one can construct robust and maintainable projects. This tutorial will guide you thru the strategy of establishing a Python project using a few of the hottest and effective tools within the industry. You’ll learn how one can use GitHub and GitHub Actions for version control and continuous integration, in addition to other tools for testing, documentation, packaging and distribution. The tutorial is inspired by resources equivalent to Hypermodern Python and Best Practices for a recent Python project. Nonetheless, this is just not the one approach to do things and you may have different preferences or opinions. The tutorial is meant to be beginner-friendly but in addition cover some advanced topics. In each section, you’ll automate some tasks and add badges to your project to indicate your progress and achievements.

The repository for this series could be found at github.com/johschmidt42/python-project-johannes

This part was inspired by this blog post:

Semantic release with Python, Poetry & GitHub Actions 🚀
I’m planning so as to add just a few features to Dr. Sven due to some interest from my colleagues. Before doing so, I needed to…

  • OS: Linux, Unix, macOS, Windows (WSL2 with e.g. Ubuntu 20.04 LTS)
  • Tools: python3.10, bash, git, tree
  • Version Control System (VCS) Host: GitHub
  • Continuous Integration (CI) Tool: GitHub Actions

It is anticipated that you simply are acquainted with the versioning control system (VCS) git. If not, here’s a refresher for you: Introduction to Git

Commits will probably be based on best practices for git commits & Conventional commits. There may be the conventional commit plugin for PyCharm or a VSCode Extension that assist you to put in writing commits on this format.

Overview

Structure

  • Git Branching Strategy (GitHub flow)
  • What’s a release? (zip, tar.gz)
  • Semantic Versioning (v0.1.0)
  • Create a release manually (git tag, GitHub)
  • Create a release robotically (conventional commits, semantic releases)
  • CI/CD (release.yml)
  • Create a Personal Access Token (PAT)
  • GitHub Actions Flow (Orchestrating workflows)
  • Badge (Release)
  • Bonus (Implement conventional commits)

Releasing software is a crucial step within the software development process because it makes recent features and bugfixes available to users. One key aspect of releasing software is versioning, which helps to trace and communicate the changes made in each release. Semantic versioning is a widely used standard for versioning software, which uses a version number within the format of Major.Minor.Patch (e.g. 1.2.3) to point the extent of changes made in a release.

Conventional commits is a specification for adding human and machine readable intending to commit messages. It’s a approach to format commit messages in a consistent manner, which make it easy to find out the variety of change made. Conventional commits are commonly used together with semantic versioning, because the commit messages could be used to robotically determine the version variety of a release. Together, semantic versioning and traditional commits provide a transparent and consistent approach to track and communicate the changes made in each release of a software project.

There are various different branching strategies on the market for git. Many individuals gravitate towards GitFlow (or variants), Three Flow, or Trunk based Flows. Some do strategies in between these, equivalent to this one. I’m using the quite simple GitHub flow branching strategy, where all bug fixes and features have their very own separate branch, and when complete, each branch is merged to primary and deployed. Easy, nice and simple.

GitHub Flow branching strategy

Whatever your strategy may be, ultimately you merge a pull request and (probably) create a release.

Briefly, a release is packing up code of a version (e.g. zip) and pushing it to production (whatever this may be for you).

Release management could be messy. Subsequently there must be a concise way that you simply follow (and others), that defines what a release means and what changes between one release and the subsequent. If you happen to don’t track the changes between the releases, you then probably won’t understand what has been modified in each release and you may’t discover any problems that may need been introduced with recent code. And not using a changelog, it could actually be obscure how the software has evolved over time. It could possibly also make it difficult to roll back changes if essential.

Semantic Versioning is only a number schema and standard practice within the industry for software development. It indicates the extent of changes between this version and the previous one. There are three parts to a semantic version number, equivalent to 1.8.42, that follow the pattern of :

Each one in all them means a distinct degree of change. A PATCH release indicates bug fixes or trivial changes (e.g. from 1.0.0 to 1.0.1). A MINOR release indicates adding/removing functionality or backwards compatible changes of functionality (e.g. from 1.0.0 to 1.1.0). A MAJOR release indicates adding/removing functionality and potentially backwards in-compatible changes equivalent to breaking changes (e.g. from 1.0.0 to 2.0.0).

I like to recommend a talk of Mike Miles, in case you desire a visual introduction into releases with semantic versioning. It’s a summary of what releases are and the way semantic versioning with git tags allows us to create releases.

About git tags: There are lightweight and annotated tags in git. A lightweight tag is only a pointer to a particular commit whereas an annotated tag is a full object in git.

Let’s create a release manually first after which automate it.

If you happen to remember, our example_app’s __init__.py file incorporates the version

# src/example_app/__init__.py

__version__ = "0.1.0"

in addition to the pyproject.toml file

# pyproject.toml

[tool.poetry]
name = "example_app"
version = "0.1.0"
...

So the very first thing we must do is to create an annotated git tag v0.1.0 and add it to the newest commit in primary:

> git tag -a v0.1.0 -m "version v0.1.0"

Please note that if no commit hash is specified at the top of the command, then git will use the present commit you might be on.

We are able to get an inventory of tags with:

> git tag

v0.1.0

and if we would like delete it again:

> git tag -d v0.1.0

Deleted tag 'v0.1.0'

and get more information concerning the tag with:

> git show v0.1.0

tag v0.1.0

Tagger: Johannes Schmidt
Date: Sat Jan 7 12:55:15 2023 +0100
version v0.1.0
commit efc9a445cd42ce2f7ddfbe75ffaed1a5bc8e0f11 (HEAD -> primary, tag: v0.1.0, origin/primary, origin/HEAD)
Creator: Johannes Schmidt <74831750+johschmidt42@users.noreply.github.com>
Date: Mon Jan 2 11:20:25 2023 +0100
...

We are able to push the newly created tag to origin with

> git push origin v0.1.0

Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 171 bytes | 171.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:johschmidt42/python-project-johannes.git
* [new tag] v0.1.0 -> v0.1.0

in order that this git tag is now available on GitHub:

Let’s manually create a recent release in GitHub with this git tag:

We click on Create a recent release , select our existing tag (that’s already certain to a commit) after which generate release notes robotically by clicking on the Generate release notes button before we finally publish the discharge with the Publish release button.

GitHub will robotically create a tar and a zip (assets) for the source code, but is not going to construct the applying! The result will appear to be this:

To summarise, the steps for a release are:

  • create a recent branch out of your default branch (e.g. feature or fix branch)
  • make changes and increase the version (e.g. pyproject.toml and __init__.py)
  • commit the feature/bug fix to the default branch (probably through a Pull Request)
  • add an annotated git tag (semantic version) to the commit
  • publish the discharge on GitHub with some additional information

As programmers, we don’t wish to repeat ourselves. So there are many tools that make these steps super easy for us. Here, I’ll introduce Semantic Releases, a tool specifically for Python Projects.

It’s a tool which robotically sets a version number in your repo, tags the code with the version number and creates a release! And that is all done using the contents of Conventional Commit style messages.

Conventional Commits

What’s the connection between semantic versioning and conventional-commits?

Certain commit types could be used to robotically determine a semantic version bump!

  • A fix commit is a PATCH.
  • A feat commit is a MINOR.
  • A commit with BREAKING CHANGE or ! is a MAJOR.

Other types, e.g. construct, chore, ci, docs, style, refactor, perf, test generally don’t increase the version.

Try the bonus section at the top to search out out how one can implement conventional commits in your project!

Automatic semantic releases (locally)

We are able to add the library with:

> poetry add --group semver python-semantic-release

Let’s undergo the configuration settings that allow us to robotically generate change-logs and releases. Within the pyproject.toml, we are able to add semantic_release as a tool:

# pyproject.toml

...
[tool.semantic_release]
branch = "primary"
version_variable = "src/example_app/__init__.py:__version__"
version_toml = "pyproject.toml:tool.poetry.version"
version_source = "tag"
commit_version_number = true # required for version_source = "tag"
tag_commit = true
upload_to_pypi = false
upload_to_release = false
hvcs = "github" # gitlab can also be supported

  • branch: specifies the branch that the discharge must be based on, on this case the “primary” branch.
  • version_variable: specifies the file path and variable name of the version number within the source code. On this case, the version number is stored within the __version__ variable within the file src/example_app/__init__.py.
  • version_toml: specifies the file path and variable name of the version number within the pyproject.toml file. On this case, the version number is stored within the tool.poetry.version variable of the pyproject.toml file
  • version_source: Specifies the source of the version number. On this case, the version number is obtained from the tag (as a substitute of commit)
  • commit_version_number: This parameter is required when version_source = "tag". It specifies whether the version number must be committed to the repository or not. On this case, it is about to true, which suggests that version number will probably be committed.
  • tag_commit: Specifies whether a recent tag must be created for the discharge commit. On this case, it is about to true, which suggests that a recent tag will probably be created.
  • upload_to_pypi: Specifies whether the package must be uploaded to the PyPI package repository. On this case, it is about to false, which suggests that the package is not going to be uploaded to PyPI.
  • upload_to_release: Specifies whether the package must be uploaded to the GitHub release page. On this case, it is about to false, which suggests that the package is not going to be uploaded to GitHub releases.
  • hvcs: Specifies the hosting version control system of the project. On this case, it is about to “github”, which suggests that the project is hosted on GitHub. “gitlab” can also be supported.

We are able to update the files where we have now defined the version of the project/module. For this we use the variable version_variable for normal files and version_toml for .toml files. The version_source defines the source of truth for the version. Since the version in these two files is tightly coupled with the git annotated tags, for instance we create a git tag with every release robotically (flag tag_commit is about to true), we are able to use the source tag as a substitute of the default value commit that appears for the last version within the commit messages. To find a way to update the files and commit the changes, we have to set the commit_version_number flag to true. Because we don’t wish to upload anything to the Python index PyPi, the flag upload_to_pypi is about to false. And for now we don’t wish to upload anything to our releases. The hvcs is about to github (default), other values could be: gitlab.

We are able to test this locally by running just a few commands, that I’ll add on to our Makefile:

# Makefile

...

##@ Releases

current-version: ## returns the present version
@semantic-release print-version --current

next-version: ## returns the subsequent version
@semantic-release print-version --next

current-changelog: ## returns the present changelog
@semantic-release changelog --released

next-changelog: ## returns the subsequent changelog
@semantic-release changelog --unreleased

publish-noop: ## publish command (no-operation mode)
@semantic-release publish --noop

With the command current-version we get the version from the last git tag within the git tree:

> make current-version

0.1.0

If we add just a few commits in conventional commit style, e.g. feat: recent cool feature or fix: nasty bug, then the command next-version will compute the version bump for that:

> make next-version

0.2.0

Without delay, we don’t have a CHANGELOG file in our project, in order that after we run:

> make current-changelog

the output will probably be empty. But based on the commits we are able to create the upcoming changelog with:

> make next-changelog### Feature
* Add releases ([#8](https://github.com/johschmidt42/python-project-johannes/issues/8)) ([`5343f46`](https://github.com/johschmidt42/python-project-johannes/commit/5343f46d9879cc8af273a315698dd307a4bafb4d))
* Docstrings ([#5](https://github.com/johschmidt42/python-project-johannes/issues/5)) ([`fb2fa04`](https://github.com/johschmidt42/python-project-johannes/commit/fb2fa0446d1614052c133824150354d1f05a52e9))
* Add application in app.py ([`3f07683`](https://github.com/johschmidt42/python-project-johannes/commit/3f07683e787b708c31235c9c5357fb45b4b9f02d))
### Documentation
* Add search bar & github url ([#6](https://github.com/johschmidt42/python-project-johannes/issues/6)) ([`3df7c48`](https://github.com/johschmidt42/python-project-johannes/commit/3df7c483eca91f2954e80321a7034ae3edb2074b))
* Add badge pages.yml to README.py ([`b76651c`](https://github.com/johschmidt42/python-project-johannes/commit/b76651c5ecb5ab2571bca1663ffc338febd55b25))
* Add documentation to Makefile ([#3](https://github.com/johschmidt42/python-project-johannes/issues/3)) ([`2294ee1`](https://github.com/johschmidt42/python-project-johannes/commit/2294ee105b238410bcfd7b9530e065e5e0381d7a))

If we push recent commits (on to primary or through a PR) we could now publish a recent release with:

> semantic-release publish

The publish command will do a sequence of things:

  1. Update or create the changelog file.
  2. Run semantic-release version.
  3. Push changes to git.
  4. Run build_command and upload the distribution file to your repository.
  5. Run semantic-release changelog and post to your vcs provider.
  6. Attach the files created by build_command to GitHub releases.

Every step could be in fact configured or deactivated!

Let’s construct a CI pipeline with GitHub Actions that runs the publish command of semantic-release with every commit to the primary branch.

While the general structure stays similar to in lint.yml, test.yml or pages.yml, there are just a few changes that must be mentioned. Within the step Checkout repository, we add a recent token that’s used to checkout the branch. That’s since the default value GITHUB_TOKEN doesn’t have the required permissions to operate on protected branches. Subsequently, we must use a secret (GH_TOKEN) that incorporates a Personal Access Token with permissions. I’ll show later how the Personal Access Token could be generated. We also define fetch-depth: 0 to fetch all history for all branches and tags.

with:
ref: ${{ github.head_ref }}
token: ${{ secrets.GH_TOKEN }}
fetch-depth: 0

We install only the dependencies which can be required for the semantic-release tool with:

- name: Install requirements
run: poetry install --only semver

Within the last step, we modify some git configurations and run the publish command of semantic-release:

- name: Python Semantic Release
env:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
set -o pipefail
# Set git details
git config --global user.name "github-actions"
git config --global user.email "github-actions@github.com"
# run semantic-release
poetry run semantic-release publish -v DEBUG -D commit_author="github-actions "

By changing the git config, the user that commits will probably be “github-actions”. We run the publish command with DEBUG logs (stdout) and set the commit_author to “github-actions” explicitly. Alternatively to this command, we could use the GitHub motion from semantic-release directly, however the arrange steps of running the publish command are only a few and the motion uses a docker container that should be pulled each time. Due to that I prefer to make an easy run step as a substitute.

Since the publish command will make a commit, you may be anxious that we could find yourself in an infinite loop of workflows being triggered. But don’t worry, the resulting commit is not going to trigger one other GitHub Actions Workflow run. That is on account of limitations set by GitHub.

Personal access token are an alternative choice to using passwords for authentication to GitHub Enterprise Server when using the GitHub API or the command line. Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of a corporation, or for long-lived integrations, it is best to use a GitHub App. For more information, see “About apps.”

In other words: We are able to create an Personal Access Token and have GitHub actions store and use that secret to perform certain operations on our behalf. Take into account, if the PAT is compromised, it might be used to perform malicious actions in your GitHub repositories. It’s subsequently advisable to make use of GitHub OAuth Apps & GitHub Apps in organisations. For the needs of this tutorial, we will probably be using a PAT to permit the GitHub actions pipeline to operate on our behalf.

We are able to create a recent access token by navigating to the Settings section of your GitHub user and following the instructions summarised in Making a Personal Access Token. This can give us a window that may appear to be this:

Personal Access Token of an admin account with push access to the repos.

By choosing the scopes, we define what permissions the token can have. For our use case, we’d like push access to the repositories which why the brand new PAT GH_TOKEN must have the repo permissions scope. That scope would authorise pushes to protected branches, given you haven’t got Include administrators set within the protected branch’s settings.

Going back to the repository overview, within the Settings menu, we are able to either add an environment setting or a repository setting under the Secrets section:

Repository secrets are specific to a single repository (and all environments utilized in there), while environment secrets are specific to an environment. The GitHub runner could be configured to run in a particular environment which allows it to access the environment’s secrets. This is sensible when considering of various stages (e.g. DEV vs PROD) but for this tutorial I’m superb with a repository secret.

Now that we a have just a few pipelines (linting, testing, releasing, documentation), we should always think concerning the flow of actions with a commit to primary! There are just a few things we should always concentrate on, a few of them specific to GitHub.

Ideally, we would like that a commit to primary creates a push event that trigger the Testing and the Linting workflow. If these are successful, we run the discharge workflow which is responsible to detect if there must be a version bump based on conventional commits. If that’s the case, the discharge workflow will directly push to primary, bumping the versions, adding a git tag and create a release. A printed release should then, for instance, update the documentation by running the documentation workflow.

Expected flow of actions

Problems & considerations

  1. If you happen to read the last paragraph rigorously or checked out the FlowChart above, you may have noticed that there are two commits to primary. One initial (i.e. from a PR) and a second one for the discharge. Because our lint.yml and test.yml react on push events on the primary branch, they’d run twice! We should always avoid running it twice to save lots of resources. To attain this, we are able to add the [skip ci] string to our version commit message. A custom commit message could be defined within the pyproject.toml file for the tool semantic_release.
# pyproject.toml

...

[tool.semantic_release]
...
commit_message = "{version} [skip ci]" # skip triggering ci pipelines for version commits
...

2. The workflow pages.yml currently runs on a push event to primary. Updating the documentation might be something that we only wish to do if there’s a recent release (We may be referencing the version within the documentation). We are able to change the trigger within the pages.yml file accordingly:

# pages.yml

name: Documentation

on:
release:
types: [published]

Constructing the documentation will now require a published release.

3. The Release workflow should depend upon the success of the Linting & Testing workflow. Currently we don’t have defined dependencies in our workflow files. We could have these workflows depend upon the completion of defined workflow runs in a particular branch with the workflow_run event. Nonetheless, if we specify multiple workflows for the workflow_run event:

on:
workflow_run:
workflows: [Testing, Linting]
types:
- accomplished
branches:
- primary

only one in all the workflows must accomplished! This is just not what we would like. We expect that every one workflows should be accomplished (and successful). Only then the discharge workflow should run. That is in contrast to what we get after we define dependencies between jobs in a single workflow. Read more about this inconsistency and shortcoming here.

Instead, we could use a sequential execution of pipelines:

The massive downside with this concept is that it a) doesn’t allow parallel execution and b) we won’t find a way to see the dependency graph in GitHub.

Solution

Currently, the one way I see to cope with the above mentioned problems is to orchestrate the workflows in an orchestrator workflow.

Let’s create this workflow file:

The orchestrator is triggered after we push to the branch primary .

Provided that each workflows: Testing & Linting are successful, the discharge workflow is named. That is defined in with the needs keyword. If we would like to have more granular control over job executions (workflows), think about using the if keyword as well. But concentrate on the confusing behaviour as explained on this article.

To make our workflows lint.yml , test.yml & release.yml callable by one other workflow, we’d like to update the triggers:

# lint.yml

---
name: Linting

on:
pull_request:
branches:
- primary
workflow_call:

jobs:
...

# test.yml

---
name: Testing

on:
pull_request:
branches:
- primary
workflow_call:

jobs:
...

# release.yml

---
name: Release

on:
workflow_call:

jobs:
...

Now the brand new workflow (Release) should only run if the workflows for quality checking, on this case the linting and testing, succeed.

To create a badge, this time, I’ll use the platform shields.io.

It’s a web site that generates badges for projects, which display information equivalent to version, construct status, and code coverage. It offers a big selection of templates and allows customization of appearance and creation of custom badges. The badges are updated robotically, providing real-time information concerning the project.

For a release badge, I chosen GitHub release (latest SemVer) :

The badge markdown could be copied and added to the README.md:

Our landing page of the GitHub now looks like this ❤ (I’ve cleaned up slightly and provided an outline):

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here