Skip to content

Project Tooling & Automations

Poetry : python packaging

Poetry is a tool for making python packaging and dependency management easy and used int his project.

Markdown : documentation tooling

This project uses Markdown "Markdown is intended to be as easy-to-read and easy-to-write as is feasible" according to John Gruber. This is a nice guide to Markdown basic syntax

Documation build automation and serving is done with Read the Docs and mkdocs. To setup documentation, the following steps were followed. 1. Setup your mkdocs project 2. Create the .readthedocs.yaml file from by following those instructions 2. Follow the Read The Docs tutorial to link your github project with Read the DOcs.

Pylint : static Code Analysis (Linting)

Linting is a process of running a static code analysis witht he goal of flagging programming errors, bugs, stylistic errors and suspicious constructs [[2]]. An example of a rule enforced by linting in this project : use of snake_case which suggests that complex token names should be separated by underscored. The linter used in this project is Pylint. Development environment like Visual Studio Code integrates linting tools like pylint and automatically highlight issues. Pylint settings are located in the .pylintrc file

Black : automatic code formatting

Black is used for automatic code formatting. Automatic code formatting automatically modifies code to enforce a programming style, ensuring a uniform code style and making code maintainance easier.

Visual Studio Code : development environment

This project was developped using Visual Studio Code and includes a very limited set of workspace settings for activating tools used to develop this project. As of writing this documentation, the following tools are integrated with Visual Studio Code. * Automatic Testing : Pytest * Automatic Code Formatting : Black (runs when saving files) * Linting : Pylint (runs when saving files)

Visual Studio Code workspace settings are located in ./vscode/settings.json. Settings can be acces with ⌘, on Mac and ctrl+, on other OS.

By the way, the difference between user settings and workspace settings? From Visal Studio Code documentation: * User Settings - Settings that apply globally to any instance of VS Code you open. * Workspace Settings - Settings stored inside your workspace and only apply when the workspace is opened.

Of course, using Visual Studio Code is not required, nor recommended to work his project. Use whatever tool you like!!

Git : source control

Git is a "free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency"[[9]].

pre-commit : pre-commit verifications

Git allows automatically calling scripts at every commits (and other particular events). "Git hook scripts are useful for identifying simple issues before submission to code review" [[8]]. pre-commit is used for managing pre-commit verifications. As of writing this documentation, it automatically * yaml files * Fix end of files * Trim Trailing Whitespaces * Runs automatic code formatting (Black) * Runs Pytest pre-commit tests

bump2version : versionning automation

bump2version automates project versionning. It is especially useful where version number appear in multiple locations in projects. Not yet used in this project as of writing this, but it is planned. This project uses Semantic Versionning 2.0

See the versionning page for more details

Github workflow and automations

This project uses Github flow as a contribution workflow. A pull request template is implemented in .github/pull_request_template.md

Github Actions are used to automate workflows. Github actions scripts are located in .github/workflow/

  • To create your own pull request template, see this
  • To create github actions of your oww, see this

CML : continuous machine learning

CML is a tool for continuous integration in ML. Using this tool, a github action that trains the model, create a training report and adds as a pull request comment is located in .github/workflow/cml.yaml.
* A tutorial on the CML tool from Iterative * Continuous machine learning explained

DVC : dataset version control, experiment tracking

DVC is a tool for dataset version control, experiment tracking and monitoring. A full page dedicated to DVC is here

Background documentation

Project Struture

The following articles were used as inspiration this project folder structure : * Folder Structure for Machine Learning Projects * Machine Learning: Models to Production

Refactoring a data science project

Youtube series on refactoring a data science project by arjan_codes. * Part 1 * Part 2 * Part 3

Learning Referecnes

Courses * Full Stack Deep Learning 2022 * ML Ops Tutorials using iterative.io tools * ML Ops Guide

Virtual environments

A python virtual environments is a "self-contained directory tree that contains a Python installation for a particular version of Python" [[1]] and
* Official Python Documentation * Python Virtual Environments Primer by Martin Breuss on RealPython * Managing Application Dependencies

ML Ops Definitions * By Databricks * By Arrikto

ML Ops Challenges * Why Production Machine Learning Fails — And How To Fix It * The Ultimate Guide: Challenges of Machine Learning Model Deployment * Model Deployment Challenges: 6 Lessons From 6 ML Engineers

Maturity model in ML Ops * Three Levels of ML Software

Could be usefull but not used in this project

Scikit-Learn Pipelines

MLFlow : Machine Learning LifeCycle Management

An open source platform for the machine learning lifecycle, according to how they define themselves. * MLFlow * Youtube Series ML Lifecycle by Isaac Reis * Getting Started on Databricks Community Edition

Hypertops : Distributed Asynchronous Hyper-parameter Optimization