Software development has been migrating from desktop and server applications to cloud-native applications. How can we locally leverage cloud-native development tools, methodology, and workflows for creating reproducible machine learning and analysis on-premise or the cloud?

Background

To enable faster development velocity while maintaining operational stability, most commercial software development has shifted to cloud-native systems with dynamically allocated containers like docker, and microservice-oriented applications. Kubernetes (k8s) has become the dominant container orchestrator for scaling elastic compute in the cloud. It was originally designed by Google to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts.”

Why Kubernetes?

Kubernetes offers some advantages as a platform; for example:

consistent packaging of applications (containers) to enable consistency across the pipeline — from your laptop to the production cluster.
running workloads over multiple commodity hardware nodes while abstracting away the underlying complexity and management of nodes.
scaling based on demand (application as well as the cluster itself).

Objective

Most home development environments have diverged from this cloud-native development trend. We aim to capitalize on this trend for our home development environment, to:

conveniently and efficiently learn cloud-native application development, with GitLab deployed on a multi-node kubernetes on our local machine
develop data science containers to reproducibly scale and accelerate queries, analysis, and machine learning workloads through the massive parallelism of graphics cards (GPU) on kubernetes
self-host workflows locally (on-premise) as much as possible, with the ability to easily scale workflows to cloud based instances

Hardware Considerations

Our workstation should have plenty of cores, memory, and storage for gracefully handling our multi-node “garage cloud lab” environment, with at least one GPU for accelerated computing. For reference, our hardware setup is as follows:

AMD Ryzen 7 1700 (8 cores, 16 threads @ 3.7GHz)
64GB Memory
2TB of Storage: 4x 1TB SATA SSD in a RAID-10 ZFS setup
One GPU for the host workstation
One NVidia GTX 1080 GPU for CUDA computation

Software Considerations

ProxMox VE 5.0 is based in Debian Linux (Stretch) and natively supports both KVM for hardware virtualization and LXC containers for Linux system virtualization. It supports installation to ZFS, a storage platform that encompasses the functionality of traditional filesystems, volume managers, and more, with consistent reliability, and performance. We configured RAID10 on ZFS for improved read/write performance with some local redundancy in case of drive failure.

Since ProxMox VE is based on Debian Linux, we can easily install development tools with a multi-node Kubernetes cluster to provide the mechanisms for deploying, maintaining, and scaling applications and GitLab to provide everything required for end-to-end software development and operations. GitLab simplifies toolchain complexity, speeds up cycle-times, and includes a container registry, and Kubernetes integration for easily getting started with cloud native development. We will use DIND (docker in docker) and kubeadm to provision kubernetes with Minio persistent storage for our containers. We also want to synchronize our local GitLab repository to a cloud-based repository.

The Plan

Create Cluster

Create kubernetes cluster with GPU enabled nodes.

Setup GitLab

Add the Storage

Next Steps…

This is Part 1 of a multipart tutorial. The next two part will cover installation of PVE and server tweaks we can make to improve performance of our VMs and containers.

Garage Cloud Lab for Data Science (Part 1)

Cloud-native Methodology for Reproducible Analytics

Table of Contents