Debian GNU/Linux at large scale locally - automating a local HTC cluster

Speaker: Carsten Aulbert

Track: MiniDebConf Berlin 2024

Type: Short Talk

Room: c-base

Time: May 18 (Sat): 17:00

Duration: 0:20

At the Max Planck Institute for Gravitational Physics (Albert Einstein Institute) we are running a largish computing facility called “Atlas”.

In this talk I want to briefly show how we scaled up from earlier iterations with 10 computers installed manually, to the current 3,500 server set-up with Debian GNU/Linux as our foundation and not much personnel.

I will try to at least touch the basics of operation like automatic installation and configuration with FAI/salt, getting work done with HTCondor, monitoring, daily tasks and chores, and how educating users usually pays off in the long run.

Ideally, anyone with some Linux background should be able to understand all of it as I do not plan to dive into details too much.