Debian GNU/Linux at large scale locally - automating a local HTC cluster
Speaker: Carsten Aulbert
Track: MiniDebConf Berlin 2024
Type: Short Talk
Room: c-base
Time: May 18 (Sat): 17:00
Duration: 0:20
At the Max Planck Institute for Gravitational Physics (Albert Einstein Institute) we are running a largish computing facility called “Atlas”.
In this talk I want to briefly show how we scaled up from earlier iterations with 10 computers installed manually, to the current 3,500 server set-up with Debian GNU/Linux as our foundation and not much personnel.
I will try to at least touch the basics of operation like automatic installation and configuration with FAI/salt, getting work done with HTCondor, monitoring, daily tasks and chores, and how educating users usually pays off in the long run.
Ideally, anyone with some Linux background should be able to understand all of it as I do not plan to dive into details too much.