TJ CSL
  • TJ CSL
  • Services
    • Ion
      • Development
        • Overview
        • Setup
          • Docker Setup
          • Vagrant Setup
        • Environment
        • Fixtures
        • PR Workflow
        • Style Guide
        • Maintainer Workflow
        • Repository Maintenance
        • Data Generation
      • Production
      • User Experience
        • User Interface
    • Director
      • Development
        • Vagrant Setup
        • PR Workflow
        • Style Guide
        • Maintainer Workflow
      • Production
    • Workstations
    • Signage
      • Setup
      • Administration
      • Monitoring
      • Troubleshooting
      • Experimental
        • IonTap
        • SignageAdmin
    • Remote Access
      • Setup
      • Administration
    • Cluster
      • FAQ
      • Setup
        • SSH Setup
      • Administration
      • Slurm
      • Slurm Administration
      • Borg
    • Printing
      • Setup
      • Troubleshooting
    • WWW
      • Administration
      • Sites
        • Web Proxy
      • Setup
      • Troubleshooting
    • Academic Services
      • Tin
      • Othello
        • Administration
        • Setup
  • Technologies
    • Web
      • Nginx
      • Django
      • PHP-FPM
      • Node.js
      • Supervisord
    • DBs
      • PostgreSQL
      • MySQL
    • Authentication
      • Passcard
        • GPG Usage
      • SSHD
        • SSH Passwordless Login
      • FreeIPA
    • Storage
      • NFS
      • Ceph
        • Setup
        • Backups
        • CephFS
    • Operating Systems
      • Ubuntu Server
      • AlmaLinux
      • Debian
    • Tools
      • Ansible
      • Slack
      • GitBook
      • GitLab
        • Setup
        • Updating
    • Virtualization
      • QEMU/KVM
      • Libvirt
    • Advanced Computing
      • MPI
      • Tensorflow
    • Networking
      • Netbox
      • Cisco
      • Netboot
      • DNS
      • DHCP
      • NTP
      • BGP
    • Mail
      • Postfix
      • Dovecot
    • Monitoring
      • Prometheus
      • Grafana
      • Sentry
      • Uptime Robot
  • Machines
    • VM Servers
      • Utonium
      • Blossom
      • Bubbles
      • Buttercup
      • Antipodes
      • Chatham
      • Cocos
      • Galapagos
      • Gandalf
      • Gorgona
      • Overlord
      • Waverider
      • Torch
    • Ceph
      • Karel
      • Stobar
      • Wumpus
      • Waitaha
      • Barrel
      • Valdes
    • HPC Cluster
      • Zoidberg
    • Borg Cluster
    • Compute Sticks
    • Other
      • ASM
      • Duke
      • Snowy
      • Sauron
      • Sun Servers
        • Altair
        • Centauri
        • Deneb
        • Sirius
        • Vega
        • Betelgeuse
        • Ohare
    • Switches
      • Core0
      • Xnor
      • Xor
      • Imply
    • UPS
    • History
      • 2008 Sun AEG
      • 2011 Sun Upgrades
      • 2017 VM Disaster
      • 2018 Purchases
      • 2018 Cephpocalypse
    • VLANs
    • Remote Management
      • iLO
      • LOMs
    • Understudy
      • Switch Configuration
      • Server Configuration
        • Setting Up the Operating System
        • Network Configuration
        • Saruman
        • Fiordland
  • General
    • Sysadmins List
    • Organization
    • Documentation
      • Security
      • Runbooks
    • Communication
      • Terminology
    • Understudies
    • Account Structure
    • Machine Room
    • Branding
    • History
      • Fridge
      • The Brick
  • Procedures
    • Data Recovery
    • Account Provisioning
    • tjSTAR
      • Tech Support
    • Onboarding
      • New Sysadmin Onboarding
  • Guides
    • VM Creation
    • sshuttle Usage
    • Linux Wifi Setup
    • VNC Usage
    • Password Changes
    • Sun Server RAID Configuration
  • Policies
    • Data Release Policy
    • Upgrade Policy
    • Account Policy
    • Election Policy
  • Obsolete
    • Arcturus
    • Chuku
    • Cray SV1 Supercomputer
    • Ekhi
    • Mihr
    • Moloch
    • Sol
    • Rockhopper
    • Kerberos
    • LDAP
    • Agni
    • Moon
    • Apocalypse
    • AFS
      • OpenAFS
      • Setup
      • Client Setup
      • Administration
      • Troubleshooting
      • Directory Structure
      • Backups
      • Cross-Cell Authentication
    • Observium
    • OpenVPN
Powered by GitBook
On this page
  • Hierarchy
  • Lead Sysadmins
  • Team Lead(s)
  • Team Members
  • Roles
  • Infrastructure Lead
  • Networking
  • Cybersecurity & Monitoring
  • Documentation
  • Teams
  • Intranet
  • Director
  • Web Services
  • Mail
  • Storage
  • Cluster
  • Workstations
  • Signage
  • Printing
  1. General

Organization

PreviousSysadmins ListNextDocumentation

Last updated 11 months ago

Hierarchy

Lead Sysadmins

The Lead Sysadmins make up the Sysadmin Leadership Team together with the Faculty Sponsor and are the final decision-makers in the Sysadmins. They make the final call with respect to team organization/membership, access requests, and all decisions related to the Lab.

They are appointed by the outgoing Lead Sysadmins with approval from the Faculty Sponsor.

In another sense, the Lead Sysadmins are the Presidents. They may appoint Junior Lead Sysadmins (Vice Presidents), if those people are expected to become Lead Sysadmins in the next year.

Lead Sysadmins are also responsible for coordinating the .

Team Lead(s)

Leads are the Directly Responsible Individuals by default on a team. They are responsible for serving as the primary point of contact with respect to the team. If there is an incident relating to their team, the Lead(s) must be the one to report it.

Leads should:

  • stay apprised of their team's work

  • have extensive knowledge of the team's functional area

  • supervise the work done by their team members

  • report on their work to the broader Sysadmin team

Apple coined the term "directly responsible individual" (DRI) to refer to the one person with whom the buck stopped on any given project. The idea is that every project is assigned a DRI who is ultimately held accountable for the success (or failure) of that project.

They likely won't be the only person working on their assigned project, but it's "up to that person to get it done or find the resources needed."

... What's most important is that they're empowered.

Source:

Team Members

Team members are people who significantly contribute to a team's goals. Passive involvement does not mean that someone is a team member. They operate under the direction of the Team Lead.

Team Members are often Junior Admins, but could also be other Team Leads interested in helping.

Roles

These delegate responsibility for important tasks that apply to the Lab as a whole.

Infrastructure Lead

The Infrastructure Lead is one of the Lead Sysadmins who is responsible for broadly supervising all facets of the Lab's infrastructure.

The Infrastructure Lead is also responsible for:

  • prioritizing work among the Sysadmins

  • allocating work among the teams

  • ensuring work is done in a timely manner

  • setting abuse guidelines

  • spearheading automation efforts

  • maintaining the GitLab issue tracker

The Infrastructure Lead provides recommendations and feedback on changes to the Lab's architecture or to substantial technical changes.

The Infrastructure Lead is NOT a person who takes on all responsibility. Instead, the Lead delegates work. authority, and responsibility to other teams and people.

Qualifications:

  • Has an extraordinary knowledge of the Lab and the relationship between its software, services, and technologies

  • Has a broad range of expertise working with various aspects of the Lab's infrastructure

  • Has shown an extraordinary level of dedication to the program and its mission/values

  • Is organized

Responsibilities

  • LDAP

  • Kerberos

  • VM servers

  • iLOs

  • Security

  • CSL architectural decisions

Networking

Cybersecurity & Monitoring

Documentation

Teams

Intranet

Director

Web Services

Mail

The Mail team is responsible for maintaining TJ's mail servers, list servers, and webmail clients.

Storage

Cluster

Workstations

The Workstations team is responsible for TJ's workstations in rooms 200 and 202.

Signage

Printing

The Networking team is responsible for managing the , including switches, networking connections, , , , and . They are responsible for the smooth flow of network traffic. They are also the people to ask when diagnosing networking connections on CSL systems.

The Cybersecurity team is responsible for ensuring that proper cybersecurity procedures are being followed throughout the Lab, and quickly responding to any vulnerabilities detected. They are also responsible for communication with FCPS's Office of Cybersecurity (OCS). They are also responsible for , including logging, alerts, and metrics. They are responsible for maintaining systems that provide monitoring capability such as and Prometheus.

The Documentation team is responsible for accurate, comprehensive, and well-written documentation for the Sysadmins. They assist and strongly encourage other teams in documenting everything in both our and this .

The Intranet team is responsible for the administration, maintenance, and development of . They are also responsible for , including the CUPS server and the printers.

The Director team is responsible for the administration, maintenance, and development of . They also are responsible for ensuring the high availability of websites hosted on Director.

The Web Services (or WWW) team is responsible for maintaining the web presence of the Lab not supervised by other teams. This includes *.tjhsst.edu domains not supervised by other teams or by the Infrastructure Lead, and software used by TJHSST classes like , the TJHSST AI Grader, and Tin. They are also responsible for managing TJ's proxy configuration file.

The Storage team is responsible for the storage of data in the Lab including and . They are also responsible for the CSL's data backups.

The Cluster team is responsible for maintaing (Borg and HPC) and hardware used for advanced computing, including GPUs.

The Signage team is responsible for maintaining TJ's . They work closely with the Ion team in this regard.

The Printing team is responsible for maintaining TJ's . They use CUPS and maintain the availability of printers. They work closely with the Ion team in this regard.

Understudy program
https://about.gitlab.com/handbook/people-operations/directly-responsible-individuals/
CSL's network infrastructure
OpenVPN
NTP
DNS
DHCP
observability in the CSL
Grafana
Ion
printing operations in the Lab
Director
Othello
Ceph
OpenAFS
TJ's clusters
Signage displays
Printers
Docsite
Runbooks