Responsabilités:
- Design and implement the architecture managing the DAG controller and the orchestration layer for initial and periodic recalibrations.
- Define node semantics: inputs, outputs, idempotence, delays, retries, and safe rollback behavior.
- Implement a policy engine for top-down controls and conditional decision trees that trigger recalibrations.
- Develop operator workflows for pause, diagnostics, suggested fixes, and resuming from failed nodes.
- Collaborate with experimentalists to translate calibration procedures into robust orchestration primitives.
- Ensure reliability through automated testing (unit, integration, hardware-in-the-loop) and runbook automation.
Requirements:
- At least 5 years of experience in software engineering, ideally on production orchestration or workflow systems.
- Strong Python skills; experience developing reliable control software for hardware.
- Experience with workflow/DAG frameworks (e.g., Airflow, Prefect, or custom orchestration).
- Proven track record with fault-tolerant systems, including retries, idempotence, and observability.
- Knowledge of instrument and hardware control interfaces (SCPI, VISA, gRPC, serial) or willingness to learn.
- Experience with hardware-in-the-loop testing strategies and CI/CD pipelines.
- Excellent communication skills and ability to collaborate with experimental scientists.
