Senior Platform Engineer
Branche | Zie onder |
Dienstverband | Zie onder |
Uren | Zie onder |
Locatie | Uitgeest |
Opleidingsniveau | Zie onder |
Organisatie | ING Netherlands |
Contactpersoon | Zie onder |
Informatie
Within FM, in Competitive Pricing area, we have ING Pricing Architecture (IPA) platform, supporting multiple products to enable real-time and batch calculations of financial risk metrics and simulations ( XVA, PFE, Value at Risk, Expected Shortfall, Bilateral Margining, pre-deal derivatives Pricing). The products on the platform are driven by new banking regulations as well as advanced Risk analysis on the derivative product portfolio of Financial Markets (interest rates, credits, foreign exchange). The platform itself is an array of containers, virtual and bare-metal servers of which a portion provides high performance computing.
The team
As HPC platform engineer, you will contribute to the daily support, maintenance, deployment and resilience exercises on the platform. You will use observability principles to report on its state, and leverage your knowledge, metrics, and industry best practices to ensure stable and reliable business delivery by our products. The platform is in the heart of the IT landscape for our global dealing rooms and risk managers in Asia, Europe and Americas, with approx. 30 scrum teams spread across 4 locations (Amsterdam, Brussels, Bucharest and Singapore), cooperating to evolve it towards target ING Financial Markets IT vision.
Roles and responsibilities
- Derive incident management and root cause analysis recommendations into improvement points for the platform.
- Work with the latest (automation) tooling with a strong focus on performance,reliability,observability and security.
- Define platform lifecycle management, resilience patterns, architecture and roadmaps together with solution, domain and enterprise architects.
- Align platform expected changes with stakeholders, financial controllers, and report on platform volumes to area lead.
- Present platform and automation best-practices to team and at in-/external engineering events.
- Report on state of IT Risk & security controls on the platform as per ING Information Security Management Policy.
- Apply CI/CD using Azure DevOps as well as remote operations on the platform.
- Through Agile/Scrum, collaborate with the other engineers to bring live new sprint releases every 2-4 weeks to Acceptance and Production.
- You are committed to staying updated with the latest developments in HPC and Cloud tech and participating in relevant workshops, conferences, and training programs is part of your nature.
- You meet frequently with product managers, analysts and researchers to gather and incorporate stakeholder feedback to improve HPC services.
How to succeed
We hire smart people like you for your potential. Our biggest expectation is that you'll stay curious. Keep learning. Take on responsibility. In return, we'll back you to develop into an even more awesome version of yourself.
Experience: 5+ years of software engineering / operations experience
Tech stack/ knowledge
- IT Operations/Support experience combined with analytical skills to identify root causes in incidents (data, technical, functional).
- Strong understanding of high performance computing environments, including HPC (GPU) clusters, parallel computing principles, distributed computing principles and techniques
- Strong understanding of using GPU technology as computational accelerator and proficiency in cluster management and job scheduling systems ( , DataSynapse, Slurm, PBS, LSF).
- Knowledge of GPU architectures and technologies ( , NVIDIA CUDA, AMD ROCm).
- Experience with deploying and maintaining parallel programming models and libraries ( , MPI, OpenMP, CUDA), middleware and supporting software.
- Ability to identify and contribute to resolving performance bottlenecks in HPC applications via monitoring / observability practices.
- Knowledge of CI/CD, experience with Git, Python, Ansible, Shell scripting and working experience with monitoring practices and alerting tools.
- Strong Linux (RHEL 8 or 9), Azure DevOps experience, pipeline and Ansible skills and experience working with certificates / encryption technology.
- Strong experience in translating computational requirements to IT concepts like system sizing.
- Experience with Grafana and tools for alerting like Prometheus, as well as a strong understanding of complex subsystem monitoring and alerting
- Good understanding of the ELK Stack and how to interact with it
- Experience in mentoring junior engineers and providing technical guidance.
- Thoroughness in testi...
Omschrijving
Within FM, in Competitive Pricing area, we have ING Pricing Architecture (IPA) platform, supporting multiple products to enable real-time and batch calculations of financial risk metrics and simulations ( XVA, PFE, Value at Risk, Expected Shortfall, Bilateral Margining, pre-deal derivatives Pricing). The products on the platform are driven by new banking regulations as well as advanced Risk analysis on the derivative product portfolio of Financial Markets (interest rates, credits, foreign exchange). The platform itself is an array of containers, virtual and bare-metal servers of which a portion provides high performance computing.
The team
As HPC platform engineer, you will contribute to the daily support, maintenance, deployment and resilience exercises on the platform. You will use observability principles to report on its state, and leverage your knowledge, metrics, and industry best practices to ensure stable and reliable business delivery by our products. The platform is in the heart of the IT landscape for our global dealing rooms and risk managers in Asia, Europe and Americas, with approx. 30 scrum teams spread across 4 locations (Amsterdam, Brussels, Bucharest and Singapore), cooperating to evolve it towards target ING Financial Markets IT vision.
Roles and responsibilities
- Derive incident management and root cause analysis recommendations into improvement points for the platform.
- Work with the latest (automation) tooling with a strong focus on performance,reliability,observability and security.
- Define platform lifecycle management, resilience patterns, architecture and roadmaps together with solution, domain and enterprise architects.
- Align platform expected changes with stakeholders, financial controllers, and report on platform volumes to area lead.
- Present platform and automation best-practices to team and at in-/external engineering events.
- Report on state of IT Risk & security controls on the platform as per ING Information Security Management Policy.
- Apply CI/CD using Azure DevOps as well as remote operations on the platform.
- Through Agile/Scrum, collaborate with the other engineers to bring live new sprint releases every 2-4 weeks to Acceptance and Production.
- You are committed to staying updated with the latest developments in HPC and Cloud tech and participating in relevant workshops, conferences, and training programs is part of your nature.
- You meet frequently with product managers, analysts and researchers to gather and incorporate stakeholder feedback to improve HPC services.
How to succeed
We hire smart people like you for your potential. Our biggest expectation is that you'll stay curious. Keep learning. Take on responsibility. In return, we'll back you to develop into an even more awesome version of yourself.
Experience: 5+ years of software engineering / operations experience
Tech stack/ knowledge
- IT Operations/Support experience combined with analytical skills to identify root causes in incidents (data, technical, functional).
- Strong understanding of high performance computing environments, including HPC (GPU) clusters, parallel computing principles, distributed computing principles and techniques
- Strong understanding of using GPU technology as computational accelerator and proficiency in cluster management and job scheduling systems ( , DataSynapse, Slurm, PBS, LSF).
- Knowledge of GPU architectures and technologies ( , NVIDIA CUDA, AMD ROCm).
- Experience with deploying and maintaining parallel programming models and libraries ( , MPI, OpenMP, CUDA), middleware and supporting software.
- Ability to identify and contribute to resolving performance bottlenecks in HPC applications via monitoring / observability practices.
- Knowledge of CI/CD, experience with Git, Python, Ansible, Shell scripting and working experience with monitoring practices and alerting tools.
- Strong Linux (RHEL 8 or 9), Azure DevOps experience, pipeline and Ansible skills and experience working with certificates / encryption technology.
- Strong experience in translating computational requirements to IT concepts like system sizing.
- Experience with Grafana and tools for alerting like Prometheus, as well as a strong understanding of complex subsystem monitoring and alerting
- Good understanding of the ELK Stack and how to interact with it
- Experience in mentoring junior engineers and providing technical guidance.
- Thoroughness in testi...