×
Virgil Chereches

Virgil Chereches

Site Reliability Engineer | DevOps Engineer

#5 Al. Budacu, Bucharest, Romania, RO
(+40) 744440910
English, Romanian

Background


About

About

Site Reliability Engineer with 20+ years of progressive experience in large-scale distributed systems, infrastructure automation, and production operations. Deep expertise in Kubernetes orchestration, OpenStack cloud platforms, and containerization technologies. Proven track record of designing and maintaining highly available systems, implementing automation frameworks that eliminate most of manual work, and optimizing system performance and reliability. Strong background in incident response, capacity planning, and blameless postmortem culture. Experienced in building and scaling production infrastructure with focus on availability, latency, and operational excellence.

Work Experience

Work Experience

  • DevOps Engineer / Site Reliability Engineer, Bosch Service Solutions S.R.L

    Sep, 2021 - Present

    Design, implement and maintain medium-scale Kubernetes production environments with focus on reliability, performance, and operational excellence for Nexeed IAS platform

    • Design and operate medium-scale Kubernetes clusters across multiple environments (AKS, Rancher RKE2, k3s, OpenShift) serving production workloads with 99.9% uptime SLA

    • Architect and deploy cloud-native solutions on Azure with focus on scalability, achieving 35% cost optimization through autoscaling, right-sizing and resource optimization

    • Engineer deployment automation reducing deployment time from 4 hours to 10 minutes using Helm3 advanced features and Kubernetes operators

    • Build CI/CD pipelines for container image lifecycle including vulnerability scanning (FossID, Mend, Trivy) and SBOM generation ensuring security compliance

    • Practice sustainable incident response with on-call rotation, troubleshooting application and infrastructure issues across target Kubernetes environments

    • Identify and implement 12 process improvements eliminating 90% of manual operational tasks through automation and scripting

    • Collaborate with development teams providing system design consulting, capacity planning, and performance optimization recommendations

    • Contribute to infrastructure automation using Infrastructure as Code principles, Terraform and configuration management tools

    • Design and implement backend services for agentic-AI project using Python (FastAPI, Pydantic, SqlModel, Autogen), deliver two major features (authN, component serialization) with full lifecycle ownership

  • Senior Cloud Developer, Orange Labs for Networks

    Dec, 2018 - Sep, 20212 years 9 months

    Build and maintain large-scale OpenStack private cloud distribution with focus on automation, reliability, and operational efficiency

    • Design and implement automation framework for OpenStack-based private cloud deployment and lifecycle management based on k8s, increasing automation coverage from 60% to 95%

    • Engineer Kubernetes operators for infrastructure components (libvirtd) demonstrating deep understanding of operator patterns and Kubernetes extension mechanisms

    • Build CI/CD pipelines achieving continuous delivery milestone for infrastructure automation code, enabling rapid iteration and deployment velocity

    • Architect deployment solutions using Helm charts for OpenStack components, pioneering cloud-native approaches in traditional infrastructure

    • Implement validation framework reducing configuration errors by 75%, improving system reliability and reducing incident response time

    • Design and deploy security solutions implementing Consul ACL and Vault auth agent achieving 100% secrets management compliance

    • Develop nested virtualization solution deploying OpenStack on OpenStack VMs demonstrating expertise in complex distributed systems

    • Build automation tools using GitLab webhooks reducing manual code review time by 25% and improving development workflow efficiency

    • Contribute to open-source kanod project demonstrating collaboration with distributed teams and community engagement

  • IT Solutions Architect, Orange Labs for Services

    Oct, 2017 - Dec, 20181 year 2 months

    • Build self-service automation platform using Rundeck, Kubernetes, Terraform and Ansible enabling 10+ developers to provision resources in under 30 minutes

    • Design and implement testing framework for OpenStack clouds using Python and Terraform ensuring infrastructure reliability before production deployment

    • Engineer self-service workflows reducing tenant provisioning time from 2 days to 4 hours through automation and Infrastructure as Code

    • Evaluate and integrate open-source technologies (Skydive, Kubespray, LOCI, OpenStack Helm) successfully deploying 4 into production from 10 evaluated

    • Conduct technical workshops and training sessions for 25+ engineers achieving 95% satisfaction score and improving team capabilities

    • Implement customizations for CloudFoundry application releases demonstrating platform engineering expertise

  • Infrastructure Solutions Architect, Orange Services S.R.L.

    Dec, 2013 - Oct, 20173 years 10 months

    Design and implement large-scale automation framework and configuration management infrastructure for enterprise systems

    • Architect and implement enterprise automation framework based on Puppet, Foreman and Rundeck automating 80% of infrastructure provisioning and reducing manual effort by 60%

    • Design blue-green deployment pipeline using Rundeck, Ansible and Consul reducing application downtime from twenty minutes to seconds

    • Build storage inventory and performance monitoring solution using Rundeck, Sidekiq and Superset managing Orange Romania storage assets with 90% accuracy

    • Pioneer Docker and Rancher container technology adoption integrating containerization into existing automation framework

    • Implement SSH key distribution engine using Puppet and Active Directory ensuring secure access management at scale

    • Establish storage performance testing methodology and capacity planning processes

    • Conduct 10+ training sessions for 80+ engineers achieving 4.5/5 satisfaction score and improving organizational technical capabilities

  • Infrastructure Operations Manager, Orange Romania S.A.

    Aug, 2011 - Dec, 20132 years 4 months

    Lead Infrastructure Operations team with focus on standardization, efficiency and cost optimization

    • Lead heterogeneous team implementing common working standards and operational practices reducing task completion variance by 50%

    • Drive server virtualization initiative improving virtualization ratio from 40% to 85% enhancing resource utilization and operational efficiency

    • Implement capacity management processes ensuring optimal resource allocation based on performance requirements

    • Achieve six-figure euro annual cost savings through vendor negotiations and strategic RFQ management

    • Optimize licensing costs eliminating five-figure euros in annual expenses through license optimization and unused support discontinuation

    • Modernize infrastructure refreshing obsolete technologies and improving overall system reliability

  • Infrastructure Solutions Architect, Orange Romania S.A.

    Jun, 2008 - Aug, 20113 years 2 months

    Provide technical leadership for server and storage infrastructure design and implementation

    • Design and implement secure remote access solution using Juniper SA serving franchise network

    • Deploy HP Server Automation implementing early automation practices and reducing manual operational overhead

    • Develop ITIL-compliant web forms integrated with OTRS improving service management workflows

    • Optimize EMC storage infrastructure performance through continuous analysis and tuning

    • Analyze workload patterns and recommend virtualization candidates optimizing infrastructure utilization

Skills

Skills

  • Container Orchestration & Cloud Platforms

    Kubernetes (AKS, RKE2, k3s, OpenShift)

    OpenStack

    Docker

    Rancher

    vCluster

    Helm

    Kubernetes Operators

  • Programming & Scripting

    Python

    Go

    Shell scripting (Bash)

    Ruby

  • Python Frameworks & Libraries

    FastAPI

    Flask

    Pydantic

    SqlModel

    Kopf (Kubernetes Operators)

    Operator SDK

    Celery

    Autogen

  • Infrastructure as Code & Automation

    Terraform

    Ansible

    Puppet

    Foreman

    Rundeck

  • Monitoring, Logging & Observability

    Prometheus

    Grafana

    ElasticSearch

    Fluentd

    System metrics and alerting

  • CI/CD & Development Tools

    GitLab CI/CD

    Azure DevOps

    GitHub Actions

    Git

    Jira

    Confluence

  • Linux & Systems Engineering

    RedHat/CentOS/Ubuntu

    Linux performance tuning

    Networking

    System troubleshooting

    HP-UX

    Sun Solaris

  • Security & Authentication

    Vault

    Consul

    OpenIDC

    OAuth

    Keycloak

    JWT

    Container security scanning (Trivy, Mend)

  • Virtualization & Containerization

    VMWare

    KVM

    Docker

    Container runtimes

  • Database & Message Queues

    PostgreSQL

    Redis

    Message queue systems

Education

Education

  • Radio Design, Bachelor, University of Electronics and Telecommunications

    Oct, 1993 - Jun, 1998

Certificates

Certificates

Publications

Publications

  • Our Journey to Continuous Delivery, I T.A.K.E Unconference

    Published on: May 28, 2015

    Presentation on implementing continuous delivery practices and DevOps patterns in large-scale telecom environment, demonstrating thought leadership in reliability engineering and operational agility

Interests

Interests

  • Distributed Systems & SRE Practices

    Distributed system designPerformance optimizationIncident responseBlameless postmortems
  • Hiking

  • Reading

    Technical literatureSystem design