Iodine Software Logo

Iodine Software

Site Reliability Engineer - AWS

Posted Yesterday
Remote or Hybrid
Hiring Remotely in USA
110K-137K Annually
Senior level
Remote or Hybrid
Hiring Remotely in USA
110K-137K Annually
Senior level
The Site Reliability Engineer will manage AWS infrastructure, implementing cloud strategies, automation tools, and ensuring reliability, security, and cost-efficiency.
The summary above was generated by AI

Join us. Let’s make a direct impact in healthcare.

Being an Iodine employee means becoming part of something bigger - using clinical AI technology to drive smarter healthcare processes and positively impact patient care.

Who we are:

Recognized as one of Austin’s best places to work, we are a collaborative and dedicated team with innovation built into our DNA. Iodine is an enterprise AI company that is championing a radical rethink of how to create value for healthcare professionals, leaders, and their organizations - by automating complex clinical tasks, generating insights and empowering intelligent care. Powered by one of the largest sets of clinical data and use cases available, our groundbreaking clinical machine-learning engine, Cognitive ML, constantly ingests the patient record to generate real-time, highly focused, predictive insights that clinicians and hospital administrators can leverage to dramatically augment the management of care delivery.

Site Reliability Engineer – AWS

We are seeking a highly skilled Site Reliability Engineer (SRE) with AWS Cloud expertise to design, build, and optimize our cloud platform and infrastructure. This role demands deep hands-on experience with AWS cloud services across compute, storage, databases, networking, and security, combined with strong cost optimization strategies. You will implement the cloud roadmap and strategy, design scalable solutions, and ensure the reliability, security, and cost-efficiency of the platform and infrastructure. You will be responsible for the scalability of the platform and infrastructure, ensuring it can support business growth while maintaining high availability and performance. for driving reliability, security, cost optimization, and operational excellence across our platform. Additionally, you will participate in key architectural discussions with product engineering and security teams to ensure new and existing services follow best practices and meet operational excellence standards.

If you are an experienced SRE/Cloud Engineer with AWS expertise and a strong SRE mindset, passionate about high availability, security, automation, operational and cost efficiency, we would love to hear from you.

Key ResponsibilitiesCloud Strategy & Roadmap
  • Implement the cloud roadmap and strategy to drive scalability, reliability, security, and cost efficiency.

  • Drive cloud adoption initiatives, ensuring alignment with business objectives.

  • Implement and support initiatives on cloud governance, architectural best practices, and modernization strategies.

Automation & Reliability Engineering
  • Develop Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK for fully automated provisioning and deployment.

  • Own and improve infrastructure CI/CD pipelines using Gitlab, Ansible (AWX), Argo CD, Helm

  • Implement self-healing, fault-tolerant architectures that can automatically recover from failures.

  • Optimize infrastructure monitoring and observability using Prometheus, Grafana, Loki, Tempo, Mimir, AWS CloudWatch, AWS Cloudtrail and New Relic

  • Participate in architecture discussions with product engineering teams for onboarding new services, ensuring they are scalable, cost-optimized, and aligned with best engineering practices.

  • Collaborate with software developers to optimize application performance and cloud-native designs.

Operational Duties & Business Support

  • Perform regular system and infrastructure maintenance including OS-level patching, AMI refreshes, and kernel upgrades.

  • Lead and coordinate planned upgrade cycles for core services like RDS, EKS, and Kubernetes clusters to ensure security and feature compatibility.

  • Troubleshoot and resolve infrastructure and application-level issues, collaborating directly with internal teams and business stakeholders.

  • Participate in customer support escalations and provide technical guidance for resolution.

Incident Response & Operational Excellence
  • Lead and refine incident management processes for the SRE team, ensuring minimal downtime and fast recovery.

  • Implement SLOs, SLIs, and error budgets to drive system reliability.

  • Conduct post-mortems and drive root cause analysis to prevent recurring issues.

Security, Compliance, and Best Practices
  • Ensure cloud security best practices are embedded into all solutions, including IAM policies, VPC security, encryption, and compliance with industry standards (such as SOC 2, HIPAA).

  • Implement least privilege access, network segmentation, and automated security controls across AWS services.

  • Collaborate with InfoSec teams to enforce threat detection, logging, and security monitoring using AWS GuardDuty, Security Hub, and CloudTrail.

Solution Architecture & Infrastructure Design
  • Design and build highly available, scalable, and fault-tolerant AWS architectures using AWS services such as EC2, S3, RDS, DocumentDB, Lambda, EKS, Secrets Manager, SSM, API Gateway, and CloudFront and other related technologies such as Hashicorp Terraform, Vault and Consul and Ansible (AWX)

  • Implement and support resilient storage, compute, and database solutions optimized for performance and cost.

  • Drive the execution of multi-region disaster recovery (DR) and backup strategies.

AWS Cost Optimization & FinOps
  • Continuously monitor and optimize AWS infrastructure costs using AWS Cost Explorer, Trusted Advisor, and Savings Plans/Reserved Instances.

  • Drive FinOps culture, ensuring teams design and deploy cost-efficient cloud solutions.

  • Implement auto-scaling, rightsizing strategies, and storage lifecycle policies to reduce costs.

Required Qualifications
  • 5+ years of experience in SRE/DevOps roles in AWS.

  • Hands-on expertise with AWS services, including EC2, S3, Lambda, EKS, VPC, IAM, Secrets Manager, SSM and technologies such as Haschicorp Vault and Consul

  • Strong knowledge of cost optimization techniques in AWS, including autoscaling, right-sizing, storage lifecycle policies, and Reserved Instances/Savings Plans.

  • Strong hands-on experience with Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK.

  • Proficiency in Linux Administration, Python, or Bash scripting for automation.

  • Experience with Kubernetes (EKS), Docker, and container orchestration.

  • Strong security and compliance knowledge, including IAM, security groups, encryption, AWS WAF, and logging with CloudTrail.

  • Hands-on experience with monitoring and observability tools like Prometheus, Grafana, AWS CloudWatch, Loki, and New Relic.

  • Experience in approving merge and pull requests, ensuring high-quality infrastructure code.

  • Strong team collaboration, documentation and communication skills.

  • Travel to and from company headquarters is required for mandatory onboarding and company meetings.

  • Preferred Qualifications

  • AWS Certifications (e.g., AWS Certified Solutions Architect - Professional, AWS Certified DevOps Engineer).

  • Experience with multi-account AWS organizations and AWS Control Tower.

  • Familiarity with service meshes (Istio, Linkerd) and API gateways.

  • Experience with Fortinet (FortiGate) firewalls and AWS networking (VPC, Transit Gateway, Direct Connect, etc.).

  • Background in database administration (PostgreSQL, MySQL, DocumentDB, or NoSQL databases).

  • Experience implementing resilience testing and chaos engineering.

Why Join Us?
  • Work on cutting-edge cloud technologies in a high-impact role.

  • Lead AWS cloud strategy and architecture, shaping the company’s infrastructure vision.

  • Be a mentor and leader, driving best practices in SRE and cloud engineering.

  • Optimize cloud costs, ensuring efficiency and scalability.

  • Collaborate with top engineering teams, influencing product and infrastructure decisions.

What we offer:

  • Comprehensive Healthcare: Fully covered medical, vision, and dental benefits for employees, plus generous dependent coverage.

  • Telehealth Services: Convenient access to telehealth services tailored for remote work.

  • Savings Accounts: Tax-advantaged savings accounts for healthcare and dependent care expenses.

  • Ancillary Benefits: Life, AD&D, and disability insurance paid by Iodine for peace of mind.

  • Retirement Plan: Competitive 401(k) retirement plan with a considerable company match.

  • Extra Life Insurance: Optional additional life insurance coverage for you and your dependents.

  • Accident Insurance: Financial protection against unexpected accidents and critical health issues.

  • Critical Illness Insurance: Provides financial support for medical costs and living expenses during serious illness.

  • Hospital Indemnity Insurance: Additional support for hospital-related expenses through indemnity insurance.

  • Pet Insurance: Affordable options for discounted pet insurance.

  • Legal and Identity Protection: Legal and ID theft protection to safeguard personal information.

  • Employee Assistance: Confidential employee assistance program for personal and professional challenges.

  • Education Allowance: Annual funding for educational pursuits and continuing education to support professional development and skill enhancement.

  • Reimbursements: Annual reimbursement for eligible wellness expenses, monthly reimbursement for cell phone and WiFi costs, and a one-time equipment allowance for creating a comfortable home office.

Why should you join Iodine?
This is a unique opportunity to join a close-knit, rapidly growing team and help us improve a key piece of the organization. You will have the opportunity to drive smarter healthcare processes through technology, so hospitals can stay focused on patient care. You will join a passionate and ambitious team, with a proven record of success building multiple companies. Learn more about our company culture on Built In Austin  and on our website at www.iodinesoftware.com.

Compensation Range: $110K - $137K


#BI-Remote

Top Skills

Ansible
Ansible
Api Gateway
Argo Cd
AWS
Aws Cdk
Aws Cloudtrail
Aws Cloudwatch
Bash
CloudFormation
Cloudfront
Docker
Documentdb
Ec2
Eks
Gitlab
Grafana
Hashicorp Vault
Helm
Kubernetes
Lambda
Loki
Mimir
New Relic
Prometheus
Python
Rds
S3
Secrets Manager
Ssm
Tempo
Terraform

Similar Jobs at Iodine Software

5 Days Ago
Remote or Hybrid
USA
145K-170K Annually
Senior level
145K-170K Annually
Senior level
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
The SRE Cloud Architect will design and optimize AWS cloud infrastructure focusing on scalability, reliability, and cost efficiency, while mentoring teams and ensuring best practices in security and operational excellence.
Top Skills: AnsibleApi GatewayAWSAws CdkAws CloudwatchAws GuarddutyBashCloudFormationCloudfrontCloudtrailDocumentdbEc2EksGitlabGrafanaLambdaLokiMimirPrometheusPythonRdsS3Secrets ManagerSecurity HubSsmTempoTerraform
4 Days Ago
Remote or Hybrid
USA
135K-175K Annually
Senior level
135K-175K Annually
Senior level
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
The role involves developing scalable healthcare applications, leading teams, and participating in technical discussions, requiring strong coding skills and experience in web technologies.
Top Skills: AngularAutomated TestingCi/CdDockerJavaJavaScriptKafkaKubernetesNode.jsPostgresPythonReactSQLTypescriptVuejs
5 Days Ago
Remote or Hybrid
USA
145K-170K Annually
Senior level
145K-170K Annually
Senior level
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
The SRE Cloud Architect will design and optimize AWS cloud infrastructure focusing on scalability, reliability, and cost efficiency, while mentoring teams and ensuring best practices in security and operational excellence.
Top Skills: AnsibleApi GatewayAWSAws CdkAws CloudwatchAws GuarddutyBashCloudFormationCloudfrontCloudtrailDocumentdbEc2EksGitlabGrafanaLambdaLokiMimirPrometheusPythonRdsS3Secrets ManagerSecurity HubSsmTempoTerraform

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account