At Robots & Pencils, we build meaningful, scalable digital products by blending strategy, design, and engineering. We are seeking a Level 4 AI Engineer to build production LLM applications for an enterprise client as part of a long-term, delivery-focused engagement.
You will own the AI stack end-to-end, including RAG pipelines, prompt engineering, and evaluation frameworks. This is a hands-on role: you will write production code, tune prompts, build evaluation and observability systems, and iterate based on real user feedback.
There is a working proof of concept in place. Your responsibility is to make it production-ready and extend it with intelligent, reliable features that operate at enterprise scale.
What You’ll Do
AI & LLM Application Delivery
· Build, optimize, and evolve RAG pipelines, including retrieval strategies, chunking, and re-ranking
· Develop prompts and guardrails for domain-specific LLM applications
· Implement hallucination detection, mitigation, and fact-checking mechanisms
· Build embeddings-based search and recommendation features
· Validate AI features with real users and iterate based on qualitative and quantitative feedback
Evaluation, Monitoring & Reliability
· Set up and maintain LLM evaluation frameworks to measure quality, relevance, and reliability
· Implement observability and monitoring for production AI systems
· Monitor live AI systems and resolve quality, accuracy, and performance issues
· Continuously improve AI outputs based on evaluation data and user behavior
Platform & System Integration
· Work closely with product and engineering teams to integrate AI into user-facing features
· Build and maintain backend services in Python
· Integrate with vector databases to support retrieval and semantic search workflows
· Ensure AI solutions meet enterprise requirements for security, scalability, and maintainability
Delivery & Collaboration
· Collaborate with cross-functional partners across product, engineering, and design
· Operate effectively in environments with evolving requirements and ambiguity
· Communicate clearly with technical and non-technical stakeholders
· Take ownership of delivery outcomes from experimentation through production
Required Skills & Experience
· 8+ years of professional software engineering experience, with 4+ years focused on applied AI/ML or data-driven systems in production environments
· 3+ years building and operating production AI systems
· Strong hands-on experience with LLM applications, including RAG, prompt engineering, and evaluation
· Experience implementing hallucination detection and mitigation techniques
· Proficiency in Python
· Experience working with vector databases (Weaviate, Pinecone, or similar)
· Experience with LLM evaluation frameworks (Langfuse, Weights & Biases, or custom solutions)
· Production experience using Claude and/or GPT APIs
· Strong understanding of embeddings and semantic search
· Comfortable working with ambiguity and iterating on unclear problems
· Bachelor's degree in computer science, Engineering, Data Science, or a related technical field, or equivalent practical experience
· Advanced degree (Master’s or PhD) in a relevant field
Nice to Have
· Experience with Azure AI services, including Azure OpenAI and Cognitive Services
· Experience with document processing (PDF extraction, OCR)
· Exposure to audio or speech processing (e.g., Whisper or similar tools)
· Experience building enterprise B2B software
· Experience with ML classification and model training
Tech Stack
· LLMs: Claude (Anthropic), Azure OpenAI
· Vector Database: Weaviate
· Backend: Python
· Infrastructure: Azure
· Evaluation & Observability: Langfuse or similar
How You Work
· You are hands-on and delivery-focused, writing code and owning outcomes
· You balance speed with quality in production environments
· You communicate clearly and collaborate effectively across disciplines
· You take ownership of ambiguous problems and drive them to resolution
· You prioritize reliability, maintainability, and real-world impact
Why Robots & Pencils
· Real production impact not a POC that sits on a shelf
· Exposure to the full AI lifecycle: RAG, LLM applications, evaluation, classification, and monitoring
· End-to-end ownership of the AI stack and technical decision-making
· A small, senior team with direct access to enterprise clients
Top Skills
Similar Jobs
What you need to know about the Colorado Tech Scene
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute


