South Geeks Logo

South Geeks

Senior Data Engineer (AI)

Posted Yesterday
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Design, build, and operate end-to-end ELT pipelines that extract structured JSON from complex leasing documents using LLMs. Optimize LLM API calls and prompts, implement validation and monitoring, and collaborate with product and engineering to evolve schemas and ensure production-ready data quality.
The summary above was generated by AI

Hi there :)

Thanks for checking in to find out about our open position. We´ll provide as much information as possible, but please feel free to reach us if you have further questions. We´ll be happy to see your application, even if there are skills you don't quite master!

About Us

At South Geeks, we connect top LATAM engineering talent with innovative companies building impactful products worldwide. We focus on long-term partnerships, strong technical environments, and creating spaces where professionals can grow, contribute, and thrive.

About the Client

Our client is a real estate technology startup transforming how commercial real estate teams negotiate and manage leases through AI-driven intelligence.

Their platform combines advanced AI, structured data pipelines, and user-centered design to automate complex lease workflows, extract market-aligned insights, and streamline proposal generation. The goal is to bring speed, clarity and data-backed confidence to the entire deal lifecycle.

About the Role

We’re looking for a Senior Data Engineer who thrives at the intersection of data engineering and applied AI.

This is a hands-on, high-ownership role where you will design, build and operate systems that extract, transform, and validate structured data from complex leasing documents. You will own the full ELT loop turning messy, real-world documents into clean, reliable JSON that powers web applications and downstream systems.

In this early-stage environment, iteration and agility are key. You’ll scope ambiguous problems, experiment with AI-driven extraction techniques, and continuously refine pipelines to improve accuracy and scalability.

Key Responsibilities
  • Design and iterate data extraction and transformation pipelines that convert unstructured leasing documents into structured JSON stores.

  • Write and optimize LLM API calls and prompts to extract and interpret text data at scale.

  • Orchestrate AI-driven workflows integrating multiple LLM models to handle diverse document types and edge cases.

  • Build and maintain ELT workflows in Python, managing data flows between cloud storage and relational databases.

  • Develop data quality and validation frameworks to ensure structured outputs are accurate and production-ready.

  • Implement monitoring, alerting, and automated quality checks across extraction pipelines.

  • Collaborate with product and engineering teams to define and evolve data schemas.

  • Own the pipeline end-to-end — from raw ingestion to validated structured output.

Required Skills & Experience
  • Strong Python engineering experience building data extraction and transformation workflows.

  • Experience calling LLM APIs (OpenAI, Anthropic, or similar) and crafting prompts for structured data extraction.

  • Solid understanding of ELT patterns and data pipeline architecture.

  • Experience working with AWS S3 (or similar object storage) and PostgreSQL (or similar relational databases).

  • Experience designing JSON schemas and handling nested or semi-structured data.

  • Strong data validation mindset and experience implementing quality controls.

  • Ability to work independently in a fast-moving, early-stage environment.

Nice to Have
  • Experience building document processing pipelines (PDFs, contracts, leases, or similar).

  • Experience evaluating and comparing LLM outputs for consistency and accuracy.

  • Familiarity with AI orchestration platforms.

  • Background in real estate, leasing, or financial document processing.

Our Team

We strive to create an inspiring and growth-oriented environment where everyone feels valued, heard, and empowered. We promote both personal and professional development, with individualized support for your needs and goals. We aim to build a space where everyone can thrive.

What We Offer
  • Long-term projects

  • 100% remote work

  • Payment in USD

  • Paid Time Off (PTO)

  • Work-from-home & training reimbursement

  • English lessons

  • Technical training

  • Career coaching

Top Skills

Python,Openai,Anthropic,Llm Apis,Aws S3,Postgresql,Json,Elt

Similar Jobs

5 Days Ago
Easy Apply
In-Office or Remote
4 Locations
Easy Apply
150K-185K Annually
Senior level
150K-185K Annually
Senior level
Financial Services
Build full-stack, user-facing features alongside ML engineers using TypeScript, Node.js, GraphQL, and Postgres. Implement and integrate AI/LLM/ML tools, own features end-to-end from design to production, maintain scalability/reliability/observability, participate in code reviews and design discussions, and contribute to product ideation and RFCs.
Top Skills: GraphQLLlmsMl/AiNode.jsPostgresPythonReactTypescript
5 Days Ago
Remote
USA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software • Database
Build production-grade AI-powered data tooling: extract data from Snowflake, generate and store embeddings, enable semantic search, design enrichment pipelines using LLM APIs, optimize AWS infrastructure, and create reusable services and SDKs for scalable, observable data and AI workflows.
Top Skills: Python,Aws,S3,Iam,Lambda,Ecs,Eks,Snowflake,Openai,Pinecone,Llm Apis,Vector Databases,Embeddings,Semantic Search
Senior level
Cloud • Information Technology • Software
Embedded with strategic enterprise customers to design, prototype, and deploy production-grade AI solutions (RAG, agents, knowledge graphs). Own full lifecycle, build data pipelines, manage GPU/private cloud infrastructure, integrate with enterprise systems, and mentor teams while driving product feedback and reusable IP.
Top Skills: Python,Node.Js,Go,React,Vue,Sql,Nosql,Docker,Kubernetes,Ci/Cd,Gpu Infrastructure (Nvidia H100,B200),Infiniband,Openstack,Vmware,Pinecone,Weaviate,Astradb,Vector Databases,Llms,Prompt Engineering,Rag,Llamaindex,Haystack,Langchain,Langgraph,Crewai,Etl,Elt,Gpu Cluster Management,Erp/Crm Integration,Data Warehouses,Data Lakes

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account