Data Science

Head of Data Science

Lead the applied-research function — from evals and retrieval to the product analytics that tell us whether our agents are getting smarter.

TeamData Science
LocationRemote (US / EU)
LevelExecutive
Comp$250K – $310K + equity

Apply for this role Ask us a question first

About the role

You'll lead the function that turns our AI stack into a measurable, improving system: retrieval quality, agent evals, model selection, and the telemetry that closes the loop between production outcomes and research priorities.

About Lytica Labs

Lytica Labs is building the agentic analytics layer for modern revenue, data, and operations teams. Our platform sits on top of the tools you already use — Snowflake, BigQuery, Salesforce, HubSpot, Stripe, Segment, Slack, Linear, Notion, and a long tail of others — and lets cross-functional teams ask complex business questions in natural language, trigger workflows from the answers, and keep every team aligned on the same source of truth.

Under the hood, we run a multi-model AI stack (OpenAI, Anthropic, and open-source models served via the Vercel AI SDK) orchestrated by a type-safe tool loop that reads live data through Convex and executes side effects through Inngest. The front end is a Next.js 15 App Router monorepo (React 19, TypeScript, Tailwind, Radix) deployed on Vercel; our backend uses Convex for reactive data, WorkOS for auth, Stripe for billing, Resend for transactional email, Sentry for observability, and PostHog for product analytics. Everything is typed end-to-end, everything ships behind feature flags, and everything is observable from day one.

We're a small, senior-heavy team that ships quickly without cutting corners. We believe in clear writing, small PRs, durable abstractions, and treating production traffic like the honor it is. If the intersection of AI agents, real-time systems, and high-leverage analytics is the most interesting problem you can think of right now — we'd love to talk.

What you will own

The evaluation harness: offline datasets, CI integration, and the online experiments that gate every model change.
Retrieval and context engineering — vector stores, hybrid search, and the feature work that makes prompts dense and relevant.
Model selection and routing decisions across OpenAI, Anthropic, and open-source providers, with cost and latency accounted for.
Partnering with engineering to keep the AI layer simple, measurable, and honest about what it can't yet do.

What we are looking for

PhD or equivalent industry experience in ML / NLP, with at least 3 years building LLM-backed systems in production.
Strong software engineering fundamentals — you can ship code, not just notebooks.
Experience designing eval + experimentation systems that meaningfully changed product trajectories.
You're a clear writer, willing to publish internally (and occasionally externally) on what we're learning.