Designing Data and AI Systems That Hold Up in Production

Do you see yourself as a full-stack developer? How does your experience across the entire stack (from frontend to database) change the way you view the information scientist role?

I do, but not within the sense of personally constructing every layer. For me, full-stack means understanding how architectural decisions at one layer shape system behavior, risk and value over time. That perspective is crucial when designing systems that have to survive change.

This attitude also influences how I view the information scientist role. Models created in notebooks are only the start. Real value emerges when those models are embedded in production systems with proper data pipelines, APIs, governance, and user-facing interfaces. Data science becomes impactful when it’s treated as a core part of a bigger system, not as an isolated activity.

You cover a big selection of topics. How do you select what to concentrate on next, and the way do you understand when a brand new topic is price exploring?

I are likely to follow recurring friction. After I see multiple teams struggle with the identical problems, whether technical or organizational, I take that as a signal that the difficulty is structural reasonably than individual, and value addressing on the architectural or process level.

I also deliberately experiment with latest technologies, not for novelty, but to know their trade-offs. A subject becomes price writing about when it either solves an actual problem I’m currently facing or reveals risks that should not yet widely understood. Finally, I write about topics I personally find interesting and value exploring, because sustained interest is what allows me to go deep.

You’ve written about LangGraph, MCP, and self-hosted agents. What’s the most important misconception you’re thinking that people have about AI agents today?

Agents are genuinely powerful and open up latest possibilities. The misunderstanding is that they’re easy. It is straightforward today to assemble cloud infrastructure, connect an agent framework, and produce something that appears to work. That accessibility is worthwhile, but it surely masks numerous complexity.

Once agents move beyond demos, the true challenges surface. State management, permissions, cost control, observability, and failure handling are sometimes underestimated. Without clear boundaries and ownership, agents grow to be unpredictable, expensive, and dangerous to operate. They should not just prompts with tools; they’re long-lived software systems and must be engineered and operated accordingly.

In your article on Layered Architecture, you mention that adding features can often feel like “open-heart surgery.” For a beginner or a small data team seeking to avoid this, what’s your key advice on organising an architecture?

“The one constant is change” is a cliché for a great reason so optimize for change reasonably than for initial delivery speed. Even a minimal type of layered considering helps: separating domain logic, application flow, and infrastructure concerns.

The goal will not be architectural perfection on day one or perfect categorization. It’s about creating clear boundaries that allow the system to evolve without constant rewrites. Small upfront discipline pays off significantly as systems grow.

You’ve benchmarked PostgreSQL insert strategies and noted that “faster will not be all the time higher.” In a production ML pipeline, what’s a scenario where you’d deliberately select a slower, safer insertion method?

When correctness, traceability, and recoverability matter greater than raw throughput. In lots of pipelines, reducing runtime by a couple of seconds offers little profit in comparison with the chance introduced by weaker guarantees.

For instance, pipelines that feed regulatory reporting, financial decision-making, or long-lived training datasets profit from transactional safety and explicit validation. Silent data corruption is much more costly than accepting modest performance trade-offs, especially when data becomes a long-term asset others will construct on..

In your Personal, Agentic Assistants article, you built a 100% private, self-hosted platform. Why was avoiding “token costs” and “privacy leaks” more necessary to you than using a more powerful, cloud-based LLM?

In my each day work I’ve experienced that trusting a system is key to system adoption. Token costs, opaque data flows, and external dependencies subtly influence how systems are used and perceived.

I also made a conscious selection to not route my personal or sensitive data through external cloud providers since there are limited guarantees on how data is handled over time. By keeping the system self-hosted, I could design an assistant that’s predictable, auditable, and aligned with European privacy expectations. Users have full control over what the assistant has access to and this lowers the barrier for using the assistant.

Finally, not every use case requires the biggest or most costly model. By decoupling the system from a single provider, users can select the model that most closely fits their requirements, balancing capability, cost, and risk.

How do you see the day-to-day work of an information skilled changing in 2026?

Despite common stereotypes, data and software engineering are highly social professions. I strongly consider that essentially the most significant a part of the work happens before writing code: aligning with stakeholders, understanding the issue space, and designing solutions that fit existing systems and teams.

This upfront work becomes much more necessary as agent-assisted development accelerates implementation. Without clear goals, context, and constraints, agents amplify confusion reasonably than productivity.

In 2026, data professionals will spend more time shaping systems, defining boundaries, validating assumptions, and ensuring responsible behavior in production environments.

Looking ahead at the remaining of 2026, what big topics will define the 12 months for data professionals, in your opinion? Why?

Generative AI and agent-based systems will proceed to grow, but the larger shift is their maturation into first-class production systems reasonably than experiments.

That transition depends upon trustworthy, high-quality, accessible data and robust engineering practices. Consequently, full-stack considering and system-level design will grow to be increasingly necessary for organizations that need to apply AI responsibly and at scale.

To learn more about Mike’s work and stay up-to-date together with his latest articles, you may follow him on TDS or LinkedIn.

Designing Data and AI Systems That Hold Up in Production

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

A Generalizable MARL-LP Approach for Scheduling in Logistics

Latest AirSnitch attack breaks Wi-Fi encryption in homes, offices, and enterprises

Google’s latest AI image generation model

Learn how to train a brand new language model from scratch using Transformers and Tokenizers

Finding value with AI and Industry 5.0 transformation

Designing Data and AI Systems That Hold Up in Production

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.